DURGAPRASAD NUTHALAPATI
Birmingham, Alabama +1-205-***-**** ************************@*****.*** LinkedIn SUMMARY
Data Engineer with 4+ years of experience building cloud-native data platforms and real-time streaming pipelines on AWS, Azure, and GCP. Expert in designing scalable ETL/ELT frameworks using Apache Spark, Databricks, Delta Lake, Kafka, and Airflow to support both batch and streaming analytics. Proven success in enabling machine learning operations through ML-ready feature engineering, MLflow tracking, and Feature Store integration, directly supporting predictive modeling and AI-driven decisioning. Skilled in optimizing data workflows, ensuring data governance, and collaborating with cross-functional teams to deliver reliable, high-performance data infrastructure. TECHNICAL SKILLS
● Programming & Languages: Python, SQL, PySpark, Scala, Java, Shell Scripting
● Data Engineering & Pipelines: ETL/ELT Development, Data Pipeline Architecture, Spark Structured Streaming, Data Modeling, Data Quality Validation, Metadata Management, Data Governance, Pipeline Optimization, dbt
● Cloud Platforms: AWS (S3, Glue, EMR, Lambda, Redshift), GCP (BigQuery, DataProc), Azure (Synapse)
● Data Platforms & Tools: Apache Spark, Delta Lake, Apache Airflow, Apache Kafka, Snowflake, Databricks, MLflow, Terraform, Jenkins, Git, GitHub Actions, Prometheus, Grafana, Apache Atlas
● Machine Learning: Feature Engineering, ML Data Pipelines, MLflow Experiment Tracking, Model Deployment support
● DevOps & Databases: Docker, Kubernetes, PostgreSQL, MongoDB, Cassandra
● Security & Compliance: Data Security Best Practices, Encryption, GDPR/CCPA Awareness PROFESSIONAL EXPERIENCE
Data Engineer Epsilon AL, USA Feb 2025 – Present
● Engineered Delta Lake ingestion frameworks on Databricks using Spark Structured Streaming to unify advertising datasets and deliver ML-ready feature outputs for trading teams across 8 operational units.
● Orchestrated cross-cloud ETL pipelines with Airflow and AWS Glue to consolidate 12 audience intelligence sources and enable real- time segmentation for faster model training cycles.
● Designed and deployed large-scale feature pipelines for AI-driven audience modeling using Spark, Databricks Feature Store, and vector embeddings, producing 30+ enriched features and reducing model refresh effort for data science teams.
● Enabled ML operations by implementing MLflow experiment tracking and model artifact storage, supporting 45+ experiments with consistent lineage across campaign analytics teams.
● Coordinated with data scientists and trading teams to align Delta Lake ingestion and feature pipelines with AI modeling and real-time segmentation needs.
Big Data Engineer Robosoft Technologies India Jan 2023 – May 2023
● Developed Python-based ingestion and Spark cleaning jobs to process mobility-app logs from 4 feature teams, landing transformed Parquet files on HDFS and improving downstream usability for analytics pipelines and job consumers.
● Designed partitioned Hive table schemas and lightweight Spark SQL transformations to support product metrics for 2 flagship mobile applications, enabling analysts to query usage trends against production data more reliably.
● Built API extraction utilities and Kafka producer scripts to stream device-interaction events into a staging HDFS layer across 3 app release cycles, enabling engineering teams to validate release telemetry and reproduce issues from persisted event logs.
● Documented dataset definitions, Hive table mappings, partition strategies, and source-to-target lineage for 12 foundational tables, improving handoff quality and reducing pipeline deployment rework during enhancements. Data Engineer Tata Consultancy Services (TCS) India Aug 2019 – Dec 2022
● Designed enterprise-scale data integration layers using PySpark, SQL, and ADF to consolidate financial records across 5 business units, enabling risk analytics teams to process 10M+ operational entries with consistent lineage.
● Engineered Snowflake and PostgreSQL analytical marts supporting predictive credit-risk initiatives, allowing data scientists to train 14 forecasting models with standardized version control.
● Enhanced Azure Databricks workloads by restructuring cluster configurations, partitioning logic, and Delta Lake maintenance routines, cutting nightly pipeline duration from 7 hours to under 4.3 hours.
● Established automated reporting ecosystems with Python, Power BI, and Airflow, removing 12 manual workflows and enabling refreshed insights for 9 stakeholder groups.
● Constructed Kafka-based event capture frameworks for transactional telemetry, enabling near–real-time anomaly signals that strengthened fraud-detection operations across distributed teams.
● Conducted behavioral analytics using Pandas, Spark SQL, and ML-aligned feature pipelines, producing insights that guided retention strategies across 6 regional markets.
PROJECTS
Cloud-Native Real-Time Data Pipeline for Predictive Insights:
● Developed and deployed a high-performance real-time data pipeline using Kafka, Spark Structured Streaming, and Delta Lake to process 5M+ mobility events daily, enabling ML feature stores to refresh in under 90 seconds. Implemented workflow orchestration with Airflow and automated schema handling in Python, reducing operational failures across 7 stages and ensuring reliable data availability for 4 production applications.
End-to-End ML Feature Engineering & Model Serving Architecture:
● Built an automated feature-engineering pipeline using Databricks, Python, and Snowflake, transforming 30+ raw attributes for 6 predictive models. Containerized model serving with Docker and used MLflow for tracking and deploying 20+ reproducible model versions with governance.
EDUCATION
University of Alabama at Birmingham Birmingham, Alabama Master of Science in Computer Engineering Jun 2023 – Dec 2024 LB Reddy College of Engineering India
B.Tech in Computer Science and Engineering Aug 2015 – May 2019 CERTIFICATIONS
● AWS Certified Data Engineer Associate
● Microsoft Certified Azure Data Engineer Associate