Senior Data Engineer Cloud Data Platforms Expert

Location:

Halethorpe, MD

Posted:

March 18, 2026

Contact this candidate

Resume:

Sanghavi Yelmelwar

Data Engineer

Open to Relocate ***********@*****.*** 202-***-**** LinkedIn PROFESSIONAL SUMMARY

Data Engineer with 5+ years of experience delivering high-performance data pipelines and cloud data platforms supporting financial analytics, machine learning, and real-time reporting. Skilled in Python, PySpark, Apache Spark, SQL, and distributed data processing with Kafka and Airflow. Hands-on experience with AWS data ecosystem (S3, EMR, Glue), Azure Databricks, Snowflake, and dbt for building scalable ETL/ELT workflows and governed data models. Proven ability to integrate AI-driven workflows and optimize data infrastructure to support advanced analytics, risk modeling, and large-scale enterprise data systems. TECHNICAL SKILLS

Programming & Query Languages: Python, SQL, PySpark, Java, Scala, R, Shell Scripting Big Data & Distributed Processing: Apache Spark, PySpark, Apache Kafka, Hadoop (HDFS, Hive), Spark Structured Streaming Data Engineering & ETL Pipelines: ETL/ELT Pipelines, Apache Airflow, AWS Glue, dbt Core, Real-Time Data Pipelines, Data Integration Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Lambda, Athena, EC2), Azure (Data Factory, Databricks), Cloud Data Architecture, Data Lake Data Warehousing & Databases: Snowflake, Amazon Redshift, PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, DynamoDB, Cassandra Data Modeling & Governance: Data Modeling, Data Warehousing, Data Governance, Data Quality (Great Expectations), Query Optimization Streaming Data Platforms: Apache Kafka, Kafka Streams, Spark Structured Streaming, Event-Driven Architecture DevOps & Infrastructure as Code: Git, GitHub, GitLab CI/CD, Jenkins, Docker, Kubernetes (EKS, AKS), Terraform, CI/CD Pipelines Machine Learning & AI: Feature Engineering, Machine Learning Pipelines, PyTorch, MLflow, NLP, LLM Data Processing, LangChain Data Visualization & BI: Power BI, Tableau, Advanced Excel, KPI Dashboards Methodologies: Agile, Scrum, SDLC, Cross-Functional Collaboration EXPERIENCE

JPMorgan Chase – Data Engineer

Delaware, USA February 2025 – Present

• Architected and deployed scalable AWS data infrastructure (Amazon S3, AWS EMR, AWS Glue) to consolidate multi-source financial datasets (trading, risk, client data) into a centralized data lake, reducing time-to-insight for analytics initiatives by 50%.

• Developed high-performance ETL/ELT pipelines using PySpark and Apache Spark, implementing incremental data loading, partitioning, and parallel processing, reducing data latency by 30% and improving cost efficiency in cloud workloads.

• Designed enterprise-grade data models and transformations using dbt and SQL, integrating market data APIs, transactional databases, and unstructured datasets to produce governed, audit-ready datasets for regulatory reporting and quantitative analytics.

• Implemented automated data orchestration and CI/CD pipelines using Apache Airflow, Jenkins, GitLab CI/CD, and Git, improving deployment reliability and reducing manual operational errors while accelerating delivery by 25%.

• Collaborated with Data Scientists and ML Engineers to build feature engineering pipelines in PySpark, supporting fraud detection and credit risk machine learning models, improving predictive accuracy by 15%.

• Integrated AI/LLM-driven data workflows using LangChain, OpenAI APIs, and Python, enabling natural language querying of financial datasets and automated reporting pipelines, reducing analyst reporting time by 40%. Wipro – Data Engineer

India November 2019 – December 2023

• Engineered 15+ production-grade ETL/ELT pipelines using Azure Data Factory, Databricks (PySpark), and Snowflake, migrating legacy Hadoop/Hive workloads to Azure cloud architecture, improving data availability for BI and analytics by 40%.

• Developed large-scale data streaming and batch processing pipelines using Apache Spark, Apache Kafka, and Apache Airflow, ingesting high-volume POS transaction and IoT sensor data, reducing analytics latency by 35%.

• Designed distributed real-time event processing architecture using Kafka Structured Streaming, Apache Spark, and Zookeeper, ensuring fault-tolerant streaming pipelines with 99.9% system availability for manufacturing analytics platforms.

• Built a Python-based data validation and quality framework using Pandas and SQLAlchemy, improving data reliability and reducing production data issues by 45% across multiple pipelines.

• Implemented DevOps and Infrastructure-as-Code (IaC) practices using GitHub Actions, Terraform, Docker, and Azure Kubernetes Service

(AKS), accelerating release cycles by 40% and maintaining 99% production uptime.

• Created interactive Power BI dashboards delivering real-time supply chain KPIs and IoT analytics, eliminating 50% of manual reporting processes and enabling data-driven decision-making for executive stakeholders. EDUCATION

Master’s in Data Science University of Maryland, Baltimore, MD, USA. Bachelor’s in Electronics and Communication Engineering J B Institute of Engineering and Technology, Hyderabad, India.

Contact this candidate