Senior Data Engineer Cloud-Native Data Platforms & ML Pipelines

Location:

Levittown, NY, 11756

Salary:

75000

Posted:

December 11, 2025

Contact this candidate

Resume:

Divya Shah

*****.*@**************.*** +1-628-***-**** LinkedIn GitHub Portfolio

SUMMARY

Dynamic Data Engineer with 4+ years of experience building large-scale, cloud-native data platforms and AI-enabled pipelines across AWS, Azure, and GCP. Skilled in Spark, PySpark, Databricks, SQL, Python, Kafka, Snowflake, and advanced ETL/ELT frameworks, with a strong track record delivering high-impact projects including statewide data modernization, multi-terabyte analytics systems, and ML-driven automation pipelines processing 1B+ records daily. Proven success improving data quality by 95%, accelerating query performance by 40–85%, reducing operational costs by $1M+, and supporting business ecosystems worth $2B+ in annual revenue. Adept in data modeling, warehousing, MDM, Airflow orchestration, serverless ingestion, API integrations, and end-to-end architecture leadership for government, healthcare, and insurance clients. PROFESSIONAL EXPERIENCE

JP Morgan Chase & Co., NY May 2024 – Present

Data Engineer

• Led end-to-end Data Architecture & Integration Strategy for the State of Hawaii’s $120M Child Welfare Modernization Program, directing cross-state teams

(HI, GA, IN) and enabling migration from legacy systems to the Cardinality AI platform supporting 10M+ annual case transactions.

• Designed enterprise Data Architecture Models, Compliant Management Frameworks, and star-schema warehouses, improving analytical query performance by 40%, scalability by 35%, and establishing a unified single source of truth for statewide child welfare reporting.

• Engineered and validated complex SQL, ETL pipelines, and S2T mapping frameworks to migrate 50M+ records with 99.2% accuracy, eliminating multi-year data quality gaps and reducing downstream reconciliation efforts by $1.2M annually.

• Built secure, automated AWS serverless ingestion pipelines (S3, Lambda, CloudWatch), cutting manual processing by 45%, enabling real-time ingestion of 500K+ daily records, and reducing operational overhead by $600K/year.

• Acted as Lead Integration Engineer on Boomi, architecting REST + SOAP API ecosystems for bi-directional data exchange across state agencies (e.g., GA DHS), enabling sub-second data synchronization, and improving interoperability and compliance by 50%.

• Provided technical leadership across multiple state programs, ensuring full data migration readiness, integration compliance, and successful deployment of Cardinality AI products (Communicare DSP Invoicing & Foster Parent Invoicing), contributing to 2 major go-lives impacting $900M+ annual state funding workflows.

Hexaware Technologies, India Apr 2021– Aug 2023

Data Engineer

• Designed and optimized large-scale ETL & ML pipelines on Azure Databricks (PySpark, Python) processing 10+ TB of enterprise data (policy, claims, EHR, customer, market), improving data availability for analytics by 40% and cutting ingestion time by 20%, enabling insights across business units handling $2B+ in annual policy revenue.

• Automated data validation, transformation, and reconciliation workflows using Python & PySpark, reducing manual processing by 30% and cutting data quality issues by 95%, saving 1,500+ analyst hours annually and accelerating actuarial & finance cycles.

• Built a fully automated ML modeling pipeline generating 200–250 models/run with feature selection, EDA, hyperparameter tuning, and automatic best-model promotion, increasing modeling accuracy, reliability, and deployment speed for marketing & underwriting teams.

• Leveraged AutoML, MLflow, and Azure MLOps to improve predictive accuracy of CLV and churn models by 15%, reducing time-to-production from 3 weeks 5 days and supporting business decisions impacting millions of customers.

• Performed advanced analytics (Python, SQL, Cox models, time series forecasting) on claims severity, policy lapsation, and clinical outcomes to support risk prediction and pricing strategies, influencing initiatives worth $50M+ in revenue impact.

• Delivered interactive dashboards using Power BI & Tableau for segmentation, policy performance, and clinical metrics, supporting real-time decision-making and increasing customer conversion and retention rates by 20% across digital channels. KPIT Technologies, India Dec 2020 –Apr 2021

Junior Data Engineer

• Engineered and optimized cloud-native data pipelines across AWS, Snowflake, and Redshift to process 1B+ records daily, improving pipeline reliability by 40% and supporting data workloads exceeding $50M+ in annual business reporting.

• Orchestrated 15+ Airflow DAGs and 50+ Control-M jobs to automate enterprise data refresh cycles, achieving a 99.8% on-time pipeline success rate and reducing operational overhead by 35%.

• Built high-performance ETL pipelines using Apache Spark, Pandas, and NumPy to process multi-terabyte datasets, reducing transformation time by 43% and increasing system throughput by 20% on AWS EMR.

• Designed and deployed star-schema data models consolidating 12+ enterprise source systems, improving analytical query performance by 35% and accelerating business-reporting cycles for U.S. stakeholders.

• Tuned Snowflake & Redshift workloads using materialized views and optimized distribution/sort keys, reducing dashboard/report execution time from 12 minutes to under 90 seconds (over 85% improvement).

• Automated infrastructure provisioning with Terraform and GitHub Actions CI/CD, cutting deployment time by 70% and ensuring consistent, compliant data environments for large-scale U.S. healthcare analytics. TECHNICAL SKILLS

• Programming Languages: Python, SQL, Scala, Java, R, Bash

• Big Data Ecosystem: Apache Spark, PySpark, Databricks, Apache Flink, Hadoop, HDFS, Hive, HBase, Kafka (Streaming & CDC), Spark Structured Streaming

• Cloud Technologies: AWS (EMR, EC2, S3, Glue, Athena, Redshift, DynamoDB, Lambda, Kinesis, Elasticsearch, Step Functions, QuickSight), Azure (Data Factory, Data Lake Gen2, Databricks, Synapse Analytics, Azure SQL, Blob Storage, Event Hub), GCP (BigQuery, Pub/Sub, Cloud Storage, Dataflow)

• Visualization & Reporting: Tableau, Power BI, AWS QuickSight, Excel (Advanced)

• ETL & Data Integration Tools: SSIS, SSRS, Informatica, Fivetran, Talend, DBT (Data Build Tool), SAS, PySpark, Tableau Prep

• Data Processing & Analytics Packages: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, MLflow, AutoML, Feature Engineering, Data Pipelines

• Version Control & Databases: Git, GitHub, GitLab, SQL Server, PostgreSQL, MySQL, Snowflake, Redshift, BigQuery, Cassandra, DynamoDB

• Data Management & Governance: Data Modeling, Dimensional Modeling, Data Warehousing, Data Quality, Metadata Management, Master Data Management (MDM), Data Lineage, Data Governance, Data Cataloging (Glue Catalog, Alation) CERTIFICATES

AWS Certified Cloud Practitioner AWS Certified Developer – Associate EDUCATION

Master of Science in Computer Science

New York Institute of Technology, New York, USA

Bachelor of Engineering in Computer Engineering

Dharmsinh Desai University, Gujarat, India

Contact this candidate