Data Engineer Machine Learning

Location:

Naperville, IL, 60563

Salary:

$80000

Posted:

September 10, 2025

Contact this candidate

Resume:

Koushik Mummadi Data Engineer

Chicago, USA +1-660-***-**** ****************@*****.*** Portfolio Summary

Data Engineer with 5+ years of experience delivering scalable, cloud-based data solutions across Azure, AWS, and GCP platforms. Proven expertise in optimizing data pipelines, integrating diverse sources, and enhancing analytics performance. Skilled in SQL, Python, Spark, and advanced ETL processes to support business intelligence and decision-making. Adept at collaborating with cross- functional teams, ensuring data accuracy, and driving measurable operational improvements in healthcare, technology, and enterprise environments.

Skills

Programming Languages: Python, SQL, Scala, Bash/Shell Machine Learning & AI: Generative AI, Machine Learning (ML), Analpan Big Data Technologies: Apache Spark, Hadoop, Hive, Kafka, Delta Lake Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics ETL/ELT Tools: Apache Airflow, dbt, Apache NiFi, Azure Data Factory Cloud Platforms: AWS (S3, Glue, EMR, Lambda, Redshift), Azure (ADF, Data Lake, Synapse), GCP (BigQuery, Dataflow, Composer) Data Modeling & Pipelines: Star/Snowflake Schema, CDC, Batch & Streaming Pipelines, Lakehouse Architecture Databases: PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB CI/CD & Containerization: Docker, Kubernetes, Jenkins, GitHub Actions, Terraform Monitoring & Logging: Prometheus, Grafana, Datadog, CloudWatch, ELK Stack Data Governance & Security: Great Expectations, Amundsen, Open Metadata, Apache Ranger, RBAC, GDPR & HIPAA Compliance Version Control & Collaboration: Git (GitHub, GitLab), JIRA, Confluence, Agile/Scrum Professional Experience

UnitedHealth Group, USA April 2025 – Present

Data Engineer

Project: Claims Data Lakehouse Modernization (Azure)

• Created Azure Data Factory pipelines ingesting real-time claims from Event Hubs, reducing latency by 65% and ensuring timely availability of claim data to actuarial teams across multiple insurance risk analysis scenarios.

• Used Azure Synapse for incremental transformations to optimize storage cost by 40%, enabling high-throughput query performance for regulatory audit teams and simplifying policy review processes for compliance departments enterprise-wide.

• Modeled dimensional schemas within Azure SQL Data Warehouse, reducing complex business intelligence query time by 58% and supporting instant insights generation for operations executives handling state-wide insurance process escalations.

• Applied Great Expectations validations in Databricks for ingestion quality, achieving 97% data accuracy thresholds and minimizing corrective effort during quarterly HIPAA data audits and internal claim verification assessments.

• Scheduled Azure Data Factory workflows using Apache Airflow, replacing legacy manual orchestration and achieving 99.5% SLA adherence for nightly batch jobs across regional datasets within UnitedHealth’s health coverage reporting stack.

• Developed interactive Power BI dashboards over Synapse views, improving executive visibility into high-risk claim categories and enabling earlier intervention in fraudulent insurance behaviors across provider networks and local coverage zones.

• Used Azure Key Vault integrated with role-based access control for secret management, improving platform access security and achieving zero credential exposure incidents during internal penetration testing and quarterly audit checkpoints.

• Built Grafana dashboards to monitor Databricks job health metrics in real-time, reducing failure resolution times by 45% and ensuring seamless ingestion of insurance claims from various provider data pipelines daily. Hexaware Technologies, India Jul 2020 – Aug 2023

Data Engineer

Project: Customer 360 Platform – Retail Analytics (AWS)

• Built AWS Glue ETL jobs to integrate sales and loyalty data from S3 into Redshift, reducing report generation time by 62% and enhancing marketing campaign targeting across all regional customer segments.

• Designed event-driven architecture using Lambda and SQS for processing new transactions in real-time, enabling 95% faster availability of sales signals and improving customer churn prediction for retention analysis dashboards.

• Created SCD Type 2 dimensional models in Amazon Redshift, enabling accurate historical tracking of customer profiles and supporting strategic decisions across loyalty program restructuring and targeted promotions in retail channels.

• Implemented CDC using AWS DMS for syncing operational databases to S3 in Parquet format, reducing transformation lag by 70% and supporting data science models with fresher features for segmentation.

• Automated deployment of EMR Spark jobs using Terraform and Jenkins, reducing provisioning time by 80% and ensuring consistent execution across dev, staging, and prod environments for distributed processing workloads.

• Set up Redshift Spectrum queries to combine clickstream and transaction data in S3, enabling 360-degree customer view and improving product placement decisions based on behavioral and transactional affinity scoring.

• Monitored ETL pipeline health via CloudWatch and integrated alerts into Slack, reducing unplanned downtime by 50% and enabling proactive engineering response for critical ingestion pipeline failures in real time.

• Created QuickSight dashboards over Redshift reporting layer, enabling merchandisers and business heads to track product affinity, category-level sales trends, and campaign ROI with refreshed data every 15 minutes via scheduler. Brightmind Technologies, India Jan 2019 – Jun 2020 Data Analyst

Project: Patient Insights Data Platform (GCP)

• Developed ingestion pipelines using Dataflow to stream patient interaction logs from Cloud Pub/Sub into BigQuery, achieving near real-time updates and supporting decision-making across digital healthcare interaction optimization teams.

• Built and deployed scalable ETL transformations in dbt layered over BigQuery datasets, reducing model refresh times by 55% and enabling faster reporting of patient behavior patterns for outreach improvement programs.

• Enabled streaming analytics with BigQuery materialized views, reducing dashboard latency by 85% and allowing clinical analysts to monitor care outcomes by location and care provider in almost real time.

• Applied patient-level data quality checks using Great Expectations integrated with Composer, detecting anomalies at ingestion and improving data trust levels among regulatory compliance analysts reviewing clinical effectiveness.

• Automated daily data snapshots using Cloud Functions and Composer to track schema changes, enabling early detection of upstream changes and reducing pipeline failure rates by 70% across critical healthcare reporting layers.

• Implemented IAM role segregation across datasets and buckets to enforce patient data privacy, passing two consecutive HIPAA audits with zero compliance issues and improved overall GCP security posture.

• Visualized cross-platform performance using Looker dashboards backed by BigQuery, enabling business leaders to measure satisfaction impact across care apps and identify UX gaps that influenced digital engagement scores.

• Monitored pipeline health with Stackdriver and set alerts for pipeline SLA violations, ensuring on-time delivery of compliance reports and improving DevOps response time by 42% during ingestion failure scenarios. Education

Masters, Computer Science Aug 2023 - May 2025

Concordia University, Wisconsin, USA

Certification

AWS Associate Data engineer

Contact this candidate