Data Engineer Machine Learning

Location:

South Davis, TX, 76013

Posted:

September 10, 2025

Contact this candidate

Resume:

Monish Bhargava Chippa

+1-551-***-**** ************@*****.*** LinkedIn

Professional Summary

Results oriented Data Engineer with 4+ years of experience designing, developing, and maintaining scalable data pipelines and cloud based data platforms. Proven track record in implementing real time and batch data processing solutions using Python, SQL, Spark, and cloud services (AWS, GCP). Experienced in data modeling, ETL orchestration, data warehouse optimization, and cross functional collaboration. Passionate about building reliable, secure, and high performance data systems that support business intelligence, machine learning, and analytics teams.

Technical Skills

Programming Languages: Python, SQL, Bash, Scala

Big Data Tools: Apache Spark, Hadoop, Kafka, Flink

ETL & Orchestration: Apache Airflow, dbt, Informatica, AWS Glue

Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, EMR), GCP (BigQuery, Dataflow), Azure (Data Factory)

Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Snowflake, Redshift, BigQuery

Data Modeling & Frameworks: Dimensional Modeling, Star/Snowflake Schema, Data Vault 2.0

Containers & CI/CD: Docker, Kubernetes (basic), Git, Jenkins

Reporting & Visualization: Power BI, Tableau, Looker

Other Tools: Terraform (basic), Jupyter, Pandas, NumPy, REST APIs, JSON, Parquet, Avro

Professional Experience

Data Engineer Comcast

Aug 2023 – Present

Developed and maintained enterprise grade ETL pipelines using Python, Spark, and Airflow, processing over 2TB of log and event data daily from customer devices and applications.

Built real time data ingestion and streaming pipelines using Kafka and Spark Streaming, reducing data latency by 75% for downstream reporting tools.

Designed and implemented data lake architecture using AWS S3, Glue, and Redshift, enabling analysts and data scientists to self serve high quality datasets.

Led a performance tuning initiative across ETL pipelines, reducing pipeline runtimes by 40% by optimizing Spark jobs, partitioning strategies, and storage formats (Parquet).

Deployed automated data validation and quality checks using Great Expectations and Airflow hooks, ensuring >98% data integrity across all ingestion layers.

Collaborated closely with data scientists to provision feature stores and ML ready datasets, accelerating model deployment cycles by 30%.

Contributed to migrating legacy workflows from on prem Hadoop to AWS EMR and Glue, achieving cost savings of ~$10K/month.

Data Analyst Deloitte

Aug 2020 – Aug 2022

Analyzed large client datasets (5M+ records) across financial, healthcare, and retail industries using Python (Pandas) and SQL, delivering insights that led to 15%+ revenue optimization for multiple clients.

Created interactive Power BI dashboards used by senior leadership, which replaced weekly manual reports and reduced reporting time by 60%.

Designed and implemented data extraction workflows for unstructured client data (PDFs, Excel, API), automating reporting pipelines and increasing accuracy by 25%.

Participated in the AWS data migration team, helping clients transition from on-prem SQL Server to Redshift, including schema conversion and ETL validation.

Supported data modeling efforts by building dimensional models and materialized views for reporting use cases.

Coordinated with cross functional teams including developers, PMs, and client stakeholders to ensure timely delivery of analytics solutions.

Projects

IoT Real Time Data Pipeline for Operations Monitoring

Tools: Kafka, Spark Streaming, Airflow, AWS (S3, Redshift)

Built and deployed a fault tolerant streaming pipeline that ingested telemetry data from 500K+ devices, processed with Spark Streaming, and stored in Redshift for analytics.

Enabled real time dashboarding and alert systems with <5 second data latency.

Cloud Native Data Warehouse Optimization

Tools: Snowflake, dbt, Airflow, Python

Refactored slow SQL transformations using dbt; redesigned schema models into star schema format.

Integrated Airflow to run incremental models and automated lineage documentation.

Reduced monthly Snowflake costs by 20% through partitioning, clustering, and tuning.

Finance Data ETL Pipeline Automation

Tools: Python, SQL, AWS Lambda, S3

Developed serverless data pipeline using Lambda functions to ingest, validate, and push financial data to S3 daily.

Replaced manual Excel workflows and improved reporting SLA from 24 hours to 30 minutes.

Certifications

AWS Certified Data Engineer – Associate

Databricks Certified Data Engineer Associate

Confluent Certified Developer for Apache Kafka

Microsoft Azure Data Engineer Associate

Education

Master of Arts in Information Technology and Management

Webster University, St. Louis, MO, USA Aug 2022 – May 2024

Contact this candidate