Monish Bhargava Chippa
+1-551-***-**** ************@*****.*** LinkedIn
Professional Summary
Results oriented Data Engineer with 4+ years of experience designing, developing, and maintaining scalable data pipelines and cloud based data platforms. Proven track record in implementing real time and batch data processing solutions using Python, SQL, Spark, and cloud services (AWS, GCP). Experienced in data modeling, ETL orchestration, data warehouse optimization, and cross functional collaboration. Passionate about building reliable, secure, and high performance data systems that support business intelligence, machine learning, and analytics teams.
Technical Skills
Programming Languages: Python, SQL, Bash, Scala
Big Data Tools: Apache Spark, Hadoop, Kafka, Flink
ETL & Orchestration: Apache Airflow, dbt, Informatica, AWS Glue
Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, EMR), GCP (BigQuery, Dataflow), Azure (Data Factory)
Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Snowflake, Redshift, BigQuery
Data Modeling & Frameworks: Dimensional Modeling, Star/Snowflake Schema, Data Vault 2.0
Containers & CI/CD: Docker, Kubernetes (basic), Git, Jenkins
Reporting & Visualization: Power BI, Tableau, Looker
Other Tools: Terraform (basic), Jupyter, Pandas, NumPy, REST APIs, JSON, Parquet, Avro
Professional Experience
Data Engineer Comcast
Aug 2023 – Present
Developed and maintained enterprise grade ETL pipelines using Python, Spark, and Airflow, processing over 2TB of log and event data daily from customer devices and applications.
Built real time data ingestion and streaming pipelines using Kafka and Spark Streaming, reducing data latency by 75% for downstream reporting tools.
Designed and implemented data lake architecture using AWS S3, Glue, and Redshift, enabling analysts and data scientists to self serve high quality datasets.
Led a performance tuning initiative across ETL pipelines, reducing pipeline runtimes by 40% by optimizing Spark jobs, partitioning strategies, and storage formats (Parquet).
Deployed automated data validation and quality checks using Great Expectations and Airflow hooks, ensuring >98% data integrity across all ingestion layers.
Collaborated closely with data scientists to provision feature stores and ML ready datasets, accelerating model deployment cycles by 30%.
Contributed to migrating legacy workflows from on prem Hadoop to AWS EMR and Glue, achieving cost savings of ~$10K/month.
Data Analyst Deloitte
Aug 2020 – Aug 2022
Analyzed large client datasets (5M+ records) across financial, healthcare, and retail industries using Python (Pandas) and SQL, delivering insights that led to 15%+ revenue optimization for multiple clients.
Created interactive Power BI dashboards used by senior leadership, which replaced weekly manual reports and reduced reporting time by 60%.
Designed and implemented data extraction workflows for unstructured client data (PDFs, Excel, API), automating reporting pipelines and increasing accuracy by 25%.
Participated in the AWS data migration team, helping clients transition from on-prem SQL Server to Redshift, including schema conversion and ETL validation.
Supported data modeling efforts by building dimensional models and materialized views for reporting use cases.
Coordinated with cross functional teams including developers, PMs, and client stakeholders to ensure timely delivery of analytics solutions.
Projects
IoT Real Time Data Pipeline for Operations Monitoring
Tools: Kafka, Spark Streaming, Airflow, AWS (S3, Redshift)
Built and deployed a fault tolerant streaming pipeline that ingested telemetry data from 500K+ devices, processed with Spark Streaming, and stored in Redshift for analytics.
Enabled real time dashboarding and alert systems with <5 second data latency.
Cloud Native Data Warehouse Optimization
Tools: Snowflake, dbt, Airflow, Python
Refactored slow SQL transformations using dbt; redesigned schema models into star schema format.
Integrated Airflow to run incremental models and automated lineage documentation.
Reduced monthly Snowflake costs by 20% through partitioning, clustering, and tuning.
Finance Data ETL Pipeline Automation
Tools: Python, SQL, AWS Lambda, S3
Developed serverless data pipeline using Lambda functions to ingest, validate, and push financial data to S3 daily.
Replaced manual Excel workflows and improved reporting SLA from 24 hours to 30 minutes.
Certifications
AWS Certified Data Engineer – Associate
Databricks Certified Data Engineer Associate
Confluent Certified Developer for Apache Kafka
Microsoft Azure Data Engineer Associate
Education
Master of Arts in Information Technology and Management
Webster University, St. Louis, MO, USA Aug 2022 – May 2024