Data Engineer Science

Location:

Ridgefield, CT

Posted:

October 15, 2025

Contact this candidate

Resume:

BHARGAVI KOMMI

Sr Data Engineer

203-***-**** **************@*****.*** linkedin

Professional Summary

• Seasoned Data Engineer with over 5 years of experience delivering robust, scalable, and cloud-native data solutions across Azure and AWS ecosystems.

• Specialized in building real-time and batch data pipelines using tools like Azure Data Factory, AWS Glue, Apache Spark, Kafka, and Databricks.

• Proven success in implementing secure, cost-efficient, and high-performance architectures for clients in e-commerce, finance, and retail banking.

• Hands-on experience enabling AI/ML workflows by delivering clean, versioned, and feature-rich datasets for data science teams using PySpark, dbt, and Delta Lake.

• Skilled in collaborating with cross-functional teams including data scientists, product managers, compliance officers, and DevOps engineers to ensure business-aligned and compliant data delivery.

• Adept at implementing data governance, security, and compliance standards such as RBAC, SOX, and data lineage through tools like Lake Formation, Azure Purview, and CloudWatch.

• Experienced in building metadata-driven frameworks for reusable ingestion, transformation, and monitoring to support scalable analytics and reporting platforms.

• Strong analytical and communication skills with a focus on driving data engineering best practices, mentoring junior engineers, and supporting agile delivery cycles. Education

University of New Haven Jan 2023 – Dec 2024

Masters in data science West Haven, CT, USA

Certifications

• Microsoft Certified: Azure Data Engineer Associate

• AWS Certified: Data Engineer Associate

Experience

JP Morgan & Chase Co. Feb 2024 – Present

Sr Data Engineer New York, USA

• Leading the build of a data platform on Azure for real-time risk and compliance analytics using Event Hubs, ADLS Gen2, and Azure Functions.

• Collaborating with quant teams and ML engineers to streamline model input data pipelines for market risk and stress testing simulations.

• Engineered end-to-end workflows in Azure Data Factory and orchestrated machine learning batch inference using Synapse and Databricks.

• Developed Delta Lake-based datasets with change data capture (CDC) to track transaction-level audit trails.

• Enabled business teams with self-service analytics through curated Power BI datasets, row-level security, and governance tagging.

• Optimized Spark clusters for heavy workloads, reducing job latency by 45% through caching, partition tuning, and adaptive execution.

• Automated metadata cataloging and schema validation workflows with Azure Purview and dbt.

• Integrated pipeline health dashboards with Azure Monitor and Application Insights to support 24x7 data availability SLAs.

• Worked across data governance, risk compliance, and legal teams to enforce lineage, consent, and data residency controls.

• Provided ongoing mentoring and code reviews, promoting modular and reusable code for ingestion and transformation layers.

(TCS) Citi Bank May 2021 – Jan 2023

AWS Data Engineer Hyderabad, India

• Designed and developed secure, scalable ETL pipelines in AWS Glue and PySpark to support regulatory reporting across multiple financial product lines.

• Partnered with data science teams to deliver engineered features for credit risk scoring and customer churn modeling.

• Processed batch and streaming data using S3, Kinesis, Lambda, and Step Functions to enable near real-time transaction monitoring.

• Optimized data pipeline performance and cost using partitioning, bucketing, and lifecycle policies on S3.

• Implemented a reusable metadata-driven ingestion framework to handle 100+ data sources with schema drift handling.

• Integrated Lake Formation and IAM policies to ensure granular access control and encryption compliance

(SOX/PCI-DSS).

• Built visualizations using QuickSight to help compliance teams monitor high-risk transactions and anomalies.

• Worked closely with DevOps and security teams to integrate pipelines with centralized monitoring and secrets management via CloudWatch and AWS Secrets Manager.

• Coordinated with QA and business analysts to build test data strategies and validate pipeline outputs in sandbox environments.

• Enabled experimentation with ML fraud detection models by provisioning training data snapshots through version-controlled S3 buckets.

Flipkart May 2019 – April 2021

Data Engineer Mumbai, India

• Built a unified data ingestion framework using Apache Spark and Hive on Hadoop to process 20TB+ of user activity data daily for Flipkart’s product recommendation engine.

• Collaborated with data scientists to preprocess and transform training datasets for personalization models using PySpark.

• Integrated streaming data from Kafka and Flume to enable near real-time user behavior tracking and A/B testing analytics.

• Designed and implemented star schema models for the Product and Customer domains to support executive dashboards.

• Contributed to Flipkart’s transition to a data lake architecture by organizing raw, processed, and curated zones on HDFS.

• Enabled fraud detection and inventory anomaly use cases through anomaly detection features built using Spark MLlib.

• Automated pipeline validation, data profiling, and anomaly detection workflows using Python and Airflow.

• Worked in cross-functional teams with product managers and analysts to ensure feature data requirements met business KPIs.

• Conducted proof-of-concept for migrating analytics workloads from Hive to Presto for faster reporting.

• Mentored junior data engineers on Spark best practices and code modularization for reusable ingestion logic. Technical Skills

Languages: Python, SQL, Java, Scala, Shell Scripting, Bash, C++, YAML, HTML/CSS, JavaScript Cloud Platforms: AWS (Glue, S3, Lambda, Redshift, EMR, Step Functions, Kinesis, Athena, Lake Formation), Azure

(Data Factory, Databricks, Synapse Analytics, ADLS Gen2, Event Hubs, Key Vault, Azure Functions), GCP (BigQuery, Dataflow, Pub/Sub)

Data Engineering Tools: Apache Spark, PySpark, Apache Kafka, Apache Hive, Apache Hudi, Apache Flink, Sqoop, Oozie, Airflow, NiFi, StreamSets, dbt, Talend, Informatica, SSIS Data Warehousing & Databases: Snowflake, Redshift, BigQuery, Azure Synapse, Teradata, Oracle, PostgreSQL, MySQL, SQL Server, HBase, MongoDB, DynamoDB

DevOps & Automation: Jenkins, GitHub Actions, Azure DevOps, Docker, Kubernetes, Terraform, Ansible, Maven, Gradle, CI/CD Pipelines

Monitoring & Logging: CloudWatch, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Azure Monitor, Datadog

Data Science & Analytics: Pandas, NumPy, scikit-learn, MLlib, Power BI, Tableau, QuickSight, Jupyter, SageMaker, MLflow

Version Control & IDEs: Git, GitHub, Bitbucket, VS Code, JupyterLab, IntelliJ, Eclipse, PyCharm Data Governance & Security: Azure Purview, AWS Lake Formation, Collibra, Alation, Ranger, RBAC, IAM, Data Encryption, GDPR/SOX Compliance

Contact this candidate