HAREN CHOWDARY DOPPALAPUDI
Data Engineer
TX, USA 346-***-**** ************************@*****.***
https://www.linkedin.com/in/haren-chowdary-doppalapudi/
PROFESSIONAL SUMMARY
Data Engineer with 4+ years of experience architecting cloud-native, real-time, and scalable data solutions across healthcare and finance domains. Proficient in building robust ETL/ELT pipelines using Talend, AWS Glue, and dbt; and in implementing real-time data streaming with Apache Kafka and Beam. Expertise spans modern data stacks including Spark, Snowflake, Airflow, and Databricks, with hands-on experience in deploying ML-ready pipelines and orchestrated workflows on AWS (S3, Glue, Lambda, Redshift, EMR) and Azure (Synapse, Blob Storage). Skilled in Python, SQL, and Java, with a strong grounding in data governance, analytics, and DevOps using Docker, Terraform, Jenkins, and Kubernetes. Adept at supporting end-to-end data initiatives from ingestion and transformation to visualization and machine learning, leveraging tools like Tableau, Power BI, ML flow, and Scikit-learn.
TECHNICAL SKILLS
Languages
Python, SQL, Java, R
Cloud Platforms
AWS (S3, Lambda, Glue, Redshift, EMR), Azure (Synapse, Blob)
Big Data & Processing
Apache Spark, Kafka, Beam, Hive, Hadoop, Databricks
Data Orchestration
Apache Airflow, AWS Step Functions, dbt
Databases & Warehouses
Snowflake, Redshift, PostgreSQL, MySQL, MongoDB, Big Query
ETL & Integration
AWS Glue, Talend, Informatica, SSIS
CI/CD & DevOps
Jenkins, GitHub Actions, Docker, Kubernetes, Terraform
Data Visualization
Tableau, Power BI, Excel (Advanced), Looker
ML/DS Tools
Scikit-learn, TensorFlow, Pandas, NumPy, Matplotlib, MLflow
Tools & IDEs
VS Code, Jupyter, PyCharm, IntelliJ, Git
PROFESSIONAL EXPERIENCE
Senior Data Engineer CVS Health – Texas Sep 2024 – Present
Built scalable data ingestion pipelines using AWS Glue and Redshift to process over 50 TB of clinical data, increasing reliability by 60% and enabling near real-time access for care coordination and patient analytics.
Developed streaming ingestion workflows using Apache Kafka, AWS S3, and Snowflake, ensuring sub-minute data availability to support live alerting for critical care teams and hospital operations.
Migrated historical clinical and financial datasets using Python and SQL, applying schema normalization, outlier handling, and value imputation to ensure consistency across analytical models and governance dashboards.
Built real-time data pipelines using Apache Beam (Dataflow) and Big Query on Google Cloud, achieving sub- second processing of high-frequency transactional data for care quality monitoring and provider scoring.
Automated ML pipelines using TensorFlow and MLflow, enabling scheduled retraining and performance monitoring for models predicting chronic disease progression and readmission risks.
Orchestrated data workflows using Apache Airflow and AWS Step Functions, improving fault tolerance, SLA tracking, and coordination between batch and streaming processes in distributed data systems.
Developed CI/CD workflows with GitHub Actions, Jenkins, and Terraform to deploy infrastructure and ETL processes across dev, staging, and production, reducing release effort and improving consistency.
Stored HL7 and FHIR-compliant patient interaction records in AWS Lake Formation, supporting schema-on- read access and simplifying compliance for downstream applications and reporting pipelines.
Designed clinical dashboards in Tableau to monitor patient risk scores, length of stay trends, and treatment effectiveness, enabling faster decisions by operational and care delivery leaders.
Integrated Great Expectations into validation layers to enforce schema, null, and range checks across 300+ datasets, improving trust in analytics and data products used by population health teams.
Delivered analysis-ready features and datasets for time-series forecasting, patient segmentation, and care quality models by collaborating with data scientists and analysts across multiple business verticals.
Used dbt to modularize and version SQL transformations, integrating with Airflow for scheduling and lineage tracking across patient-level and department-level data marts.
Captured metadata, lineage, and access policies using AWS Data Catalog to support HIPAA compliance, internal audits, and self-service data access for clinical analytics teams.
Data Engineer HCL Technologies – India Jun 2020 – Jul 2023
Migrated enterprise data warehouses from legacy SQL systems to Hadoop and Apache Spark, reducing analytical query latency by 40%while enabling petabyte-scale processing for investment risk and compliance workloads.
Automated ingestion of 100M+ daily records using AWS Lambda, Talend, and S3, streamlining ingestion of financial market feeds and transactional logs into the Hadoop ecosystem for unified data processing.
Engineered Kafka-based pipelines using Python for real-time trading data ingestion, achieving 5-second latency for fraud detection, anomaly classification, and rapid response by compliance and audit teams.
Implemented low-latency classification of trade events using Apache Beam and HDFS, ensuring regulatory triggers and surveillance alerts were generated in near real-time for Tier-1 financial operations.
Orchestrated batch and streaming workflows with Apache Airflow, embedding retry logic, alerts, and dynamic scheduling across SLA-sensitive ETL jobs, ensuring reliable and transparent data movement.
Integrated relational systems like MySQL and PostgreSQL with Hive and HDFS to support consistent cross- source analytics, improving accessibility for analysts across product, audit, and risk departments.
Built scalable forecasting and anomaly detection models using Azure Databricks and Spark MLlib, enabling risk teams to predict asset anomalies and mitigate exposure in volatile portfolios.
Tuned and monitored Spark jobs running on Azure to optimize cluster performance, improve memory usage, and meet SLA-bound data delivery targets during high-volume processing windows.
Created Snowflake-based data marts with granular role-based access for financial reporting, enabling automated refreshes, schema versioning, and secured access for regulators and internal auditors.
Designed internal dashboards using Excel (Advanced) to visualize investment patterns, product uptake, and exposure KPIs, supporting executive decisions without third-party BI dependency.
Developed modular SQL workflows using dbt, standardizing transformation logic, promoting reuse across compliance and finance teams, and reducing downstream reporting inconsistencies.
Applied unsupervised ML techniques like K-means clustering and Association Rule Mining to surface behavioral patterns among retail investors, guiding experimentation in new product offerings.
Deployed containerized ETL components using Docker and streamlined releases through Jenkins pipelines, improving reproducibility and reducing environment drift across dev, QA, and production stages.
Maintained data lineage, schema evolution, and access controls using AWS Data Catalog, ensuring governance alignment and full traceability across all critical financial data assets.
EDUCATION
Master of Science in Data Science (Set 2023 – Dec 2024)
New Jersey Institute of Technology
Bachelor of Technology in Computer Science Engineering (July 2017 – June 2021)
Maharishi University of Information Technology