Data Engineer Senior

Location:

Plano, TX

Salary:

90000

Posted:

October 15, 2025

Contact this candidate

Resume:

Harsha Gudapati

859-***-**** — *************@*****.*** — linkedin.com/in/gharshasree

Summary

•Senior Data Engineer with 4+ years of experience delivering high-performance cloud-native data platforms across Finance, Healthcare, and Pharma.

•Hands-on in Azure (ADF, Databricks, Synapse, ADLS, Purview, Key Vault), AWS (S3, Glue 4.0, EMR 6.x, Redshift RA3, Lambda, EventBridge, Athena), and GCP (Data Proc, BigQuery, Cloud Functions).

•Expertise in Spark 3.x (PySpark/Scala), Kafka 3.x, and Hadoop ecosystem (Hive, Sqoop, HBase, Pig, NiFi)

for batch and streaming ETL/ELT.

•Built governed lakehouse solutions with Delta Lake and dbt, processing 10+ TB/day and supporting enterprise reporting for 50+ dashboards.

•Optimized pipelines to cut compute cost by 20%, reduced query runtimes by 40%, and achieved

lessthan5slatencyforstreamingworkloads.

•Implemented data governance, lineage, and compliance (HIPAA, GDPR, FINRA) with Unity Catalog, Purview, IAM least-privilege, and end-to-end encryption.

•Deployed and orchestrated pipelines with Airflow 2.x and Logic Apps; automated releases with Git, Jenkins, GitHub Actions, and Terraform.

•Containerized jobs with Docker and Kubernetes; integrated semi-structured data from MongoDB and Cosmos DB.

•Delivered insights via Power BI, Tableau, and QuickSight; collaborative Agile/Scrum partner and mentor with strong cross-team communication skills.

Education

Masters in Information Technology Cincinnati, OH

University of Cincinnati

Experience

Senior Data Engineer Jan 2025 – Present

Bank of America Charlotte,NC

•Led the design and development of secure ETL/ELT pipelines using Azure Data Factory and Databricks, integrating data from SQL servers, APIs, and streaming Kafka sources.

•Engineered real-time streaming pipelines with Kafka and Spark Structured Streaming, achieving <5s latency for high-volume financial transaction monitoring.

•Implemented enterprise-wide data governance and lineage with Unity Catalog and Azure Purview, ensuring compliance with FINRA and GDPR regulations.

•Containerized Spark ETL jobs with Docker and orchestrated them via Kubernetes, enabling elastic scaling and cost optimization.

•Introduced dbt models and Delta Lake architecture on Databricks, improving governance and reproducibility.

•Optimized data warehouses in Azure Synapse and Snowflake, reducing query runtime by 40% through partitioning, clustering, and performance tuning.

•Built and maintained CI/CD pipelines with GitHub Actions and Jenkins, automating testing, deployment, and monitoring of data workflows.

•Designed and delivered interactive Power BI dashboards, enabling executives to track KPIs, revenue metrics, and operational performance.

•Partnered with InfoSec teams to enforce IAM least-privilege policies, encryption, and Azure Key Vault

integration for sensitive data assets.

•Mentored junior engineers on best practices for Databricks notebooks, version control with Git, and reproducible data engineering workflows.

Data Engineer April 2024 – Dec 2024

Baxter Health Cary,NC

•Built and automated batch + streaming ETL pipelines using AWS Glue, S3, Redshift, and EMR, processing millions of healthcare claims and patient records.

•Ingested and standardized HL7/FHIR clinical data into HIPAA-compliant data lakes, ensuring secure handling of PHI.

•Developed real-time analytics workflows with Kafka Spark Redshift, improving clinical decision-making for providers.

•Deployed event-driven ingestion pipelines using AWS Lambda and EventBridge, automating schema validation and anomaly detection.

•Delivered real-time healthcare dashboards using QuickSight, enabling executives to monitor patient outcomes and costs.

•Integrated semi-structured data from MongoDB and Cosmos DB into Redshift for unified analytics.

•Migrated legacy ETL jobs to modern Databricks on EMR, improving scalability, reducing runtime by 35%.

•Implemented automated data quality checks with Great Expectations; integrated pipeline monitoring with

CloudWatch and Splunk.

•Created dashboards in Tableau to track patient outcomes, claims processing efficiency, and regulatory KPIs.

•Collaborated with compliance and data science teams to align pipeline designs with HIPAA, FISMA, and organizational policies.

Data Engineer Oct 2021 – Dec 2023

Dr.Reddy’s Laboratories Hyderabad,India

•Developed and optimized ETL pipelines using Azure Data Factory (ADF) and Databricks, integrating ERP and clinical trial data into Azure Synapse Analytics.

•Designed PySpark-based data transformations to process structured/unstructured data from SQL Server, MySQL, and flat files.

•Ensured compliance with GDPR and ICMR regulations by implementing encryption, masking, and access-control policies across datasets.

•Automated end-to-end workflows with Airflow and Azure Logic Apps, enabling real-time alerting for SLA breaches.

•Managed distributed storage on HDFS clusters; improved Spark SQL performance by 30% with partitioning, indexing, and caching.

•Designed Hive tables and used Sqoop imports to migrate ERP data into the data lake.

•Prototyped pipelines on GCP Data Proc and BigQuery for cost-effective analytics of clinical trial datasets.

•Developed interactive reports in Power BI and Tableau, enabling stakeholders to visualize clinical trial progress and pharma operations.

•Implemented metadata-driven ingestion frameworks, reducing onboarding time for new datasets by 25%.

•Collaborated with cross-functional R&D teams, ensuring robust data integration pipelines supported business intelligence and research reporting.

Technical Skills

•Cloud Platforms: Azure (ADF, ADLS Gen2, Synapse, Databricks, Purview), AWS (S3, Glue, Redshift RA3, EMR, Lambda, EventBridge, Athena), GCP (Data Proc, BigQuery, Cloud Functions)

•Big Data & ETL: Apache Spark 3.x (PySpark/Scala), Databricks, Kafka 3.x, Delta Lake, dbt, Hadoop (Hive, Pig, Sqoop, HBase, NiFi), Airflow 2.x,Data Quality process

•Data Warehousing: Snowflake, Azure Synapse, AWS Redshift RA3, BigQuery

•Databases: SQL Server, MySQL, Oracle, PostgreSQL, MongoDB, Cosmos DB

•Languages & Scripting: Python, SQL, PySpark, Scala, Shell Scripting, PowerShell

•Visualization: Power BI, Tableau, QuickSight

•DevOps & Tools: Git, Jenkins, GitHub Actions, Terraform, Docker, Kubernetes

•Methodologies: Agile, Scrum, CI/CD, DataOps

Contact this candidate