Azure Data Engineer

Location:

Arlington, TX

Posted:

September 12, 2025

Contact this candidate

Resume:

KAVYASREE JAKKAMPUDI

Senior Data Engineer

Phone : 682-***-****

Mail : *************@*****.***

LinkedIn: www.linkedin.com/in/kavyasree-jakkampudi-038050111

SUMMARY:

Over 10 years of IT experience, including 5+ years in ETL development, data warehousing, and database optimization.

Strong expertise in Azure Data Factory, Databricks, Synapse, and MS SQL Server, with proven ability to support analytics teams in retrieving, summarizing, and transforming data from multiple sources.

Skilled in SQL Server scripting, data modeling, and database performance tuning in both on-premises and Azure cloud contexts. Adept at working closely with SQL Server DBAs, system architects, and application developers to design stable, reliable, and optimized databases.

Demonstrated ability to manage multiple priorities, concurrent requests, and fast-paced project timelines while ensuring data quality and compliance.

Strong hands-on expertise with Azure Cloud services: Azure Data Factory, Databricks, Logic Apps, Function Apps, Synapse Analytics, Analysis Services, Azure Blob Storage, Azure Data Lake (ADLS Gen2), and Azure DevOps.

Integrated complex on-premises data sources with Azure, utilizing Data Lake Storage and Blob Storage for efficient cloud migration and data management.

Designed and deployed scalable, reliable data pipelines using Azure Data Factory activities such as data ingestion, transformation, stored procedures, and custom tasks.

Expertise in designing and managing Azure Data Factory (ADF) pipelines for orchestrating complex ETL and ELT workflows across cloud and hybrid environments.

Leveraged Azure Databricks to drive data transformation and analytics, enabling data-driven decision-making across business functions.

Proficient in developing Spark pipelines using both Scala and PySpark, supporting large-scale data processing and complex data transformations.

Created advanced PySpark scripts for exploratory and statistical analysis to support actionable business insights.

Automated business processes using Azure Logic Apps and Azure Functions, enhancing operational efficiency and system integrations. Managed Azure DevOps for CI/CD workflows and infrastructure automation.

Utilized Azure Synapse Analytics for data integration, modeling, exploration, machine learning, and business intelligence initiatives.

Hands-on experience managing secrets and keys securely using Azure Key Vault across integrated Azure services.

Configured Azure Active Directory (Azure AD) for role-based access control (RBAC), identity management, and secure authentication across enterprise environments.

Managed scalable storage solutions using Azure Blob Storage for secure data storage, archival, and seamless integration with Azure-based services.

Implemented enterprise-wide data governance using Azure Purview, ensuring compliance, metadata management, and data cataloging. Developed ingestion frameworks supporting multiple file formats (Avro, Parquet, CSV, JSON, ORC) for optimized storage and querying.

Extensive experience with Snowflake, including Snowpipe and SnowSQL for automated ingestion, query optimization, data transformation, and platform administration.

Built and orchestrated data pipelines using Apache Airflow, employing DAGs for robust task scheduling, monitoring, and error handling.

Designed and optimized complex SQL queries in Snowflake and Synapse to analyze large datasets and support business intelligence efforts. Built efficient Power BI and Tableau dashboards for reporting, leveraging DAX for advanced calculations and dynamic visualizations.

Strong experience in Data Modeling, applying Star Schema, Snowflake Schema, and dimensional modeling techniques to support data warehouse and reporting solutions.

Created and maintained data models using ERwin Data Modeler, documenting data structures, generating DDL scripts, and supporting data architecture initiatives.

Practical expertise with Hadoop ecosystem tools: HDFS, Hive, Tez, MapReduce, Apache Sqoop, Oozie, Kafka, and Spark for large-scale data processing. Engineered real-time streaming pipelines using Kafka and Azure Event Hubs, enabling real-time data analytics and messaging architectures.

Optimized ETL performance using best practices in encryption, partitioning, indexing, and query optimization for both batch and streaming workloads.

Delivered enterprise data solutions using Informatica and SSIS, facilitating complex ETL processing and enterprise-wide data integrations.

Experienced in Data Warehousing (EDW) solutions, designing and implementing Data Marts and Data Lakes using Azure Synapse, SQL Server, and Snowflake.

Managed source control and collaboration via Git/GitHub, alongside CI/CD pipeline automation using Jenkins.

Active participant in Agile methodologies and full Software Development Life Cycle (SDLC) for continuous improvement and project delivery.

ADDITIONAL INFORMATION:

Cloud Computing : Azure Data Factory(ADF), Azure Data Lake Storage(ADLS), Azure Synapse Analytics (SQL Data warehouse), Azure SQL Database, Azure Data bricks, Polybase, Azure Cosmos DB, Azure Key vaults, Azure DevOps, Functional Apps, Logic Apps, Azure Preview.

Languages : Python, SQL, HiveQL, PySpark, Scala, PIG, Javascript

Databases : MYSQL, Oracle, MS-SQL Server, Teradata, PostgreSQL, MongoDB.

BIG Data : SQOOP, Hive, HBase, Flume, Hadoop, Kafka, Apache Spark, Horton Works, Cloudera, Keycloak.

CI/CD Tools : Terraform, Apache Airflow, Jenkins

Version Control : Git, Bitbucket

Automation Tools : Maven, SBT

File Formats : CSV, JSON, XML, ORC, Parquet, Delta

Other Tools : Visual Studio, SQL Navigator, SQL Server Management Studio, Eclipse, Postman

EDUCATION:

Bachelor of Technology – KLU, 2013 May

CERTIFICATIONS:

Azure Fundamentals (DP-900)

Azure Data Engineer Associate (AZ-203)

Certified Snowflake SnowPro Core

IBM Data Science

AWS Solution Architect

Oracle Certified Foundations Associate

WORK EXPERIENCE:

Senior Azure Data Engineer Sep 2022 – Till Now

Cardinal Health Inc, Dublin, OH

Responsibilities:

Supported data analytics teams by retrieving and summarizing data from multiple healthcare data sources, applying advanced SQL and data transformation techniques.

Developed and optimized ETL workflows to integrate data from relational, semi-structured, and unstructured sources into Azure SQL, Synapse, and Data Lake.

Designed stable and optimized SQL Server databases in collaboration with DBAs and architects, ensuring high performance and scalability in an Azure environment.

Applied data warehousing principles in building analytical datasets, improving reporting accuracy and performance.

Managed multiple simultaneous requests and priorities, consistently delivering high-quality solutions under tight deadlines.

Designed and developed metadata driven Azure Data Factory (ADF) pipelines to perform complex data integration workflows, implementing ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processing patterns

Integrated on-premises healthcare databases (MySQL, Cassandra) with cloud-based storage solutions (Azure SQL DB, Azure Blob Storage) using Azure Data Factory (ADF) to process clinical data, EHR records, claims data, and medical billing information for downstream analytics.

Developed Azure Databricks Notebooks to extract, transform, and load healthcare data in various formats including HL7, FHIR, CCD, JSON, XML, CSV, and Parquet into Azure Data Lake and external data warehouses for patient care analytics.

Migrated large volumes of clinical and operational data from on-premises healthcare systems to Azure SQL, Azure Synapse Analytics, and Azure Data Lake Storage Gen2 (ADLS Gen2), ensuring compliance with HIPAA regulations.

Designed scalable ingestion pipelines in Azure Data Factory (ADF) using parameterized datasets and linked services to enable secure, efficient ingestion of EHR and patient claims data into Azure cloud storage.

Implemented event-driven Azure Data Factory (ADF) pipelines for both real-time and batch ingestion of healthcare transactional data, ensuring timely data availability for clinical reporting.

Optimized Azure Synapse Analytics workloads for healthcare data analytics, supporting complex reporting on patient outcomes, readmission rates, and healthcare utilization.

Designed healthcare data transformation workflows using Azure Databricks, Spark SQL, and Python/Scala, processing sensitive PHI and storing curated datasets in ADLS Gen2 and Azure Blob Storage.

Automated incremental data loading of healthcare datasets using watermarking strategies in Azure Data Factory (ADF), reducing data refresh processing time and supporting near real-time clinical insights.

Tuned Azure Stream Analytics jobs to enable low-latency processing of real-time patient monitoring data streamed from medical devices and health apps via Azure Event Hubs.

Developed custom Apache Airflow DAGs in Python for orchestrating ingestion, transformation, and validation tasks on clinical and operational healthcare data.

Developed scalable PySpark ETL pipelines in Azure Databricks to process over 10 TB of healthcare data monthly, including EHR, claims, and lab results, enabling faster analytics for clinical reporting.

Built custom data transformation scripts using Python and Spark SQL to clean and normalize sensitive PHI/PII datasets, improving data quality and regulatory compliance for downstream healthcare applications.

Automated healthcare data workflows using Python-based Apache Airflow DAGs, orchestrating ingestion and processing tasks across Azure Data Lake and Azure Databricks environments.

Applied advanced PySpark performance tuning techniques (caching, broadcast joins, partitioning) to optimize large-scale healthcare data processing, reducing execution times by 30%.

Created external tables and stored procedures in Azure Synapse Analytics for efficient querying of medical records and healthcare claims data. Automated business workflows using Azure Logic Apps, streamlining system integrations and process orchestration across diverse enterprise applications.

Secured sensitive PHI/PII data using Azure Key Vault and implemented RBAC within Azure Data Factory (ADF) pipelines for controlled access to confidential healthcare datasets.

Designed healthcare data warehousing solutions including data cleansing, surrogate key generation, SCD Type 2 handling, and CDC implementations within Snowflake to support clinical and financial analytics.

Built real-time ingestion pipelines for medical device data, appointment scheduling systems, and lab results using Apache Kafka, Flume, and NiFi, improving healthcare data availability for downstream analytics.

Leveraged Delta Live Tables (DLT) for continuous ingestion and transformation of EHR and patient-generated health data, ensuring data integrity and consistency across analytics platforms.

Architected and maintained Delta Lake Medallion Architecture across Bronze, Silver, and Gold layers, processing 50M+ patient records and 1M+ monthly appointments from EMR systems, optimizing resources and reducing cluster costs by 25%.

Designed and automated 25+ PySpark ETL pipelines in Azure Databricks for HL7, FHIR, CCD, JSON, CSV, XML, and Parquet data, handling 10+ TB of EHR, claims, and telemetry monthly; reduced manual prep time by 40%, ensuring HIPAA compliance with Unity Catalog and Azure Key Vault.

Leveraged Delta Live Tables (DLT) and Azure Databricks streaming to process 200M+ annual healthcare transactions, boosting pipeline efficiency by 30% and reducing ETL time by 35% via Spark SQL tuning, caching, and partitioning for near real-time patient insights.

Integrated and automated data transformations using Data Build Tool (DBT) within Azure Databricks and Synapse environments for modular data pipeline development.

Built 10+ Gold layer datasets integrated with Power BI, supporting predictive analytics for risk scoring, readmissions, and population health; collaborated on MLflow models to reduce intervention delays by 20%, managing secure end-to-end pipelines with Azure Data Lake and ADLS Gen2.

Leveraged Fabric Dataflows Gen2 to streamline ingestion and transformation of structured and unstructured healthcare data (HL7, FHIR, CCD, CSV, JSON), reducing data processing time by 30% and improving care insights.

Built and managed real-time data pipelines using Delta Live Tables (DLT) in Azure Databricks, ensuring continuous data ingestion, transformation, and monitoring.

Integrated Microsoft Fabric OneLake with Power BI to deliver centralized, unified views of clinical and operational metrics, enhancing decision-making for hospital leadership and improving reporting accuracy by 25%.

Designed and optimized healthcare-specific data models in CosmosDB, supporting fast retrieval of patient records and clinical encounter data, improving query performance by 20% for real-time healthcare applications.

Configured Azure Event Hubs to enable near real-time ingestion of medical device telemetry, appointment scheduling data, and clinical alerts, improving streaming data availability and supporting real-time analytics.

Led troubleshooting and performance optimization efforts across Azure Data Factory, Azure Databricks, and SQL environments, resolving pipeline failures, addressing data inconsistencies, and reducing healthcare data processing times by 35% through targeted tuning and root cause analysis.

Deployed and maintained Azure Functions and Container Apps within a complex security ecosystem for healthcare data workflows, ensuring compliance with HIPAA and enterprise governance standards.

Implemented CI/CD pipelines using Azure DevOps, ARM templates, and Terraform, deploying secure healthcare data ingestion and transformation code across multiple environments.

Applied 3–5 years of experience deploying and securing data pipelines and applications within enterprise security frameworks, including RBAC, encryption, and Azure Key Vault integration.

Developed Python-based test frameworks and automated QA pipelines to validate healthcare ETL processes, ensuring data accuracy, completeness, and regulatory compliance

Deployed Azure Functions for event-driven transformations of clinical data sourced from APIs, HL7/FHIR messages, and legacy healthcare systems. Automated CI/CD pipelines using Azure DevOps, ARM templates, and terraform for healthcare data ingestion, processing, and reporting applications.

Streamlined the deployment of Apache Airflow DAGs using containerized pipelines via Docker, Kubernetes, and Azure DevOps, supporting scalable orchestration of healthcare ETL workloads.

Optimized healthcare analytics data models in Snowflake, Apache Hive, and HBase, supporting efficient storage and querying for operational and clinical performance reporting.

Built and maintained secure, scalable ETL pipelines processing medical and operational datasets using Spark, and SQL, supporting population health analytics, readmission reduction programs, and patient safety initiatives.

Collaborated with BI and clinical informatics teams to develop Power BI dashboards tracking key healthcare metrics such as patient throughput, claims processing times, and appointment scheduling efficiency.

Ensured data security and regulatory compliance (HIPAA, PHI protection) through Azure Key Vault, Unity Catalog, RBAC, ACLs, and encryption mechanisms.

Integrated healthcare-specific data formats (HL7, FHIR, CCD, CSV, JSON, Parquet) into analytics pipelines, enabling interoperability between EPIC systems and Azure cloud platforms.

Designed event-driven workflows using Azure Logic Apps and Azure Functions to automate ingestion of real-time lab results, clinical notes, and patient registration data.

Developed custom JavaScript functions within Azure Data Factory (ADF) to perform advanced data validation, transformation, and error handling across ETL pipelines.

Leveraged JavaScript in Azure Functions for event-driven data processing, enabling real-time transformation of healthcare datasets before storing them in ADLS Gen2.

Environment: Azure Data Factory (ADF), Azure Data Lake Storage Gen2 (ADLS Gen2), Azure SQL Database, Azure Blob Storage, Azure Databricks, Delta Live Tables (DLT), Azure Synapse Analytics (Serverless SQL Pools), Microsoft Fabric, Azure Key Vault, Azure DevOps, Snowflake, Power BI, SQL, Change Data Capture (CDC), GitHub, Azure Databricks Repos, Databricks Audit Logs, Role-Based Access Control (RBAC), SOX Compliance, PCI DSS Compliance.

Azure Data Engineer May 2019 – Aug 2022

Akamai Technologies, New Jersey, NJ

Responsibilities:

Designed and managed scalable data pipelines using Pl), Azure Databricks, Azure Synapse Analytics, and Azure SQL Database to support global cloud and CDN operations.

Engineered metadata-driven Azure Data Factory (ADF) pipelines for ETL/ELT processes, optimizing ingestion from on-premises SQL, MySQL, and Cassandra to Azure Data Lake and Synapse.

Implemented data processing workflows in Azure Data Factory pipelines, leveraging activities such as Copy to seamlessly move data between diverse sources and destinations, filter to extract specific subsets of data based on defined conditions, and for each to automate iterative tasks, optimizing data processing efficiency and enhancing workflow automation.

Migrated large-scale datasets from on-premises systems to Azure Data Lake Storage Gen2 (ADLS Gen2), Azure SQL DB, Azure Databricks, and Azure Synapse SQL Pools, streamlining Akamai’s cloud migration strategies.

Developed and optimized PySpark scripts and Spark SQL transformations in Azure Databricks for high-performance data processing.

Enhanced Spark job efficiency via partitioning, broadcast joins, and caching, ensuring resource optimization across Akamai’s data pipelines.

Managed Linked Services and Self-hosted Integration Runtime (SHIR) configurations for secure and reliable data transfers across hybrid environments.

Built scalable Delta Lake architectures to enable efficient storage and real-time analytics for telemetry and CDN traffic data.

Integrated real-time streaming data pipelines using Azure Event Hubs, Azure Stream Analytics, and Azure Databricks to support Akamai’s CDN performance monitoring.

Designed telemetry ingestion frameworks capturing IoT data from edge devices using Azure IoT Hub, storing large-scale data in Azure Cosmos DB and Blob Storage.

Automated end-to-end pipelines using Apache Airflow and DAGs, ensuring reliable orchestration and monitoring of distributed workloads.

Managed enterprise data assets using Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Synapse, ensuring proper metadata tagging, cataloging, and lifecycle management while implementing data governance frameworks and access controls for secure and compliant data management across Azure cloud environments.

Designed and optimized scalable data pipelines in Azure Databricks using PySpark, processing large-scale CDN and telemetry data with performance tuning (partitioning, broadcast joins, caching), and enforced data validation, anomaly detection, and automated data profiling using Python and SQL scripts to ensure data quality and integrity across Azure Data Lake, Synapse, and reporting layers.

Integrated Snowflake with Azure Data Factory pipelines for scalable, cross-cloud data storage and ELT processes, automated end-to-end workflows using Apache Airflow and ADF, and developed interactive Power BI dashboards for real-time visualization of CDN traffic analytics, latency trends, and Akamai’s global infrastructure KPIs.

Developed CI/CD pipelines for Python applications using Azure Pipelines and Jenkins, integrating code deployments and automated testing.

Conducted Proof of Concept (POC) initiatives for evaluating and optimizing Azure Databricks for data processing and CDN performance analytics.

Applied advanced data modeling within Azure Synapse Analytics, including partitioning, indexing, and PolyBase integration to enhance query performance.

Implemented security best practices for Azure SQL DB, Synapse Analytics, and Cosmos DB, managing permissions via SSMS and Role-Based Access Control (RBAC). Built event-driven architectures supporting Akamai’s security and CDN analytics using Azure Event Hubs.

Developed modular Python scripts to automate data ingestion, transformation, and reporting tasks, integrating them within Azure Data Factory and Databricks workflows.

Implemented Python-based CI/CD pipelines using Azure DevOps and Jenkins for automated testing, deployment, and monitoring of data pipelines.

Leveraged Power BI for interactive reporting and visualization of global CDN traffic, performance metrics, and operational KPIs.

Proficient in Azure Blob Storage, Azure Data Lake Storage Gen2, Azure HDInsight, Azure Functions, and Azure Logic Apps for data processing and workflow automation.

Enabled unified querying across diverse data sources using PolyBase to streamline data access and reporting.

Managed source code using GitHub, ensuring version control and facilitating code collaboration within cross-functional teams.

Collaborated with Product Owners and Business Analysts to capture requirements and design data solutions aligned with Akamai’s cloud and CDN service offerings.

Led Agile ceremonies (sprint planning, reviews, retrospectives) and daily Scrum meetings to ensure project delivery and alignment with business goals.

Environment: Azure Data Lake Storage gen2, Azure Data Factory, Azure Purview, Azure Event Hubs, Azure SQL Server, Azure Synapse Analytics, Azure Blob Storage, Microsoft Fabric, Azure Key Vault, Azure Logic Apps, Azure Functional Apps, Azure Analytic Services, Azure Service Bus, Snowflake Database, Oracle Databases, PySpark, Scala, Python, Spark SQL, Snow SQL, Power BI, GitHub, Agile methodology, JIRA.

Big Data Developer Feb 2017 - Apr 2019

State of Nebraska, Lincoln, NE

Responsibilities:

Develop and manage scalable ETL workflows that handle data extraction, transformation, and loading from structured, semi-structured, and unstructured data sources to big data platforms.

Developed robust end-to-end data ingestion pipelines utilizing Apache Flume, Sqoop, Pig, Kafka, Oozie, and MapReduce to efficiently ingest and process behavioral and streaming data into HDFS for analytics.

Executed regular data imports from relational databases like MySQL and MongoDB into HDFS using Sqoop, enabling seamless integration and storage of structured and semi-structured data.

Orchestrated ingestion of log files and real-time streaming data from diverse sources using Flume and Kafka, ensuring reliable and scalable data capture for downstream processing.

Utilized PL/SQL for developing ETL/ELT processes within Oracle environments, enhancing data transformation and workflow automation.

Led large-scale migrations of datasets from enterprise databases such as Netezza, Oracle, and SQL Server to Hadoop ecosystems, facilitating big data analytics adoption.

Integrated HBase with Hive in the Analytics Zone, optimizing table design and enabling efficient NoSQL data storage, retrieval, and querying for real-time and batch analytics.

Designed and optimized Hive queries using HQL to emulate MapReduce jobs, while improving query efficiency with partitioning, bucketing, and bucket joins tailored to business requirements.

Implemented fault-tolerant mechanisms and disaster recovery strategies in Hadoop clusters by configuring data replication, redundancy, and job-level safeguards across HDFS, YARN, and MapReduce.

Developed and maintained data transformation workflows using Pig Latin scripts and PySpark, leveraging distributed processing for scalable data cleansing, filtering, and aggregation.

Built and managed real-time streaming data pipelines by integrating Kafka, Spark Streaming, and Hive, supported by workflow orchestration with Oozie and configuration management using Zookeeper for high availability.

ETL Developer/ Data Modeler June 2013 - Mar 2016

Verizon, Hyderabad, IND

Responsibilities:

Gathered business requirements, created design documents, and developed complex T-SQL (DDL, DML) scripts for stored procedures, views, tables, and user-defined functions, while designing and modeling relational databases using Erwin Data Modeler.

Designed and developed staging databases for efficient data processing and loading into central repositories, leveraging Erwin Data Modeler for logical and physical Visualization and SQL Server Integration Services (SSIS) for ETL.

Created and optimized ETL processes for various data sources (SQL Server, Flat Files, Excel, Oracle, DB2), transforming data with SSIS packages while ensuring consistency and quality across the data pipeline.

Designed and implemented relational and dimensional data models using Erwin, and optimized T-SQL queries to improve ETL performance in SSIS packages, ensuring efficient data processing.

Developed data integration solutions with SSIS and Informatica PowerCenter, creating robust ETL mappings and ensuring high-quality data across multiple source systems while collaborating with stakeholders.

Worked on production support, translating business requirements into logical and physical data models using Erwin, and collaborating with development teams for debugging, version control, and optimization.

Conducted ETL performance tuning, troubleshooting, and capacity estimation, working closely with business analysts to deliver accurate reporting and dashboards.

Followed Waterfall SDLC, engaging in all phases including requirements gathering, data model design with Erwin, implementation, testing, and deployment for seamless ETL and data warehousing solutions.

Contact this candidate