Dheeraj Atmakuri
Azure Data Engineer
*******************@*****.*** +1-469-***-**** in/datmakuri/
PROFESSIONAL SUMMARY
Azure Data Engineer with 5+ years of experience designing scalable cloud data platforms, ETL/ELT pipelines, and analytics solutions using Azure Databricks, Azure Data Factory, Power BI, Microsoft Fabric, and lakehouse architectures across pharmaceutical, financial services, and enterprise domains
Expertise in building end-to-end data pipelines using PySpark, Spark SQL, Azure Data Factory, and Informatica (IICS, PowerCenter) to integrate data from Salesforce, SQL Server, REST APIs, and on-premises sources into Delta Lake, Snowflake, Azure Synapse, and ADLS Gen2
Specialized in Azure-native data engineering using Databricks (Delta Lake, Medallion Architecture), Synapse Analytics, Event Hubs, Azure Functions, and Purview to deliver enterprise-scale lakehouse solutions, streaming pipelines, and AI-ready feature stores
Proven track record in data modeling (star schema, snowflake schema, data vault) using Erwin and PowerDesigner, optimizing dimensional models for Power BI semantic layers, regulatory reporting, and high-performance analytics
Strong focus on data governance, security, and compliance implementing Unity Catalog, Microsoft Purview, RBAC, audit logging, and data lineage tracking to meet GxP, 21 CFR Part 11, HIPAA, and financial regulatory requirements
Proficient in CI/CD and MLOps automation using Azure DevOps, GitHub Actions, Terraform, and Databricks CLI to deploy notebooks, ADF pipelines, ML models, and infrastructure as code across dev, QA, and production environments, reducing deployment time by 50%
TECHNICAL SKILLS
•Azure (Primary): Databricks, Data Factory (ADF), ADLS Gen2, Synapse, Event Hubs, Functions, Microsoft Fabric, Purview, Unity Catalog
•AWS (Secondary): Glue, Redshift, Lambda, S3, DMS, CloudWatch
•ETL / ELT: ADF, Databricks Jobs, Informatica PowerCenter, SSIS, Airflow
•Data Engineering & Modeling: Delta Lake, Medallion Architecture, Star/Snowflake Schema, Data Vault, Snowflake
•Programming & SQL: Python, PySpark, T-SQL, PL/SQL, Snowflake SQL, REST APIs
•DevOps & CI/CD: Azure DevOps, GitHub Actions, Terraform, Git, IaC
•Data Quality & Governance: Great Expectations, RBAC, Audit Logging, HIPAA, HL7, FHIR
•Analytics & BI: Power BI, DAX, Semantic Models, Tableau
WORK EXPERIENCE
PNC Financial Services, Dallas, Texas Jan 2025 – Present
Azure Data Engineer
Build and maintain ETL/ELT pipelines in Azure Databricks using PySpark and Spark SQL to ingest and process financial data from on-prem and cloud sources, including end-to-end orchestration in Azure Data Factory.
Implement Medallion Architecture (bronze, silver, gold) using Delta Lake to standardize data quality, support schema evolution, and maintain audit history.
Orchestrate ADF pipelines integrated with Databricks Jobs, handling dependencies, parameters, scheduling, and error handling for batch and incremental loads.
Optimize Databricks performance through autoscaling, partition pruning, caching, and Z-ordering, reducing runtime by 40% and lowering compute costs.
Develop reusable PySpark modules and notebook templates to streamline transformation logic and reduce development effort across teams.
Implement automated data quality checks for schema drift, null validation, record counts, and freshness using Delta Live Tables and Databricks workflows.
Build CI/CD pipelines for Databricks notebooks and ADF workflows using Azure DevOps/GitHub Actions, enabling automated testing and environment promotion.
Design dimensional models and fact tables to support profitability, liquidity, and credit risk reporting for finance and analytics teams.
Monitor and troubleshoot pipeline failures using Databricks logs, Spark UI, ADF run history, and Azure Log Analytics to resolve root causes and improve stability.
Migrate legacy SSIS/Informatica ETL workflows into ADF pipelines, improving maintainability, performance, and cloud scalability while reducing operational cost.
Accenture, India May 2020 – Dec 2023
Azure Data Engineer
•Designed and developed ETL/ELT pipelines using Azure Databricks (PySpark, Spark SQL) and Azure Data Factory to ingest and transform data from SQL Server, Salesforce, REST APIs, and on-prem sources into Delta Lake on ADLS Gen2.
•Implemented Delta Lake Medallion Architecture (bronze, silver, gold) with ACID transactions, schema enforcement, and time-travel to ensure high-quality, audit-ready data for analytics and ML workloads.
•Built real-time streaming pipelines using Spark Structured Streaming, Azure Event Hubs, and Auto Loader to process financial and healthcare datasets with sub-second latency.
•Migrated legacy Informatica PowerCenter ETL workflows into ADF pipelines while preserving mapping logic, metadata controls, and SCD frameworks.
•Processed HL7 and FHIR healthcare data using ADF and Databricks in compliance with HIPAA, supporting claims, patient records, and provider analytics.
•Optimized Databricks performance through autoscaling, partition pruning, broadcast joins, caching, and Z-ordering, reducing runtime by up to 60% and cutting compute cost.
•Built CI/CD automation using Azure DevOps and GitHub Actions to deploy Databricks notebooks, ADF pipelines, and Terraform IaC across dev, QA, and production.
•Applied data governance and security controls using Unity Catalog, Microsoft Purview, RBAC, and column-level encryption to protect PII and meet financial and healthcare regulations.
Tech Mahindra Jan 2020 – May 2020
Data Engineering Intern
Collaborated with cross-functional teams and business stakeholders to gather data requirements and deliver scalable, production-grade data engineering solutions.
Designed and implemented secure, high-performance data pipelines using Apache Spark and Databricks, leveraging the Medallion Architecture to improve job runtimes by 20%.
Provisioned and managed Databricks Workspaces, enabling secure integration with AWS S3 and Azure Data Lake Storage (ADLS).
Optimized ETL workflows through Spark performance tuning, SQL query optimization, and caching techniques, reducing compute costs and improving resource efficiency.
Automated batch and real-time data ingestion pipelines using Delta Lake and Databricks Workflows, enhancing data reliability and timeliness in both staging and production.
Developed and implemented data validation and quality frameworks using PySpark and SQL to ensure accuracy, completeness, and consistency for downstream analytics and reporting.
EDUCATION
The University of Texas at Dallas
Master of Science - Information Technology and Management GPA: 3.9/4.0
CERTIFICATIONS
Databricks Certified Data Engineer Associate
Azure Fundamentals (AZ-900)
AWS Certified Data Engineer - Associate
AWS Certified Solutions Architect - Associate
AWS Certified Cloud Practitioner