Data Engineering Azure

Location:

Dallas, TX

Posted:

September 15, 2025

Contact this candidate

Resume:

NAVEEN CH

Email: ******.*.***@*****.*** Mobile: +1-469-***-****

PROFESSIONAL SUMMARY

● 4+ years of experience in data engineering, specializing in cloud-based data integration, ETL processes, and building scalable data pipelines across AWS and Azure environments.

● Expertise in designing and optimizing ETL workflows, data warehousing, and data lakes, leveraging tools such as AWS Glue, Azure Data Factory, Redshift, Synapse Analytics, and S3.

● Strong command of big data technologies like PySpark, Hadoop, and Kafka, processing large datasets for both batch and real-time analytics, ensuring high performance and reliability.

● Proficient in data modeling, database optimization, and developing BI solutions with tools like Tableau, Power BI, and QuickSight to support business insights and reporting.

● Skilled in CI/CD pipelines, Docker, and Git, automating deployment processes and ensuring continuous delivery of data engineering solutions.

TECHNICAL SKILLS

Programming

Languages

Python, SQL, Hive queries, UDFs, Bash, PySpark

Amazon Web Services

(AWS)

AWS Glue, S3, Athena, Redshift, Step Functions, Lambda, RDS, CloudWatch, EMR, EC2, VPC, SageMaker, IAM, EKS, QuickSight, CodePipeline, Migration Hub

Microsoft Azure

Services

Azure Data Factory, ADLS, Azure Synapse Analytics, Azure SQL, Azure ML, AKS, Azure Blob Storage, Azure Functions, Azure Active Directory, Azure VM, Azure VNet

Methodologies Data Integration, CI/CD, IaaC, Agile, Scrum ETL & Data

Warehousing

ETL, Data Modeling, Data Warehousing, Change Data Capture, Batch Data Pipelines, Real-time Data Streaming, Data Pipelines Optimization, Schema Design, DBT

Databases Amazon Redshift, Azure SQL, SQL Server, RDS, Databricks, Microsoft SSMS, SSIS, SSAS, SSRS

Big Data Hadoop, Spark (PySpark, Databricks), Kafka, Kinesis, MapReduce, EMR Business Intelligence

Tools

Tableau, Power BI, QuickSight

Containerization &

Deployment

Docker, Jenkins, Git, CodePipeline, CI/CD, API Design and Development PROFESSIONAL EXPERIENCE

HCA HEALTHCARE Nashville, TN

Data Engineer 2024 March - Present

● Engineered ETL pipelines using AWS Glue to migrate 50+ TB of healthcare data into Redshift and S3, automating data transformation and storage processes while reducing manual intervention.

● Deployed AWS Lambda functions to trigger real-time data transformations, reducing data latency by 90% and improving operational efficiency.

● Architected a data lake in Amazon S3, managing 30 TB of raw healthcare data and enabling scalable storage with high availability for analytics.

● Optimized Redshift schemas and data models, reducing query execution times by 40%, and increasing performance for large-scale healthcare analytics workloads.

● Implemented VPC, IAM policies, and security controls, ensuring HIPAA compliance and securing access to sensitive data for over 500 users across multiple systems.

● Developed interactive dashboards in AWS QuickSight, providing real-time insights on healthcare KPIs, patient trends, and readmission rates, reducing reporting time by 50%.

● Configured CloudWatch for pipeline monitoring, identifying and resolving data pipeline failures within minutes, reducing downtime by 25%.

● Streamlined data flow using AWS Kinesis and Lambda, processing real-time patient data, reducing data processing delays by 75%.

● Automated CI/CD pipelines with CodePipeline, reducing manual deployment time by 60% and ensuring reliable updates to data processing workflows.

● Optimized Redshift performance by enhancing stored procedures and implementing partitioning strategies, improving query speed by 40%.

● Developed predictive machine learning models with Amazon SageMaker, delivering patient readmission forecasts with 85% accuracy.

● Implemented data encryption and compliance controls through AWS KMS and IAM, ensuring data security for 10 TB of sensitive patient data.

YES BANK Pune, India

Data Engineer 2022 July - 2023 August

● Migrated 60 TB of legacy data from on-premise systems to Azure Synapse Analytics and Azure SQL, cutting infrastructure costs by 30% and improving scalability.

● Automated ETL pipelines with Azure Data Factory, processing 10 million banking records per day and reducing data processing time by 45%.

● Built real-time data processing solutions using Azure Functions, handling over 5,000 banking transactions per minute with 99.9% uptime.

● Deployed a data lake in Azure Data Lake Storage (ADLS), storing 40 TB of structured and unstructured financial data for secure and efficient analytics.

● Transformed large datasets using Azure Databricks and PySpark, optimizing batch processing to reduce processing times from 12 hours to under 3 hours.

● Implemented CI/CD pipelines with Jenkins and Azure DevOps, automating deployment of data engineering projects and reducing deployment time by 70%.

● Ensured secure user access with Azure Active Directory (AAD) and role-based authentication, managing 2,000+ user accounts across cloud-based systems.

● Developed real-time analytics using Azure Synapse and Azure ML, delivering fraud detection and customer behavior insights with 92% accuracy.

● Secured sensitive data using Azure Key Vault for encryption and Azure VNet for private network configurations, ensuring compliance with financial regulations.

● Migrated legacy SSIS/SSRS/SSAS workloads to Azure Synapse Analytics, improving data processing and reporting by 40%.

● Monitored data pipelines with Azure Data Factory monitoring tools, tracking lineage and enabling full traceability of data movements.

● Collaborated with cross-functional teams to develop an enterprise-wide data warehouse architecture in Azure Synapse, enabling high-performance analytics.

BHARATI AXA LIFE INSURANCE Remote, India

Data Engineer 2021 March - 2022 July

● Developed ETL pipelines using AWS Glue, processing 20 TB of insurance data monthly and loading it into Amazon Redshift for efficient analytics.

● Optimized Redshift schemas and data models, improving query execution time by 30% and reducing report generation time from 3 hours to 1 hour.

● Implemented AWS Athena for direct querying of over 50 TB of unstructured data in S3, enabling real-time analytics without full data extraction.

● Built serverless Lambda functions for real-time processing of 1,000+ insurance claims per minute, ensuring immediate updates to claim status.

● Created and maintained interactive dashboards using Tableau and Power BI, visualizing KPIs and insurance trends for internal stakeholders.

● Utilized PySpark and Databricks to process large datasets in a distributed environment, reducing data transformation times by 60%.

● Monitored and troubleshot ETL pipelines through AWS CloudWatch, reducing failures by 30% and improving pipeline reliability.

● Optimized Redshift query performance by tuning stored procedures and implementing partitioning strategies, reducing query time by 25%.

● Implemented Change Data Capture (CDC), streamlining incremental data loading and improving data freshness for 10 million daily records.

● Managed secure data access using AWS IAM, ensuring role-based permissions for over 200 users, preventing unauthorized access to sensitive insurance data.

● Developed data lineage tracking systems within AWS Glue, ensuring transparency and traceability for all data processes and transformations.

● Implemented Docker containers for consistent and portable deployment of data engineering applications, enhancing workflow scalability and repeatability.

EDUCATION

Master of Science in Data Science, University of North Texas, Denton May 2025

Contact this candidate