Data Engineer Azure

Location:

Posted:

October 21, 2025

Resume:

Professional Summary

Highly skilled and results-driven Data Engineer with 5+ years of experience designing, building, and optimizing data pipelines and architectures in Healthcare and Financial domains. Proficient in Python, SQL, Spark, Kafka, AWS, Airflow, and ETL frameworks. Proven track record in handling structured/unstructured data, implementing HIPAA/SOX-compliant systems, and deploying end-to-end data pipelines that enhance data accessibility and drive business intelligence. Adept at collaborating cross-functionally, working in Agile/Scrum teams, and optimizing big data systems for large-scale data operations.

Professional Experience

Azure Data Engineer, Humana Healthcare, USA Jan 2025 – Present

Engineered scalable data pipelines using Apache Spark (PySpark) and Azure Data Factory to ingest, transform, and load 20+ TB of healthcare data, including EMR, claims, and lab records in alignment with FHIR/HL7 standards.

Developed real-time data streaming solutions using Azure Event Hubs, Kafka, and Databricks Structured Streaming for patient vitals monitoring and proactive clinical alerting.

Built secure, compliant data lakes and analytics platforms using Azure Data Lake Storage (Gen2), Azure Synapse Analytics, and Azure Purview to support HIPAA-compliant analytics.

Automated 100+ data workflows using Apache Airflow and Azure Logic Apps, integrating DAG-level retry logic, failure alerts, and SLA monitoring for mission-critical processes.

Implemented role-based access controls (RBAC), data masking, and PII encryption using Azure Key Vault, Managed Identity, and Data Catalog to ensure HIPAA and GDPR compliance.

Integrated pipelines into CI/CD workflows using Azure DevOps, Jenkins, and Docker, enabling version control, blue-green deployments, and rollback capabilities.

Applied Great Expectations for robust data validation, ensuring data integrity and quality before, during, and after ingestion stages.

Collaborated with data science teams to operationalize ML models (e.g., chronic condition prediction), and surfaced results v ia Power BI and Tableau dashboards for clinical and operational teams.

Designed dimensional models (star/snowflake) for analytics use cases like cost optimization, readmission trends, and patient segmentation, boosting reporting efficiency by 30%.

Partnered with cross-functional teams (Product, Security, Analytics) to drive improvements in data architecture, observability, and incident response.

AWS Data Engineer, Hexaware Technologies Banking, India Sept 2019 – April 2023

Developed high-throughput Apache Spark and Scala pipelines to process 30+ million financial transactions daily, supporting fraud detection and regulatory reporting.

Built real-time fraud detection analytics using Apache Kafka and Spark Structured Streaming, enabling <1s latency alerts and live compliance notifications.

Designed and maintained cloud-based data architectures using AWS Glue, S3, Redshift, and Athena, with robust partitioning, and schema versioning.

Designed dimensional models in Snowflake, BigQuery, and SQL Server, utilizing SCD (Type 1 & 2) for regulatory and business reporting.

Executed cloud migration strategies, moving legacy ETL to AWS and GCP platforms, using Snowpipe, Dataflow, and Cloud Composer for ingestion and orchestration.

Automated infrastructure provisioning using Terraform and AWS CloudFormation, standardizing Dev/Stage/Prod environments and improving deployment consistency.

Enabled CI/CD using GitHub Actions, Jenkins, and Docker, facilitating code reuse and seamless deployments across microservices and data pipelines.

Ensured PCI-DSS compliance by implementing data security controls like encryption (AES-256), tokenization of sensitive fields, and reporting outputs.

Created interactive dashboards in Power BI and Tableau, allowing risk and finance teams to track KPIs such as net interest margin, charge-offs, and delinquency rates.

Integrated predictive fraud models into ETL pipelines and BI layers, enabling proactive compliance monitoring and case management insights.

Documented data lineage, created ER diagrams, and maintained data dictionaries to support audits, SOX/Basel III reporting, and operational transparency.

Jr. Data Engineer, Mphasis, India Feb 2018 – Aug 2019

Built data ingestion and classification pipelines using Apache NiFi and Python, storing processed logs securely in GCP Cloud Storage for scalable access.

Assisted in building batch-oriented ETL pipelines using Python and SQL to process structured datasets from internal CRM, finance, and web analytics systems (~2 TB).

Supported senior engineers in maintaining AWS-based infrastructure, including S3, RDS (PostgreSQL), and Lambda, for scalable and cost-effective data processing.

Developed data cleansing scripts to handle missing values, inconsistent schemas, and malformed records, increasing data pipeline reliability by 25%.

Created basic data validation rules and automated data profiling scripts using pandas, helping identify anomalies and ensure pipeline output quality.

Helped implement incremental load strategies for MySQL and Oracle sources, reducing daily job runtime by 40%.

Participated in migration of legacy ETL jobs to AWS Glue, learning to write and debug PySpark scripts under guidance.

Designed dashboards and reports using Excel, Tableau, and Google Data Studio for internal sales and operations teams.

Collaborated with QA and product teams to perform UAT on new data pipelines and ensure alignment with reporting logic.

Maintained technical documentation and flowcharts for existing data pipelines and participated in weekly code reviews.

Took part in Agile ceremonies (sprint planning, standups), contributing to task estimation, testing, and deployment planning.

Skills

•Programming Languages: Python, SQL, PySpark, CPython (NumPy, Pandas), Shell Scripting, R

•Big Data Technologies & Streaming Analytics: Apache Spark, Hadoop, AWS Kinesis, AWS EC2, AWS S3, Apache Kafka, Apache Flink

•Data Warehousing: Amazon Redshift, Google Big Query, Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Snowflake

•Database Management & BI Tools: Oracle, Teradata, SQL Server, PostgreSQL, PL/SQL, MySQL, BigQuery, MongoDB, NoSQL, Tableau, Power BI

•ETL Tools: Talend, Azure Data Factory, Apache Airflow, AWS Glue, Informatica PowerCenter, SSIS, SSRS, TIDAL

•Cloud Platforms: AWS, AWS Glue, AWS Redshift, AWS Lambda, Google Cloud, Azure Databricks, Data Lake (ADLS), Cosmos DB, AWS S3, EMR.

•Containerization and Orchestration: Docker, Kubernetes

•Data Skills: Visualization, Data Modeling, Data Normalization, Data Warehousing, Data Mining, Data Analysis, Statistics

•Machine Learning & Big Data: Scikit-Learn, TensorFlow, Keras, Hadoop, Hive, HDFS, Scala, SAP HANA

Education

Master’s in Business Analytics May 2023 – Dec 2024

University of Texas Dallas, Richardson, Texas, USA

Certification

AWS Cloud Practitioner

AWS Data Engineer Associate

Contact this candidate