Azure Data Engineer with 5+ Years Experience

Location:

Toronto, ON, Canada

Posted:

January 26, 2026

Contact this candidate

Resume:

Sushma

647-***-**** ******************@*****.*** linkedin

Summary

• Results-driven Azure Data Engineer with 5+ years of experience designing, developing, and optimizing large-scale data solutions using Azure Data Factory (ADF), Databricks (PySpark, Delta Lake), Synapse Analytics, and Azure Data Lake (ADLS Gen2).

• Skilled in building data pipelines, ETL/ELT frameworks, and real-time streaming systems leveraging Kafka, Spark Streaming, and Python. Strong background in data modeling, orchestration, and performance optimization with deep understanding of distributed computing and cloud architecture.

Professional Experience

Azure Data Engineer

Rogers, Canada SEP 2024 – Present

• Designed and deployed end-to-end ETL pipelines in Azure Data Factory (ADF) using linked services, datasets, and parameterized pipelines for scalable and reusable data ingestion.

• Developed multi-layer data lakehouse architecture (Bronze–Silver–Gold) on Azure Data Lake Storage Gen2 (ADLS) using Delta Lake for raw, curated, and analytics-ready datasets.

• Engineered data transformation frameworks in Azure Databricks using PySpark, Spark SQL, and Delta Lake to process structured and semi-structured data at scale.

• Implemented Delta Live Tables (DLT) for automated ETL orchestration, enforcing data quality expectations and schema validation.

• Designed incremental and CDC pipelines using ADF Mapping Data Flow to efficiently process change data.

• Optimized Spark cluster configurations with autoscaling, caching, and partition pruning, reducing job runtime by 35%.

• Created parameterized ingestion frameworks in ADF to integrate multiple data sources like SQL Server, REST APIs, Cosmos DB, and SFTP.

• Integrated Azure Key Vault for secure credential storage and managed identities for ADF, Databricks and Synapse authentication.

• Developed Synapse dedicated SQL pools with partitioning, materialized views, and distribution strategies for analytical workloads.

• Collaborated with business teams to design star and snowflake schemas in Synapse and Power BI for reporting and analytics.

• Implemented Databricks REST APIs for automated cluster creation, notebook execution, and job monitoring through DevOps pipelines.

• Built CI/CD pipelines in Azure DevOps for deploying ADF pipelines, Databricks notebooks, and Synapse scripts across environments using YAML and ARM templates.

• Designed real-time streaming ingestion pipelines using Event Hub and Databricks Structured Streaming for low-latency data processing.

• Integrated Azure Monitor and Log Analytics for pipeline monitoring, alerting, and data drift detection.

• Developed data validation frameworks in PySpark using Great Expectations for row-level and schema-level checks.

• Configured Azure Purview for data cataloging, classification, and lineage tracking across ADF, Synapse, and Databricks.

• Tuned Delta tables using OPTIMIZE, VACUUM, and ZORDER to maintain performance and storage efficiency.

• Built Power BI datasets and dashboards on top of Synapse and Databricks Delta tables for business insights.

• Managed Terraform scripts for IaC-based provisioning of ADF, Databricks, Synapse, and Key Vault.

• Implemented ADLS lifecycle policies for cost optimization and automatic archival of cold data.

Data Engineer

Wipro, India MAY 2021 – DEC 2023

• Built scalable data ingestion pipelines in Azure Data Factory to extract and transform data from Developed ADF pipelines for ingestion from on-prem SQL Server, Oracle, and SAP into ADLS Gen2 using Self-Hosted Integration Runtime for hybrid connectivity.

• Migrated legacy SSIS and Python ETL jobs into ADF and Databricks, improving maintainability and performance by 50%.

• Built PySpark-based data cleansing and transformation scripts to handle duplicates, nulls, and schema drift efficiently.

• Implemented data lakehouse framework in Databricks using Delta Lake for unified batch and streaming data processing.

• Designed Synapse serverless SQL pools for ad-hoc querying and analytics directly on ADLS data.

• Automated Databricks job scheduling and cluster policies via REST APIs and Azure Logic Apps.

• Developed ADF pipeline templates for dynamic source–target mapping using metadata-driven design.

• Integrated ADF with Azure Key Vault and Managed Identity to securely access secrets and storage accounts.

• Created PySpark UDFs and window functions to implement business rules and transformations.

• Built ADF logging framework using Azure SQL Database to track pipeline runs, errors, and metadata.

• Deployed event-driven pipelines using Event Grid triggers for real-time ingestion of incoming files.

• Collaborated with analysts to develop data marts and reporting datasets in Power BI backed by Synapse SQL views.

• Implemented schema evolution and time travel features in Delta Lake to manage historical data.

• Conducted Spark performance tuning using job metrics, caching, and repartitioning strategies.

• Monitored resource utilization and optimized Databricks cluster costs using Azure Cost Management and Advisor.

• Integrated Git repositories with ADF and Databricks to enable version control and peer-reviewed development.

• Created PowerShell and Azure CLI automation scripts for provisioning resources and refreshing pipelines.

• Documented data lineage and process flows using Azure Purview and Confluence for governance.

• Supported disaster recovery and high availability by configuring geo-redundant ADLS and Synapse failover setups.

• Implemented data retention and archival policies aligned with compliance and audit requirements.

Technical Skills

Azure Cloud: Data Factory (ADF), Synapse Analytics, Data Lake Storage Gen2 (ADLS), Event Hub, Event Grid, Key Vault, Logic Apps, Azure Monitor, Purview

Databricks & Big Data: Azure Databricks, PySpark, Spark SQL, Delta Lake, Delta Live Tables, Structured Streaming, DLT, Job APIs

DevOps & Automation: Azure DevOps, Git, YAML Pipelines, ARM Templates, Terraform, REST APIs, PowerShell, Azure CLI

Data Integration: SQL Server, Oracle, Cosmos DB, SAP, REST APIs, Kafka, SFTP

Programming: Python (PySpark, pandas, NumPy), SQL, Scala (basic scripting)

Data Modeling & BI: Star/Snowflake Schema, Synapse Views, Power BI, DAX, Data Mart Design

Governance & Security: Azure RBAC, ACLs, Managed Identities, Key Vault Secrets, Encryption, Purview Cataloging

Monitoring & Optimization: Log Analytics, Application Insights, Spark UI, Ganglia, Azure Monitor

Education

Loyalist College, Canada

Post graduate in Cloud Computing

gayatri vidya parishad engineering college, India

Bachelors in Computer Science

Contact this candidate