Arun G
Toronto, ON
Email: *********************@*****.***
Professional Summary
Azure Data Engineer with 3+ years of hands-on experience designing and implementing scalable cloud-native data solutions using Azure Data Factory (ADF), Azure Databricks, ADLS Gen2, Azure Synapse Analytics and Delta Lake. Strong expertise in Lakehouse Architecture, Medallion design (Bronze/Silver/Gold), Spark optimization and CI/CD implementation using Azure DevOps. Experienced in migrating legacy ETL solutions to modern Databricks-based distributed data platforms.
Core Technical Skills
Cloud & Data Platform
Azure Data Factory (ADF), Azure Databricks, Azure Data Lake Storage Gen2 (ADLS), Azure Synapse Analytics, Delta Lake, Azure Key Vault, Azure Active Directory, Azure DevOps
Programming & Processing
PySpark, Spark SQL, Python, SQL
Architecture & Modeling
Lakehouse Architecture, Medallion Architecture (Bronze/Silver/Gold), Data Modeling, Batch/Incremental Data Loading, Delta MERGE, Auto Loader, Change Data Capture (CDC), SCD Type 1 & 2, Data Validation, Data Quality Checks.
DevOps & Security
Git, Azure DevOps CI/CD Pipelines, RBAC, Managed Identities, Key Vault Integration
Visualization
Power BI
Professional Experience
Data Engineer
Ontario Securities Commission (OSC)
Toronto, ON
Sep 2022 – Jul 2025
Project: Enterprise Data Platform Modernization & Databricks Migration
Led migration of enterprise data platform from Azure Synapse to Azure Databricks, improving distributed processing performance by approximately 40%.
Designed and implemented Lakehouse architecture using Medallion pattern (Bronze Silver Gold) with Delta Lake.
Built and orchestrated 25+ parameterized Azure Data Factory (ADF) pipelines integrating HTTP sources, ADLS Gen2, and Databricks notebooks.
Developed scalable ingestion framework using Databricks Auto Loader to process large CSV files (1GB to 1.5GB per file), enabling incremental ingestion with schema evolution.
Developed optimized PySpark transformations and Delta Lake MERGE operations for incremental upserts and SCD processing.
Designed parameterized ADF pipelines with event-based and scheduled triggers for dynamic workflow orchestration.
Integrated ADF with Databricks clusters for notebook execution and job orchestration.
Configured autoscaling clusters and optimized Spark jobs to improve performance and reduce execution time.
Implemented CI/CD pipelines using Azure DevOps for automated deployment of ADF pipelines and Databricks artifacts across environments.
Enforced security best practices using RBAC, Managed Identities, and Azure Key Vault for secrets management.
Performed data validation, reconciliation, and quality checks across all transformation layers.
Collaborated in Agile sprint cycles, participating in backlog grooming, sprint planning, and release deployments.
Reviewed and enhanced Scala-based Spark notebooks to ensure compatibility with PySpark frameworks and improve maintainability.
Environment:
Azure Data Factory (ADF), Azure Databricks, ADLS Gen2, Azure Synapse Analytics, Delta Lake, PySpark, Spark SQL, Azure DevOps CI/CD, Git, RBAC, Azure Key Vault.
Education
Bachelor of Computer Application, Bharathiar University Chennai, India, 2015
PG Diploma in Web Design and Development, Conestoga College, Ontario, Canada, 2021