Azure Data Engineer with Data Lakes and BI Expertise

Location:

Toronto, ON, Canada

Posted:

January 30, 2026

Contact this candidate

Resume:

***********.***@*****.***

+1-437-***-****

Prabhu Kumar

Azure BI and Data Engineer

Introduction :

Data Engineer with 6+ years of hands-on experience building and operating cloud-scale data engineering solutions on Azure.

Experienced in designing end-to-end data pipelines using Azure Data Factory, Databricks (PySpark), Delta Lake,

and Snowflake, with a focus on performance, reliability, and data governance.

Profile Summary:

• Strong experience in designing and building large-scale, cloud-based data engineering pipelines on Azure,

with a focus on reliability, scalability, and maintainability.

• Hands-on expertise in developing metadata-driven ETL & ELT frameworks using Azure Data Factory,

enabling automated ingestion, transformation, validation, and reprocessing of data at scale.

• Proficient in big data processing using Databricks (PySpark) and Apache Spark to handle high-volume

batch and incremental workloads efficiently.

• Extensive experience integrating and optimizing analytical data platforms using Snowflake,

Azure Synapse Analytics, and Delta Lake to support enterprise reporting and analytics.

• Strong understanding of data pipeline performance tuning, including partitioning strategies,

incremental loading strategies, and Spark job optimization.

• Experience implementing robust data quality, validation, and reconciliation mechanisms to ensure

data consistency across OLTP and OLAP systems.

• Solid background in building semantic models and analytical datasets to support BI tools such as

Power BI, SSRS, and SSAS, enabling self-service and enterprise reporting.

• Skilled in orchestrating end-to-end data workflows, monitoring pipeline execution,

and troubleshooting production issues to support business-critical reporting.

• Practical exposure to version control and CI/CD practices using Git and Azure DevOps

for controlled and repeatable data and BI deployments.

TECHNICAL SKILLS:

Data Engineering & Big Data:

Apache Spark, PySpark, Delta Lake, Batch & Incremental Processing,

Metadata-Driven ETL/ELT, Partitioning, Performance Tuning & Optimization

Cloud & Data Platforms:

Microsoft Azure (Azure Data Factory, Azure Databricks, Azure Synapse Analytics, ADLS Gen2, Azure Key Vault, Azure Purview)

Databases & Warehousing:

Snowflake, Microsoft SQL Server, Azure Synapse Analytics, OLTP & OLAP Data Modeling

ETL & Orchestration

Azure Data Factory (ADF), SQL Server Integration Services (SSIS), Databricks Notebooks & Workflows

Business Intelligence

Power BI, SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), Tableau

Programming & Query Languages

Python, SQL (T-SQL, Snowflake SQL), PySpark, Scala, PowerShell, Bash, Rust, R, Java

DevOps & Integration

Git, Azure DevOps, CI/CD for Data Pipelines, REST APIs, JSON

Data Governance & Quality

Data Validation & Reconciliation, Data Lineage, Metadata Management, Security & Access Control

API & System Integration

REST APIs, JSON-based Data Exchange, Secure Service-to-Service Authentication

TECHNICAL PROJECTS & GITHUB PORTFOLIO

GitHub: https:// github.com/prabhukumarm98-rgb/data-engineering-portfolio

• Maintained a production-grade data engineering portfolio demonstrating end-to-end design and operation of large-scale, cloud-native data platforms across batch, streaming, and real-time workloads.

• Built Infrastructure-as-Code foundations using Terraform and CloudFormation, including multi-region AWS data platforms with VPC peering, IAM-based security hardening, and Kubernetes-based Spark and Airflow deployments on EKS.

• Implemented enterprise-grade data pipelines with defined SLAs, including petabyte-scale Spark ETL workloads, real-time fraud detection using Apache Flink with CEP patterns, and CDC-based incremental data ingestion.

• Designed modern data platform architectures such as Lakehouse implementations using Delta Lake and Iceberg, Snowflake performance optimization with zero-copy cloning, and Data Mesh patterns with federated governance.

• Integrated observability and MLOps capabilities using Open Telemetry instrumentation, automated alerting with anomaly detection, ML feature stores with online/offline consistency, and cost monitoring with budget forecasting dashboards.

• Conducted performance benchmarking and comparative analysis across data technologies, including Spark vs Flink workload evaluation, storage format benchmarks with real metrics, message queue throughput analysis, and join optimization strategies.

• Emphasized production-readiness through monitoring, security, scalability, cost optimization, and documented architectural trade-offs across all projects.

Education:

●Degree: Bachelor of Arts

University: Rabindranath Tagore University

Certification:

●Microsoft Power BI Data Analyst

Work Experience:

Aptos, Azure and BI Engineer

Montreal,QC, Aug 2023-Till Date

Responsibilities-

• Architected and implemented enterprise-scale data ingestion and transformation pipelines using Azure Data Factory (ADF), supporting hundreds of tables across multiple source systems with automated dependency management, retries, and controlled reprocessing.

• Designed and optimized Spark-based ETL workloads in Azure Databricks using PySpark and Apache Spark, reducing end-to-end pipeline execution time by ~30–45% through partition pruning, incremental loading strategies, and Spark performance tuning.

• Built and maintained Delta Lake–based storage layers on Azure Data Lake Storage Gen2 (ADLS), implementing schema evolution, time travel, and optimized file layouts to support historical reprocessing and downstream analytics.

• Implemented analytical data platforms using Snowflake and Azure Synapse Analytics, enabling high-performance SQL querying and scalable OLAP workloads for enterprise reporting.

• Developed a metadata-driven ETL/ELT framework using Azure Data Factory, Databricks notebooks, and configuration-driven control tables to standardize pipeline execution and significantly reduce manual intervention during failures.

• Orchestrated Databricks notebooks and Spark jobs through Azure Data Factory pipelines, implementing robust error handling, logging, monitoring, and alerting to support production-grade data engineering workflows.

• Implemented data quality, reconciliation, and validation checks using SQL, PySpark, and control tables to ensure data consistency across OLTP systems and analytical warehouses.

• Tuned Spark workloads in Databricks using broadcast joins, caching strategies, and optimized partition sizing to improve throughput, reliability, and cloud resource utilization.

• Designed curated analytical datasets and semantic models in Snowflake and SQL Server to support Power BI, SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS), enabling enterprise reporting and self-service analytics.

• Implemented data governance, security, and access controls using Azure Key Vault and Azure Purview, ensuring secure credential management, metadata visibility, and end-to-end data lineage.

• Utilized Git and Azure DevOps for version control, CI/CD pipelines, and controlled deployment of data engineering and BI artifacts across environments.

• Collaborated with data architects, analysts, and business stakeholders to translate complex analytical requirements into scalable, production-ready data engineering solutions.

Axis Bank, Cloud and BI Consultant

Mumbai, IN, Mar 2021- Jul 2023

Responsibilities-

• Integrating Power BI with various data sources like databases, Excel files, and cloud-based services for seamless connectivity.

• Architected and implemented a scalable data ecosystem using Apache Spark, processing of large amount of data every month, resulting in reduction in operational costs.

• Ensuring data security and privacy within Azure Data Factory through proficient execution of data masking and anonymization processes.

• Participating in database code reviews as a DBA to maintain coding standards and best practices.

• Conducting BI data profiling and metadata management to gain comprehensive data understanding.

• Developed and managed a real-time fraud detection system using Spark and Scala, in order to decrease the fraudulent transactions.

• Crafting SSIS package designs aligned with business requirements after collaborative discussions with business users.

• Managing data integration in Azure Synapse to unlock advanced analytics capabilities.

• Enforcing coding standards and performing automated checks on committed database code through Git hooks setup.

• Configuring role-based security within SSAS to control user access and protect sensitive data.

• Implementing Blob Storage metadata search for efficient data retrieval.

• Optimizing data processing efficiency through incremental or delta data loading strategies using SSIS.

• Troubleshooting and resolving complex data-related issues in OLAP and OLTP systems.

• Ensuring data accuracy and reliability by implementing data validation and integrity constraints in the database.

• Maintaining data quality checks and cleansing routines within ETL processes for accurate data.

• Fine-tuning report performance in SSRS by optimizing SQL queries and indexing.

• Designing and implementing SQL database data validation and profiling with data governance teams.

• Administering report subscriptions in SSRS and scheduling report delivery to designated recipients.

• Implementing security measures in Power BI to control access and ensure data confidentiality.

• Setting up integration with Azure Time Series Insights for time series data analytics in Azure Data Bricks.

Apollo, BI Consultant

Mumbai, IN, Apr 2019- Mar 2021

Responsibilities-

●Performing data validation, data blending, use of SQL queries and creating empty extracts to transfer performance load on Tableau servers instead of Tableau desktop.

●Use of filters, transformations, calculated fields, LODs, sets, groups, and parameters.

●Creating performance metrics, score cards, what if analysis, forecasting models based on in built and custom statistical analysis.

●Proficient in using SQL Server Integration Services (SSIS) to build Data Integration and Workflow Solutions, Extract, Transform and Load (ETL) solutions for Data warehousing applications.

●Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple homogeneous and heterogeneous information sources (CSV, Excel, Oracle DB, and SQL).

●Performed Loading operation of historical data using full load and incremental load into Enterprise Data Warehouse.

●Created SSIS packages to Extract, Transform and load data using different transformations such as Lookup, Derived Columns, Condition Split, Aggregate, Pivot Transformation, and Slowly Changing Dimension, Merge Join and Union all.

●Developed Custom Logging so user can know when a row is inserted in custom logging table by every SSIS package that executes.

●Migrated DTS packages from SQL Server 2005 to SQL Server 2008 as SSIS Packages.

●Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc.) and then loaded the data into destination tables by performing different kinds of transformations using SSIS packages.

●Proficient in usage of SSIS Control Flow items (For Each Loop, For Loop & Sequence container, execute package, execute SQL tasks, Script task, send mail task) and transformations (Conditional Split, Data Conversion lookup, Derived Column, Aggregate, Multicast). handled the extracts and data transformation at the source level wherever needed.

●Creating drill down reports, interactive graphs, and visuals for a comprehensive report.

Contact this candidate