Senior Data Engineer ETL, Cloud & Data Lakehouse

Location:

Vancouver, BC, Canada

Posted:

March 18, 2026

Contact this candidate

Resume:

Imran shah

Lead Data Engineer ETL Data Engineer Cloud Data Engineer

***************@*****.*** 778-***-**** Vancouver, BC github.com/codevector809

SUMMARY

Senior Data Engineer with 8+ years of experience designing scalable ETL pipelines, distributed data platforms, and analytics solutions across AWS and Azure. Expertise in Python, SQL, Apache Spark, Apache Kafka, and Apache Airflow for batch and real-time processing of financial, healthcare, and enterprise data. Strong experience building cloud data architectures, Data Lakes, and Data Lakehouse using Snowflake, Amazon Redshift, and Azure Synapse. Proven track record of leading end-to-end data engineering initiatives, optimizing pipeline performance, reducing latency, and enabling self service analytics for product, data, and BI teams.

SKILLS

Languages

Python, SQL, Scala, Java, Bash

Big Data Technologies

Apache Spark, Hadoop, Hive, HBase, Presto,

Apache Flink

ETL / Data Pipeline Tools

Apache Airflow, dbt, Apache NiFi, Talend,

Informatica

Data Warehousing

Snowflake, Amazon Redshift, Google BigQuery,

Azure Synapse

DevOps & Infrastructure

Docker, Kubernetes, Terraform, Jenkins, Git, CI/

Data Architecture

Data Lakes, Data Lakehouse, Star Schema,

Snowflake Schema, Data Modeling

Data Governance & Compliance

GDPR, HIPAA, Apache Atlas, Collibra, AWS Lake

Formation

Streaming & Real-Time Processing

Apache Kafka, Spark Streaming, Kafka Streams,

AWS Kinesis, Pub/Sub

Cloud Platforms

AWS (S3, Redshift, Glue, Lambda, EMR), GCP

(BigQuery, Dataflow, Pub/Sub), Azure (Data

Factory, Synapse, ADLS)

Databases

PostgreSQL, MySQL, MongoDB, Cassandra,

DynamoDB

Monitoring & Observability

Prometheus, Grafana, ELK Stack, Datadog

PROFESSIONAL EXPERIENCE

Lead Data Engineer

EAGIS INC.

•Architected and deployed a clinical data warehouse on Oracle and SQL Server, consolidating 20+ disparate data sources and improving reporting coverage for operations and clinical teams.

2024 – Present

•Designed and optimized batch and real-time data pipelines using Apache Spark, PySpark and Apache Kafka, improving ingestion SLAs and reducing latency for clinical event processing.

•Developed high performance PL/SQL stored procedures and optimized SQL queries to improve reporting and analytics performance for data and clinical teams.

•Built end to end real time data processing pipelines by integrating Kafka with Spark based processing on a shared distributed computing platform.

•Led data modeling initiatives with business and analytics stakeholders, creating star schema models that improved data quality, traceability and HIPAA aligned compliance.

Senior Data Engineer

CL CONSULTING

•Designed cloud native data architecture on Azure (Data Factory, Synapse, ADLS Gen2), supporting analytics for millions of workforce records and increasing platform scalability by 3x.

2021 – 2024

•Built scalable Spark based ETL pipelines on Azure Databricks and Synapse to process multi terabyte datasets and improve data freshness for executive dashboards and HR analytics.

•Implemented ingestion frameworks to land raw and curated data into ADLS and Synapse, enabling centralized Data Lake and Data Lakehouse architectures for analytics and machine learning.

•Developed curated semantic layers and reusable data models for Power BI and Tableau, improving report performance and enabling self service analytics for business and product teams.

Associate Data Engineer

AUTOX

•Maintained and improved ETL and ELT pipelines processing of daily financial transaction data on AWS, increasing throughput and stability for regulatory and risk reporting.

2019 – 2021

Remote

•Migrated a legacy on premise data warehouse to AWS using S3, Redshift, Glue, Lambda and EMR, modernizing the data platform and enabling scalable analytics with lower operational overhead.

•Designed and orchestrated ETL pipelines using Apache Airflow and PySpark on EMR to load curated datasets into Snowflake and Redshift, improving data freshness and reducing manual work.

•Built dimensional data models using star schema and snowflake schema in Snowflake to support finance and risk analytics, improving query performance and usability for analysts and data scientists.

•Implemented monitoring and alerting for critical data workflows, reducing incident resolution time and improving reliability for batch and near real time data pipelines.

Software Engineer

TECH VERSE

•Developed and maintained scalable backend services using Python and PL/ SQL to support transactional and reporting use cases for enterprise applications.

2018 – 2019

•Built and integrated RESTful APIs to connect multiple enterprise systems and third-party applications, enabling real-time and batch data exchange across platforms.

•Collaborated with cross-functional teams using Agile/Scrum methodologies to deliver high-quality software and data solutions, improving time-to- market for key features.

•Troubleshot, debugged, and optimized existing applications and database queries to improve performance, scalability, and reliability under increasing data volumes.

PROJECTS

Healthcare Streaming Data Platform - CHORD

Architected a hybrid Data Lakehouse platform combining Spark based batch processing and Kafka powered streaming data pipelines to ingest clinical events into Oracle/SQL Server warehouses and analytics layers, enabling more timely clinical and operational insights. Financial Data Warehouse Migration to AWS - AUTOX

Led core data streams in migrating a legacy on premise data warehouse to AWS S3, Redshift, and Snowflake using Spark, Glue, and Apache Airflow, enabling scalable data warehousing on distributed systems and reducing infrastructure management effort. EDUCATION

University Of Sargodha

Master in Science

Contact this candidate