Post Job Free
Sign in

Data Engineer Big

Location:
Canada
Salary:
1
Posted:
August 06, 2025

Contact this candidate

Resume:

DHYEY CHAUHAN

Ottawa, ON +1-437-***-**** **************@*****.***

Summary

Results-driven Cloud and Data Engineer with 3+ years of hands-on experience designing, developing, and optimizing cloud-native data pipelines and infrastructure on AWS. Proven expertise in building scalable ETL workflows using AWS Glue, Lambda, Athena, and Redshift for real-time and batch data processing. Skilled in managing large datasets across S3, RDS, and Snowflake, while implementing CI/CD pipelines with GitHub Actions, Terraform, and CloudFormation. Adept at leveraging Apache Spark, Airflow, and Kafka for big data transformation and orchestration. Strong background in data modeling, governance, and compliance frameworks including HIPAA and GDPR, with a consistent focus on cost optimization, automation, and business impact. Skills

Programming & Scripting: Python, SQL, Bash, Java, PySpark

Data Engineering & ETL: Apache Airflow, dbt, AWS Glue, Azure Data Factory, Google Dataflow, Talend, Informatica, SSIS, SFTP, ETL Automation

Big Data & Distributed Systems: Apache Spark, Apache Hadoop, SparkSQL, Kafka, Apache Flink, Databricks

Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics

Cloud Platforms & Services: AWS (S3, Redshift, Lambda, Glue, EC2, RDS), Azure, GCP

Databases: MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, DynamoDB

Containerization & Orchestration: Docker, Kubernetes, Amazon ECS

Infrastructure as Code (IaC): Terraform, AWS CloudFormation

Monitoring & Logging: CloudWatch, Prometheus, Grafana, ELK Stack

DevOps & CI/CD Tools: Git, GitHub Actions, Jenkins, Azure DevOps, GitLab CI

Data Modeling & Governance: Star/Snowflake Schema, Data Lineage, Data Quality Management, Metadata Management, OLTP, OLAP, GDPR, HIPAA

Analytics & Visualization: Power BI, Tableau, Looker, SQL-based Dashboards Experience

Wawanesa Group Aug 2024 – Present

Cloud Engineer

Architected and deployed a fault-tolerant AWS infrastructure using EC2, S3, Lambda, Redshift, RDS, and CloudFront, supporting 1.5M+ monthly enterprise data transactions with 99.99% uptime.

Automated infrastructure provisioning via Terraform and CloudFormation, cutting manual configuration effort by over 70%, and reducing deployment errors by 60%.

Designed and orchestrated ETL pipelines using AWS Glue, Step Functions, and Lambda, enabling daily ingestion and transformation of 12M+ records from multiple sources.

Containerized and deployed over 20 microservices using Docker and Amazon ECS with Fargate, increasing deployment efficiency by 45% and improving scalability.

Developed and maintained CI/CD pipelines using GitHub Actions and AWS CodePipeline, reducing release cycles from 2 weeks to 2 days.

Monitored cloud workloads with CloudWatch, Prometheus, and SNS, reducing MTTR (Mean Time to Resolution) by 35% for production issues.

Enforced security controls including KMS encryption, IAM policies, and VPC segmentation, achieving 100% HIPAA compliance in quarterly audits.

Optimized S3 storage using intelligent tiering and lifecycle policies, cutting monthly costs by 22% (~$1,200/month savings). Creative Newtech Ltd Feb 2020 – Dec 2022

Data Engineer

Built and maintained over 30 automated ETL pipelines using Apache Spark (PySpark), Kafka, and Apache Airflow, processing 500GB+ daily from 20+ sources.

Migrated data workflows from on-prem to AWS S3, Redshift, and Athena, reducing report latency by 40% and saving analysts 8 hours/week.

Engineered transformation logic in SQL and Python to clean and enrich 10TB+ of healthcare datasets, increasing data quality scores by 28%.

Created high-performance data models in Redshift using star/snowflake schemas, improving dashboard load times by 50%.

Implemented serverless ETL transformations with AWS Glue and Lambda, decreasing processing costs by 30%.

Developed and published 12+ dashboards in Power BI and Tableau, supporting real-time KPI monitoring for 7 clinical departments.

Established data governance controls including encryption, masking, and audit logs, ensuring full HIPAA and GDPR compliance across 3 annual audits.

Integrated ETL jobs into CI/CD pipelines via GitHub and AWS CodeBuild, reaching 90% test coverage and reducing job rollback by 25%.

Projects

Real-Time Data Pipeline for Financial Transactions Tools & Technologies: AWS (S3, Lambda, Kinesis, Glue, Redshift), Apache Spark, Airflow, Python, SQL, Terraform, GitHub Actions Description: Developed a real-time data ingestion and transformation pipeline to process high-volume financial transactions for fraud detection and reporting.

Key Contributions:

Built streaming ingestion using AWS Kinesis Firehose and processed events via AWS Lambda and Glue Jobs.

Transformed data using PySpark on AWS Glue and loaded insights into Amazon Redshift for business dashboards.

Scheduled pipeline orchestration with Apache Airflow, ensuring >99% SLA compliance.

Enabled schema validation, logging, and S3 backup versioning for audit compliance and fault tolerance.

Automated infrastructure with Terraform and integrated deployment via GitHub Actions. Impact: Reduced data latency from 30 minutes to <5 minutes and improved fraud detection response times by 45%. Education

University of Guelph-Humber

PG Certificate in Cloud Computing (2024)

PG Certificate in AI & ML (Dean’s List, 2023)

GLS University, Gujarat

Bachelor of Science in Information Technology (2019) Certifications

AWS Certified Cloud Practitioner (2025)



Contact this candidate