Aspiring Data Engineer with Python, SQL, ETL, Cloud (Azure)

Location:

Quan Tan Binh, 72100, Vietnam

Posted:

November 14, 2025

Contact this candidate

Resume:

NGUYEN MINH DUY

Data Engineer

Linh Trung Ward, Thu Duc City, Ho Chi Minh City

+84-795-***-*** # *************@*****.*** ï linkedin.com/in/minzi03 § github.com/minzi03 Objective

Aspiring Data Engineer with hands-on experience in designing, building, and optimizing data pipelines and cloud-based data warehousing solutions. Proficient in developing scalable and reliable data infrastructures that integrate batch and real-time workflows, enabling advanced analytics, business intelligence, and data-driven decision making. Education

VNU-HCM University of Information Technology (UIT) Sep. 2021 – Expected Graduation: 2025 Bachelor of Information Technology — Major in Information Systems GPA: 3.1 / 4.0

– Relevant Coursework: Database Systems, Data Warehousing & OLAP, Cloud Computing (Azure), Big Data Analytics with Spark, Machine Learning & Data Mining, Business Intelligence & Visualization. Skills

Programming & Querying: Python (Pandas, NumPy, PySpark, Scikit-learn), SQL (T-SQL, PostgreSQL, MySQL, Snowflake), R, Java, C/C++

Data Engineering & Orchestration: Azure Data Factory, Databricks, Apache Airflow, dbt, GitHub Actions (CI/CD) Streaming & Real-Time Processing: Apache Kafka, Debezium CDC, Spark Structured Streaming Cloud & Lakehouse Platforms: Azure (Synapse Analytics, Data Lake Gen2), Snowflake, Delta Lake (ACID, Z-Ordering, Partitioning), MinIO (S3-compatible), MongoDB

Data Warehousing & Modeling: ETL/ELT workflows, Medallion Architecture (Bronze–Silver–Gold), Star/Snowflake Schema, SSIS, SSAS, MDX

BI & Visualization: Power BI (DAX, modeling, KPIs), Excel (Pivot, Power Query), SSRS, Matplotlib, Seaborn Machine Learning & Analytics: EDA, Feature Engineering, Forecasting (LSTM, LightGBM), Regression, Clustering Collaboration & DevOps: Git/GitHub, Docker, Jupyter Notebook, Agile/Scrum, CI/CD Projects

Modern Data Stack Pipeline Kafka, Debezium, MinIO, Airflow, Snowflake, dbt, GitHub Actions GitHub Sep. 2025

– Built a containerized real-time pipeline simulating banking transactions with PostgreSQL as the operational source.

– Implemented Change Data Capture (CDC) using Debezium and streamed events via Apache Kafka to MinIO as Parquet files.

– Orchestrated ingestion with Apache Airflow and applied the Medallion Architecture (Bronze Silver Gold) in Snowflake.

– Developed dbt staging, dimension, and fact models with tests and SCD Type-2 snapshots for history tracking.

– Automated validation and deployment using GitHub Actions (CI/CD) to ensure reliability and reproducibility. Azure E-Commerce ETL Pipeline & Analytics ADF, ADLS Gen2, Databricks, Synapse, Power BI, Delta Lake GitHubJun. 2025

– Designed and developed a production-grade ETL pipeline on Azure using ADF, Databricks (PySpark), Synapse, and Power BI following the Medallion Lakehouse Architecture (Bronze Silver Gold).

– Automated ingestion from MySQL, MongoDB, and HTTP/CSV APIs through dynamic, parameterized ADF pipelines leveraging Lookup–ForEach–Copy activities.

– Transformed and optimized data in Databricks (PySpark + Delta Lake) with schema enforcement, surrogate keys, partitioning, and Z-Ordering for query performance.

– Modeled a Star Schema with 1 Fact ( 2.9M rows), 8 Dimensions, and a Bridge table to support analytical workloads.

– Exposed curated data via Synapse external tables & views, powering Power BI DirectQuery dashboards with 20+ KPIs for Sales, Customer Insights, and Logistics—uncovering 20% YoY growth and 10–12 day delivery delays.

– Implemented Logic App alerts and integrated Azure Monitor for automated health checks and pipeline observability. Movie Data Warehouse (ETL & OLAP) SQL Server, SSIS, SSAS, SSRS, Power BI, Python GitHub Jul. 2024

– Designed a Star Schema with 1 Fact Table ( 780K records) and 8 Dimension Tables derived from over 1M raw rows.

– Built SSIS ETL pipelines processing over 780K rows per batch, improving overall data quality by 21%.

– Developed SSAS cubes with 7 measures and 25+ MDX queries to support multi-dimensional analytics and performance KPIs.

– Created 15+ reports using SSRS & Power BI (2015–2024) highlighting revenue trends, top-performing movies, and production distribution by country.

– Extended the project with a Python forecasting module, achieving 94% accuracy in movie revenue prediction using SVR. Awards & Certifications

• Google (2024): Professional Certificates — Data Analytics, Advanced Data Analytics, Business Intelligence.

• Microsoft (2025): DP-900 Microsoft Azure Data Fundamentals; DP-203 Data Engineering on Microsoft Azure; DP-3027 Implement a Data Engineering Solution with Azure Databricks; PL-300 Power BI Data Analyst.

• IBM (2025): IBM Data Engineering Professional Certificate — In Progress (16-course specialization).

• DataCamp (2025): Data Analyst Associate; Data Engineer Associate; Data Engineer.

Contact this candidate