Post Job Free
Sign in

Data Engineer

Quan 1, 71000, Vietnam
May 23, 2024

Contact this candidate


Pham Duc Manh

Bien Hoa • • 033******* •


Amazon Retails Data Analystic May 2024 - Jun 2024

Orchestrated end-to-end data workflows using Apache Airflow and Docker: Managed data extraction from the Amazon website, storing raw data in PostgreSQL, and ensuring efficient workflow management. Integrated scalable and secure data storage with Azure Data Lake Storage Gen2: Managed data movement from PostgreSQL to Azure Data Lake Storage, facilitating scalable and secure storage solutions. Implemented ETL processes within Apache Airflow and leveraged Azure Data Factory and DataBricks for advanced analytics: Streamlined data transformation and loading processes using Airflow, and utilized DataBricks for high-performance data processing and AI.

Performed comprehensive data analysis with Azure Synapse Analytics and enabled data visualization with Tableau: Provided deep insights through integrated data analysis and created interactive dashboards for effective data presentation. TECHNOLOGY DESCRIPTION

Airflow, Docker, PostgreSQL, Azure Data Lake Storage Gen2, Azure Data Factory, DataBricks, Azure Synapse Analytics, Tableau API Realtime Data Streaming Jan 2024 - Feb 2024

Orchestrated end-to-end data workflows using Apache Airflow and Apache Kafka, enhancing the efficiency of data generation, streaming, and processing tasks.

Managed the generation of random user data via the API, and streamlined data storage and management in PostgreSQL and Cassandra.

Leveraged Apache Spark's distributed computing power for efficient data processing, ensuring scalability and performance in handling large datasets.

Implemented robust stream monitoring and schema management with Zookeeper, Control Center, and Schema Registry, maintaining high data integrity and consistency.


Airflow, Kafka, PostgreSQL, Spark, Zookeeper, Cassandra, Docker Big Data For Movie Analystics Oct 2023 - Dec 2023

Led the data processing and cleaning phase, initiating with the extraction of data from the IMDb website and Box Office figures from IMDb Pro.

Reduced data processing time by 15% through the implementation of the end-to-end data pipeline, enhancing project efficiency. Improved data accuracy by 20%, as evidenced by the reduction in discrepancies found during data analysis phases. Ensured the data was meticulously cleaned and prepared, thereby significantly improving the accuracy of subsequent analysis by data analysts.

Facilitated a 10% increase in predictive model accuracy for box office performance, directly impacting marketing and distribution strategies.


Spark, Snowflake, Python, Airflow, PostgreSQL, Streamlit, Git, Docker EDUCATION

Bachelor of Mathematics and Computer Science Sep 2021 - Present University of Science-VNUHCM


Languages: Python, SQL, C++

Tools: Apache Hadoop, Apache Spark, Apache Kafka, Apache Airflow, MySQL, PostgreSQL, Apache Cassandra, Power BI, Snowflake, Graphql, Docker

Cloud Computing Platforms: AWS, Microsoft Azure


TOEIC RL 760, SW 280 by IIG Vietnam

Contact this candidate