Data Engineer Information Technology

Location:

Hanoi, Vietnam

Posted:

July 01, 2025

Contact this candidate

Resume:

Bin Pham

Data Engineer

091******* ************@*****.*** www.linkedin.com/in/binpham 221 Hoang Hoa Tham, Ha Noi OBJECTIVE

Data Engineer with over one year of experience in developing, maintaining, and optimizing data pipelines and services focused on data integration, monitoring, and real-time analytics. Skilled in handling both streaming and batch data workflows, and building robust data warehouse and lakehouse architectures. Experienced in deploying data services and integrating service endpoints across distributed systems, with exposure to DevOps practices such as containerization, infrastructure automation. Passionate about clean data architecture, scalable systems, and delivering actionable insights through reliable, automated, and well-orchestrated data platforms. CERTIFICATIONS

IELTS 7.5

Goethe-Zertifikat B1

EDUCATION

Ha Noi University 2020 - 2025

Information Technology in English

WORK EXPERIENCE

Innovation center VNPT-IT 6/2024 - 5/2025

Data Engineer

1. Chatbot lakehouse project:

• Developed a microservice architecture using Docker. Orchestrated Spark on Kubernetes for batch data extraction and transformation.

• Stored processed data in MinIO with Iceberg catalogs and tables.

• Queried Iceberg tables using StarRocks.

• Created data dashboards and charts in Superset for insights. 2. Chatbot JSON Telegram realtime notification:

• Used Airflow to schedule Kafka streaming and trigger Spark tasks on new data arrival. Streamed nested JSON logs via Kafka for real- time data ingestion.

• Applied Spark to normalize chatbot log responses and extract status-code values.

• Pushed status-code insights to XCom for further use.

• Sent status-code notifications to users via Telegram Bot. 3. Real-Time APIs Data Processing and Visualization:

• Built microservices with Docker for Airflow, Spark, Kafka, MinIO, Iceberg, Trino, and Superset. Scheduled Spark workflows with Airflow to process and store data in Iceberg tables.

• Streamed data from multiple AI service URLs using Kafka.

• Triggered Spark to extract, transform, and load log data into MinIO and Iceberg.

• Queried Iceberg data efficiently using Trino and created sub-tables for Superset.

• Integrated Superset with Trino to build dashboards and charts for real-time data insights. 4. Feathr Ai feature development:

• Designed a new submission pipeline for Feathr AI using Kubernetes, Airflow, and Spark. Wrapped Spark jobs within Airflow DAGs to automate and scale feature processing workflows.

• Used Pydantic for robust data validation and schema enforcement in Python components.

• Leveraged boto3 for seamless interaction with AWS services (e.g., S3, EMR).

• Deployed Spark on a Kubernetes cluster to optimize job execution and resource allocation.

• Enhanced monitoring and observability using Airflow's UI, Spark UI, and custom logging.

• Improved user interaction through parameterized DAGs and dynamic configurations for flexible job submissions. National Centre for Hydro - Meteorological Forecasting 7/2023 - 3/2024 Data Analyst

1. Predictive Weather Analytics:

• Designed and queried data tables to support future weather prediction models.

• Utilized Microsoft SQL to handle and analyze large-scale meteorological datasets.

• Developed interactive dashboards and visualizations using Tableau. 2. Precipitation Prediction & Analysis:

• Built regression models using Python and scikit-learn to estimate precipitation probabilities.

• Derived actionable insights from ingested weather data reports.

• Communicated results effectively through data visualizations in Tableau. Self projects -

Developer

1. Bar Website:

• Utilized MySQL as the database and Node.js for backend development.

• Developed key functions, including user login, reservation management, gift card purchases, and account management.

• Successfully implemented and deployed the application for customer use. 2. House price prediction

• Used Jupyter Notebook and Scikit-learn for training regression and random forest models.

• Visualized model results with Seaborn and Matplotlib.

• Built an interactive user interface with Streamlit for input and model interaction. SKILLS

Technical skill:

• Technical Skills Object-Oriented Programming: Proficient in applying OOP principles to build modular and maintainable code

• Data Structures & Algorithms: Strong foundation in implementing efficient algorithms and data manipulation techniques

• ETL & ELT Pipelines: Experienced in building and optimizing batch and streaming data pipelines for scalable data processing

• Data Integration: Skilled in ingesting and transforming data from APIs, relational databases, flat files, and message queues (e.g., Kafka)

• Exploratory Data Analysis (EDA): Capable of analyzing and profiling raw datasets to extract insights and guide data modeling decisions Tech stack:

• Big Data Technologies: Apache Spark, Kafka, Apache Iceberg, Superset

• Programming Languages: Python, SQL, Java

• Databases: PostgreSQL, MySQL, MongoDB, MinIO, Iceberg

• Query engine: Trino, Superset, Starrocks

• Data Warehousing: Proficient in designing and maintaining data warehouses, data lakehouse

• Workflow Orchestration: Apache Airflow

• Containerization & DevOps: Docker, Kubernetes

• Machine learning: Sckitlearn and mathematics models

• Version Control & Collaboration: Git, GitHub

Contact this candidate