Data Analyst Engineer

Location:

Quan 1, 71000, Vietnam

Posted:

January 01, 2025

Contact this candidate

Resume:

Hoàng Minh Triết

Github LinkedIn +847******** *****************@*****.***

Career Objective

A real go-getter, detailed-oriented Fresher Data Analyst seeks the highest possible opportunities transitioning to a Data Engineer in the long-term goal. With a rock-solid background in data analytics, specialized in E-Commerce, I’m always in my best self to contribute my knowledge to drive the business’s growth. Working Experiences

Data Analyst ADA (Acquiring Customore) Sep 2024 – Present Tools: Python, PostgreSQL, Fast API, Spreadsheets/Excel, Slack, Notion, Dropbox

• In charge of building ETL pipelines, automation tools as requested for all clients.

• Manage end-to-end project/road-map and collaborate with other FTEs to complete clients’ requests.

• In charge of monthly data check, data accuracy, products/bundle mapping for all clients.

• Fulfill products’ feature, category mapping, and sellers’ configurations for all clients. E-Commerce Data Analyst Intern Customore March 2024 – Sep 2024

• In charge of checking, loading and ensuring data integrity for clients.

• Loading and aggregating data of traffic and sales for clients’ sellers in all marketplaces

• Support in ad-hoc issues to fix clients inflated/deflated data. Participated projects:

Reckitt-Benckiser Real Time Dashboard For Mega Day Project durations: 3 weeks Project description To crawl and process the data from both Seller Center and Brand Portal sources and refresh data

according to client’s requested frequencies during mega day

Client Reckitt Benckiser

Departments Involved Analytics, Tech, Account

Project’s role Data Engineer

Technologies used: Python, PostgreSQL, Spreadsheet/Excel, Google sheet API

• Create 4 scripts getting real-time data from Lazada Brand Portal, Lazada Seller Center, Shopee Brand Portal, and TikTok Seller Center by calling marketplaces’ hidden API and Playwright automation.

• Create the local back-up script to automate querying data from database’s source and pasting to the spreadsheet to support tech tool’s services.

• Achieved creating robust pipelines/scripts to automate getting data every 5 minutes to refresh client’s Google Sheet’s dashboard during mega days.

Reckitt-Benckiser Offline – ETL offline data Project durations: 4 weeks Project description To upload offline data from 9 marketplaces to Customore’s database, do product mapping for SKUs

and update data to Power BI

Client Reckitt Benckiser Offline

Departments Involved Analytics

Project’s role Data Engineer

Technologies used: Oracle Ubuntu VM, Bash, Python, Flask, Redis, Celery, MySQL, Spreadsheet/Excel

• Developed quick application for Analytics members to access and upload raw client’s data to the database through pipeline’s ETL process.

• Overcome cloud’s overhead issues by switching from containerization to systemctl to manage Celery workers, Flask, and ETL pipeline, significantly saving resources to deploy these services.

• Confgured MySQL database replication through master and slave relationships, reduce workload for main shard by 50%.

• Increased Analytics’ capacity by reducing processing time compared to using Excel/Spreadsheets template to upload by 40% (Uploaded 1 month data for 1 marketplace for under 2 minutes)

• Visit code: Here

Lazada Brand Portal – Reports Automation Project durations: 3 weeks Project description To automate downloading reports from Lazada Brand Portal source for all Customore’s clients

Client All clients having traffic data from this source Departments Involved Analytics

Project’s role Data Engineer

Technologies used: Python, Django, Docker, Huey, Redis, MySQL, Spreadsheet/Excel

• Developed end-to-end Playwright bot to automate downloading Lazada Brand Portal reports for sellers, reducing processing time from manual procedures by 66% (Downloaded 369 reports in 30 minutes)

• Created a Python script to automate renaming and concatenating these files to an internal format.

• Improved data accuracy by fixing inflated traffic data for 1 year and 6 months for 4 sellers.

• Visit code: Here (Tasks executed by Playwright) Here (Integrated to back-end) Accenture Realtime reports for mega day Project durations: 3 weeks Project description To process data in real-time during mega day to generate client-based format reports and upload these reports to client’s Share point folder for 81 sellers Client Accenture

Departments Involved Analytics, Tech, Product, Account Project’s role Data Engineer

Technologies used: Python, Spreadsheet/Excel, Google sheet API, PostgreSQL

• Achieved creating back-up scalable pipeline to collect data for 46 sellers from Lazada Seller Center to generate traffic and voucher reports on Analytics’ local machine (Collected data for 10 sellers under 2 minutes and output 20 JSON reports through leveraging multi-thread processing).

• Created internal ORM script to automate renaming voucher and traffic reports into client’s format for 35 Shopee sellers through reports’ voucher code and item code.

• Achieved back-up coverage over 90% for over 1100 traffic and voucher reports generated in 1 hour with refreshing frequencies at 10 minutes per seller.

Nestle Philippines Market share Report Project durations: 2 weeks Project description To output the market share report at brand x category and SKU levels for the month minus 1

Client Nestle

Departments Involved Analytics, Account

Project’s role Data Analyst

Technologies used: Python, Requests, Spreadsheet/Excel, PostgreSQL

• Developed Playwright script to automate scrapping SKU’ total ratings, current month ratings, and ASP.

• Creating tools retrieving new SKUs from seller’s store to refresh recurring market share report.

• Adjust market share’s ratings based on Account or client’s request.

• Reduced processing time from manual collecting methods by 70% (Collected and scraped over 400 SKUs from Lazada under 1 hour).

Engineering projects

Realtime Jobs Scrapping Indeed Recruiting Websites

• Technologies used: Oracle Ubuntu VM, Bash, Java, Kafka, Spark Streaming, MongoDB Atlas, Grafana, Kubernetes, Docker, Google Sheet API.

• Build up the complete end-to-end pipeline through leveraging Kafka with Kafka-Spark streaming to scrape and process the data in real-time. Leveraged Google Sheet API and Grafana to monitor the scrapped result.

• Integrate seamlessly lightweight Kubernetes (K3s) and configure Zookeeper’s container, Zookeeper’s service, and Kafka load balancer to replicate 2 Kafka clusters on Oracle Cloud. Achieved scrapping results for over 3000 recruiting data points from this website.

• Visit code: Here

Web Scrapping And Modeling CarreFour E-commerce Platform

• Technologies used: Java, Java Spark, MySQL

• Build up the complete star-schema data modeling for MySQL database. Developed a scrapper to collect SKU’s details based on searching keywords, a processor to clean up and structure data before saving to the database, and a scheduler to schedule task’s frequencies. Achieved data created/updated for all 5 tables in 1 scrape and created table market share to compute % revenue across all SKUs in the database.

• Visit code: Here

SKUs’ Ratings Scrapping With CI/CD Pipeline Lazada E-commerce Platform

• Technologies used: Oracle Ubuntu VM, Bash, Python, GitHub Actions, SonarQube, Docker

• Develop the Python-based Playwright scrapper to mimic users’ interactions with Lazada’s platform and achieved 60% performance improvement over manual methods. Implemented full workflow CI/CD pipeline to automate code testing, analysis and building processes. Achieved 80% test suite coverage for the Python script.

• Visit code: Here

Educational Background

International Business – Rouen program of Open University - GPA: 3.66/4.0 August 2021 – Now Certifications

Microsoft Office Specialist (Word, Excel, Powerpoint) Microsofts Visit certifications: Here Python for Data Analytics Datapot Visit certifications: Here Analyzing and Visualizing Data with Microsoft Power BI Datapot Visit certifications: Here Querying Data with Transact-SQL Datapot Visit certifications: Here Azure Data Fundamentals Datapot Visit certifications: Here Languages

Vietnamese: Maternal language English: IELTS 7.0 French: A2

Contact this candidate