Luke Gao
202-***-**** *****@***.***
Education
University of Maryland College Park, MD
Master of Science in Information Systems Aug. 2021 – Dec. 2022 Hong Kong Baptist University
Bachelor of Science in Data Science Aug. 2017 – Jun. 2021 Experience
Data Engineer Aug. 2022 – Present
Antra, Inc. Sterling, VA
• Collaborated with a team of three to create a cloud-first data ingestion solution using Python, SQL, and Spark, achieving an impressive 99.7% uptime.
• Utilizing SQL and internal APIs, ingested data from diverse sources to construct data views optimized for BI tools like Tableau.
• Conducted a thorough review and enhancement of data dictionaries to include a more robust history for developing consistency across domain.
• Partnered with back-end developers to document data pipelines, APIs, and system integrations, facilitating efficient troubleshooting.
Big Data Developer Intern May 2021 – Aug. 2021
Gree Electric Zhuhai, China
• Built a basic ETL process under guidance to ingest transactional and event data from an internal ERP system with 5000 daily active users, resulting in annual cost savings of $71,000 by reducing reliance on external vendors.
• Utilized PySpark to optimize data processing for large streaming datasets, achieving a 20% improvement in both ingestion speed and processing efficiency.
• Facilitated efficient data fetching by 10% using query optimization and indexing. Data Analyst Intern Jun. 2020 – Sep. 2020
Zhuhai Rural Commercial Bank Zhuhai, China
• Used Scala to store streaming data to HDFS and to implement Spark for data processing.
• Assisted in automating a manual data reconciliation process between different sources.
• Collected 2000 sets of facial recognition data and processed with MapReduce. Projects
Maryland Global Consulting Program Python, Databricks, Tableau, Power BI Sep. 2022 – Dec. 2022
• Collaborated with a team of 5 to develop a Data-Driven global expansion solution for a telemetry system manufacturer.
• Gathered and analyzed global telemetry market data, creating insightful data visualization dashboards to support global expansion strategies.
• Prioritized target audiences, formulated a global expansion strategy, and presented recommendations to the client.
• Awarded a scholarship of $1,500 from CIBE, a Title VI grant from the U.S. Department of Education. F1 Powerhouses Analyzer Python, Databricks, Azure Data Lake, Azure Data Factory May 2022 – Aug. 2022
• Designed and implemented an ETL pipeline to seamlessly ingest diverse data sources, transforming the data to support machine learning, BI reporting, and SQL analytics.
• Conducted in-depth analysis to identify the most dominant drivers and teams in Formula1 over the last decade and all time.
• Developed compelling data visualization dashboards to illustrate the dominance periods of drivers and teams, showcasing their performance levels and historical achievements Technical Skills
Languages: Python, C#, SQL, SAS, JavaScript, R, Scala Tools: Git, Docker, Apache Spark, Azure Data Factory, Azure Databricks, Amazon Web Services, Google Cloud Platform, Pig, Hive, Hadoop, Impala, VS Code/Visual Studio, LINQ Database: MySQL, MS SQL Server, Azure SQL, MongoDB, PostgreSQL