Post Job Free

Resume

Sign in

Data Engineer

Location:
Lorton, VA
Posted:
October 28, 2023

Contact this candidate

Resume:

Minglei Cai

+1-202-***-****

ad0oro@r.postjobfree.com

EDUCATION

Georgetown University Sep 2021 - May 2023

Master of Science, Data Science and Analytics GPA: 3.63/4.0 Washington DC, US University of Science and Technology of China (THE World #74) Sep 2017 - May 2021 Bachelor of Science, Mathematics (Major), Computer Science (Minor) Hefei, China SKILLS

Programming: Python, SQL, R, JavaScript/TypeScript, C++, Java, C#, MATLAB Cloud Computing / ML: Kafka, Spark, Hadoop, Hive, HBase, Torch, Scikit-Learn, Tensorflow, NLTK, spaCy, NumPy Data Visualization: Tableau, PowerBI, D3.js, Matplotlib, Seaborn, Plotly, Altair, ggplot2, R Shiny, Datashader Frontend / Backend / Database: Angular, Spring, Flask, MySQL, PostgreSQL, MongoDB CERTIFICATIONS

AWS Certified Cloud Practitioner Sep 2023

AWS Certified Solutions Architect – Associate Oct 2023 WORK EXPERIENCE

Georgetown University Aug 2021 - Dec 2022

Teaching Assistant / IT Support Washington DC, US

Provided maintenance for educational technologies and resolved software/device issues.

Graded homework, conducted Q&A sessions and addressed coding/math problems raised by students. Hefei Zhongke Leinao Tech Co. Feb 2020 - Feb 2021

Data Engineer Intern Hefei, China

Implemented real-time data pipelines for data collected from sensors in the refrigerator testing processes, using Kafka for data ingestion, Spark for data processing and modeling, FineBI for data visualization.

Implemented anomaly detection modules for the monitoring machines, increasing the precision by 22%.

Worked closely with infrastructure engineers on the protocol integration, optimized and redesigned the database schemas, decreasing workloads by 21% and reducing the database capacities by 31%.

Developed Python Plotly scripts to generate summary reports for the manufacturing production lines. PHIMA Intelligence Tech Co. Nov 2019 - Jan 2020

Data Engineer Intern Maanshan, China

Optimized ETL processes using Spark, reducing the time complexity by 35%.

Built real-time video streaming pipelines for surveillance cameras using Alibaba Cloud Link Vision Video and implemented image preprocessing modules using Tensorflow.

Built the quarterly reports from the company data warehouse using Tableau. COURSE PROJECTS

Big Data – Impact of Russia-Ukraine Conflict to Commodity Markets Nov 2022

Applied LDA and sentiment analysis techniques to extract topic and sentiment-related features from over 10 million Russia-Ukraine-Conflict related comments on Reddit using Spark on Azure Databricks.

Built predictive models of XGBoost, Random Forest, and Feed-Forward Neural Networks for commodity prices and achieved an R-Squared value of over 0.2 for the price of natural gas. Visualization – What Makes a Great Buffet Restaurant in Florida Apr 2022

Conducted aspect-based sentiment analysis and identified 3 significant factors for positive reviews.

Created interactive plots with D3.js and Plotly to visualize quantitative insights from 1M+ Yelp reviews. NLP – A Conversational Chatbot Nov 2022

Utilized PyTorch and Transformers to fine-tune DialoGPT (a transformer-based language model).

Deployed the chatbot via the Google Voice API, ensuring convenient use through messaging for entertainment. Generative AI – Social Media AI Assistance Apr 2023

Utilized Diffusers, Transformers to incorporate ChatGPT and Stable Diffusion and constructed pipelines of prompt-based image generation, image captioning and text generation.

Developed a web application with FastAPI and TypeScript, Angular, enhancing artistic creation for graphic design and boosting the efficiency of mass-producing image assets.



Contact this candidate