Post Job Free
Sign in

Data Engineer

Location:
Arlington, VA
Posted:
March 29, 2020

Contact this candidate

Resume:

Lihua (Neo) Pei

Email: *****.*******@*****.*** Phone: +1-847-***-**** Portfolio of Projects: https://lihuapeineo.github.io Objective

I am looking for Data Engineer and Data Scientist opportunities and I am a Permanent Resident. Education

The George Washington University Washington, DC

Master Degree in Data Analytics. GPA: 3.64/4.00 Expected May 2020 Stony Brook University Stony Brook, NY

Bachelor Degree in Applied Mathematic & Statistics. GPA: 3.55/4.00 December 2017 Skills

Languages: Python, SQL, R, Java, and HTML/CSS.

Systems: MySQL, MongoDB, Linux, Spark, Excel, and Latex

Others: Familiar with Machine Learning, Data Visualization, Database Management, Deep Learning and Nature Language Processing.

Industry Experience

Fresh Air DC Project - uRADMonitor, Inc. Washington DC Data Scientist Intern May 2019 - October 2019

ETL: Designed and built new ETL pipelines to transform data into analytics friendly schema by Python (Pandas and Numpy) resulting in a 90% reduction in ETL time.

Database: Designed and managed the remote MySQL database (PyMySQL) to collect 6 uRAD monitors’ data updating every 30 seconds, storing at the George Washington University’s cloud servers.

Comparative Analysis: Conduct Time Series Analysis and Correlation Analysis to test new uRAD monitors’ performance on millions of observations between the test groups and control groups.

XFN Work: Presented reports with the business insights and improvement suggestions to the engineering department.

Fire Pillar Studio Hong Kong, China

Database Developer June 2017 – May 2018

Database: Designed and managed the local MySQL databases for the project which stored over 20,000 original user updated pictures and processed data.

Deep-Learning: Participated in developing a deep-learning program to set up an artificial neural network that can recognize cartoon characters in pictures by using PyTorch.

Unity: Developed a Unity program with the team to make characters have life-like breathing. Data Analytics Projects

Stocks Trend Prediction System The George Washington University SVM (Support Vector Machine) September 2018 – December 2018

Collected data of top 500 stocks with 18 significant features crawling (Beautiful Soup) from Yahoo Finance websites.

Created a SVM-based machine learning model to predict the market trend and stocks’ price. Achieved average 78% accuracy.

Spotlight Twitters Analysis The George Washington University Topic Modeling May. 2018 – October 2018

Applied regular expression and NLTK to construct the twitters’ context cleaning functions to transform 12,000 twitters to analytics friendly words bag.

Constructed an NLP model applied the LSI and LDA methods. The model showed good performance to extract the topic for each given twitter.

Research Experience

Optimal Topologies Searching Research Project Stony Brook University Research Assistant Guided by Prof. Yuefan Deng August 2017 - March 2018

Database: Designed and managed the local database to store over 10 million generated topologies for research purposes.

Algorithm: Designed and implemented a genetic algorithm to search the optimal topologies from 10 million generated topologies on High-performance Computer.

Data Analysis: Collected the performance data of new topologies and visualized experimental results. The optimal topology achieved 150% to 320% better efficiency than commonly used ring topologies.



Contact this candidate