Sign in

CMU MScience Student, Dell Data Science Intern

Pittsburgh, PA
July 28, 2022

Contact this candidate


Yiluo Qin Phone number: 909-***-****




Carnegie Mellon University, Pittsburgh, PA Master of Science in Electrical and Computer Engineering Cumulative GPA: 3.7/4.0 Feb 2021 — Dec 2022

University of California San Diego, San Diego, CA Bachelor of Science in Data Science Cumulative GPA: 3.6/4.0 Major GPA: 3.8/4.0 Aug 2016 — Jul 2020 SKILLS

Programming Languages: Java, Python, SQL, JavaScript, R, HTML Softwares/Packages: Scikit-learn, PySpark, Pytorch, PowerBI, Git, Tableau, MapReduce framework, TensorFlow, and AWS Knowledge: Machine Learning, Deep Learning, Data Visualization, Hypothesis Testing, NoSQL, and Advanced Stats and Probs INTERNSHIP/RESEARCH EXPERIENCES

Data Science Research Intern at Dell Pittsburgh, PA, Jun 2022 — Aug 2022

• Performed data preprocessing and validation on real time Dell telemetry data served for user products experience data.

• Designed methods to tackle problems such as sparse data columns, date time mismatch, and duplication errors in the source data.

• Collaborated and built an improved semi-supervised model to predict users overall experiences (0 to 10) with a 70% accuracy. Research Assistant on TikTok User Behavior Research at CMU MINT Lab Pittsburgh, PA, Nov 2020 — Feb 2022

• Collaborated with Dr. Klug and analyzed four main user assumptions about the functionality of the TikTok algorithm.

• Performed quantitative data analysis to test and verify how user understandings of algorithms influence content creation.

• Discovered features including video creation time and video engagement highly positively correlated with video play count. Research Assistant on Summer Sleep Study Research at CMU Pittsburgh, PA, Jun 2021 — Sep 2021

• Performed SMOTE and under sampling techniques on five distinct classes to tackle dataset imbalance on the server.

• Co-developed one CNN model and classified kids’ five sleeping stages using EEG channel with a 60% accuracy. Data Science Intern at SAV-E San Diego, CA, Sep 2020 — Apr 2021

• Obtained three most essential environmental components data to create single Life Cycle Analysis model in OpenLCA.

• Utilized flow, process, and process system to assemble an environmental sustainability model for white cotton T-shirts.

• Assisted to construct a visual indicator from 1 to 10 to evaluate environmental cost of cotton T-shirts on Amazon product page. Teaching Assistant at Halicioglu Institute of Data Science, UCSD San Diego, CA, Apr 2019 — Jun 2020

• Tutored for three different classes with student sizes 100+ and 200+ and worked with 10+ and 20+ tutor teams.

• Held weekly office hours and led weekly discussion sessions to answer coding questions and to review core lecture concepts. Data Science Intern at TeraData San Diego, CA, Jul 2019 — Sep 2019

• Helped to clean up and reconstruct multiple health care datasets using Pandas and SQL.

• Assisted to build one prediction model with SageMaker and TensorFlow for a local pharmaceutical company. Data Analyst Intern at Microsoft Shanghai, China, Jun 2018 — Sep 2018

• Accomplished data preprocessing and cleaning of electricity consumption data in Shanghai and Beijing campus with MySQL.

• Discovered redundant relationships and created more efficient relationships to link 50+ tables with PowerBI.

• Composed an interactive electricity consumption comparison dashboard for Shanghai and Beijing campus with PowerBI. PROJECTS

U.S. Major Cities UberX Entry Year Binary Prediction Pittsburgh, PA, May 2022

• Investigated whether early UberX entry has significant economic implications such as worse local transportation competitions.

• Discovered UberX entry year is related to attributes like avg. household income, number of cars per household, etc. with R.

• Performed 5-fold cross validation on three generalized linear models and obtained the one with the highest accuracy of 70%. Chicago Crime Rate Visualization Dashboard Pittsburgh, PA, Dec 2021

• Reconstructed and filtered online source data that have more feature completeness using SQL and Python.

• Wrote several functions in JavaScript to link different sections of the dashboard and enabled various on-click, scroll effects.

• Synchronized visualizations with HighCharts, JavaScript, HTML/CSS, and Python to display the final data. Olympic Game Dataset from Kaggle San Diego, CA, Dec 2021

• Identified functional dependencies across all attributes and designed an ER Diagram to fully represent cardinalities.

• Wrote and created table statements with optimal data type declarations and populated the data with PostgreSQL.

• Decomposed resulting entities into 3rd Normal Form (3NF) and executed queries based on questions of interest. Read and Unread Prediction Dataset from Kaggle San Diego, CA, Apr 2020

• Predicted user read/unread behavior based on user-book information from training set (unsupervised learning).

• Constructed baseline model with Popularity Ratio, Jaccard Similarity, and a combination of both.

• Developed two new features with normalization and redesigned the decision algorithm to optimize decision rule.

• Achieved top 8% ranking in class with a 74.5% accuracy score (about 380 undergraduate and 350 graduate students). Room Occupancy Prediction San Diego, CA, Dec 2018

• Predicted room availability based on features including temperature, humidity, energy consumption, etc. (supervised learning).

• Engineered one more feature ‘Hour’ to further assist us to reach a higher prediction score.

• Compared different model performances and obtained 98.34% accuracy on testing set using Logistic Regression. PUBLICATIONS

Trick and Please. A Mixed-Method Study On User Assumptions About the TikTok Algorithm Pittsburgh, PA, Mar 2021

• Published in the Proceedings of the 13th ACM Web Science Conference 2021 (WebSci’21).

Contact this candidate