Post Job Free

Resume

Sign in

Machine Learning Data Scientist

Location:
New York, NY
Salary:
20/hr
Posted:
October 22, 2023

Contact this candidate

Resume:

Haoyang Li

*** ********* ***, *** **** City, NY ad0j8o@r.postjobfree.com +1-646-***-**** LinkedIn: linkedin.com/in/lhy1999

Programming Languages: R, Python (Pandas, NumPy, Matplotlib, TensorFlow, PyTorch), MATLAB, SQL, SAS

Software & Platforms: Tableau, MySQL, MongoDB, Oracle, AWS (S3, Redshift), PySpark, Power BI, Neo4j

Certificates: AWS Certified Cloud Practitioner

5+ years of expertise in machine learning, data processing, and analytics, underpinned by a solid foundation in math and statistics, with experience in executing core projects and constructing ML models. EDUCATION

Columbia University New York, USA

Master of Arts in Statistics Expected December 2023 Queen's University Ontario, CA

Bachelor of Science (Hons.), Major in Statistics, Minor in Economics 09/2017 – 05/2022 Dean's Honor List for the 2019 – 2022 Academic Year PROFESSIONAL EXPERIENCE

Baynovation San Jose, CA, USA

Data Scientist Intern 07/2023 -09/2023

Upgraded from a traditional risk assessment approach to the advance XGBoost gradient boosting framework, achieving a marked improvement in model accuracy from 80% to 90%.

Executed break-even analyses and evaluated risk models (random forest, XGBoost, Cox Model), ensuring the alignment of financial predictions with market dynamics.

Collaborated in the development of a streamlined web application with Flask, allowing both customers and staff to input property details seamlessly, generate risk predictions, and visualize analytical outcomes.

Sensitivity analyses are regularly performed on the developed models to assess their robustness to potential changes and variations in the input parameters and to ensure consistent performance across scenarios. Sino Life Insurance Shenzhen, CHN

Data Analyst Intern, Dept. of Compliance and Risk Management 06/2021 – 08/2021

Specialized in data analytics by collecting and cleansing a significant dataset of bond defaults in China since 2011.

Employed Tableau (dynamic dashboards, interactive reports) for sophisticated data visualization. This approach facilitated the unveiling of intricate patterns and insights, enhancing the strategic analysis of bond market trends.

Leveraged R to construct a logistic regression model, focusing on the meticulous analysis of various bond types. Utilized the glm function for model development, achieving a model accuracy 85%. Then apply cross validation and hypothesis test, resulting in a 20% improvement.

Smart Super Market (Peking University Shenzhen) Shenzhen, CHN Research Assistant for Dr. Ding 04/2021 – 06/2021

Collaborated on a Python-powered smart supermarket initiative using MOT and YOLO v3. The main goal of the project is to provide customers with a seamless checkout experience without human intervention and reduce merchant costs.

Led the data cleaning and preprocessing stages, utilizing tools such as Pandas for data manipulation and OpenCV for image processing. Employed LabelImg for the meticulous annotation of the training dataset.

Spearheaded the training of the Retina Net model using TensorFlow. This model was specifically optimized for dense and overlapping object detection, ensuring accurate product identification even in crowded scenarios. PROJECTS

Explainable and Robust Graph Neural Network for Spatio-Temporal Prediction (Columbia University) New York, USA Research Assistant for Dr. Mo 09/2023 – Current

Employed the Tweepy library in Python to gather tweets pertaining to New York City, targeting specific geographical regions and keywords to obtain highly pertinent, localized data for in-depth spatio-temporal analysis.

Utilized NLTK for essential data preprocessing steps such as stop word removal, common word elimination, and keyword filtering. These steps were crucial in reducing noise in the data and improving the overall accuracy of the subsequent analysis.

Employed Random Forest and ULMFiT classifiers to categorize tweets, aiding in data organization and facilitating effective further analysis for event detection, achieving 86.3% accuracy on the dataset. Analysis of User Behavior on Taobao Using MySQL New York, USA Active Participant 02/2023-04/2023

Conducted a data pipeline by MySQL, automating ETL processes for the Taobao dataset into a MySQL database, ensuring that it was well-structured and optimized for complex query operations.

Employed advanced SQL queries for data extraction and analysis. Leveraged Tableau along with ggplot2 in R for sophisticated data visualizations, enabling the identification of peak user activities.

Leveraged data mining tools R with caret to create segmented user profiles. This analysis revealed a 65% repurchase likelihood within specific user segments, empowering the development of highly targeted marketing strategies. SKILLS SUMMARY



Contact this candidate