Post Job Free
Sign in

Data Scientist Intern

Location:
Manhattan, NY, 10019
Posted:
May 06, 2024

Contact this candidate

Resume:

Yuheng Hu

*** * ***** **, *** York, NY *****

Email: **********@*******.*** Tel: 1-516-***-**** LinkedIn URL: www.linkedin.com/in/yuhenghu EDUCATION BACKGROUND

Columbia University New York, USA

Master of Electrical Engineering 09/01/2023-12/2024(Expected) University of Nottingham Nottingham, UK

Bachelor of Electrical and Electronic Engineering (GPA: 3.76/4.0) 09/01/2019-07/01/2023 Provost’s Scholarship granted in Dec. 2022 (top 3%) PROFESSIONAL EXPERIENCES

Hangzhou Shengxing Energy Technology 07/01/2023-08/31/2023 Data Scientist Intern Zhejiang, China

Applied full-stack Java skills in the development of "Power System Simulation Data Visualization Platform", utilizing VUE architecture, SpringBoot framework and Mysql database, creating 1K+ charts and graph components.

Executed comprehensive unit tests on project code, covering 95%+ functional codes, resulting in a 15% reduction in code violations and enhanced maintainability.

Developed data models for the platform which enables various data analysis functions such as Time Series Prediction and Clustering, deploying the platform on the cloud server (Aliyun Windows Server) as part of the software system.

Conducted data collection with SQL and preprocessed the data with Pandas, maintaining data quality standards with a 99% accuracy rate in data cleansing and validation processes.

Utilized Tableau for data analysis and visualization, enhancing reporting capabilities and user interaction experience Zhejiang Uniview Technology 06/20/2022-08/20/2022

Data Scientist Intern Zhejiang, China

Developed the data model for an electronic gate monitoring project with a focus on video recognition, achieving a 20% improvement in accuracy over the previous iteration.

Implemented Computer Vision (CV) capabilities, conducted data collection and governance, processed image data, and used LabelMe for annotation, establishing an initial dataset of 3,000 training images (CV).

Employed semi-supervised methods for dataset augmentation, increasing the volume of data for modeling by 60%.

Contributed to the person image recognition module (semantic segmentation & image recognition), compared CNN (CV) as a baseline and ultimately selected YOLOv5 (CV), training and deploying the optimal model.

Built and fine-tuned models using PyTorch and TensorFlow, wrote interfaces for model deployment, achieving an F1 score of 97% which beyond departmental expectations. PROJECT EXPERIENCES

Designed and Developed a Comprehensive Online Shopping Website Database 09/01/2023-11/01/2023

Spearheaded the design and implementation of a robust database for an e-commerce platform, leveraging PostgreSQL, resulting in a 20% improvement in data management and scalability.

Played a pivotal role in architecting the database system, including meticulous schema definition, table creation, and establishment of intricate relationships and constraints in a PostgreSQL environment.

Engineered a Python-based application to interface with the database, facilitating real-time data updates and complex queries through an intuitive web interface. Enhancing data accessibility and streamlined operational processes. Advanced Blood Pressure Estimation Project Using PPG and ECG Signals 09/01/2022-06/01/2023

Designed a cuff-less and non-invasive blood pressure monitoring system by integrating Artificial Neural Networks with PPG and ECG signals, significantly improving the prediction accuracy benchmarks of cardiovascular diseases.

Successfully minimized the prediction error of blood pressure readings to within 5mmHg, outperforming the accuracy benchmarks set by existing commercial blood pressure measuring devices.

Employed rigorous validation techniques, comparing the system's readings against a standard blood pressure monitor under identical conditions to ascertain the prediction error and refine the model's precision.

Developed a comprehensive SQL database to store PPG/ECG signal values for in-depth analysis, selecting key signal features and environmental factors, optimizing the prediction algorithm's accuracy and reliability by 25%. SKILLS

Programming: Python (sklearn, Pandas, NumPy, Regex, matplotlib, seaborn), SQL, C++, Java, JavaScript Machine Learning: Generalized Linear Models, Tree-based Models, Clustering Models, Boosting Models, Regularization, Time Series Analysis, Neural Networks Data

Technology: Excel, Google Cloud Platform, Tableau, Power BI, MATLAB



Contact this candidate