Post Job Free

Resume

Sign in

Data Assistant

Location:
San Diego, CA
Posted:
February 19, 2021

Contact this candidate

Resume:

Jiahe Feng (Jeffrey)

**** ******** ** #****, *** Diego, CA, 92122; 484-***-****; adkbp6@r.postjobfree.com

http://www.linkedin.com/in/jeffrey-jiahe-feng ; Blog: https://jeffrey7377.github.io Education

University of California San Diego, La Jolla, CA September 2018 - March 2022

- B.S. Data Science; B.S. Cognitive Science Spec. Machine Learning and Neural Computation; Minor Japanese Studies.

- GPA: 3.9 (Cumulative), 3.91 (Major). Provost Honor; Warren College Honor Society Member. Skills

Data Skills: Exploratory data analysis, data mining, recommendation system, data modeling, scalable analytics, data scraping, regex, geospatial data, data management, data visualization, web design, signal processing, experimental design and analysis. Programming language: Python, Java, SQL, HTML, CSS, Javascript, R, Matlab. Tools/Packages: NumPy, Pandas, Matplotlib, AWS, Dask, Hadoop, Spark, Beautifulsoup, Geopandas, Scipy, Scikit-learn, KNIME. Work History

Data Science Intern at Scripps Research January 2021 - Current

- Supervised by Dr. Eric Zorrilla, identifying genomic risk factors for human alcohol use phenotypes.

- Retrieving genetic data and selecting relevant clinical risk factors from UKBioBank through Scripps HPC server.

- Performing GWAS and SNP association testing to identify genetic variants using REGENIE and SAIGE packages.

- Plan to publish a research paper by the end of the spring quarter with two other students with biology backgrounds. Research Assistant at UCSD Jacobs School of Engineering October 2020 - Current

- Supervised by Dr. Micheal R. Davidson, helping build Python package Geodata to streamline the manipulation of renewable energy resource profiles and land use datasets with high geographical resolution.

- Examining the rebound of PM2.5 in northern China after covid lockdown by retrieving NASA Earthdata using Geodata, creating Plotly interactive charts and Matplotlib animations, and sifting out specific provinces with Cartopy shapereader and Xarray . Teaching Assistant at University of California, San Diego September 2019 - Current

- Courses: DSC10 Principles of Data Science (Fall 19, Spring 20) and COGS 108 Data Science in Practice (Winter 21).

- Topics including data analysis, hypothesis testing, data visualization, web scraping, text analysis, and machine learning in Python.

- Tutored over 800 students. The tasks include creating and grading student assignments, leading discussions, and holding office hours. Projects

Supervised ML Algorithm Comparison December 2020

- Replicated a Cornell University study that compared the performance of different supervised machine learning algorithms.

- With 5 trials, 3 datasets, 4 algorithms (Logistics Regression, ANN, Random Forest, and Gradient Boosting), created models with optimized hyperparameters, with 4275 total train/validation cycles from grid search and cross-validation.

- Ranked the test accuracy and runtime of models for different algorithms/dataset combos and tested for statistical significance. Clothing Fit Classification December 2020

- Selected 200000 instances of clothing renting data from rentaway.com, imputed the missing values, and created data visualizations.

- Compared validation accuracy on different feature selections and natural language processing models on the review text.

- Attain 81% accuracy on whether the user will find a clothing item fit, small, or large and evaluate the models. Steam Game Play Prediction November 2020

- Predicted whether a Steam user would play a certain game given over 170000 instances of player-game pairs with limited features.

- From JSON formatted raw data, generated both positive and negative prediction labels and training data using popularity ratio, Jaccard similarity, cosine similarity, and other extractable features for each user and game pair.

- Obtained 72% test accuracy and ranked top 13% in the class competition (around 400 undergrads and 250 graduate students) E ect of Climate Change Visualization June 2020

- Made a data visualization web page and hosted it on Github to show the rise of temperature and the effect of climate change in CA.

- Used Javascript and the Highchart library to create interactive charts that users can manipulate.

- Created user-centered designs that help viewers easily understand the severity of global warming. Imagined Emotion Project February 2020 - March 2020

- Selected over 10GB of electroencephalography recordings from UCSD researchers in the 1980s and cleaned the data.

- Used Matlab EEGLAB to identify different prediction labels and classified emotions using various ML algorithms in Python.

- Successfully predicted positive/negative emotions from patients with over 95% accuracy.



Contact this candidate