Post Job Free
Sign in

Scientist Intern Analyst

Location:
Washington, DC
Posted:
June 01, 2023

Contact this candidate

Resume:

Yujia (Sian) Jin

202-***-**** *****@**********.*** Washington DC, 20007 Linkedin: www.linkedin.com/in/sian-jin-6878a5224 Github: https://github.com/goldenfishome Portfolio: https://goldenfishome.github.io/index.html EDUCATION

Georgetown University, Master of Science in Data Science and Analytics, Washington DC 08/2021–05/2023 Relevant Coursework: Big Data and Cloud Computing, Neural Networks and Deep Learning, Database Systems for SQL, Advanced Data Visualization, Statistical Learning, Computational Linguistics, Data Structures and Algorithms GPA: 3.9/4.0

Queen’s University Belfast, Bachelor & Master of Pharmacy, UK 09/2017–07/2021 GPA: 3.55/4.0

WORK EXPERIENCES

Data Scientist Intern, Civilience, Delaware 05/2022–08/2022

Conducted web-scraping to automatically download over 10000 records of online disease text info and applied topic modeling, and sentiment analysis on TF-IDF and Word2Vec output to show the trend and focus of infectious disease

Developed infrastructure and architecture of a ETL data pipeline on AWS (ec2, lambda, EventBridge, S3, Dymano DB) and automated the code via lambda, which processed billions of records per day and largely saved manual effort

Built a Random Forest model on customer satisfaction survey and provided actionable insights through feature importance

Deployed a recommendation system through cosine similarity and collaborative filtering using user info and inputs; Built a Discord robot with rest API to automatically assign mentors to users, which helped client to increase subscription by 5% Research Analyst, Queen’s University Belfast, Belfast 09/2020–06/2021

Employed excel and SQLite to create databases for phage depolymerases found in recent five years of publication

Conducted hypothesis testing on the bioinformatic gene’s data, applied K-means clustering to group genes into 5 types, where the K was selected through silhouette score, and visualized the dimension reduced result through Project Control Analyst Intern, Worley Parsons, Shanghai 06/2019–09/2019

Sorted out financial bid analysis form of the purchasing department and engineering drawings of each unit for each project

Mastered Encompass software (electronic document management system) and assisted the project control department with file transfer, cutting the process’s time by 8 hours per week

Applied Excel Power Query to track working hour data of employees and data of project summary, and visualized the insights on power BI dashboard and Tableau

PROFESSIONAL PROJECTS

Data Visualization Project: Volcano Analysis & Visualizations 01/2022–05/2022

Conducted Exploratory Data Analysis and created 9 sets of interactive visualization using Python (matplotlib, plotly), R and JavaScript to identify insights behind time series, geospatial and quantitative data and apply statistical analysis

Built a website using HTML/JavaScript showcasing visualization and description to tell a data science story on volcanoes Clinical Trials Analytics Project 08/2021–12/2021

Implemented data cleaning and TF-IDF to preprocess tweets and text data by tokenizing, stemming, and extracting features

Explored Naive Bayes, Support Vector Machine, Gradient Boosting on clinical trials data; Conducted Recursive Feature Elimination, Random Search on hyperparameters and resulted in 85% accuracy on predicting terminated clinical trial Real-Estate Market Analysis 08/2021–12/2021

Coordinated with the 4-member team to build various statistical models with R, including t-test, chi-square test, ANOVA, linear regression, bootstrap to find insight of the US real-estate market under the influence of the pandemic (COVID-19)

Found out 4% increase in price as the difference between before and after pandemic and explained by top 5 factors ACADEMIC RESEARCH

Paper Publication Weather Forecast with Climate Change Dataset via Machine Learning Method 07/2020–08/2020

Used linear regression, time series & neural network methods to predict the weather index in Alabama, the United States for one year, and compared the accuracy of different methods for better results

The finished paper “Weather Conditions Prediction with Climate Change Dataset with Linear Regression Model, Autoregressive Model, LSTM Model” has been accepted and is to be published in International Core Journal of Engineering SKILLS

Skills: R, Python, SQL, HTML, JavaScript, Linux, CSS, object-oriented programming, ArcGIS, MongoDB, data mining Models: Linear Regression, Logistic Regression, Ridge, LASSO, Decision Trees, Random Forest, Gradient Boosting, XGBoost, SVM, ARM, Naïve Bayes, Clustering, PCA, Neural Network, RNN, CNN, Natural Language Processing, AB Testing Tools: MySQL, Excel, Git, GitHub, Tableau, Jira, AWS, Hadoop, Spark, PyTorch, TensorFlow, Microsoft Azure, Pandas, Numpy Visualization: ggplot2, leaflet, d3.js, matplotlib, seaborn, bokeh, plotly, Tableau, Altair



Contact this candidate