Yujia (Sian) Jin
202-***-**** *****@**********.*** Washington DC, 20007 Linkedin: www.linkedin.com/in/sian-jin-6878a5224 Github: https://github.com/goldenfishome Portfolio: https://goldenfishome.github.io/index.html EDUCATION
Georgetown University, Master of Science in Data Science and Analytics, Washington DC 08/2021–05/2023 Relevant Coursework: Big Data and Cloud Computing, Neural Networks and Deep Learning, Database Systems for SQL, Advanced Data Visualization, Statistical Learning, Computational Linguistics, Data Structures and Algorithms GPA: 3.9/4.0
Queen’s University Belfast, Bachelor & Master of Pharmacy, UK 09/2017–07/2021 GPA: 3.55/4.0
WORK EXPERIENCES
Data Scientist Intern, Civilience, Delaware 05/2022–08/2022
Conducted web-scraping to automatically download over 10000 records of online disease text info and applied topic modeling, and sentiment analysis on TF-IDF and Word2Vec output to show the trend and focus of infectious disease
Developed infrastructure and architecture of a ETL data pipeline on AWS (ec2, lambda, EventBridge, S3, Dymano DB) and automated the code via lambda, which processed billions of records per day and largely saved manual effort
Built a Random Forest model on customer satisfaction survey and provided actionable insights through feature importance
Deployed a recommendation system through cosine similarity and collaborative filtering using user info and inputs; Built a Discord robot with rest API to automatically assign mentors to users, which helped client to increase subscription by 5% Research Analyst, Queen’s University Belfast, Belfast 09/2020–06/2021
Employed excel and SQLite to create databases for phage depolymerases found in recent five years of publication
Conducted hypothesis testing on the bioinformatic gene’s data, applied K-means clustering to group genes into 5 types, where the K was selected through silhouette score, and visualized the dimension reduced result through Project Control Analyst Intern, Worley Parsons, Shanghai 06/2019–09/2019
Sorted out financial bid analysis form of the purchasing department and engineering drawings of each unit for each project
Mastered Encompass software (electronic document management system) and assisted the project control department with file transfer, cutting the process’s time by 8 hours per week
Applied Excel Power Query to track working hour data of employees and data of project summary, and visualized the insights on power BI dashboard and Tableau
PROFESSIONAL PROJECTS
Data Visualization Project: Volcano Analysis & Visualizations 01/2022–05/2022
Conducted Exploratory Data Analysis and created 9 sets of interactive visualization using Python (matplotlib, plotly), R and JavaScript to identify insights behind time series, geospatial and quantitative data and apply statistical analysis
Built a website using HTML/JavaScript showcasing visualization and description to tell a data science story on volcanoes Clinical Trials Analytics Project 08/2021–12/2021
Implemented data cleaning and TF-IDF to preprocess tweets and text data by tokenizing, stemming, and extracting features
Explored Naive Bayes, Support Vector Machine, Gradient Boosting on clinical trials data; Conducted Recursive Feature Elimination, Random Search on hyperparameters and resulted in 85% accuracy on predicting terminated clinical trial Real-Estate Market Analysis 08/2021–12/2021
Coordinated with the 4-member team to build various statistical models with R, including t-test, chi-square test, ANOVA, linear regression, bootstrap to find insight of the US real-estate market under the influence of the pandemic (COVID-19)
Found out 4% increase in price as the difference between before and after pandemic and explained by top 5 factors ACADEMIC RESEARCH
Paper Publication Weather Forecast with Climate Change Dataset via Machine Learning Method 07/2020–08/2020
Used linear regression, time series & neural network methods to predict the weather index in Alabama, the United States for one year, and compared the accuracy of different methods for better results
The finished paper “Weather Conditions Prediction with Climate Change Dataset with Linear Regression Model, Autoregressive Model, LSTM Model” has been accepted and is to be published in International Core Journal of Engineering SKILLS
Skills: R, Python, SQL, HTML, JavaScript, Linux, CSS, object-oriented programming, ArcGIS, MongoDB, data mining Models: Linear Regression, Logistic Regression, Ridge, LASSO, Decision Trees, Random Forest, Gradient Boosting, XGBoost, SVM, ARM, Naïve Bayes, Clustering, PCA, Neural Network, RNN, CNN, Natural Language Processing, AB Testing Tools: MySQL, Excel, Git, GitHub, Tableau, Jira, AWS, Hadoop, Spark, PyTorch, TensorFlow, Microsoft Azure, Pandas, Numpy Visualization: ggplot2, leaflet, d3.js, matplotlib, seaborn, bokeh, plotly, Tableau, Altair