JIATING WU
Tel: 814-***-**** Email: *************@*****.***
EDUCATION
Columbia University
MA in Statistics
University of California, Davis
New York City, NY
****-**** (expected)
California, USA
BS in Statistics 2020-2024
Relevant courses: Probability, Statistical Inference, Linear Regression Models, Sample Surveys, Statistical Machine Learning, Applied Data Science, Advanced Data Analysis, Time Series Analysis, Statistical Learning, Bayesian Statistical Inference, Statistical Data Science, Statistical Data Technologies, Mathematical Statistics, Regression Analysis, Nonparametric Statistics, Introduction to Programming PROFESSIONAL EXPERIENCE
Part-time Assistant ByteDance(Remote) Aug. 2023 - Sept. 2023
● Utilized Excel to filter and clean data, created pivot tables, and established data relationships using VLOOKUP
● Mastered basic SQL programming and self-learned MySQL to implement data query and management
● Learned other self-developed database through internal training and tutorials, got an in-depth understanding of ByteDance’s database products in terms of technology innovations, industry practices and competitors PROJECT EXPERIENCE
Applied Data Science Columbia University Jan. 2025 - May. 2025
● Analyzed U.S. BRFSS 2015 dataset to identify behavioral and clinical risk factors for heart disease and heart attack.
● Developed predictive models including Random Forest, Decision Tree, and Artificial Neural Network.
● Performed data cleaning, PCA, entropy analysis, and interaction effect visualization to support feature selection and interpretation.
● Found that general health, age, high blood pressure, and cholesterol were the strongest predictors. Advanced Data Analysis Columbia University Jan. 2025 - May. 2025
● Built regression and classification models to predict COVID-19 patient length of stay (LOS) using OpenML hospital treatment data.
● Applied and compared Linear Regression, Logistic Regression, Random Forest, and XGBoost; XGBoost achieved the best test performance.
● Identified ward type, number of patient visitors, and illness severity as the most important predictors influencing LOS.
● Conducted data preprocessing, EDA, feature encoding, and model validation using Python and scikit-learn. Linear Regression Models Columbia University Sep. 2024 - Dec. 2024
● Enhanced Diff-UNet (3D diffusion-based U-Net) to generate denoised electron density maps from cryo-electron microscopy images.
● Integrated simulated maps into EModelX pipeline to improve protein structure reconstruction accuracy.
● Preprocessed high-resolution cryo-EM data and simulated density maps using EMAN2; trained model on 64 64 64 voxel cubes.
● Demonstrated improved amino acid type prediction and alignment with known PDB structures through 3D reconstruction tasks. Statistics Learning University of California, Davis (UCD) Mar. 2023 - Jun. 2023 Project 1 Clustering Algorithms
● Wrote a Python class to implement the k-means and k-medoids methods to improve data analysis capabilities. Project 2 Clustering and Manifold Learning
● Compared clustering methods of DBSCAN, k-means, and mean shift, visualized and analyzed their performance on given dataset.
● Conducted multi-dimensional scaling and PCA, performed a segmentation analysis on given photos. Analysis of Worker Wage University of California, Davis (UCD) Jul. 2022 - Aug. 2022
● Implemented a multiple linear regression model by R that incorporated age, education level, job class, and health insurance to understand the factors affecting the wage level of male workers in the Mid-Atlantic region.
● Demonstrated that workers' wages can be roughly predicted using the identified key factors that higher education levels, health insurance, and information jobs positively impact wages. Non-Parametric Statistic University of California, Davis (UCD) Mar. 2022 - Jun. 2022 Project 1 The Number of Toys a Particularly Agile Cat Breed “Destroys” in a Week
● Analyzed cat toys data using exact binomial test to understand toy buying habits for agile cat owners, concluded that the owners have to buy over 3 toys per week in order to keep up with their particularly agile cat. Project 2 Comparison Between SAT Scores with Three Different GPA Groups
● Applied F-test of equal means for non-parametric ANOVA to analyze academic dataset to establish if SAT scores are associated with high school GPA, found evidence of significant difference between SAT scores for different GPA groups. Project 3 Test for Independence
● Used permutation chi-squared test to determine if gender and sports choice are independent in a health club dataset, determined significant dependency leading to marketing recommendations for the club. SKILLS
● Languages: Fluent in English, native speaker of Mandarin
● IT: Proficient in R language, Python, MySQL, Jupyter Notebook, Matlab, Overleaf, Colab