Lanxin Kang
Data Science Data Analytics Data Analyst
ad2hc8@r.postjobfree.com • LinkedIn
314-***-**** • St. Louis, MO
EDUCATION Washington University in St. Louis, McKelvey Engineering School & Olin Business School St. Louis, MO
• M.S. in Engineering Data Analytics & Statistics 9/2020 – 05/2023
• M.S. in Business Analytics
• Courses: Probability and Stochastic Process, Machine Learning, Time Series Analysis, Text Mining, Optimization Arizona State University, W.P.Carey School of Business Tempe, AZ
• B.S. in Finance - Cum Laude 08/2014 – 12/2017
SKILLS
• Programming & Tools: Python (Pandas, Scikit-Learn, Matplotlib, NumPy, Seaborn, SciPy), SQL, R, Tableau, Excel
• Concepts: Supervised/Unsupervised ML, DL, Naïve Bayes, Statistical Analysis, Hypothesis Testing, Optimization PROFESSIONAL EXPERIENCE Saint Louis University St. Louis, MO
Research Assistant 08/2023-present
• Recreates the paper's ideal solution using R code, which involves applying PCA to enrichment profiles and selecting significant principal components through a plot to capture information with high variance and non-redundant.
• Read article about applying neural network on detecting change points of cells. MyEdMaster Leesburg, VA
Data Scientist Intern, Remote 02/2023 – 05/2023
• Solved prediction on self-disease by training and developing the predictive random forest model by extracting, cleaning and merged datasets, achieving an around 85% accuracy rate in predicting self-disease in 1.5 months.
• Transformed survey data 265 rows and 132 columns into best suitable format to support predictive model effectively, including filled missing values, detected outliers, converted texts into reasonable numerical values in 2 days (R).
• Supported team members by interpreting experimental results of articles. Airswift Trusted Worldwide Sichuan, CHINA
Billing Analyst (Full-time) 05/2019 – 12/2019
• Employed regression model in Excel to analyze historical accounts receivable to support manager’s work, achieving 80% of expected receivable revenue.
• Optimized efficiency by using VLOOKUP to merge multiple datasets, cutting the process from 3 weeks to 2 weeks. SELECTED PROJECTS Robotics (Python) 04/2023
• Improved the model's accuracy by 1% beyond the common benchmark by analyzing target values and applying a gamma regression for predictions
• Conducted exploratory data analysis on 41 features by analyzing descriptive statistics, uncovering patterns and visualized feature correlations as well as parametric paired t-test to compare the performance of multiple models
• Minimized error (RMSE) down to 0.168 by selecting gamma regression with lass regularization from all other baseline models, such as random forest tree with k-means clustering, PCA and linear regression with regularizations. Customer Churn – BCG (Forage) 09/2023
• Validated hypothesis of price effect by feature selection and merging into a new dataset of 14k+ records as well as conducting regression analysis which resulted in p-values less than 0.05, confirming statistically significant.
• Revealed historical average price difference with 2.2$ between churned (1,417) and retained(13k+) customers.
• Performed exploratory data analysis by merging and cleaning datasets with 20k+ rows, visualizing correlations through Seaborn Heatmaps.
Spam Filter (Python) 02/2023
• Achieved top ranking (1st out of 14 ) in model accuracy with an AUC of 99.759% by successfully recognizing spam emails using a linear model with logistics loss function applied.
• Upgraded recognition of spam emails by scratching and optimizing gradient descent of three loss functions, leading to the successful determination of optimal feature weights.