Shuyan (Flora) Li
Email: ************@*.************.*** Phone: 660-***-**** LinkedIn: https://www.linkedin.com/in/li-shuyan EDUCATION
Northwestern University Evanston, Illinois
Master of Science in Analytics (GPA: 3.82/4.0) December 2020 Coursework: Reinforcement Learning, Text Analytics (NLP), Big Data, Deep Learning, Analytics Value Chain, Predictive Analytics, Data Mining, Databases, Data Warehousing, Machine Learning, Data Visualization, Business Leadership Insights Grinnell College Grinnell, Iowa
Bachelor of Arts in Computer Science, minor in Statistics (GPA: 3.8/4.0) May 2019 Honors: Dean’s List, Phi Beta Kappa
Coursework: Data Analytics, Software Design and Development, Artificial Intelligence, Analysis of Algorithms, Data Structures TECHNICAL SKILLS
Tools: Python (scikit-learn, keras, numpy, pandas, matplotlib), R (dplyr, flexdashboard, shiny, ggplot2), Snowflake, SQL, Hadoop, Spark, Hive, Java, JavaScript (d3.js, Vue.js, React.js), Looker, Tableau, GitHub, AWS, Microsoft Azure Techniques: Market Basket Analysis, Recommender Systems, Gradient Boosting, Neural Networks, Decision trees, Random Forests, Time Series Analysis, Cluster Analysis, Retention Models, Migration Model, Survival Analysis, Bootstrapping, GLMs, SVM, PCA, A/B Testing EXPERIENCE
Autodesk, Inc San Francisco, California
Data Science Intern June 2020 –December 2020
Designed and launched data schema through SQL; transferred required data to Snowflake using Data Build Tool (DBT).
Explored customer’s profiles and subscriptions; visualized findings by an interactive dashboard using R (flexdashboard and shiny).
Trained models that advise products to recommend for potential customers utilizing Market Basket Analysis and XGBoost Classification in Python; constructed and validated model pipeline and thus the output tables (in Looker) receive more than 30 pageviews each day and save 46% of the time for the sales reps, as well as increasing subscription rate and Total Contract Value. Retail Banking & Wealth Management, HSBC Group Chicago, Illinois Student Data Science Consultant September 2019 – June 2020
Established an open-source database that can be leveraged for geospatial analysis; extracted and processed data using APIs and Python.
Analyzed overall branch performance, prioritized a list of low performance/potential branches through cluster algorithms, and visualized breakdowns by creating an interactive dashboard through Tableau and D3.js.
Forecasted average daily footfall through gradient boosting tree and devised a strategy to reduce 80 poor-performed US branches. Greenwich.HR St Louis Park, Minnesota
Student Data Science Consultant September 2019 – December 2019
Adopted different clustering methods (K-Means, Mean-Shift clustering, etc.) to identify clusters of companies having a high correlation between hiring data metrics and 30-day stock performances.
Applied clustering and classification methods to select a list of companies to include an "optimal" trading portfolio (index fund). ACT, Inc Iowa City, Iowa
ACTNext (AI /Machine Learning group) Intern June 2019 – September 2019
Conducted research on open-source Online Judge Systems, such as Domjudge and OnlineJudge; installed and compared their performances through load testing and stress testing.
Produced a demo of an online learning platform with contest mode, realized Learning Progression feature by customizing front end, applying recommendation systems, and reconstructing modules using Python and Vue. Shenzhen Institutes of Advanced Technology, Chinese Academy of Science Shenzhen, China Data Science Research Assistant May 2018 – August 2018
Collected and wrangled massive scale mobile phone location data in Shenzhen in 2015 deploying R and SQL.
Drafted a Top-N model to estimate identification risk of people by mobile phone location data in 30 days using Python.
Implemented MaxCliqueDyn algorithm using Java to lower re-identification risk, thus decreased uniquely identified population from 49% to 12% for top three locations.
PROJECTS
Face Aging: Optimized a GAN model to create face aging/de-aging effect on human photos and developed a CNN model to evaluate the performance of the GAN model by classifying tage group of the generated photos.
Analytics Value Chain: Productionized a time series model to predict open prices for top 10 S&P 500 stocks for the following week, interacted with AWS S3 and RDS, and constructed a flask web app with through Docker.
News Recommendation: Modeled a content-based recommender system for news subscribers and achieved the best overall performance in accuracy, diversity and novelty in class.