Data Science Intern

Location:

Palos Verdes Estates, CA, 90274

Posted:

April 12, 2023

Contact this candidate

Resume:

RUIXIANG XI

Tel: 949-***-**** Email: ***************@*****.***

Address: *** * **** **, *** York NY 10036

EDUCATION

Columbia University - M.A. Statistics 09/2022 - 12/2023 University of California, Irvine - B.S. Mathematics concentration in Data Science 09/2020 - 06/2022 GPA: 3.7 /4.0 Honor: UCI Deans Honor List

Irvine Valley College - Mathematics& Teacher Education& Social and Behavioral Sciences 09/2018 - 05/2020 GPA: 3.6 /4.0

Relevant Courses: Statistical Inference, Linear Regression Models, Algorithms for Data Science, Statistical Machine Learning, Statistical Methods for Data Analysis, Applied Data Science, Advanced Data Analysis, Robust Statistics, Stochastic Process, Optimization, Group Theory, Game Theory, Principles of Macroeconomics, Analytic Geometry, Global Economy INTERNSHIP EXPERIENCE

Shenzhen Institute of Computing Sciences, Shenzhen, CHN Software Development Engineer Intern 07/2021 - 09/2021

• Utilized Linux to write self-testing parts of the data quality enhancement system with rules self-discovery as the core and the combination of rules and AI.

• Using DBeaver to import, export and backup data and conduct SQL queries which enhanced the performance of test scripts.

• With the application of Python, finished the writing of automated test scripts, and tested the functions of data error checking, entity matching, data enhancement, and data standardization modules of the data quality system.

• Worked closely with the system development team, fixed more than 60 errors that were found with the test scripts.

• Applied K-Means algorithm, wrote a python program that used the word-breaking operation of NLP to automatically cluster over 100k pieces of data used in machine learning training which saved over 200 hours of labor. Deloitte, Shenzhen, CHN

Data Analytics Engineer Intern 10/2021 - 12/2021

• Collected and prepped the sales data from clients using SQL, Excel and Python to build marketing mix models that resulted in a lift in ROI of 5 basis points.

• Cleaned, standardized and visualized the data of the dynamic of the vehicle market in China using database and business intelligence software to predict the market trend under the development of electric vehicles.

• Developed a program in SAS that automated the refinement of linear regression models for specific customer base segments that saved 25 hours of labor each month.

• Partaking in drafting program reports, proposals, and meeting minutes and finished detailed research that focused on factors impacting vehicle sales in China using Scikit-learn and random forest. RELATED PROJECTS

Predict the Effect of Covid-19 Spread in China related to the Economics Recession in Province Level

• Using Mathematica, crawled raw daily covid-19 cases data with API. Virtualized the covid-19 spread in China in province level to animate and geographic bubble-chart with built-in resource functions.

• Using RStudio, applied knowledge of statistics, compared the GDP growth difference before and after the covid-19 spread in China. Determined the degree of covid-19 influence and predict the economics recession with statistical tests and M.L. Apply Machine Learning Algorithms Recognizing Graphs of Kuzushiji(Ancient Calligraphy Japanese Characters)

• Using Singular Value Decomposition to reduce the dimensions of data for principal component analysis.

• Generated training data and implement logistic regression for classification. Estimate the intrinsic geometry of the data manifold with Isomap and visualize the results by scatter plots.

• Applied cuML TSNE, accelerated the sklearn TSNE with the GPUs by 40% and finished a report based on Decision Trees. Develop a Customer Segmentation to Define Marketing Strategy from the Dataset of Credit Card Usage Behavior

• Performed K-Means clustering on the data, checked the clustering metrics (inertia, silhouette scores), applied PCA to improve feature extraction and visualized how the clusters were segmenting the samples by Seaborn Pairplot.

• Based on the clustering, segment the customers into smaller groups, applied business analytics and formulate targeting marketing strategies to stimulate consumption and reduce risk. Explore N.Y.C. Taxi Trip Record Large Dataset with Distributed Dask M.L. on Fargate using AWS CloudFormation

• Loaded the 100 million taxi trips record from the public AWS S3 bucket into a distributed Dask DataFrame

• Using asynchronous methods client.persist to set up large collections and then use df.compute for fast analysis

• Predicting trip duration with Dask ML linear regression and the Dask Fargate cluster as the backend for training the model

• Using the Network Load Balancer public DNS to monitor the performance of the cluster SKILLS & INTERESTS

• Technical skills: Python, Linux, AWS, SAS, R, SQL, Excel, Pandas, Dask, Numpy, Scikit-learn, TensorFlow, GPT, etc.

• Interests: Basketball, Snowboarding, Reading, Traveling.

Contact this candidate