Xinyu Lou
+1-857-***-**** **********@*****.*** LinkedIn : linkedin.com/in/xinyu-lou/ Boston, MA
EDUCATION
Northeastern University Boston, MA Sept 2021 – Jul 2023 Master in Analytics
Coursework: Enterprise/Predictive Analytics, Visualization, Big Data, Database Management, Data Mining, Machine Learning China University of Mining and Technology Beijing, China Sept 2017 - Jun 2021 Bachelor of Science in Mathematics
Coursework: Mathematical analysis, Probability theory, Theory of Complex Functions, Mathematical Modelling, Numerical Analysis,Time Series, Database Theory, Applied Stochastic Process, Statistical Models, Mathematical Statistics SKILLS
• Data Analytics: A/B Testing, Casual Inference, Tableau, PowerBI, SAS, Google Analytics, MS Excel
• Machine Learning: Python (Scikit-Learn), R, SPSS Predictive Modeling, Hypotheses testing, Regression Analysis
• Database Processing: SQL, AWS S3, Google Cloud BigQuery Data Engineering: Spark, Hadoop, Hive, MapReduce WORK EXPERIENCE
DATA SCIENTIST San Francisco, CA
DeFiner Jan 2023 - present
• Extracted and processed 14GB+ transaction and event log data related to block chain and identified 5+ KPIs including transaction times, transaction value and transaction frequency etc.
• Developed dashboard report automation system to create 4 interactive Tableau reports weekly with transaction performance database, saved 30% of the reporting time
• Performed anomalous detection algorithms including Isolation Forest & K-Means and statistical analysis for the sale price data in order to reduce potential errors, successfully improved the data accuracy for 13% DATA ANALYST Gansu, China
Gansu Ministry of Construction Dec 2017 - Aug 2020
• Designed surveys to gather 1k+ examinees' satisfaction data to determine whether the examinee's attitude cause the decline in the exam participation
• Performed sentiment analysis by using BERT to analyze the examinee's attitude regarding to the exam
• Transformed the exam registration information of 34 province into latitude and longitude by using Geocoding API
• Visualized 40k+ exam registration information to analyze the concentration of the location of exam registration
• Communicated with the team of 5 to present a comprehensive analysis reported to the government regarding to the change for exam registration location, successfully increased the number of tests by 46.6% PROJECTS
Banking Customer Churn Prediction and Analysis (Git Link)
• Developed the automated ETL pipelines by using SageMakers and EC2 service, load bank customer data from CSV files, improved the working efficiency by 20%
• Predicted the customer churn rates by using Random Forest, Logistic Regression, and K-Nearest Neighbors, reached the optimal model with an accuracy of 86.5%
• Improved the model performance of classification (accuracy, f1 or AUC score) via 5-fold cross-validation technique for 4% and identified top factors that influenced the results Amazon Customer Reviews Analysis and Topic Modeling (Git Link)
• Created the web crawler to scrape the review data from the Amazon website from HTTP requests and regular expressions
• Preprocessed review text by tokenization, stemming, removing stop words and extracted features by TF-IDF
• Extracted the top 10+ popular topics using K-means clustering and Latent Dirichlet Analysis MovieLens Movie Recommendation system
• Processed and analyzed 27M ratings to 58K movies by 280K users’ datasets and conducted OLAP with Spark SQL
• Created the recommendation system (ALS model) and predicted the ratings for the movies and made specific recommendation based on users’ preferences with Python
• Tuned the hyperparameters using 5-fold CV and applied the optimal hyperparameters on final model, reached RMSE = 0.88