Data Analyst Scientist

Location:

Saint Paul, MN

Posted:

March 03, 2023

Contact this candidate

Resume:

Raymond An

612-***-****

***************@*****.***

PROFESSIONAL SUMMARY

Data Scientist with 10+ years of experience, completion of several machine learning practices, and solving lots of business problems.

Proven track record of delivering successful data-driven solutions to complex business problems

Strong knowledge of programming languages such as Python and R, as well as SQL and other data manipulation tools

Strong understanding of machine learning techniques and algorithms, including supervised and unsupervised learning, deep learning, and reinforcement learning

Expertise in data science packages: Pandas, sklearn, etc. Profound knowledge of Machine Learning methodologies, such as RandomForest, SVM, K-NN, and Naïve Bayes.

Experience working with the software development team, familiar with git, DevOps, and Azure cloud.

Experience with Linux Environment, Knowledge of data science-related topics: text analytics, basic NLP, hands-on experience in image recognition

Proficient in using data science-related software, such as Alteryx, Python, R, SPSS, Excel, etc.

Experience in Data analysis with multiple industries, such as FMCG, HR, automobile, Steel and refractories, etc.

Expert in explaining statistics and machine learning to stakeholders who have no quantitive analytics background and have strong communication and presentation skills, with the ability to effectively convey technical concepts to non-technical audiences.

EXPERIENCE

RHIMagnesita, Jun 2020 – Nov 2022

Digital Solution Team (APO)

Data Scientist

•A member of the data science team created a predictive model and forecasted the thickness of the lining for a steel plant. This digital product was named APO.

•Investigated new customers' data to determine if it was suitable for the APO solution. Customers provided a list of variable names for the data available in their systems, and the APO data scientist provided a data acquisition template. They worked together to check if the customer's data was consistent and applicable to the APO solution.

•Monitored the performance of the running model and implemented improvements in a timely manner to increase the accuracy of the model. Communicated with the backend and frontend teams about the changes and updated the wiki page.

•Improved lining visualization using the Python Open3D package to display a 3D view of the lining and dust. Created an algorithm to eliminate outliers in the laser detection data while maintaining the correct information.

•Conducted time series forecasting using the ARIMA model from the Python statsmodels library to simulate the trending of energy consumption of a furnace. Fine-tuned the order series as a hyperparameter and packaged the script in a single Flask package for the backend to implement.

•Promoted APO to the Greater China/East Asia market and hosted several potential client meetings and seminars.

iTutorGroup Dec 2019 – May 2020

Data Science Center

Senior AI Engineer

•Retrieved data using sophisticated SQL queries.

•Created an automated ETL pipeline in Python for data pre-processing and used it for the prediction model.

•Improved the methodology and accuracy for the model that identifies refund customers and helped all parties involved in anti-refund mechanics to use the prediction results.

Project 1: Potential customer identification

•Purpose: To identify potential customers from a 5 million customer base and provide a high potential customer list to the sales team.

•Used the Python sklearn package to build a prediction model that can rank leads from top to bottom in terms of the possibility of becoming a contract customer.

•Conducted data mining from several factors, mainly geographical information and media information.

•Used one-hot encoding to transform text information into 0-1 factors, and also added metrics to interpret the text information. For example, used disposable income as an input together with the province factor, and the percentage of contract customers using a certain media vehicle together with the media factor.

Honeywell, Sep 2016 – Nov 2019

Human Resources Predictive Analytic

Predictive Analytic Analyst

•Conducted predictive analytics on people by forecasting the flight risk of all global Honeywell employees on a yearly basis with good accuracy and received very positive feedback from clients.

•Identified hard-to-fill open positions of all Honeywell on a weekly basis by creating an automation tool that combines machine learning methodology and fast data preparation.

•Developed a "health score" for all Honeywell plants in the US and Canada by considering factors such as comparison, overtime, and years of service. This score was given to each factory on a monthly basis in an effort to monitor and reduce the activity of labor unions.

•Presented findings to internal global clients.

Project 1: Individual attrition risk identifier

•Purpose: To predict the attrition risk of Honeywell employees, identify the key drivers of attrition, and take actions to reduce attrition.

•Data input: Data was categorized into 6 classes: Organizational Data, Job Data, Compensation Data, Performance Data, Manager Data, and Resume Data.

•Data preparation and cleaning: Depending on the feature of each variable, different methods were used to impute data.

•Model set-up: Using a RandomForest classifier to set up the training model, identifying the active group vs. terminated group. Optimized the model by modifying the number of trees to balance the bias-variance tradeoff.

•Result presentation: The final output was the probability of attrition risk of each Honeywell employee in the coming year. Alongside the % risk, some important factors were also selected to present with a color-coding style. Tools used: Python, Jupyter Notebook.

Project 2: Overdue requisition predictor

•Purpose: To identify which job requisitions take longer to fill, suggest the hiring team to pay more attention to difficult requisition that is identified by prediction model, and help recruiters to shorten the period of recruiting. Key drivers of overdue requisition:

•Data input: Label: the exact days to fill each historical requisition, the basic information of the requisition, as well as the information of manager who creates the requisition.

•Model set-up: Regression model, using RandomForest regressor to create the training model.

•Way of presenting: Weekly report, using Alteryx to automate the whole process from data preparation to final report. The report was tailor-made to fit different recruiters, as each recruiter has its own managing scope and own preference of color coding. Tool used: Alteryx.

The Nielsen Company, Feb 2012 – Aug 2016

Department of Advance Analytics Consulting

Data analyst

Data collection: helped the client provide the required data

Conducted modeling research on SPSS or CG, by processing raw data, making the dependent and independent variables corresponding

Presented the research results by different phases. Interpreting statistical result to the client

Trained and guided new analysts, boosted their ability of understanding the model and practice

Data cleasing and validation: as data might come from different sources, it is required to combine the data into one single spreadsheet. The data has to be processed into same period (like days sum up to week, and month divided into week etc.). Also need to validate the data, check if there’s any abnormal points, and report to client

Simulation: after present the model result to client, to create a simulation tool (in the excel) to show how effectiveness may improve by the suggestion, and hwo much the revenue may increase via different marketing and in-store activity

Model coefficient testing: the model has to be tested tens of times until all the coefficients are validate and sensible. It requires good knowledge and biz understanding of each coefficient and how the client will interpret the results. In the end, to assure the total accuary accords with VIF, p-value, dubin-watson, and goodness of fit standard

EDUCATION

Executive Master of Engineering Management

St. Cloud State UNIVERSITY, Saint Paul MN

Master of Science in Management Science and Engineering

Tongji UNIVERSITY, Shanghai

Bachelor of Science in Engineering Management

Hunan UNIVERSITY, Changsha

Contact this candidate