Sign in

Data Software Engineer

Riverside, Rhode Island, United States
November 02, 2018

Contact this candidate



Providence, RI ***** 949-***-****


Brown University, Providence, RI May 2019

ScM. in Biostatistics

Relevant Coursework: Statistical Learning and Big Data, Generalized Linear Models, Survival Analysis, Practical Data Analysis University of California, Irvine, CA Jun. 2017

B.A. in Business Economics

Relevant Coursework: Applied Econometrics; Financial Market & Macroeconomics; Math of Finance; Business Decisions EXPERIENCE

Autogravity Co., Irvine, CA. May 2018 – Aug. 2018

FP&A and Data Analytics, Intern

• Succeeded in building a forecasting model in R for the AutoGravity app to predict approval rate by different lenders to give customers quotes and soft inquires, and increased applications by 10%.

• Designed dashboards and created dynamic visualizations for KPIs by connecting with data warehouse in Tableau.

• Consulted with software engineer to revamp app code from R to Python using NumPy and scikit-learn for further model testing.

• Deployed SQL queries to improve quality of large datasets from Amazon Redshift.

• Advised upper management with key insights through tracking and interpreting mix-panel data and funding reports. Brown University, Providence, RI Sep. 2017 – May 2018 Data Assistant

• Enhanced university’s dorm room lottery system by creating functions in R to automate selection process.

• Cultivated and improved the accuracy of hundreds of datasets including students’ records, room occupancy list in the database.

• Assisted with administrative processes by processing university’s Data Service using SQL server. Fenchem Biotek Ltd., Chino, CA Apr. 2017 – Jun. 2017 Data Analyst Assistant

• Analyzed sales data with pivot tables in Excel to forecast customer demands and predict potential profits.

• Generated reports in Tableau to more easily communicate findings across departments.

• Designed and implemented personalized recommendation policies, and successfully registered 2 new customers. PROJECTS

Master Thesis – Characteristics Associated with Screening and Diagnostic Evaluation Non-compliance of National Lung Screening Trial (NLSH) Study Participants – IN PROGRESS

• Conducting exploratory data analysis to identify data structures and classifying participants with cluster analysis in R.

• Performing table joins, data cleaning and missing data imputation of large datasets (more than 50,000 obs, 500 variables).

• Developing hierarchical logistic regression to evaluate noncompliance probability of future participants. Graduation Thesis - Oil Price Change and Auto Stock Returns in Economic Cycle

• Showcased that changes in oil prices do not show a statistically significant impact on the stock return of car companies in most periods under the capital asset pricing model. Summarized findings into a report and delivered a final PowerPoint presentation.

• Refined data of monthly crude oil and auto stock prices and divided it into 24 tables based on the economic cycle before visualizing the cleaned data using Python.

• Performed various statistical analysis (e.g. descriptive statistics, t-test, regression analysis) on each of the tables obtained by inner joining the stock data with oil data.

NYC Taxi Trip Duration Prediction in R

• Visualized time series and spatial data to detect trip flow and abnormal patterns in R, establishing new features including weather to improve model’s performance.

• Transformed spatial features by PCA to help for decision tree splits.

• Tested multiple regression models such as Lasso, GBM, and XGBoost, and tuned the hyper-parameters to optimize the model. Twitter Text-mining in R

• Concluded Dos Equis to be the most influential beer brand on Twitter of 5 chosen companies through implementation of sentiment analysis routine in R to find popular topics and words among tweets with highest retweets.

• Identified data trends through visualization of data using ggplot2. SKILLS

Technical: R, Python, SAS, SQL, Alteryx, Spark, Linux, Excel, Tableau, PowerPoint Statistics: Machine Learning, Optimization, Predictive Modeling, Data Mining, Statistical Computing, Data Visualization

Contact this candidate