data science

Location:

Arlington, VA

Posted:

April 23, 2021

Contact this candidate

Resume:

DATA SCIENCE, MACHINE LEARNING, ECONOMICS AND PREDICTIVE MODELING

Summary

Senior Data scientist with over 8 years of experience working and learning in the private sector, the public sector, and academia. As a data scientist, I offer a unique perspective combining the rigor of an academic background with the pragmatism that comes from business. I love technically challenging work that yields real, tangible benefits for end-users.

- Focus on solutions which are rigorous, technically sound, and address specific business-needs of end users

- Consistent, fruitful engagement with stakeholders to ensure model development meets the requirements of the business

- Ability to communicate highly complex, technical topics in simple, commonsense language

- Experience working with both structured and unstructured data

- Experience working with both supervised and unsupervised learning

- Extensive experience modeling in Python, PySpark, Spark, R, etc

- Extensive mathematical knowledge of different modeling approaches including Linear Regression,

Logistic Regression, GLMs/General Linear Models, Linear Mixed Models, Decision Trees, Gradient

Boosting, Random Forests (Bagging/Bootstrap Sampling), Neural Networks (Feed Forward,

RNNs, LSTMs, CNNs), Monte Carlo, Bayesian Updating, Clustering (K-Means), and others

- Extensive knowledge of more classical statistical techniques, including hypothesis testing (A/B Testing), analysis of variance, dealing with heteroskedasticity and autocorrelation, etc

- Familiarity working with various data science platforms, including significant work with Microsoft

Azure, Databricks, and running python on virtual machines

Experience

Vocational

03/2020-Current Senior Data Scientist, Nestle USA, Arlington, VA (Remote).

Nestle is a multinational consumer-packaged-goods company based in Switzerland

At NUSA (Nestle USA), I was the second data scientist hired to build an internal, company-wide data science department from the ground up. During my time I have worked to educate leadership on the role of data science within an organization. I also repeatedly connected with stakeholders to ensure that any data science work we performed was both relevant and valuable to the company. In the immediate aftermath of COVID, my team developed a demand-forecasting model that accurately predicted weekly consumption by region. While previous, simpler time-series models had difficulty adjusting to the "new-normal" of a post-COVID world, the machine learning algorithms we employed were able to continue predicting consumption on a very granular (SKU-State) level. This gives demand planners significantly more accurate, actionable information.

Additionally, I developed several POCs during my time at NUSA, including work clustering reviews using NLP (ELMO/LSTMs) to identify problems with production, and a model which leveraged random forests to predict customer orders which were likely to be late and result in a fine. This allows end-users in supply chain to intervene ahead of time and avoid incurring late fees.

- Developed a demand-forecasting model based off IRI syndicated data to assist demand planners with effectively allocating resources

- Utilized machine-learning models to implement a high-performing demand forecasting framework from scratch that worked well even after COVID

-Satisfied critical request from executive leadership: previous models could not adjust to the dramatic swings after COVID

-Despite an extremely small sample size post-COVID, the model obtained consistent, high quality results through the use of hierarchical modeling (MLib/GBT)

-Combined several disparate data sources, at different granularities (sales data, economic data, SNAP spending, and more) into one master dataset

-Worked in PySpark, Python, on Azure Databricks

-The model is a significant improvement over the baseline, univariate time-series forecasts that were used previously

- Worked with stakeholders extensively to provide updates, tailor model characteristics to better assist end-users

- Advised on how best to modify existing predictive out-of-stock models to accurately forecast for a longer time-horizon

- Developed POC which leveraged NLP techniques and clustering algorithms (K-Means) to attribute causes to negative reviews

-Used webscraping to create dataset of Amazon reviews from scratch

-Developed POC based off said dataset using NLP, worked with LDA (Latent Dirichlet Analysis) and ELMO (bidirectional LSTM) to extract and cluster key topics

-Worked with a variety of python packages including numpy/cupy, nltk, pandas, torch/PyTorch, and regex

- Built a model to determine orders that are at risk for being delivered late. This would allow individuals along supply chain to intervene and avoid the associated fees.

-Created dataset leveraging numerous, disparate sources (outside data from customers, internal order data, weather data, and transportation data)

-Used PCA to deal with the very high (hundreds) number of sparse, categorical variables

09/2016 –01/2019 Researcher/Graduate Student Instructor, University of Michigan, Ann Arbor

Pursued a doctorate at the University of Michigan with research focused on the intersection of labor market monopsony and machine learning. Additionally led section, held office hours, and otherwise aided tenured professors in teaching undergraduates how to approach both theoretical and empirical economic analysis.

- Researched the effects of market power and their intersection with technology on general equilibrium models

- Explored using machine learning techniques (eg ANNs) to predict economic behavior typically thought of as random

- Taught courses on the principles of economics as well as working directly with honors students in the economics department advising them on their theses in International Economics

- Closely mentored students and assisted them with implementing theoretical, statistical, and machine learning models to generate results for their thesis

- Successfully completed qualifying exams in econometrics, microeconomics

06/2012 – 07/2016 Senior Associate Economist, Federal Reserve Bank of Chicago, Chicago.

Developed theoretical and empirical models with both members of the research team and the policy team. Our results formed the basis of several research projects and the models I developed were a critical part of the monetary-policy decision making process.

- Worked with policymakers and senior economists to develop predictive models for key economic indicators

- Worked with a variety of econometric models, fixed effects and random effects, vector autoregression, heteroskedasticity robust standard errors, hypothesis testing, etc

- Used a variety of Machine Learning models including linear regression, logistic regression, time series modeling, decision trees (including random forests/bootstrapping/boosting), clustering algorithms such as Gaussian Mixture Models/K-Means

- Wrote up reports to deliver at FOMC meetings translating the technical results of our modeling into clear, straightforward language understood by non-experts

- Developed machine learning models to explain patterns in data associated with leveragebased asset bubbles

- Used web scraping algorithms to programmatically obtain housing data in key cities across the US

- Parsed scraped data in Pandas, BeautifulSoup, and Regex, worked with various machine learning and statistical models to analyze housing prices nationwide and their connection to leverage in the financial industry

- Created several asset pricing models with KNN and Random Forests to analyze house prices in various regions as a function of both micro-level features as well as macroeconomic indicators

- Utilized Gaussian Mixture Models to cluster loans in the REPO market based on risk indicators

- Implemented Monte-Carlo simulations and simulated reinforcement learning models in response to sovereign debt crises Miscellaneous

2018–2019 Personal Work with NLP (Natural Language Processing).

- Scraped comments data off reddit to gather dataset

- Cleaned and transformed data using the python programming packages pandas, regex, NLTK, and word2vec

- Built an LSTM using torch/PyTorch to analyze textual data

Publications.

- Bubbles and Leverage: A Simple and Unified Approach FRB of Chicago Working Paper No. 2013-21

- Interest Rates or Haircuts? Prices versus Quantities in the Market for Collateralized Risky

Loans FRB of Chicago Working Paper No. WP-2016-19

- Interest Rates and Asset Prices: A Primer Economic Perspectives, Vol. 38, 4th, 2014 Education

2016–2018 Doctor of Philosophy: Economics, University of Michigan, Ann Arbor.

Withdrawn before completion to pursue other opportunities

Successfully completed qualifying exams in econometrics and microeconomics

2008 – 2012 Bachelor of Science: Pure Mathematics, Purdue University, Indianapolis.

Technology

Programming python, pyspark, R, matlab, SQL LATEX

Languages

Packages numpy, torch, pandas, mlib, matplotlib, plotly, scikit-learn, xgboost, catboost, diplyr, sqlite, regex

Communication Skills

2020 Communicating with stakeholders

Gathering business requirements from end-users

Presenting status updates and advocating for meaningful work

2016-2018 International Economics and Introduction to Macroeconomics at University of Michigan

2015 Dynamic Stochastic General Equilibrium Seminar at Federal Reserve Bank of Chicago 2014 Leverage Based Asset Pricing Seminar at Federal Reserve Bank of Chicago

2013 Empirical Phenomena Associated With Bubbles Seminar at Federal Reserve Bank of Chicago

2012 Undergraduate Research Opportunities at Purdue Mathematics Presentation

Interests

-Karaoke - Mathematics

-Coffee - Romance Novels

Contact this candidate