Machine Learning Engineer - Data Scientist

Location:

Chicago, IL

Posted:

June 16, 2021

Contact this candidate

Resume:

PROFILE SUMMARY

• Senior data scientist with over 15 years of IT experience, and 11 years of Data Science experience.

• Developing Python solutions for over 9 years with experience in Matlab, R, Java, JavaScript, SQL and C/C++.

• Experience in the application of machine learning techniques including Naïve Bayes, Linear and Logistic Regression Analysis, Neural Networks, RNN, CNN, Transfer learning, Time-Series analysis, trees, and Random Forests.

• Experience in statistical models on, big data sets using cloud/cluster computing assets with AWS and Azure.

• Strong ability to devise and propose creative and innovative ways to look at problems by using business acumen, data models, statistical analysis, and a practical and direct understanding of the subject matter.

• Highly capable of discovering patterns in data using both algorithms, visual representation, and intuition. Ability to use experimental and iterative approaches to validate findings.

• Advanced statistical and predictive modeling techniques to build, maintain, and improve on real-time decision systems. Recommendations are strengthened with precise analysis, a real-world intuition, and an adogmatic approach to modeling techniques.

• In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised machine learning problems.

• Ability to perform exploratory analysis on varying types of data and datasets, allowing for a full knowledge of the subject matter, a nuanced understanding of the variables in question, and a technically sound insight into the required modeling approach.

• Strong background of working with advanced analytical teams to design, build, validate and refresh data models.

• Excellent communication skills (verbal and written) to communicate with clients/stakeholders and team members.

• Highly perceptive with other people, allowing for a strong ability to facilitate a dynamic and constructive team environment.

• Ability to quickly gain a keen understanding of niche subject matter domains via research and communication with all parties involved. Ability to design and implement effective novel solutions to be used by other subject matter experts.

TECHNICAL SKILLS

Analytic Development: Python, R-Programing, SQL, Excel

Python Packages: Numpy, Pandas, scikit-learn, TensorFlow, PyTorch, SciPy, Matplotlib, Seaborn

IDE: Jupyter, Spyder, RStudio, Google Colab, MySQL, Oracle

Version Control: Git, GitHub, Jira

Machine Learning: Natural Language Processing & Understanding, Machine Learning algorithms including text recognition, image classification, and forecasting

Data Query: Azure, Google, SQL, data warehouse, data lake and various SQL databases and data warehouses

Deep Learning: Machine Perception, Data Mining, Machine Learning algorithms, Neural Networks, RNN, CNN, Transfer learning, TensorFlow, Keras. PyTorch

Artificial Intelligence: Text Understanding, Classification, Pattern Recognition, Recommendation Systems, Targeting Systems, Ranking Systems, and Time Series

Analysis Methods: Advanced Data Modeling, Statistical, Exploratory, Bayesian Analysis, Inference, Regression Analysis, Multivariate analysis, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics, Decision Analytics, Design and Analysis of Experiments, Factorial Design and Response Surface Methodologies, Optimization, and State-Space Analysis

Analysis Techniques: Classification and Regression Trees (CART), Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN including LSTM, CNN, Transfer learning, Linear and Logistic Regression, Naïve Bayes, Simplex, Markov Models, and Jackson Networks

Data Modeling: Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series Analysis

Applied Data Science: Natural Language Processing, Machine Learning, Text Recognition, Image Classification, Social Analytics, Predictive Maintenance

Soft Skills: Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately and articulate issues clearly; leadership; mentoring; coaching.

PROFESSIONAL WORK EXPERIENCE

Sept 2019 – Present

Location: Chicago, IL

Position: Senior Data Scientist

Company: McDonald’s

Project Summary:

McDonald’s is an American fast-food company. Implemented machine learning algorithm (ANN) build a time series predictive model to forecast demand through data analytics. Worked with the operational research team to optimize warehouse inventory and layout by predicting the time for certain work to be done. Our data model improved supply chain team working efficiency and reduced overage cost of raw materials.

· Explored time series machine learning model like Vector ARIMA, SRIMA, Facebook Prophet for time series data analysis.

· Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, and KNN for data analysis.

· Implementation of machine learning algorithms and concepts such as: K-means Clustering (varieties), Gaussian distribution, decision tree etc.

· Analyzed data using data visualization tools and feature extraction tools and supervised machine learning techniques to achieve project objectives.

· Analyzed large data sets and apply machine learning techniques and develop predictive models, statistical models.

· Supervised, Unsupervised, Semi-Supervised classification and clustering of warehouse inventory in analysis.

· Machine learning classification of documents - Neural Network and Deep Learning, K-neighbors, K-means, Random Forest, Logistic Regression, SVM.

· Created an analysis model to optimize resource allocation and warehouse layout.

· Documented logical data models, semantic data models and physical data models.

· Performed complex modeling, simulation and analysis of data and processes.

· Analysis through different regression and ensemble models in machine learning to perform forecasting.

· Analysis employed key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and boosting.

Nov 2018 – Aug 2019

Location: Wichita, KS

Position: Lead Data Scientist

Company: Cargill

Experience Summary:

Cargill is an American privately held international food conglomerate, major businesses are trading, purchasing and distributing grain and other agricultural commodities. Our team used CNN with computer vision build the machine learning model to detect unhealthy hydrophytes. Our model helped regulator work more efficient by detecting unhealthy hydrophytes in hydroponic farming automatically and increased their harvesting rate which increased their revenue.

· Performed statistical analysis and built statistical models in R and Python using various supervised and unsupervised Machine Learning algorithms like Regression, Decision Trees, Random Forests, Support Vector Machines, K- Means Clustering and dimensionality reduction.

· Used MLlib, Spark's Machine learning library to build and evaluate different models.

· Define the list codes and code conversions between the source systems and the data mart enterprise metadata library with any changes or updates.

· Developed Ridge regression model to predict energy consumption of customers. Evaluated model using Mean Absolute Percent Error (MAPE).

· Developing and enhancing statistical models by leveraging best-in-class modeling techniques.

· Developed a predictive model and validated Neural Network Classification model for predicting the feature label.

· Implemented logistic regression to model customer default and identified factors that are good predictors.

· Designed a model to predict if a customer will respond to marketing campaign based on customer information.

· Developed Random forest and logistic regression models to observe this classification. Fine-tuned models to obtain more recall than accuracy. Tradeoff between False Positives and False Negatives.

Feb 2015 - Oct 2018

Location: Charlottesville, Va

Position: Data Scientist

Company: Bio Vista

Experience Summary:

BioVista Inc. is a private drug development services company. The company repurposes drugs and calculates risks of drugs before trials. This is done using multidimensional characterizations of pharmacologically relevant features such as genes, diseases, drugs, pathways and cell types, to identify and quantify potential risks and discover new indications for drugs in development or already in the market. We utilized literature mining methodologies to perform Natural Language Processing. Information extraction and named entity recognition are used to link concepts and arguments from different articles and infer implicit knowledge from seemingly unrelated concepts to discover new potential uses for existing drugs. More simply put, we used machine learning, natural language processing and graph databases to perform Graphical representation of Swanson’s ABC model for drug repurposing.

· Became proficient in Natural Language Processing, SQL queries and web scrapping for collecting literature using BeautifulSoup

· Familiarized myself with various drug and disease knowledge databases such as DrugBank, ChemBank, OMIM, KEGG, and Pubmed in addition to genomic databases, such as MIPS, PDB, GEO, and GenBank.

· Utilized graph clustering algorithms, such as ClusterONE (Clustering with Overlapping Neighborhood Expansion) and MbiRW (measure and Bi-Random walk) which is a k-means-based network cluster algorithm.

· Worked with large data sets by creating integration with Spark with Sparklyr.

· Utilized Natural Language Processing (NLP) techniques, such as term frequency-inverse document frequency (TF-IDF) to measure the importance of terms within the literature when constructing the document network for use with above mentioned graphical clustering algorithms.

· Used machine learning with Theano/Keras to further perform risk assessment of the potential adverse effects for drug-drug interaction.

· Collaborated with molecular simulation experts to combine literature-based discovery with computer simulations for the purpose of predicting the risk of certain drugs.

· Utilized multiple cross validation frameworks (k-folds and train-validate-test, where appropriate) to optimize, and ensure generalizability for drug risk assessment.

Oct 2013 - Feb 2015

Location: Muncie, Indiana

Position: Staff Applied Mathematician

Company: Lincoln National Corporation

Experience Summary:

I worked with an actuary analyzing, classifying and looking for patterns in dataframes. This included working with finance and risk analysis and using statistical methods to predict liability. Worked under lead Actuary John Beekman and implemented various types of techniques.

· Tested survival analysis technique using various method: Accelerated Failure Time model, proportional Hazard model and Cox Proportional Hazard (CPH) to estimate the default probability and default time and chose the best performing model.

· Created lifetime loss, prepayment and recovery forecasts for credit due diligence.

· Development of credit risk underwriting models to support underwriting of various card.

· Applied linear regression in Python and R to understand the relationship between different attributes of dataset and causal relationship between them.

· Set up the model risk management framework covering both operational and Base models.

· Delivered portfolio risk dashboard as a package covering all aspects of the credit life cycle.

· Delivered machine learning based models for predicting signs/features of borrowers likely to default to challenge a conventional statistical model.

· Information used included structured and semi-structured data elements collected from both internal and external sources.

· Creation of a target operating model that balances the modelling rigor and product specific priorities and nuances.

June 2010-Sept 2013

Location: Indianapolis, Indiana

Company: CarpetLand USA

Position: Business Intelligence Specialist

Experience Summary:

During my time at Carpetland USA, I was tasked with predicting carpet sales based on prior sales data and seasonality trends. In this job, I had to construct a multi-variate linear regression algorithm to predict the price of several types of carpet including berber, frieze and outdoor to construct an accurate inventory; this, in turn, helped increase profit by 30%.

· Developed processes and tools to monitor and analyze performance and data accuracy.

· Enhanced data collection procedures to include relevant information to build and continuously optimize the analytics systems.

· Collaborated with I.T. to continuously optimize business performance.

· Processed, cleansed and verified the integrity of data from various sources used for analysis and reporting.

· Leveraged the latest data visualization tools and techniques to present and communicate analysis to the leadership team utilizing data management, analytics modeling, and business analysis.

· Used predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes.

· Advised leadership team and stakeholders with data-driven solutions and recommended strategies that address business challenges.

Feb 2008 - May 2010

Location: Indianapolis, Indiana

Company: J.S. Hein Masonry

Position: Data Analyst

Experience Summary:

During my time at J.S. Hein Masonry, I developed a classification algorithm using multivariate logistic regression to determine which Indianapolis neighborhoods exhibited the most growth based on housing price, crime statistics, voter and population demographics as well as internal neighborhood growth; included business development and previous development trends. I used this data as a guide for management to determine which building contractors would offer the best investment for J.S. Hein. in terms of job security and overall profit. As a result, we put in bids for five different contracts which increased our yearly profit by 10%.

· Used unsupervised learning techniques such as K-means clustering and Gaussian Mixture Models to cluster customers into different risk groups based on health parameters provided through wearable technology regarding their activities and health goals

· Multiple statistical modeling approaches were applied to determining the usefulness of the wearable technology data for various insurance products.

· Survival modeling techniques, such as Poisson regression, hidden Markov models, and Cox proportional hazards, were used to model time to different events utilizing wearable data (time to death for life insurance, time to next hospital visit, time to next accident, time to critical illness, etc.).

· Data required extensive cleaning and preparation for machine learning modeling, as some observations were censored without any clear notification.

· We solved a binary classification problem transferring to lower risk group or not with given financial incentive) with a logistic regression.

· An artificial neural network was utilized with Keras/TensorFlow in python to solve binary classification problem for premiums and their intersection with the discriminant.

· We used modifiers, including L1 regularization, dropout, and Nesterov momentum to enhance the neural network and optimize generalization

· Provided reporting dashboard for stakeholders and project owners to rapidly provide information via Plotly, Bokeh, Matplotlib, and Seaborn.

· Retrieved data from devices, which was streamed to our company database using AWS-kinesis (real-time data streaming).

Education

BA in Mathematics, Ball State University

PhD in Differential Topology, Wayne State

Contact this candidate