Professional Summary
Experienced Data Scientist & Machine Learning Engineer with over a decade of experience in applying deep learning, artificial intelligence, and statistical methods to data science problems thereby increase understanding and enhance profits as well as the market share of the company./
Brilliant in developing algorithms and implementing novel approaches to non-trivial business problems in a timely and efficient manner; possess experience in knowledge databases and language ontologies.
Good knowledge of executing solutions with common NLP frameworks and libraries in Python (NLTK, spaCy, gensim) or Java (Stanford CoreNLP, NLP4J). Familiarity with the application of Neural Networks, Support Vector Machines (SVM), and Random Forests.
Brilliant in using Pandas, NumPy, Seaborn, Matplotlib, SciKit-learn in Python for developing various machine learning models and utilized algorithms such as Logistic regression, Random Forest, Gradient Boost Decision Tree, and Neural Network.
Experience in the application of Naïve Bayes, Regression Analysis, Neural Networks/Deep Neural Networks, Support Vector Machines (SVM), and Random Forest machine learning techniques.
Experience in machine learning models and statistical models on big data sets using cloud/cluster computing assets with AWS and Azure.
Well versed in using different embedders such as Universal Google Encoder, DocToVec, TFIDF, BERT and ELMO to identify the best embedder that yields the best performing result.
Good knowledge of applying statistical analysis and machine learning techniques to live data streams from big data sources using PySpark and batch processing techniques.
Hands-on experience in credit risk modeling from hazard models, severity models, and exposure at risk models.
Possess knowledge of Convolutional Neural Networks, Computer Vision, and customer behavior predictive modeling on lapse/churn, withdraw.
Extensive experience on time series analysis and survival analysis.
Excellent communication skills (verbal and written) with demonstrated ability to communicate with clients/stakeholders and team members.
Technical Skills
Analytic Development: Python, Javascript, Matlab, SAS, Spark, SQL VBA, C++, C
Python Packages: Numpy, Pandas, scikit-learn, TensorFlow, SciPy, Matplotlib, Seaborn, Numba, SpaCy, NLTK, LightGBM, XGBOOST, CatBoost, Dask, Gensim
IDE: Jupyter, Spyder, MatLab, Visual Studio.
Version Control: GitHub, Git
Machine Learning: Time Series Prediction, Natural Language Processing & Understanding, Machine Intelligence, Generalized Linear Models, Machine Learning algorithms
Data Query: Azure, Google Cloud, Amazon RedShift, Kinesis, EMR; RDBMS, Snowflake, SQL and data warehouse, data lake and various SQL and NoSQL databases.
Deep Learning: Machine perception, Machine Learning algorithms, Neural Networks, TensorFlow, Keras.
Artificial Intelligence: text understanding, NLP, Computer Vision, customer behavior predictive modeling, classification, pattern recognition, targeting systems, ranking systems.
Analysis Methods: Advanced Data Modeling, Time Series Analysis, Forecasting, Predictive, Statistical, Sentiment Analysis, Exploratory, Stochastic Calculus, Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Sampling methods, Forecasting, Segmentation, Clustering, Predictive Analytics, Big Data and Queries Interpretation, Design and Analysis of Experiments, Association Analysis
Analysis Techniques: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, Principal Component Analysis, Recurrent Neural Networks, Regression, Naïve Bayes
Data Modeling: Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series Analysis, Survival Analysis
Applied Data Science: Natural Language Processing, Machine Learning, Social Analytics, Predictive modeling
Soft Skills: Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately, leadership, mentoring, coaching
Professional Experience
Sr. Data Scientist February 2022 – Present
Roche Group
Genentech, San Francisco, CA
Building a text model to find connection with fraud claims. Data was collected from SQL server. General cleaning process was done. Used AWS environment to complete the project.
Accessed production SQL database to pull extract for validation with third-party data.
Performed data validation between SQL Servers and third-party systems.
Worked with big data (10M+ observations of text data). [ SQL, SQLite}
Cleaned the text data using different techniques.
Integrated with AWS platform environment.
Utilized cloud computing resources for model optimization/tuning of hyperparameters, and cross-validation of statistical data science models.
Used Pandas, NumPy, Seaborn, Matplotlib, SciKit-learn in Python for developing various machine learning models and utilized algorithms such as Logistic regression, Random Forest, Gradient Boost Decision Tree, and Neural Network.
Built and analyzed datasets using Python and R.
Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them.
Applied Exploratory Data Analysis (EDA to analyze datasets to summarize their main characteristics,
Performed EDA such as bag of words, K-means and DBSCAN etc.
Utilized Git for version control on GitHub to collaborate work with the team members.
Used different embedders such as Universal Google Encoder, DocToVec, TFIDF, BERT and ELMO to identify the best embedder that yields the best performing result.
Implemented models to predict previously Identified Key performance indicators (KPI's) among all attributes.
Developed several ready-to-use templates of machine learning models based on specifications given and assigned clear descriptions of purpose and variables to be given as input into the model.
Prepared reports and presentations using Tableau, MS Office, ggplot2 that accurately convey data trends and associated analysis.
Worked with Data Warehouse architecture and wrote SQL queries.
Sr. Data Scientist Aug 2020 – Jan 2022
Compass Insurance Group
Grand Rapids, MI
Compass Insurance Group is one of the largest insurance brokers in the region, with presence in 20 states. Compass Insurance Group serves large corporations to individuals. I used several advanced Machine Learning models and Deep Learning techniques for different use-cases across the organization. I developed a recommender system that incorporated Principal Component Analysis and Singular Value Decomposition in making collaborative suggestions about prospective customers. Early A/B testing showed that insurance agents who used the recommender system were able to close deals 13% more often than those who had not adopted the system.
Developed Recommendation systems, Churn prediction, Customer Life Time Value models.
Developed time-series models using ARIMA, SARIMA, and Deep Learning approach with Recurrent Neural Networks and LSTM to forecast and optimize sales and revenue.
Automated time-series analysis with Prophet.
Developed personalized product recommendations with machine-learning algorithms that used Collaborative filtering to better meet the needs of existing customers and acquire new customers.
Created machine-learning algorithm and employed logistic regression, random forest, KNN, SVM, neural network, linear regression, lasso regression and k-means.
Developed optimization algorithms for use with data driven models such as with supervised and unsupervised machine learning or reinforcement machine learning.
Researched statistical machine-learning methods that included forecasting, supervised learning, classification, and Bayesian methods.
Advanced the technical sophistication of solutions through the use of machine learning and other advanced technologies.
Performed exploratory data analysis and data visualizations using R and Tableau.
Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
Performed data integrity checks, data cleaning, exploratory analysis, and feature engineering using R and Python.
Lead Data Scientist November 2019 – August 2020
Sound Off Signal
Hudsonville, MI
Sound Off Signal is one of the largest US manufacturers of integrated, strategic lighting and controls products (e.g., lighting, sirens, speakers, and switches). Purchase order processing was automated using Optical Character Recognition (OCR) on scanned documents. Prior to my work, they were creating handwritten purchase orders and filing them in a filing cabinet. This took additional time and manpower when a record needed to be referenced. This system was automated so that scanned documents could be submitted to an API and the results of the OCR were stored in a SQL database for easy referencing. I used NLP and Computer Vision techniques for it.
Led the development team and implemented the complete solution.
Wrote code in Python and SQL
Implemented various NLP techniques to identify text fragments.
Used Tensorflow and Keras to implement word embeddings.
Used Google Tesseract and AWS Text Extract.
Used common NLP techniques such as pre-processing (tokenization, part-of-speech tagging, parsing, stemming).
Used NLP techniques to sort and classify documents.
Used Hierarchical time-series analysis to forecast competent usage across various products.
Used AWS Redshift Data Warehouse and Boto 3 to access AWS Resources from Python.
Worked with product development team for optimal product design, pricing and marketing strategy.
Performed semantic analysis (named entity recognition, sentiment analysis), modeling and word representations (RNN / ConvNets, TF-IDF, LDA, word2vec, doc2vec).
Built predictive modeling using Machine Learning algorithms such as Random Forests, Naive Bayes, Neural Networks, MaxEnt, SVM, Topic Modeling/LDA, Ensemble Modeling, GB, etc.
Used CNN techniques for different object recognition techniques.
Used pre-trained models (VGG16, ResNets, Inceptions, DenseNet, U-Net, etc.) for transfer learning on small datasets.
Sr. Data Scientist/ML Engineer June 2018 – October 2019
Frito-Lay Inc.
Byron Center, MI
Frito-Lay is one of the largest snack food manufacturers in the world. Frito Lay is property of PepsiCo. I was responsible for regional analytical activities where I created and updated ML models. The machine-learning algorithms we employed were able to continue predicting consumption on a very granular (SKU-State) level. This provided demand planners with significantly more accurate, actionable information.
Developed a demand-forecasting model based on different time-series techniques to assist demand planners effectively allocate resources.
Utilized machine-learning models to implement a high-performing demand forecasting framework from scratch.
Satisfied critical requests from executive leadership: previous models could not adjust to the market demands.
The model obtained consistent, high-quality results using hierarchical modeling (MLib/GBT).
Advised about how best to modify existing predictive out-of-stock models to accurately forecast for a longer time-horizon.
Worked in PySpark, Python on Azure Databricks.
The model was a significant improvement over the baseline univariate time-series forecasts that were used previously.
Data Scientist May 2016 – June 2018
Dollar General
Kentwood, MI
Dollar General is one of the largest discount retailers in the US. I was a Data Scientist for the marketing regional team. I worked to forecast future sales. Sales data for the past three years was analyzed and fit to models. An ARIMA model was fit to the data to forecast weekly sales into the next quarter. Models revealed shopping trends that were being under-capitalized by Dollar General.
Engineered a solution in the R programming language.
Experimented with time-series models such ARIMA GARCH to produce reliable forecasting.
Accessed and integrated large datasets from remotes servers using SQL.
Applied statistical testing to the model to determine appropriate autocorrelation and partial auto-correlation lags.
Forecasted sales for the next quarter.
Cleaned and normalized data set to optimize performance and reliability of predictions.
Collaborated with advertising to form a plan to capture the market during newly revealed consumer trends.
Communicated results through interactive visuals using the Javascript library D3.
Data Scientist for Sales January 2014 – April 2016
Sears Parts and Service
Grand Rapids, MI
Sears Parts and Service was part of Sears Roebuck Co. I served on an Analytics team for the Sales department as a Data Science consultant for Sears Parts and Service store division. I led a team to optimize a recommendation engine using a Collaborative Filtering and Content based Recommender system. The revenue impact from deployment of the new optimized Recommender system on their regional sales was expected to be more than 9%.
Used techniques such as collaborative and content-based filtering demographics for creating different recommender systems.
Used A/B testing to test the effectiveness of different types of recommender systems and optimized the most effective recommender system after careful tests and research.
Partially solved the “cold start” problem of Recommender system by incorporating the demographic-based Recommender system in the final SVD Recommender system.
Worked on data pre-processing and cleaning the data utilizing Pandas, NumPy, and performed feature engineering and data imputation techniques for missing values in the dataset using Python.
Performed stemming and lemmatization of text to remove superfluous components and make the resulting corpus as small as possible while containing all important information.
Solved analytical problems and effectively communicated methodologies and results.
Data Scientist January 2010 – January 2014
Independent Contractor
California, CA
Worked on several small projects in data science and statistics as a freelance data scientist.
Examined the relationship between SAT/ACT scores and college admissions.
Performed deep mathematical analysis of large datasets using R and Ggplot to produce visualizations that revealed the relationships and trends within the data.
Investigated correlations between temperature and energy demand.
Created logistic regression model to demonstrate likelihood of acceptance into various industries.
Performed NLP, topic modeling, and clustering analysis on job titles and descriptions to identify multiple employment opportunities in the same field with different names.
Utilized decision trees in Python to explain feature importance and observe effect of weather data on product sales.
Data Analyst May 2006 – January 2010
Sears Holdings Company
Chico, CA
Marketing Data Analyst for the Regional Sales Division. Supported many aspects of the business with internal analytics and provided lean, actionable, data-driven insights for clients on indirect procurement consulting engagements. Performed various statistical analyses and linear optimization techniques to create the greatest possible profit impact for clients.
Maintained and contributed to many internal R packages used for building and diagnosing models, and automated reporting.
Used R to perform ad-hoc analyses and deeper drill downs into spend categories of particular interest to clients on a project-to-project basis.
Performed large data cleaning and preparation tasks using R and SQL to gather information from disparate and incompatible data sources from across a client’s entire enterprise to provide a complete view of all indirect spends.
Helped maintain a large database of commodity and vendor information using SQL.
Maintained various visualization tools and dashboards used to provide data-driven insights.
Operations Management Tutor August 2013 - May 2014
Chico State University
Chico, CA
Quantitated methods for process optimization.
Education and Training
Master of Science: Applied Statistics
Michigan Technological University - Houghton, MI
Advanced Statistical Methods:
oDeveloped understanding of statistics, predictive modeling, probability, and time-series data.
oEnhanced experience with graphical methods, probability models, parameter estimation, and hypothesis testing.
Programming and Technology:
oCombined tested techniques with emerging technologies.
oImproved familiarity with industry-standard software and tools such as R, Python, and SAS.
oExperience with real-world datasets to overcome common challenges.
Communication and Leadership:
oBuilt the skillset necessary to draw accurate conclusions, present outcomes with confidence, and drive organizational decision-making.
Education
Bachelor of Science: Applied Mathematics
California State University Chico - Chico, CA