Zane Smith - Data Science

Location:

Indianapolis, IN

Salary:

Posted:

August 12, 2020

Contact this candidate

Resume:

Zane

Rodnick-Smith

Data Scientist

Ph: ***-***-*99

E: **********@*****.***

Phone Number: 720-***-****

Email: ****.*******@*****.***

Professional Summary

Data Science and statistician professional, creative thinker and problem solver. Able to distill high performant solutions from data to drive business strategy. Versatile, results-driven, and meticulous professional in Data Science and programming. Experience in Machine Learning and Data Mining with large Structured and Unstructured datasets, performing Data Acquisition, Data Validation, Predictive modelling, and Data Visualization. Experience in text mining - transposing words and phrases in unstructured data into numerical values. 10 Years of experience in Data Science and Statistics 10 Years of Experience in Information Technology

Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Technical Summary

Used statistical packages in Python, R together with SQL to build complex statistical models for predictive analysis, principal component analysis, and performing cluster analysis. Experience in designing informative visualizations using Tableau software, publishing and presenting dashboards, storyline on web and desktop platforms.

Familiarity with developing, deploying, and maintaining production NLP models with scalability in mind.

Hands on experience in implementing linear discriminant analysis (LDA), linear and logistic regression models, Naïve Bayes, support vector machine classifiers, K nearest neighbors, Random Forests, Decision Trees and neural networks while applying know how of Principle Component Analysis to strengthen Recommender Systems.

Experienced with machine learning algorithms such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression, and k-means.

Adept in statistical programming languages like R and Python including Big Data technologies like Spark, Hadoop 2.0, HIVE, HDFS; Experienced in Spark 2.1, Spark SQL, and PySpark.

Visualization tools like Tableau, Matplotlib, ggplot2

Skilled in using dplyr, ggplot2, Pandas, Numpy, Matplotlib, Seaborn and Pandas in R and python for performing Exploratory data analysis.

Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables.

Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.

Expert knowledge in statistics, mathematics, machine learning, recommendation algorithms and analytics with excellent understanding of business operations and analytics tools for effective analysis of data.

The ability to balance the “art and science” by solving analytical problems using quantitative and qualitative approaches that will be critical to driving high-end business value.

Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.

Drives the analytics roadmap proactively by identifying opportunities in the data based on the business priorities working with all divisions.

Responsible for delivering solving problems in the domains of Ecommerce, Shipping, Internet of Things and Spatial analytics with batch, real-time and predictive models

Analyze large data sets comprising of e-commerce data (clickstream, order data, tracking data, competitive price changes, currency fluctuations) and optimize business goals.

Stays current with research in data science, machine learning, operations research and Natural Language Processing to ensure we are leveraging best-in-class techniques, algorithms, and technologies.

Data Scientist. Ph: 720-***-**** E: ****.*******@*****.***

Works closely with Senior Leadership to champion informatics-based innovation efforts and to develop and execute a prioritized roadmap of analytic studies that targets advanced analytics initiatives.

Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction

Proactively researches and develops moderately complex Proofs of Concept that have will have the potential to serve as conceptual designs that analysts and data science practitioners can use in their respective initiatives.

Researches and implements methodologies to measure the impact of the technologies.

Provides business expertise and supports the development of models and analysis to provide the organization with insights.

Technical Skills

Data Science Specialties: Natural Language Processing, Machine Learning, Internet of Things (IoT) analytics, Social Analytics, Predictive Maintenance Programming Languages, Frameworks, Solutions: Java, Python, R, R-Shiny, JavaScript, SQL, MATLAB, SPSS, MiniTab, Hive, Spark, Scala

Version Control: GitHub, Git, SVN

IDE: Jupyter, Spyder, IntelliJ, Eclipse

Data Frameworks: R, Python, HiveQL, Spark, Spark SQL, Storm, Scala, Impala, MapReduce, Kinesis, EMR

Analytic Tools: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes Visualization: Tableau, R, R shiny, ggPlot2, PowerBI, seaborn, matplotlib Modeling and Methods: Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Stochastic Gradient Descent, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics Databases: Azure, Google, Amazon RedShift; HDFS, RDBMS, SQL and noSQL, data warehouse, data lake and various SQL and NoSQL databases and data warehouses. Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, pytorch

Soft Skills: Able to deliver presentations and highly technical reports; collaboration with stakeholders and cross-functional teams, advisement on how to leverage analytical insights. Development of clear analytical reports which directly address strategic goals. Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Professional Experience

Senior Data Scientist

DaVita

Denver, CO June 2019 - Present

Lead a data science product unit in using structured and unstructured dialysis patient biometric data with machine learning to predict whether a patient is requires rehospitalization and to allow intervention to prevent future hospital visits.

Identified important and interesting questions about large datasets, then translated those questions into concrete analytical tasks.

Researched and tested survival model for data, including state of the art neural networks for survival analysis using Python deep learning packages Theano, Tensorflow, and Keras. Provided evidence survival analysis was the incorrect machine learning approach for project and convinced principle project lead to change to a classification approach Implemented machine learning classification algorithm XGBoost Classifier in Python on structured patient biometric data.

Delivered feature engineering on structured patient biometric data to improve results. Approaches included:

o One-hot encoding categorical data

o Converting data labeled “MISSING” by original source providers into numpy NaN format to be usable by the algorithm

o Testing small subsamples of features to determine feature importance Tested and implemented multiple ways to handle missing values in the data, including replacing with a measure of central tendency (mean, median), removing values, using tree-based algorithms that can use missing values as decider nodes, and imputing the missing values using the R package MICE (Multiple Imputations by Chained Equations). Introduced new features into the dataset in collaboration with data engineer and principle project lead, most significantly previous hospital admission count, which led to a significant lift in accuracy.

Worked in an Anaconda environment with coding in Python and R-Programming. Implemented grid search from the scikit learn package in Python to efficiently test multiple hyperparameters for the machine learning algorithm Implementations done in collaboration with data engineer led to an over 30% gain in accuracy over previously tested machine learning models Produced rank-order feature importance tables to provide subject matter experts with a list of important drivers of dialysis hospitalization. Used values from the SHAP library in Python to give subject matter experts individualized drivers on a patient level to help plan treatment and interventions. Project received significant attention from C-level executives, and as a result of changes implemented, project was approved for pilot testing Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Collaborated with data engineer to introduce Python code into data pipeline to produce machine learning predictions quickly and efficiently. Collaborated with data engineer to encode unstructured doctor’s notes into features identified by subject matter experts using Doc2Vec and cosine similarity values for machine learning in Python.

Experimented with ensemble methods of machine learning analysis to improve prediction results, including stacking Random Forest, Stochastic Gradient Descent Classifier, Support Vector Machines, Naïve Bayes, and K-Nearest Neighbors. Made use of Anaconda environments for dependency control in Python Became familiar with HIPAA regulations to protect privacy of subjects in dataset and anonymize data points

Documented changes and results of experiments through use of Jupyter Notebooks in Python to track versions

In collaboration with data engineer and subject matter experts, discovered errors in dataset and identified source for correction.

Created visualizations to help explain the prediction results using a ROC curve in the matplotlib library in Python

Determined cross correlations among the feature data by producing a heatmap in the seaborn library in Python

Developed a dashboard in Tableau to provide valuable insights to stakeholders Created visualizations to help interpret model predictions and explain feature importance Data Scientist

Equinor

Austin, TX July 2018 – June 2019

Used machine learning and statistical techniques to analyze invoices and transactions for large oil company.

Used Python and Excel to create flat files from invoice data Developed Python script to automate comparisons between internal company data and subcontractor invoices

Used machine learning to detect error rates and flag invoices in need of correction Successfully lowered rate of error from subcontracting company Along with software engineer, successfully standardized subcontractor reporting system Along with software engineer, improved efficiency of cataloging itemized lists of charges on subcontractor invoices using SQL tables

Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.

Identified areas of inefficiency and waste that could be improved upon using Excel graphs and Tableau dashboards

Delivered various complex scorecards, dashboards, and reports. Collaborated on database design, data ingestion schemas. Developed interfaces with RESTful services.

Utilized Tensorflow and Keras in Python to create artificial neural network for productionized model

Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Data Scientist

FDM Group

New York City, NY Feb 2018 – June 2018

Worked on Wall Street to analyze financial and logistics data for consulting firm Used Excel to create analytics spreadsheets for outside firms Applied Bayesian statistics to financial data to model outcomes of investments using R programming language

Used time-series analysis and ARIMA modeling in R to predict bond trade fluctuations Created dashboards of financial data using Tableau and Power BI to present to executive level stakeholders

Along with business intelligence analyst, drafted and created a proposal to increase efficiency of company's recruitment and training program Delivered presentations to C-suite level executives and other nontechnical audiences Wrote SQL queries to pull financial transaction data from on-premise Oracle database Used R and SQL to clean and transform normalized financial data into flat files for analysis Senior Data Scientist

Apple

Austin, TX Jan 2017 – Jan 2018

Worked as a data scientist to analyze sentiment in preparation of iPhone X launch and critical response to product release.

Gathered data from various social media sources to perform sentiment analysis Evaluated performance of bag-of-words and TFIDF tokenization Performed stemming and lemmatization as well as stop word removal Implemented sentiment analysis on large dataset of many customer reviews of products Created convolutional neural network model using Tensorflow and Keras in Python Grouped reviews by sentiment score to perform topic modeling and provide insight into data trends

Created LDA model in Python with genism to extract topics from large corpus of documents

Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis. Applied breadth of knowledge in programming (Python, R), for Descriptive, Inferential statistics

Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, Matlab, Tableau and more.

Involved in extensive ad-hoc reporting, routine operational reporting, and data manipulation to produce routine metrics and dashboards for management Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.

Interacting with other data scientists and architects, custom solutions for data visualization using tools like Tableau and Packages in Python. Involved in running Spark jobs for processing millions of records. The building, publishing customized interactive reports, report scheduling and dashboards using Tableau Server.

Lead Data Scientist

NYOS

Austin, TX Aug 2015 – Dec 2016

Involved in evaluating and prescribing methods for company processes and procedures Created content for mentoring individuals in on-level statistics Utilized data driven methodologies for analyzing junior statistician performance that resulted in more effectively assessing junior statistician needs and supporting struggling employees

Compiled performance data in csv files using R and created reports for review by administration with ggplot2

Developed statistical models using Bayesian probabilities to predict likelihood of churn Examined conditional and marginal probabilities to create a recommender system using collaborative filtering and similarity scores

Performed Z-tests and T-tests to perform optimization on price points for various products sold by the company

Investigated the usability of machine learning in R&D for new products and finding appropriate price points based on similar features to existing products Lead Data Scientist

VeraBank Harker Heights

Harker Heights, TX Aug 2014 – Aug 2015

Mentored and led a team in methods of fraud detection Designed a new training methodology in statistics that met financial standards and requirements

Researched available data sources and examined the common thread of class imbalance in financial fraud detection

Instructed employees in statistical methods and data visualization techniques Improved model performance by 3 percentage points by utilizing Gaussian Mixture model Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Demonstrated among administrative executives the statistical significance of improvement in model performance and increased recall in fraud detection Involved in leading large team of statisticians to create mathematical and statistical models to evaluate trends and provide insight into data Led a project among employees in predicting the outcome of loans using Bayesian statistics and clustering on customer data in Python with scikit-learn Created a dashboard using R to report to stakeholders the estimated monthly return on investments as well as weekly number of fraudulent purchase requests correctly identified Data Scientist

Independent Contractor

Austin, TX Jul 2012 – Jul 2014

Worked several small projects in data science and statistics as a freelance data scientist Examined the relationship between SAT/ACT scores and college admissions Performed deep mathematical analysis of large datasets, using R and ggplot to produce visualizations that revealed the relationships and trends within the data Investigated the correlations between temperature and energy demand Created logistic regression model to demonstrate likelihood of acceptance into various industries

Performed NLP, topic modeling, and clustering analysis on job titles and descriptions to identify multiple employment opportunities in the same field with different names Utilized decision trees in Python to explain feature importance and observe effect of weather data on product sales

Data Science Research Associate

University of Texas

Dallas, TX Aug 2010 – Jul 2012

Participated in data-driven research project regarding sub-clinical autistic traits in the general population

Gathered data observing the presence of specified traits Organized data using Excel to create CSV files for data processing Performed exploratory data analysis in R language

Plotted correlation amongst various features using ggplot library Demonstrated feature importance to show which features were the best predictors of the traits

Evaluated performance of tree-based models, SVMs, and logistic regression to predict presence of traits

Implemented a logistic regression model to calculate probability of presence of traits Data Scientist. Ph: 720-***-**** E: ****.*******@*****.*** Model was used by professor to evaluate factors that influence the expression of the traits Worked in an R environment using packages like tidyverse and ggplot to explore and visualize the data

Education

University of Texas

Master of Science in Cognition and Neuroscience

Dallas, Texas

Reed College

Bachelor of Arts in Psychology

Portland, Oregon

Contact this candidate