Zane
Rodnick-Smith
Data Scientist
E: adfaaz@r.postjobfree.com
Phone Number: 720-***-****
Email: adfaaz@r.postjobfree.com
Professional Summary
Data Science and statistician professional, creative thinker and problem solver. Able to distill high performant solutions from data to drive business strategy. Versatile, results-driven, and meticulous professional in Data Science and programming. Experience in Machine Learning and Data Mining with large Structured and Unstructured datasets, performing Data Acquisition, Data Validation, Predictive modelling, and Data Visualization. Experience in text mining - transposing words and phrases in unstructured data into numerical values. 10 Years of experience in Data Science and Statistics 10 Years of Experience in Information Technology
Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Technical Summary
Used statistical packages in Python, R together with SQL to build complex statistical models for predictive analysis, principal component analysis, and performing cluster analysis. Experience in designing informative visualizations using Tableau software, publishing and presenting dashboards, storyline on web and desktop platforms.
Familiarity with developing, deploying, and maintaining production NLP models with scalability in mind.
Hands on experience in implementing linear discriminant analysis (LDA), linear and logistic regression models, Naïve Bayes, support vector machine classifiers, K nearest neighbors, Random Forests, Decision Trees and neural networks while applying know how of Principle Component Analysis to strengthen Recommender Systems.
Experienced with machine learning algorithms such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression, and k-means.
Adept in statistical programming languages like R and Python including Big Data technologies like Spark, Hadoop 2.0, HIVE, HDFS; Experienced in Spark 2.1, Spark SQL, and PySpark.
Visualization tools like Tableau, Matplotlib, ggplot2
Skilled in using dplyr, ggplot2, Pandas, Numpy, Matplotlib, Seaborn and Pandas in R and python for performing Exploratory data analysis.
Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables.
Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
Expert knowledge in statistics, mathematics, machine learning, recommendation algorithms and analytics with excellent understanding of business operations and analytics tools for effective analysis of data.
The ability to balance the “art and science” by solving analytical problems using quantitative and qualitative approaches that will be critical to driving high-end business value.
Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.
Drives the analytics roadmap proactively by identifying opportunities in the data based on the business priorities working with all divisions.
Responsible for delivering solving problems in the domains of Ecommerce, Shipping, Internet of Things and Spatial analytics with batch, real-time and predictive models
Analyze large data sets comprising of e-commerce data (clickstream, order data, tracking data, competitive price changes, currency fluctuations) and optimize business goals.
Stays current with research in data science, machine learning, operations research and Natural Language Processing to ensure we are leveraging best-in-class techniques, algorithms, and technologies.
Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com
Works closely with Senior Leadership to champion informatics-based innovation efforts and to develop and execute a prioritized roadmap of analytic studies that targets advanced analytics initiatives.
Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction
Proactively researches and develops moderately complex Proofs of Concept that have will have the potential to serve as conceptual designs that analysts and data science practitioners can use in their respective initiatives.
Researches and implements methodologies to measure the impact of the technologies.
Provides business expertise and supports the development of models and analysis to provide the organization with insights.
Technical Skills
Data Science Specialties: Natural Language Processing, Machine Learning, Internet of Things (IoT) analytics, Social Analytics, Predictive Maintenance Programming Languages, Frameworks, Solutions: Java, Python, R, R-Shiny, JavaScript, SQL, MATLAB, SPSS, MiniTab, Hive, Spark, Scala
Version Control: GitHub, Git, SVN
IDE: Jupyter, Spyder, IntelliJ, Eclipse
Data Frameworks: R, Python, HiveQL, Spark, Spark SQL, Storm, Scala, Impala, MapReduce, Kinesis, EMR
Analytic Tools: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes Visualization: Tableau, R, R shiny, ggPlot2, PowerBI, seaborn, matplotlib Modeling and Methods: Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Stochastic Gradient Descent, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics Databases: Azure, Google, Amazon RedShift; HDFS, RDBMS, SQL and noSQL, data warehouse, data lake and various SQL and NoSQL databases and data warehouses. Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, pytorch
Soft Skills: Able to deliver presentations and highly technical reports; collaboration with stakeholders and cross-functional teams, advisement on how to leverage analytical insights. Development of clear analytical reports which directly address strategic goals. Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Professional Experience
Senior Data Scientist
DaVita
Denver, CO June 2019 - Present
Lead a data science product unit in using structured and unstructured dialysis patient biometric data with machine learning to predict whether a patient is requires rehospitalization and to allow intervention to prevent future hospital visits.
Identified important and interesting questions about large datasets, then translated those questions into concrete analytical tasks.
Researched and tested survival model for data, including state of the art neural networks for survival analysis using Python deep learning packages Theano, Tensorflow, and Keras. Provided evidence survival analysis was the incorrect machine learning approach for project and convinced principle project lead to change to a classification approach Implemented machine learning classification algorithm XGBoost Classifier in Python on structured patient biometric data.
Delivered feature engineering on structured patient biometric data to improve results. Approaches included:
o One-hot encoding categorical data
o Converting data labeled “MISSING” by original source providers into numpy NaN format to be usable by the algorithm
o Testing small subsamples of features to determine feature importance Tested and implemented multiple ways to handle missing values in the data, including replacing with a measure of central tendency (mean, median), removing values, using tree-based algorithms that can use missing values as decider nodes, and imputing the missing values using the R package MICE (Multiple Imputations by Chained Equations). Introduced new features into the dataset in collaboration with data engineer and principle project lead, most significantly previous hospital admission count, which led to a significant lift in accuracy.
Worked in an Anaconda environment with coding in Python and R-Programming. Implemented grid search from the scikit learn package in Python to efficiently test multiple hyperparameters for the machine learning algorithm Implementations done in collaboration with data engineer led to an over 30% gain in accuracy over previously tested machine learning models Produced rank-order feature importance tables to provide subject matter experts with a list of important drivers of dialysis hospitalization. Used values from the SHAP library in Python to give subject matter experts individualized drivers on a patient level to help plan treatment and interventions. Project received significant attention from C-level executives, and as a result of changes implemented, project was approved for pilot testing Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Collaborated with data engineer to introduce Python code into data pipeline to produce machine learning predictions quickly and efficiently. Collaborated with data engineer to encode unstructured doctor’s notes into features identified by subject matter experts using Doc2Vec and cosine similarity values for machine learning in Python.
Experimented with ensemble methods of machine learning analysis to improve prediction results, including stacking Random Forest, Stochastic Gradient Descent Classifier, Support Vector Machines, Naïve Bayes, and K-Nearest Neighbors. Made use of Anaconda environments for dependency control in Python Became familiar with HIPAA regulations to protect privacy of subjects in dataset and anonymize data points
Documented changes and results of experiments through use of Jupyter Notebooks in Python to track versions
In collaboration with data engineer and subject matter experts, discovered errors in dataset and identified source for correction.
Created visualizations to help explain the prediction results using a ROC curve in the matplotlib library in Python
Determined cross correlations among the feature data by producing a heatmap in the seaborn library in Python
Developed a dashboard in Tableau to provide valuable insights to stakeholders Created visualizations to help interpret model predictions and explain feature importance Data Scientist
Equinor
Austin, TX July 2018 – June 2019
Used machine learning and statistical techniques to analyze invoices and transactions for large oil company.
Used Python and Excel to create flat files from invoice data Developed Python script to automate comparisons between internal company data and subcontractor invoices
Used machine learning to detect error rates and flag invoices in need of correction Successfully lowered rate of error from subcontracting company Along with software engineer, successfully standardized subcontractor reporting system Along with software engineer, improved efficiency of cataloging itemized lists of charges on subcontractor invoices using SQL tables
Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
Identified areas of inefficiency and waste that could be improved upon using Excel graphs and Tableau dashboards
Delivered various complex scorecards, dashboards, and reports. Collaborated on database design, data ingestion schemas. Developed interfaces with RESTful services.
Utilized Tensorflow and Keras in Python to create artificial neural network for productionized model
Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Data Scientist
FDM Group
New York City, NY Feb 2018 – June 2018
Worked on Wall Street to analyze financial and logistics data for consulting firm Used Excel to create analytics spreadsheets for outside firms Applied Bayesian statistics to financial data to model outcomes of investments using R programming language
Used time-series analysis and ARIMA modeling in R to predict bond trade fluctuations Created dashboards of financial data using Tableau and Power BI to present to executive level stakeholders
Along with business intelligence analyst, drafted and created a proposal to increase efficiency of company's recruitment and training program Delivered presentations to C-suite level executives and other nontechnical audiences Wrote SQL queries to pull financial transaction data from on-premise Oracle database Used R and SQL to clean and transform normalized financial data into flat files for analysis Senior Data Scientist
Apple
Austin, TX Jan 2017 – Jan 2018
Worked as a data scientist to analyze sentiment in preparation of iPhone X launch and critical response to product release.
Gathered data from various social media sources to perform sentiment analysis Evaluated performance of bag-of-words and TFIDF tokenization Performed stemming and lemmatization as well as stop word removal Implemented sentiment analysis on large dataset of many customer reviews of products Created convolutional neural network model using Tensorflow and Keras in Python Grouped reviews by sentiment score to perform topic modeling and provide insight into data trends
Created LDA model in Python with genism to extract topics from large corpus of documents
Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis. Applied breadth of knowledge in programming (Python, R), for Descriptive, Inferential statistics
Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, Matlab, Tableau and more.
Involved in extensive ad-hoc reporting, routine operational reporting, and data manipulation to produce routine metrics and dashboards for management Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
Interacting with other data scientists and architects, custom solutions for data visualization using tools like Tableau and Packages in Python. Involved in running Spark jobs for processing millions of records. The building, publishing customized interactive reports, report scheduling and dashboards using Tableau Server.
Lead Data Scientist
NYOS
Austin, TX Aug 2015 – Dec 2016
Involved in evaluating and prescribing methods for company processes and procedures Created content for mentoring individuals in on-level statistics Utilized data driven methodologies for analyzing junior statistician performance that resulted in more effectively assessing junior statistician needs and supporting struggling employees
Compiled performance data in csv files using R and created reports for review by administration with ggplot2
Developed statistical models using Bayesian probabilities to predict likelihood of churn Examined conditional and marginal probabilities to create a recommender system using collaborative filtering and similarity scores
Performed Z-tests and T-tests to perform optimization on price points for various products sold by the company
Investigated the usability of machine learning in R&D for new products and finding appropriate price points based on similar features to existing products Lead Data Scientist
VeraBank Harker Heights
Harker Heights, TX Aug 2014 – Aug 2015
Mentored and led a team in methods of fraud detection Designed a new training methodology in statistics that met financial standards and requirements
Researched available data sources and examined the common thread of class imbalance in financial fraud detection
Instructed employees in statistical methods and data visualization techniques Improved model performance by 3 percentage points by utilizing Gaussian Mixture model Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Demonstrated among administrative executives the statistical significance of improvement in model performance and increased recall in fraud detection Involved in leading large team of statisticians to create mathematical and statistical models to evaluate trends and provide insight into data Led a project among employees in predicting the outcome of loans using Bayesian statistics and clustering on customer data in Python with scikit-learn Created a dashboard using R to report to stakeholders the estimated monthly return on investments as well as weekly number of fraudulent purchase requests correctly identified Data Scientist
Independent Contractor
Austin, TX Jul 2012 – Jul 2014
Worked several small projects in data science and statistics as a freelance data scientist Examined the relationship between SAT/ACT scores and college admissions Performed deep mathematical analysis of large datasets, using R and ggplot to produce visualizations that revealed the relationships and trends within the data Investigated the correlations between temperature and energy demand Created logistic regression model to demonstrate likelihood of acceptance into various industries
Performed NLP, topic modeling, and clustering analysis on job titles and descriptions to identify multiple employment opportunities in the same field with different names Utilized decision trees in Python to explain feature importance and observe effect of weather data on product sales
Data Science Research Associate
University of Texas
Dallas, TX Aug 2010 – Jul 2012
Participated in data-driven research project regarding sub-clinical autistic traits in the general population
Gathered data observing the presence of specified traits Organized data using Excel to create CSV files for data processing Performed exploratory data analysis in R language
Plotted correlation amongst various features using ggplot library Demonstrated feature importance to show which features were the best predictors of the traits
Evaluated performance of tree-based models, SVMs, and logistic regression to predict presence of traits
Implemented a logistic regression model to calculate probability of presence of traits Data Scientist. Ph: 720-***-**** E: adfaaz@r.postjobfree.com Model was used by professor to evaluate factors that influence the expression of the traits Worked in an R environment using packages like tidyverse and ggplot to explore and visualize the data
Education
University of Texas
Master of Science in Cognition and Neuroscience
Dallas, Texas
Reed College
Bachelor of Arts in Psychology
Portland, Oregon