Post Job Free

Resume

Sign in

Senior Data Science Consultant

Location:
San Antonio, TX
Posted:
March 22, 2023

Contact this candidate

Resume:

Professional summary.

Data scientist with ** years’ experience processing and analyzing data across a variety of industries. Leverages various mathematical, statistical, and machine learning tools to collaboratively synthesize business insights and dr..ie innovative solutions for productivity, efficiency, and revenue.

●Experienced in the application of Bayesian Techniques, Advanced Analytics, Neural Networks and Deep Neural Networks, Support Vector Machines (SVMs), and Decision Trees with Random Forest ensemble.

●Proven creative thinker with a strong ability to devise and propose novel ways to look at and approach problems using a combination of business acumen and mathematical methods.

●Experience in using statistical models on big data sets using cloud-based cluster computing assets with AWS, Azure, and other Unix-based architectures.

●Identification of patterns in data and using experimental and iterative approaches to validate findings.

●Working alongside Data Engineers to productionize algorithms and solutions.

●Advanced predictive modeling techniques to build, maintain, and improve on real-time decision systems.

●In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems.

●Machine learning techniques to promote marketing and merchandising ideas.

●Contributed to advanced analytical teams to design, build, validate, and re-train models.

●Excellent communication skills (verbal and written) to communicate with clients, stakeholders, and team members.

●Ability to quickly gain an understanding of niche subject matter domains, and design and implement effective novel solutions to be used by other subject matter experts.

●Experience implementing industry standard analytics within specific domains and applying data science techniques to expand these methods using Natural Language Processing, implementing clustering algorithms, and deriving insight.

Technical Skills

●Analytic Development/Platforms: Python, R-Programing, SQL, Excel.

●Python Packages: Numpy, Pandas, scikit-learn, TensorFlow, SciPy, Matplotlib, Seaborn, Azure Notebook.

●IDE: Jupyter, Spyder, RStudio, Google Colab, MySQL.

●Version Control: GitHub.

●Machine Learning: Natural Language Processing and Understanding, Machine Learning algorithms including text recognition, image classification, and forecasting; XGBoost.

●Data Query: Azure, Google, SQL, data warehouse, data lake and various SQL databases and data warehouses.

●Deep Learning: Machine Perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, PyTorch, Long Short-Term memory (LSTM).

●Artificial Intelligence: Text Understanding, Classification, Pattern. Recognition, Recommendation Systems, Targeting Systems, Ranking Systems, and Time Series.

●Analysis Methods: Advanced Data Modeling, Statistical, Exploratory, Bayesian Analysis, Inference, Regression Analysis, Multivariate analysis, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics, Decision Analytics, Design and Analysis of Experiments, Factorial Design and Response Surface Methodologies, Optimization, and State-Space Analysis, KNN Regression.

●Analysis Techniques and Tools: Classification and Regression Trees (CART), Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Recurrent Neural Network (RNN) including LSTM, Linear and Logistic Regression, Naïve Bayes, Simplex, Markov Models, and Jackson Networks, Docker, Kalman filtering, Gaussian Mixture Models (GMMs).

●Data Modeling: Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series Analysis.

●Applied Data Science: Natural Language Processing, Machine Learning, Text Recognition, Image Classification, Social Analytics, Predictive Maintenance.

●Cluster Management: Kubernetes, Databricks.

●Open-Source Data Platforms and Algorithms: XGBoost, MLFlow, CatBoost, LightGBM.

●Cloud: Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure

Professional Experience

Senior Data Science Consultant

DaVita

Feb 2022 to Present

Boulder, CO (worked remotely from Atlanta)

I worked with a team to develop a model, end to end, from requirements gatheirng through deployment into production on GCP by using AutoML & VertexAI. We trained XGBoost models, using patient history records covering 2 million patient treatments, tuned to patient clusters to predict the improvement of prescriptions generated both by rules and by actual physicians. The project was to recommend better prescriptions for patients on peritoneal dialysis. 93% of patients were predicted to have improved outcomes, and 81% percent were predicted to no longer be at risk.

●Served as Team Lead working with 2 data scientists and an intern underneath me, 1 data Engineer, and 1 technical writer.

●Accomplished project planning through GitLab task.

●Used different Machine Learning and Deep Learning algorithms

●Deployed models by using Google Kubernetes and Dockers Engines

●Made weekly presentations to business and medical teams.

●Worked in an Agile development environment with daily stand ups.

●Worked directly with physicians to develop domain knowledge.

●Used GCP services VertexAI, AutoML, Colab, DataProc, Big Query and others

● Dockerized production files.

●Hosted on a Google Cloud Platform compute instance.

●Worked with UX team to create physician facing app.

●Worked with technical writer to provide documentation for doctors who would be end users.

●Performed data cleaning and preparations with statistical and visual analysis.

●Ensured work complied with HIPAA regulatory mandates pertaining to cloud computing and protecting the privacy and security of electronic protected health information (ePHI).

●Utilized Light Gradient Boosting Machine (LightGBM) distributed gradient boosting framework.

●Applied CatBoost algorithm for gradient boosting on decision trees.

●Applied Support Vector Machine (SVM) linear model for classification and regression problems.

●Used Python to develop and train models and worked with combination of clustering algorithms using Sklearn that helped in optimizing recommendation process.

●Worked with Airflow, Big Query, and SQL.

Data Science Consultant

Anthem Inc.

Jul 2021 to Feb 2022

Atlanta, GA

I am serving as a Data Science Consultant interacting with 3 Data Science dev teams working on data extraction (mining), UI development/optimization, and pipeline builds. My consulting work involves me interacting with 4 Data Scientists, 3 Software Engineers, 1 Project Manager, and 1 Head of Data Science.

●Extracted text from documents using OCR.

●Applied cosine similarity and Bert to find relevant sections of text in documents.

●Applied OCR to extract handwritten signatures and dates.

●Generated Regex patterns to collect text from relevant sections.

●Utilized OpenCV to find page numbers and text coordinates.

●Stored data on local Hadoop cluster.

●Led weekly presentations to business stakeholders to refine output.

●Used Jira for sprint planning and cards.

●Used Bitbucket and Git for code management.

●Built deep learning neural network models from scratch using GPU-accelerated libraries like PyTorch.

●Employed PyTorch, Scikit-Learn and XGBoost libraries to build and evaluate the performance of different models.

●Utilized Amazon Textract machine learning (ML) service to automatically extract text, handwriting, and data from scanned documents.

●Configured Pandas for data manipulation.

●Troubleshot machine-learning models in Python (TensorFlow) code to keep pipeline moving using PyTest packages.

●Applied Fuzzysearch algorithms to help locate records relevant searches.

●Used MS Teams for communication.

Lead Data Scientist

Neal Analytics

Jan 2021 to Jul 2021

Seattle, WA

The project at Neal was for Niagara Water, to help them predict their shipping costs when they needed to make ad-hoc shipments to supplement their existing shipping contracts. Challenge was to improve Model accuracy. The solution involved clustering shipping lanes, and then using the clusters to fit a Regression tree. A filter based on a prediction-correction model for linear and time-variant or time-invariant systems was applied.

The team consisted of myself in the Lead Data Scientist role, 2 Data Scientists who reported to me, 1 Project Manager, 1 Data Engineer, and 1 ML Operations Specialist. The team applied an Agile methodology with daily standups and bi-weekly presentations. My focus point of responsibility was on primary modeler development and deployment. Project represents approximately $2M in estimated annual cost savings.

●Clustering shipping lanes using a Gaussian Mixture model.

●Fitting clusters to an XGBoost Regression tree to make the final prediction and target negotiation goals based on the statistical likelihood of achieving that goal based on the bid prediction.

●Applying a Kalman filter at bid time to update any mispredictions to give a target negotiation goal to Niagara’s bid negotiators.

●Building the model on an Azure Notebook and Azure Databricks data analytics platforms.

●Programming model functions using Python.

●Working within Databricks developed largely in Python, Spark, PySpark, Mlib, Pandas, and NumPy.

●Fitting several preliminary Bayesian and machine learning models in Scala and Python (with PySpark for data retrieval in Python) for the purpose of improved understanding of data, and for feature selection.

●Applying and running search and decision algorithms such as XGBoost, MLFlow, CatBoost, LightGBM.

●Conducting regression tests using KNN regression.

●Applying Long Short-Term memory (LSTM)/Recurrent Neural Network (RNN) architectures to deep learning.

●Creating, deploying, and running container applications using Docker.

●Using Kubernetes for clustering.

ML and Data Scientist

Buoy Health

Jan 2020 to January 2021

Boston, MA

Buoy Health claims that its symptom checker chatbot leverages AI to deliver personalized and more accurate diagnoses. The company’s algorithm was trained on clinical data from 1000s of medical papers in an effort to mirror the literature referenced by physicians. Examples of data include 5 million patients and approximately 1,700 conditions. Beginning with the symptoms provided by the user via natural language processing, the chatbot will match the symptoms to all possible conditions and then ask clarifying questions to narrow them down to the best selection.

●Worked as a Data S to generate Data Models using Erwin and developed relational database system.

●Analyzed the business requirements of the project by studying the Business Requirement Specification document.

●Converted business problem into a well-defined Machine Learning problem. Created Key Performance Indicators (KPI) for project success. Delivered actionable insights to stakeholders.

●Used R and Python for Exploratory Data Analysis, Anova test and Hypothesis testing.

●Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them.

●Utilized matplotlib in Python to generate data visualizations to convey results, diagnostics, and useful insights to team members and team lead.

●Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.

●Designed mapping to process the incremental changes that exists in the source table. Whenever source data elements were missing in source tables, these were modified/added in consistency with third normal form based OLTP source database.

●Provided expertise and recommendations for physical database design, architecture, testing, performance tuning and implementation.

●Designed logical and physical data models for multiple OLTP and Analytic applications.

●Designed the physical model for implementing the model into oracle9i physical data base.

●Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats

●Tuned database to optimize performance of indexes and SQL statements.

●Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.

●Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.

●Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.

Data Scientist

Baltimore Orioles –

Nov 2017 – Sept 2019

Baltimore Maryland (Remote)

Working with the Baltimore Orioles. I worked on pitcher relief prediction. Game statistics for the 2017-2018 season were analyzed and pitchers were clustered into one of three categories. Pitcher clusters were then passed to a random forest to determine the optimal time for substitution. The model was then used to create a paper-based framework for General Manager decisions in the bullpen.

●Visual data exploration performed in Python using the Matplotlib and Seaborne libraries.

●Applied PCA utilizing SVD for variable selection to reduce model complexity

●K-Means, GMM, DBSCAN used to classify pitchers into distinct strategic classes for analysis

●Normalized Data to improve accuracy and performance, used statistical and analytic tests such as Grubb’s to find and remove outliers.

●Explored Data statistically and visually to support model construction.

●Built specific domain knowledge by finding nontechnical experts and integrating their knowledge into models.

●Performed feature engineering and selection to generate high performing, understandable models.

●Implemented a Final model using decision trees in a random forest ensemble to predict pitcher substitution.

Data Scientist

YRC Freight

Jan 2015- Sept 2017

Atlanta, GA

For YRC I worked to optimize trailer loading and unloading. Utilizing machine vision, a convolution neural network was used to recognize irregular loads, and then a linear programming solution based on convex hulls was applied to fill trailers. The model reduced labor costs, and time investments to loading trailers.

●Applied linear programming and optimization techniques to tessellate the space on each trailer.

●Used Tensorflow in Python for machine vision to recognize non-standard labeling.

●Implemented a Convolution Neural Network for machine vision to read non uniform labeling.

●Coded solution using Python while utilizing Numpy,CVXPY, PuLP for linear programming.

●Deployed Solution onto an EC2 instance using AWS providing access to forklift operators

●Integrated models create a solution in compliance with federal and company regulations.

●Explored data visually in python using the Matplotlib and Seaborn packages

Data Scientist

Target Corporation

March 2012 - 2014

Minneapolis, Minnesota

Worked to forecast future sales. Sales data for the past three years was analyzed and fit to models. AN ARIMA model was fit to the data in order to forecast weekly sales into the next quarter. Models revealed shopping trends that were being under-capitalized by Target.

●Engineered a solution in the R programming language

●Experimented with time series models such ARIMA GARCH to produce reliable forecasting.

●Accessed and integrated large datasets from remotes servers using SQL

●Applied Statistical testing to the model to determine appropriate autocorrelation and partial auto-correlation lags.

●Forecasted sales for the next quarter.

●Cleaned and normalized data set to optimize performance and reliability of predictions.

●Collaborated with advertising to form a plan to capture the market during newly revealed consumer trends.

●Communicated results through interactive visuals using the Javascript library D3.

Education

Master of Science in Analytics

Georgia Institute of Technology

Atlanta, Georgia

Bachelor of Science in Applied Mathematics

Georgia Institute of Technology

Atlanta, Georgia

Bachelor of Science in Physics

Georgia Institute of Technology

Atlanta, Georgia



Contact this candidate