Sign in

Gareth Williams - Data Science

Atlanta, GA
July 31, 2020

Contact this candidate


Professional SummarY

Experience in the application of Naïve Bayes, Regression Analysis, Neural Networks/Deep Neural Networks, Support Vector Machines (SVM), and Random Forest machine learning techniques.

Creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical theories, data models, and statistical analysis.

Advanced statistical and predictive modeling techniques to build, maintain, and improve on real-time decision systems.

Work on productizing algorithms and solutions.

In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems.

Advanced analytical teams to design, build, validate and refresh data models.

Excellent communication skills (verbal and written) to communicate with clients/stakeholders and team members.

Ability to quickly gain an understanding of niche subject matter domains, and design and implement effective novel solutions to be used by other subject matter experts.

Experience implementing industry standard analytics methods within specific domains, such as Online Analytical Processing (OLAP) Cubes within the field of indirect procurement, and applying data science techniques to expand these methods, for example, using Natural Language Processing methods to aid in normalizing vendor names, implementing clustering algorithms for supplier justification analysis, and deriving novel metrics for prioritization in addressing commodity fracturing.

Technical Skills

Analytic Development: Python, R-programming, Matlab, SQL, SCIP, C, Rust

Machine Learning: Natural Language Processing & Understanding, Imaging Recognition and Detection, Forecasting

Artificial Intelligence: Text comprehension, Audio Generation, Classification, Pattern Recognition, Recommendation Systems, Targeting Systems, Ranking Systems

Analysis Methods: Advanced Data Modeling, Forecasting, Predictive, Statistical, Sentiment, Exploratory, Stochastic, Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate Analysis, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics, Decision Analytics, Design and Analysis of Experiments, Association Analysis

Analysis Techniques: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes

Data Modeling: Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series Analysis

Applied Data Science: Natural Language Processing, Machine Learning, Internet of Things (IoT) Analytics, Social Analytics, Predictive Maintenance

Python Packages: Numpy, Pandas, Scikit-learn, Tensorflow, SciPy, Matplotlib, Seaborn, Keras

Version Control: GitHub, Git

IDE: Jupyter Notebook, Spyder, RStudio, Visual Studio, Code::Blocks, Atom

Data Query: Azure, SQL

Deep Learning: Machine Perception, Neural Networks, TensorFlow, Keras

Soft Skills: Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately; leadership, mentoring, and coaching

Professional Experience

August 2019 – Present

Data Scientist

Enhance IT Atlanta, Georgia

Project Summary:

Worked as a Machine Learning Engineer/Data Researcher. Was a part of a team that built, trained, and tested a Natural Language Processing model which took in comments/tweets and classified if they contained valuable information, and then extracted that information. Goal of the project was to increase collection of reports and determine if it was news, if it was worth reporting, and if it was worth sending a news crew to further report on.

Project Points:

Used Python, R, and SQL to collect, explore, analyze the comments/tweets.

Used Python and NLTK to tokenize and pad comments/tweets and vectorized.

Created and trained an Artificial Neural Network with TensorFlow on the tokenized comments/tweets

Involved in model deployment using Flask with a REST API

Wrote extensive SQL queries to extract data from the database

Built a deep learning model for text classification and analysis

September 2017 – June 2019

Data Scientist

CSX Transportation Jacksonville, Florida (Remote)

Project Summary:

Worked as a Machine Learning Engineer/Data Scientist. Was a part of a team that researched methods to optimize the cost of removing a Rail Car for inspection and the cost of it failing on the job. As a team we collected, preprocessed, and analyzed the data. Next, we built, trained, and tested a variety of models and selected the most robust model to move on to production.

Project Points:

Used SQL, R, and Python to collect, analyze, and preprocess the data.

Used R and Python to explore the data using statistical methods along with visualization with the use of R and Python packages.

Worked on data preprocessing and cleaning the data utilizing Pandas, Numpy, and performing feature engineering and data imputation techniques for missing values in the dataset using Python.

Utilizing a Multiple Linear Regression method increased identifying potential failing parts, which over the course of a year saved the company a little under a million dollars per year.

Created a predictive analytics system to decrease the likelihood of part failure and reduce associated costs.

Explored the use of Survival Modeling to estimate the time to failure of certain parts.

July 2015 – May 2017

Data Scientist

Debsbru Productions LLC San Francisco, California

Project Summary:

Used convolutional neural networks to identify and flag/remove brand names and logos from every frame of a film. Another job was to take film footage of the actors’ faces and map it to specific senses in the movie. Specific scenes in the movie only had one take and the actor’s faces were not clearly visible. With a talented team, we created a training set of all of our faces wording and acting the scenes. We then used images of the targeted actors and generated the scene with their images. The footage was then sent to a separate team to then fit and edit the footage into the final product.

Project Points:

Used pretrained model to segment frame images into objects for identification and removal/replacement.

Used Python and C to import, build, and generate the actor’s facial movements.

Created and preprocessed video and images.

Used a Convolution Neural Network which was trained on the motions of the targeted acted scenes.

Generated a product which was then used in the final cut of the movie.

February 2012 – April 2015

Data Scientist Research Associate

Computer Communication Research Labs Norfolk, Virginia

Project Summary:

Project was to classify population socioeconomic classes (normal and elite). The use of clustering was our main approach to identify and predict the fluidity of classes as they moved throughout time. Using data collected from a simulation we were able to predict and classify how classes within a given area changed or remained the same.

Project Points:

Used Python for the Clustering Analysis and NetLog for the simulation.

Data exploration using statistical methods and visual packages from Python.

Pre-processed datasets that NetLog produced during the simulation phases and past datasets.

Implemented a variety of Clustering models with the preprocessed data to classify population classes.

Developed a segmentation solution using various clustering analysis methods including k-means clustering, gaussian mixture models, and dbscan


Bachelor’s of Science in Applied Mathematics

University of California, Davis

Davis, California

Contact this candidate