Resume

Sign in

Data Science

Location:
Atlanta, GA
Posted:
December 10, 2021

Contact this candidate

Resume:

DON AGIRO

Data Scientist

Atlanta, GA 402-***-**** adkrv6@r.postjobfree.com

Professional Summary

Detail oriented Data Scientist with 10 years of experience solving complex problems through proven Machine Learning, statistical and project management skills to optimize data and improve quality of business Insights.

●Data science, data mining, SQL queries, data modeling, business analytics, and data visualization, machine learning in Python using TensorFlow.

●Experience using R, Python, and SQL.

●Extensive data science experience in Python (Numpy, Pandas, TensorFlow, Matplotlib) & R tidyverse to find lean and actionable solutions and insights to various real-world business problems.

●Exceptional communication skills able to clearly relay results and solutions to stakeholders.

●Design custom BI reporting dashboards in Python using Dash with Plotly.

●Expert in the transformation of business requirements into statistical data models in Python. Design and build solutions using TensorFlow.

●Build statistical models in Python using TensorFlow and report using BI solutions that scale across massive volumes of structured data and unstructured data.

●Experience in the application of machine learning models (e.g., naïve Bayes, linear regression, deep neural networks, support vector machines (SVM), decision trees, random forest, XGBoost etc.).

●Experience implementing statistical models on big data sets using cloud computing services (e.g., AWS and Azure).

●Strong ability to devise and propose innovative ways to analyze problems by using acquired business acumen, mathematical theories, data models, and statistical analysis.

●Discover patterns in data using algorithms and SQL queries and use an experimental and iterative approach to validate findings in Python using Tensorflow

●Experience working with relational databases using advanced SQL skills.

●In-depth knowledge of statistical procedures that are applied in both supervised and unsupervised machine learning problems.

●Excellent communication skills (verbal and written) to communicate with clients/stakeholders and team members.

●Strong experience in Software Development Life Cycle (SDLC) and in supervising teams of domain specific experts to meet product specifications and benchmarks within the deadlines given.

●Ability to quickly gain an understanding of very niche subject matter domains and design and implement effective novel solutions to be used by other subject matter experts.

Skills

●Python, R, MySQL

●Tensorflow, Keras, Pytorch, Prophet

●Pandas, Numpy, NLTK, SQLAlchemy, Matplotlib, Seaborn

●Linear Regression, Logistic Regression, KNN, Neural Networks, Timeseries,

●Project Coordination, Interpersonal Communication, Emotional Intelligence

Professional Experience

Data Scientist at Southern Company

Atlanta, Georgia Apr 2021 – Present

I am working with a team mandated to analyze the pipe systems of the gas operations level of Southern Company. The main objective of the project is to come up with a data model that alerts to the probability of pipe failure (e.g., puncture) occurring within a predicted time range so the company can optimize preventative maintenance on their pipe systems and address pipes identified for maintenance review prior to the event of an actual puncture.

●Working with stakeholders to evaluate the existing system to determine best algorithm to manage resource demand.

●Assessing the company’s existing Data Analysis and Reporting of pipe damages data.

●Leading cross-functional collaborations between technical vendors and company leadership.

●Working with Data Science team to map out solution for generating real-time streams of relevant pipeline data and creating appropriate Data Sets.

●Working with Data Science team to identify, design, and evaluate third-party tools to build the predictive modelling solution to determine threat levels of pipes within the company’s industrial pipeline operations infrastructure.

●Developing time series models to help with resource management.

●Programming in Python.

Data Scientist at Coca Cola

Atlanta, Georgia Jan 2020 – Apr 2021

At Coca Cola Co, a beverage company, I worked with manufacturing equipment datasets to provide complex data extracts, programming, and analytical modeling. This was performed to support the automation of various routine manufacturing processes by predicting time-to-failure to prevent extended downtime, scheduling appropriate preventative maintenance. Incorporated IoT data for up-to-date predictions. When Covid-19 hit we focused on generating automated system alerts and predictive solutions to increase the reliability of the plants under reduced staff. Some of the tasks included:

●Through survival analysis techniques and machine learning algorithms, we tried to improve how the manufacturing teams could predict part failures.

●The project required use of data mining methods, hypothesis testing, regression analysis and various other statistical analysis and modeling methods.

●Presented weekly updates to managers and key stakeholders to preview the user interface designs and analytical results of stress analysis findings, etc.

●Presented using PowerPoint, Tableau, Excel for data work and charts.

●Participated in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing following Agile methodologies. Operated in 2-week sprints, and weekly stand-ups.

●Worked in Git development environment.

●Responsible for preparation for data for collaboration with machine learning models.

●Used Python to create a semi-automated conversion process to generate raw archive linked data file.

●Provided software training and further education about model applications to incoming team.

●Initial findings reported for conversion of Excel to CSV, text to CSV and image to CSV.

●Collaborated with the computer vision team to better understand how to extract meaning from images and PDF files.

●Used Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical Testing, and Normal Distribution.

●The project was implemented with custom APIs in Python and use of visualization tools such as Tableau and ggplot to create dashboards.

Data Scientist at Ameritas Insurance Company

Lincoln, Nebraska Jan 2018 – Dec 2019

Worked with Actuaries to calculate total claims using a time series model. was tasked with working on a time series ARIMA model using Python to determine the value of a claim and append the results to a dataset that was used to make a linear regression model for predicting the sum value of expected claims. Model results were deployed on an AWS EC2 instance and shareholders could get results through a web API capable of fetching real time results as new data was added. Some tasks include:

●Database Administration: Built a MySQL database to house and manipulate the project data

●Python libraries (Prophet, Pandas, Numpy): used Python to explore the data and evaluated if it was fit to conduct a time series analysis.

●Dickeyfuller test: Ran a Dickeyfuller test to determine whether the data being used was stationary

●Data decomposition: did a one- step differencing to make the data more stationary without overfitting

●Statistical model exploration (ARIMA and spectral analysis): ran two different time series models to determine which would give more accurate projections.

●Model Tuning: performed routine checks on the model to make sure anomalies such as structural breaks were adapted to the model and performance was not affected.

●Deployment: used Flask to build an API used to display results on a website for stakeholders to view.

Senior Data Scientist at Humanity and Inclusion

Houston, Texas Dec 2014 – Nov 2017

Tasked with supporting the monitoring and Evaluation officer with collecting data on refugees for analysis. By applying a logistic regression model, the refugee population would be classified based on their immediate need of attention (i.e., need for immigration support from IOM, need for food from WFP, need for medical support from Handicap International etc.). The model was deployed in an AWS EC2 instance, and the resulting analysis was stored in a MySQL Server. Stakeholders accessed the data through

●Database Management: designed and built a MySQL data to store and manipulate the collected data.

●Data Collection: Manually collected the data and entered into the built MySQL database

●Data Cleaning, Imputation, Tokenizing: used Python libraries (Pandas, NLTK, Numpy, Keras) to clean and prepare the data for analysis.

●Statistical model exploration (Naive Bayes Model, Logistic regression): tested multiple models to determine how well the outcome would perform given different techniques.

●Hyperparameter tuning: used Keras HyperModel to conduct some feature engineering and improve the model's performance.

●Model Deployment/implementation: deployed the model on an AWS EC2 instance and built a web API for stakeholders to gain access and view the results.

●Monitoring and Evaluation: performed routine checks on the model and checked its performance against new data and evaluated whether it needed to be fine-tuned or overhauled for a new model.

●Project Coordination: managed a team of two data officers who assisted in collecting the data. Provided direction on how to organize the data and feed it to the MySQL database.

●IT support: provided employees with ICT support (i.e., software installations, hardware maintenance, troubleshooting computer related issues, etc.).

Data Scientist at Saerbeck Municipality

Saerbeck, Germany Jun 2012 – Nov 2014

Worked on a deep learning solution using an LSTM model that regulated power output with demand fluctuation funded by Shell and the European Union. A time series analysis shows recurring peaks during certain hours of the day; however, this does not always satisfy demand when irregular events such as soccer games, farmers market, tourists, etc., suddenly increase demand for power. By accommodating long term and short-term data, the LSTM model predicted demand given current events to provide efficient management and regulation of clean energy

●Database Management: designed and built a MySQL data to store and manipulate the collected data.

●Data Collection: data was collected through a digital IOT device that would then store all the information in the MySQL database.

●Data Cleaning, Normalization, Imputation, Tokenizing: used Python libraries (Pandas, NLTK, Numpy, Tensorflow, Keras) to clean and prepare the data for analysis using models built through Keras.

●Statistical model exploration (ANN, RNN, LSTM): experimented on different statistical models to test how performance fluctuated in different approaches.

●Feature Engineering: used Random search by Keras to conduct some parameter tuning and make sure the model was at its best performance at all times.

●Model Evaluation and Adaptation: did scheduled checks to confirm the model was not drifting with new data. Would tune the model or overhaul it and build a new one if need be.

●Public Relations: worked with the marketing department to inform the general public about the renewable energy project and its financial, economic and environmental benefits to everyone who was a part of the program.

Lead Data Scientist at Centre of Expertise Water Technology

Leeuwarden, Netherlands Aug 2010 – April 2012

Designed an artificial neural network for the local wastewater plant. The ANN model was fed data such as weather, water salinity, pH levels, etc.from the pretreatment phase and analyzed this information to predict whether bacteria and mangroves needed an intervention to keep them efficient and running all year round.

●Statistical model exploration (Logistic regression, ANN): conducted tests on more than one model to evaluate the best solution for this project

●Database management: constructed a MySQL database to define, manipulate and manage the acquired data for this project.

●Data Collection: collected the data using digital meters that directly fed their information to the database. Some of the data was manually collected and entered into the database

●Data Preprocessing: used python libraries (Pandas, NLTK, Numpy, Tensorflow, Keras) to clean and prepare the data for analysis using the ANN model.

●Feature Engineering: used Keras HyperModel tuner to fine tune the model and increase performance.

●Model Deployment: deployed the model using an AWS EC2 instance. Stakeholders could view real time results through a web API

●Project Coordination: managed a team of three undergraduate students who assisted in collecting the data and storing it in the MySQL database.

Education

Master of Science in Environment and Energy

University of Twente

Enschede, Netherlands

Bachelor of Science in Actuarial Science

University of Nebraska Lincoln

Lincoln, Nebraska

Bachelor of Science in Information technology

United States International University

Nairobi, Kenya

Certifications

Google Cloud Platform

●Machine Learning Models: https://google.qwiklabs.com/public_profiles/31d72a3b-5679-4f40-9c77-952a0cadbb08

●Deploy on Kubernetes: https://google.qwiklabs.com/public_profiles/31d72a3b-5679-4f40-9c77-952a0cadbb08

●Explore ML and AI: https://google.qwiklabs.com/public_profiles/31d72a3b-5679-4f40-9c77-952a0cadbb08

●Data insights with BigQuery: https://google.qwiklabs.com/public_profiles/31d72a3b-5679-4f40-9c77-952a0cadbb08



Contact this candidate