data scientist

Location:

Houston, TX, 77001

Salary:

Posted:

July 07, 2022

Contact this candidate

Resume:

BABATUNDE A

Email: *****@******************.***

Contact: 469-***-****

Professional Experience:

7+ years of rich experience in Machine Learning Algorithms, Data Mining techniques, Natural Language processing

Experienced scrum master and agile certified professional.

Worked on end-to-end basis from gathering business requirements, pulling the data from different data sources, data wrangling, implementing machine learning algorithms, deploying models, and presenting end results to clients.

Mined and analyzed huge datasets using Python and R languages. Created an automated data cleansing module using supervised learning model in python.

Worked with different data set manageable packages like Pandas, NumPy, SciPy, Keras etc

Implemented various statistical tests like ANOVA, A/B testing, Z-Test, T-Test for various business cases.

Worked with various text analytics libraries like Word2Vec, GloVe etc.

Knowledge in Seq2Seq models, Bag of Words, Beam Search, and other natural language processing (NLP) concepts.

Experienced with Hyper Parameter Tuning techniques like Grid Search, Random Search.

Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, dB scan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest

Knowledge in PostgreSQL and Unix Shell Scripting. Designed and developed wide variety of PostgreSQL modules and shell scripts with maximum optimization

Commendable knowledge in SQL and relational databases (Oracle, SQL Server, gp admin)

Worked with MicroStrategy visualization to create business reports with key KPIs

Experienced with DevOps tools like Docker, Container, Jenkins.

Worked with DevOps teams to help them in deployment by writing python code for custom logics to achieve Infrastructure as code concept.

Technical Skills:

Database Management

MySQL, Entity relationship Diagrams (ERD)

Languages

Python, R

Machine Learning

Techniques

Regression: Linear, Polynomial, Support Vector, Decision Trees

Classification: Logistic Regression, K-NN, Naïve Bayes, Decision Trees, Support Vector Machines

Clustering: K-means, Hierarchical

Deep Learning: Artificial Neural Networks, Convoluted Neural Networks, Recurrent Neural Networks

Dimensionality Reduction: Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA)

Ensemble Learning: Random Forest, Bagging, Boosting, Stacking

Natural Language Processing (NLP): Tokenization, Part-of-speech tagging, Parsing, Stemming, Lemmatization, Named Entity Recognition, Semantic Analysis, Sentiment Analysis, Latent Dirichlet Allocation

Time-series Forecasting: Multiplicative & Additive decomposition, exponential

smoothing, winter multiplicative model, AR, MA, ARMA, ARIMA, SARIMA

Data Visualization

Tableau, Microsoft Power BI; R: ggplot2 and Plotly; Python: Matplotlib, Seaborn

Professional Experience:

Client: USAA, San Antonio, TX Feb 2022 - Present

Role: Data Scientist – Machine Learning

Responsibilities:

Created an Automated Ticket Routing algorithm for the support team using Natural Language processing and other machine learning algorithms.

Analyzed and significantly reduce customer churn using machine learning to streamline risk prediction and intervention models.

Worked with K-Means, K-Means++ clustering and Hierarchical clustering algorithm to sort of the customer classification.

Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, Dbscan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest

Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.

Worked with PCA (Principle Component Analysis), LDA (Linear Discriminant Analysis) and other dimensionality reduction concepts on various classification problems on various linear models.

Worked with sales forecast and campaign sales forecast models such as ARIMA, Holt-Winter, Vector Autoregression (VAR), Autoregressive Neural Networks (NNAR).

Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM) and re-enforcement learning to prevent the retail fraud.

Worked with ETL developers to increase the data inflow standards using various preprocessing methods.

Worked with Survival Analysis for customer dormancy rates, periods, and inventory management.

Created a customer service upgrade which is an automated chatbot to better assist the online customers using text classification and knowledgebase.

Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.

Deep knowledge of a scripting and statistical programming language like python. Advanced SQL ability to efficiently work with very large datasets. Ability to deal with non-standard machine learning datasets.

Built visualizations to facilitate research into the Human Connectome Project data and identify on the anatomical and functional connectivity within the healthy human brain, as well as brain disorders such as dyslexia, autism, Alzheimer's disease, and schizophrenia.

Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables.

Develop data preprocessing pipelines using Python, R, Linux scripts on on-premise High-performance cluster and AWS, GCP cloud VMs.

Model development, evaluation using machine learning methods such as KNN, deep learning on MRI data

Client: Visa, Austin, TX Jan 2020– Dec 2021

Role: Data Scientist

Responsibilities:

Assisted Business Analyst and Data Scientists in Data preprocessing: Data Preparation, Data cleaning masking, Data Analysis, Data profiling

Create a classification model to reduce the false alerts generated by the existing anti-money laundering & fraud detection system by 35 %.

Successfully upgraded Informatica from version 9.6.1 HF2 to 10.1.2

Worked with claim classification models to reduce the different workloads for the Core Operations team.

Implemented CNN model to go through various documents coming from downstream to identify set of images for claims department.

Explored and created different new data sets to work with and implement few data science workflow platforms for future applications.

Designed and implemented workflow methodologies for the claim predictions API and involved in ETL component creations to pull the required data.

Created various models like SVM with RBF kernel, Multi Perceptron Neural Network, KNN, Lasso, Ridge, Elastic net Regression models.

Worked with K-fold cross validation and other model evaluation techniques throughout different projects.

Worked with text extraction modules such as Tesseract to extract text from various documents and process the text with NLTK.

Set up process for Data archival to free up 24 TB space in production environment and improve the performance.

Refined the design of data quality design leading to 20% faster batches and added new data quality checks.

Worked on Backlog Management in JIRA as scrum master and capacity management for EM team.

Working together with Client Management to Define and Design the cloud migration strategy for the department

Groomed 14 data engineers in Informatica and lead the enterprise memory team

Client: AB Microfinance Bank, Nigeria May 2017 to Oct 2019

Role: Data Scientist/Analyst

Responsibilities:

Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.

Designed, built, and deployed a set of Python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction.

Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models.

Segmented the customers based on demographics using K-means Clustering.

Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau and Power BI. Also used R to generate regression models to provide statistical forecasting.

Used big data tools Spark (PySpark, Spark SQL, Mllib) to conduct real time analysis of loan default based on AWS.

Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server.

Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predict residency.

Collected data needs and requirements by Interacting with the other departments

Client: Ikarodu Healthcare, Nigeria Feb 2015 to Apr 2017

Role: Data Scientist/Analyst

Responsibilities:

Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.

Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS.

Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.

Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.

Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.

Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.

Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams, and implemented the techniques for priority projects.

Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.

Generated complex calculated fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements.

Implemented visualizations and views like combo charts, stacked bar charts, pareto charts, donutcharts, geographic maps, spark lines, crosstabs etc.

Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.

Environment:

-Experience in using AWS and Azure environments

-Proficient experience in building models on Python

-Expertise in R

-Hands on with SQL and ALTERYX

-Python

-Spark

Education: MSc, Statistics, Sam Houston State University, Huntsville, Texas, USA.

BSc, Mathematics & Statistics, University of Lagos, NIGERIA.

Contact this candidate