Data Science

Location:

Florham Park, NJ

Posted:

July 08, 2021

Contact this candidate

Resume:

******.***@*****.***

732-***-****

TECHNICAL SKILLS

COMMUNICATION SKILLS verbal, written, presentations

LEADERSHIP supports project goals, business use case and mentors team

QUALITY continuous improvement in project processes, workflows, automation and ongoing learning and achievement

CLOUD Analytics in cloud-based platforms (AWS, MS Azure, Google Cloud)

ANALYTICS Data Analysis, Data Mining, Statistical Analysis, Multivariate Analysis, Stochastic Optimization, Linear Regression, ANOVA, Hypothesis Testing, Forecasting, ARIMA, Sentiment Analysis, Predictive Analysis, Pattern Recognition, Classification, Behavioral Modeling

PROGRAMMING LANGUAGES Python, R, SQL, Scala, Java, MATLAB, C, SAS, F#

LIBRARIES NumPy, SciPy, Pandas, Theano, Caffe, SciKit-learn Matplotlib, Seaborn, TensorFlow, Keras, NLTK, PyTorch, Gensim, Urllib, BeautifulSoup4, MxNet, Deeplearning4j, EJML, dplyr, ggplot2, reshape2, tidyr, purrr, readr

DEVELOPMENT Git, GitHub,Bitbucket, SVN, Mercurial, PyCharm, Sublime, JIRA, TFS, Trello, Linux, Unix

DATA EXTRATION AND MANIPULATION: Hadoop HDFS, Hortonworks Hadoop, MapR, Cloudera Hadoop, Cloudera Impala, Google Cloud Platform, MS Azure Cloud, both SQL and noSQL, Data Warehouse, Data Lake, SWL, HiveQL, AWS (RedShift, Kinesis, EMR, EC2)

MACHINE LEARNING Supervised Machine Learning Algorithms (Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees and Random Forests, Naïve Bayes Classifiers, K Nearest Neighbors), Unsupervised Machine Learning Algorithms (K Means Clustering, Gaussian Mixtures, Hidden Markov Models, D), Imbalanced Learning (SMOTE, AdaSyn, NearMiss), Deep Learning Artificial Neural Networks, Machine Perception

APPLICATIONS: Recommender Systems, Predictive Maintenance, Forecasting, Fraud Prevention and Detection, Targeting Systems, Ranking Systems, Deep Learning, Strategic Planning, Digital Intelligence,

WORK EXPERIENCE

BANK OF NY MELLON November 2019 - Present

Data Scientist Florham Park, NJ

Worked with the Identity Management Team within the Information Security Division to develop self-service tools for internal employees. Worked to establish cloud controls for identity governance and assess risks associated with cloud service providers.

Programmed solutions using Python libraries such as numpy and pandas.

Use machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management and accuracy

Managed version control set up for the phantom platform using Git.

Setted up a playbook for events and classification containers.

Developed an app called risk hub and moved it to production deployment using Django.

Contributed to design and prototyping of medium to high complexity machine learning systems

Work with product managers to formulate the data analytics problem

In charge of cleaning and debugging datasets and the codebase before applications reach QA.

Used RASA to build a POC web app to show chat bot usage

Provided recommendations for controls for implementation of IAM on the cloud.

Perform analysis of user profiles and current application entitlements based on user profiles, organization, departments and groups.

Built a recommending system to auto-provisioning applications and platform access to new employees/contractors so they are productive as soon as they onboarded

Entitlement Analytics

Data Analysis and Reporting

Developed a Recommendation Engine of Entitlements and Applications

Performed Evaluation of on-prem and cloud controls.

Benchmarked on-prem identity management system vs cloud identity management systems

Responsible of presenting findings to stakeholders.

Selected and built dashboards to for internal usage

Familiar with Machine Learning modeling using python and frameworks like Tensorflow

Implemented and cleans datasets for network accesses based on user profiles

Randalls Food and Stores May 2018-November 2019

Data Scientist Houston, TX

Worked on a sales forecasting project for a using an artificial neural network developed in PyTorch along with Facebook’s Prophet model. I performed data cleaning in Python on a large dataset including several years’ worth of data across different departments in dozens of stores and produced highly accurate forecasts for each store and department.

Created a model using Facebook Prophet to produce highly accurate predictions of a weekly sales

Evaluated model performance on large dataset (multiple years of daily data for dozens of departments per store and dozens of stores)

Deployed model created highly accurate 6-month forecasts up to 6 months in advance for every store and department.

Worked in a Cloudera Hadoop environment using Python, SQL, and Tableau

HDFS (Cloudera): Pulled data from Hadoop cluster.

Worked within the Enterprise Applications team as a Data Scientist.

Used Python, Pandas, NumPy, and SciPy for exploratory data analysis, data wrangling and, feature engineering.

Used Tableau and TabPy for visualization of analyses.

Worked along with Business Analyst, Data Analyst, and Data Engineers.

Consulted with various departments within the company including, SIU and Safety.

Managed and matched claim numbers into fraud cases.

Cleaned fraud data to be joined with the claims data (~73k observations)

Research and Assess the Fraud Predictive Analytics scenario in terms of predicting final outcomes for new claims

Create a Tableau Dashboard that will help SIU in present their Annual Report

Tried kernel density estimation in lower dimensional space as a feature to predict fraud.

Testing Anomaly Detection Models such as Expectation Maximization, Elliptical Envelope, and Isolation Forest.

Multivariate analysis of safety programs from the last 10 years.

Used regression to determine the correlation of participation in the safety program with outcome of claims.

Hypothesis testing and statistical analysis was done to determine statistically significant changes in claims after participating in the safety program.

Presented findings of impact testing.

Workers Compensation fraud detection

Prepared data for exploratory analysis

Engineering actuarial formulas

Collaborated with other Data Scientist with use cases that included workplace accident prediction and sentiment analysis.

Technologies: Cloudera Hadoop, Python, SQL, and Tableau, Hadoop HDFS, Pandas, NumPy, and SciPy, TabPy, Data Modeling, Multivariate analysis, Regression Analysis, Hypothesis Testing, Exploratory Analysis, Sentiment Analysis, Predictive Analytics.

Omnicare Inc May 2017-April 2018

Data Scientist/NLP Engineer Stafford, TX

Worked with NLP to classify text with data draw from a big data system. The text categorization involved labeling natural language texts with relevant categories from a predefined set. One goal was to target users by automated classification. In this way we could create cohorts to improve marketing. The NLP text analysis monitored, tracked and classified user discussion about product and/or service in online discussion. The machine learning classifier was trained to identify whether a cohort was a promoter or a detractor. Overall the project improved marketing ROI and customer satisfaction.

Oversaw the entire production cycle to extract and display metadata from various assets developing a report display that is easy to grasp and gain insights.

Performs NLP preprocessing in Python using libraries such as NLTK.

Collaborated with both the Research and Engineering teams to productionize the application.

Assisted various teams in bringing prototyped assets into production.

Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.

Utilized MapReduce/PySpark Python modules for machine learning & predictive analytics on AWS.

Implemented assets and scripts for various projects using R,Java and Python

Built sustainable rapport with senior leaders.

Developing and maintaining Data Dictionary to create metadata reports for technical and business purposes.

Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators.

Keeping up to date with latest NLP methodologies by reading 10 to 15 articles and whitepapers per week.

Extracting the source data from Oracle tables, MS SQL Server, sequential files and Excel sheets.

Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.

Involved in defining the source to target data mappings, business rules, and data definitions.

Project environment was AWS and Linux.

Technologies Used: Python, R, Java, Kubernetes, Docker, ELK Stack (ElasticSearch, Logstash, Kibana), AWS Comprehend

TGS-Nopec February 2014-April 2017

Data Scientist Houston, TX

TGS-Nopec is a publicly traded company listed in the Norwegian Stock Exchange with global headquarters in Houston. The primary occupation of the company is to perform exploration studies for the oil and gas industry. Their principal products are data and insights for the oil and energy industries that include: multi-client geophysical data, multi-client geological data, imaging services and reservoir solutions, data & analytics, machine-learning solutions, well performance insights, etc. At TGS I performed a number of statistical studies including well performance and drilling optimization using deep neural nets.

Application of data mining techniques and optimization techniques in B2B and B2C industries and Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.

Utilized PySpark Python modules for machine learning & predictive analytics in Hadoop on AWS.

Predictive modeling using state-of-the-art methods.

Implemented advanced machine learning algorithms utilizing caffe, TensorFlow, Scala, Spark, MLLib, R and other tools and languages needed.

Programming, and scripting in R, Java and Python.

Developed Data Dictionary to create metadata reports for tec hnical and business purpose.

Built reporting dashboard on the statistical models to identify and track key metrics and risk indicators.

Performed Boosting method on predicted model for the improve efficiency of the model.

Extracted source data from Amazon Redshift on AWS cloud platform.

Parsed and manipulated raw, complex data streams to prepare for loading into an analytical tool.

Explored different regression and ensemble models in machine learning to perform forecasting

Developed new financial models and forecasts.

Improved efficiency and accuracy by evaluating models in R.

Involved in defining the source to target data mappings, business rules, and data definitions.

Performing an end to end Informatica ETL Testing for these custom tables by writing complex SQL Queries on the source database and comparing the results against the target database.

TMD Staffing June 2012-January 2014

Data Analyst Katie, TX

Applied Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.

Implemented Event Task for executing an application automatically.

Involved in defining the source to target data mappings, business rules, and data definitions.

Assist in continual monitoring, analysis and improvement of AWS Hadoop Data Lake environment

Built and maintained dashboard and reporting based on the statistical models to identify and track key metrics and performance indicators.

Involved in fixing bugs and minor enhancements for the front-end modules.

Performed data mining and developed statistical models using Python to provide tactical recommendations to the business executives.

Integrated R into micro-strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.

Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with SciKit-Learn preprocessing.

Worked on outlier identification with Gaussian Mixture Models using Pandas, NumPy and matplotlib.

Adopted feature engineering techniques with 200+ predictors in order to find the most important features for the models. Tested the models with classification methods, such as Random Forest, Logistics Regression and Gradient Boosting Machine, and performed hyper-parameter tuning to optimize the models.

University of Port Harcourt Sept 2009-January 2012

IT Associate Rivers State, Nigeria

Assisted users in initiating services.

Experience with Microsoft Exchange Migration

Open Directory and Dot Net framework.

Implemented Event Task for executing an application automatically.

Involved in defining the source to target data mappings, business rules, and data definitions.

Assisted in basic computer repair/reformat.

EDUCATION

Higher National Diploma (Bachelor Equivalent): Petroleum Engineering Technology

University of Port Hartcourt, Rivers State, Nigeria

Master of Science: Business Analytics

Merrimack College, North Andover, Massachusetts

Certificate, Petroleum Data Technology

Lone Star College, Cypress, TX

Chidi O.

DATA SCIENTIST

Phone: (000-***-**** Email: ****.*********@*****.***

SUMMARY

ABOUT ME

9 Years in Data Science

12 years in Information Technology

Expertise in Machine Learning, Deep Learning, Convoluted Neural nets

Projects involving NLP, NLU, Text Mining, Predictive Analytics, Artificial Intelligence

Techniques big data structure and unstructured

Extensive exposure on analytics project life cycle CRISP-DM (Cross Industry Standard Process for Data Mining) and web applications using SCRUM methodologies.

Use machine learning to advance systems such as product recommendations, search ranking and relevance, image attribution, demand routing, fit recommendations, inventory forecasting, threat modeling, etc.

Business understanding, Data understanding, Data preparation, Modeling, Evaluation and Deployment.

Experienced in practical application of data science to business problems to produce actionable results.

Experience in Natural Language Processing (NLP), Machine Learning & Artificial Intelligence.

Experience with AWS cloud computing, Spark (especially AWS EMR), Kibana, Node.js, Tableau, Looker.

Able to incorporate visual analytics dashboards.

Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction

Knowledge on Apache Spark and developing data processing and analysis algorithms using Python.

Programming in Java, Python and SQL queries.

Use of libraries and fraemworks in Machine Learning such as NumPy, SciPy, Pandas, Theano, Caffe, SciKit-learn Matplotlib, Seaborn, Theano, TensorFlow, Keras, NLTK, PyTorch Gensim, Urllib, Beautiful Soup).

Experience working in industrial or manufacturing environments around Operations Analytics, Supply Chain Analytics, and Pricing Analytics.

Ability with algorithms, data query and process automation.

Evaluation of datasets and complex data modelling.

Contact this candidate