Post Job Free

Resume

Sign in

Data scientist/Data analyst

Location:
Austin, TX
Salary:
65
Posted:
March 05, 2021

Contact this candidate

Resume:

PROFESSIONAL SUMMARY:

Data Scientist/ Data Science Professional with 8+ years of experience in Acquisition of correct Datasets, Data Scrubbing to mine the target data, Data Engineering to extract features utilizing Statistical Techniques, Exploratory Data Analysis with an inquisitive mind, build diverse Machine Learning Algorithms for developing Predictive Models and design Stunning Visualizations to help the growth of Business Profitability.

Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python and R.

Proficient in identifying trends and discovering insights from high-dimensional data sets using a variety of supervised and unsupervised algorithms.

Extensive work experience on Python and R stack includes libraries such as sci-kit learn, pandas, numpy, scrapy, dplyr, ggplot2, seaborn, matplotlib etc.

Experience in Deep Learning using libraries such as Theano, TensorFlow and Keras.

Hands on experience in working with different Amazon Web Service (AWS) services such as EC2, S3, CloudWatch, VPC, CloudFront, Cloud Formation, Amazon Redshift, Amazon EMR and Amazon SQS.

Expertise in Machine Learning models like Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, clustering (K-means, Hierarchical), Bayesian.

Experience deploying models using Azure ML Studio and familiarity with Spark and Hadoop- Map Reduce, Statistical Inference, Machine Learning, Applied Machine Learning, Data Visualization, Big Data, Time Series, Categorical Data Analysis, Database Design .

Experience in Data manipulation, wrangling, model building and visualization with large data sets.

Skilled at performing Data Extraction, Data Screening, Data Cleaning, Data Exploration, Data Visualization and Statistical Modelling of varied datasets, structured and unstructured, as well as implementing large-scale Machine Learning and Deep Learning Algorithms to deliver resourceful insights.

Hands on experience in the entire Data Science project life cycle, including Data Acquisition, Data Cleaning, Data Manipulation, Data Mining, Machine Learning Algorithms, Data Validation and Data Visualization.

Knowledge in Deep learning and Artificial Neural Networks such as Convolution Neural Networks, Recursive Neural Networks and Recurrent Neural Networks.

Experienced in Machine Learning techniques such as Regression and Classification models like Linear Regression, Logistic Regression, Decision Trees, Support Vector Machine using scikit-learn on Python.

In-depth Knowledge of Dimensionality Reduction (PCA, LDA), Hyper-parameter tuning, Model Regularization (Ridge, Lasso, Elastic net) and Grid Search techniques to optimize model performance.

Skilled at Python, SQL, R and Object Oriented Programming (OOP) concepts such as Inheritance, Polymorphism, Abstraction, Encapsulation.

Working knowledge of Database Creation and maintenance of Physical data models with Oracle, DB2 and SQL server databases as well as normalizing databases up to third form using SQL functions.

Experience in Web Data Mining with Python's ScraPy and BeautifulSoup packages along with working knowledge of Natural Language Processing (NLP) to analyze text patterns.

Skilled in Big Data Technologies like Apache Spark (PySpark, Spark Streaming, MLlib), Hadoop Ecosystem (MapReduce, HDFS, HIVE, Kafka, Ambari).

Proficient in Ensemble Learning using Bagging, Boosting (AdaBoost, xGBoost) & Random Forests; clustering like K-means.

Experience in developing Supervised Deep Learning algorithms which include Artificial Neural Networks, Convolution Neural Networks, Recurrent Neural Networks, LSTM, GRU and Unsupervised Deep Learning Techniques like Self-Organizing Maps (SOM's) in Keras and TensorFlow.

Experience of building machine learning solutions using PySpark for large sets of data on Hadoop ecosystem.

Experience in Build and deployed recurrent neural network architecture called LSTM in one of the projects to improve the accuracy of the model, also have knowledge of Deep Learning approaches such as traditional Artificial Neural Network and Convolutional Neural Network.

Skilled at Data Visualization with Tableau, PowerBI, Seaborn, Matplotlib, ggplot2, Bokeh and interactive graphs using Plotly& Cufflinks.

Knowledge of Cloud services like Amazon Web Services (AWS) and Microsoft Azure ML for building, training and deploying scalable models.

Highly proficient in using T-SQL for developing complex Stored Procedures, Triggers, Tables, Views, User Functions, User profiles, Relational Database Models and data Integrity, SQL joins and query Writing.

Proficient in using PostgreSQL, Microsoft SQL server and MySQL to extract data using multiple types of SQL Queries including Create, Join, Select, Conditionals, Drop,Case etc.

Hands-on experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Decision Tree (CART), Random Forest, SVM, K-Nearest Neighbors, Naïve Bayes, K-means Clustering, Principal Components Analysis and more.

Professional knowledge of deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).

Experienced in Data Integration, Validation and Data Quality controls for ETL process and Data Warehousing using Informatica, MS Visual Studio, SSMS, SSIS and SSRS.

Proficient in model validation and optimization using k-fold cross validation, ROC curve, confusion matrix and F1 score.

Strong skills in Statistics methodologies such as hypothesis testing, ANOVA, Monte Carlo simulation, principle component analysis and correspondence analysis, ARIMA time series analysis, structural equation model.

Excellent experience in Python, with packages pandas, numpy, datetime, matplotlib, seaborn, scikit-learn, scipy,statsmodels, PySpark to apply data cleaning, data manipulation, data mining, machine learning, data validation data visualization.

Strong computational background (complimented by Statistics/Math/Algorithmic Expertise), solid understanding of machine learning algorithms, and with a love for finding meaning in multiple imperfect, mixed, varied, and inconsistent data sets.

Knowledge and experience of extracting information from text data using Natural Language Processing (NLP) methods such as Bag of Words, Sentiment Analysis, TF-IDF, Topic Modeling using LDA .

Solid ability to write and optimize diverse SQL queries, proficiently using MySQL, Oracle, Access and SQL Server.

Ability in effectively organize and manage multiple assignments with excellent analytical and problem-solving skills.

EDUCATION

UNIVERSITY OF NEW HAMPSHIRE, Durham, NH

Master of Science – Analytics

UNIVERSITY OF LAGOS, Lagos, NG 2009

Bachelor of Science - Mechanical Engineering

TECHNICAL SKILLS

Languages

Java 8, Python, R

Python and R

Numpy, SciPy, Pandas, Scikit-learn, Matplotlib, Seaborn, ggplot2, caret, dplyr, purrr, readxl, tidyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2

Algorithms

Kernel Density Estimation and Non-parametric Bayes Classifier, K-Means, Linear Regression, Neighbors (Nearest, Farthest, Range, k, Classification), Non-Negative Matrix Factorization, Dimensionality Reduction, Decision Tree, Gaussian Processes, Logistic Regression, Naïve Bayes, Random Forest, Ridge Regression, Matrix Factorization/SVD

NLP/Machine Learning/Deep Learning

LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML

Cloud

Google Cloud Platform, AWS, Azure, Bluemix

Web Technologies

JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Data Modeling Tools

Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies

Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Databases

SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools

Informatica Power Centre, SSIS.

Version Control Tools

SVM, GitHub

BI Tools

Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Operating System

Windows, Linux, Unix, Macintosh HD, Red Hat

PROFESSIONAL SUMMARY:

Kaplan Test Prep - New York, NY Nov 2019 – Till Date

Data Scientist

Kaplan, Inc. is an American for-profit corporation that provides educational services to colleges and universities and corporations and businesses, including higher education programs, professional training and certifications, test preparation and student support services.

Responsibilities:

Implement statistical models and apply machine learning tools to evaluate and improve educational experiences within Kaplan’s products.

Collaborate iteratively with stakeholders to align business needs with data collection, evaluation, analysis and presentation.

Participated in all phases of data mining, data collection, data cleaning, developing models, validation and visualization to deliver data science solutions.

Developed advanced analytical models and computational solutions using large-scale data manipulation and transformation, statistical analysis, machine learning, visualization.

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Performed EDA to gain insight into business problems.

Applied several machine learning/deep learning models that takes data and make recommendations based on the model prediction.

Created end to end machine learning pipelines which take data from a database, perform transformation and pre-processing on the data either on query level or after query, get model predictions, writes results to a table and alert users if necessary.

Generate reports to meet regulatory requirements.

Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.

Experience in Deep Learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK and Keras to help our customers build DL models.

Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.

Performed data imputation using Scikit-learn package in Python.

Apply appropriate experimental design methodologies to answer product questions.

Rockefeller Capital Management- New York, NY May 2017 - Oct 2019

Data Scientist

Rockefeller Capital Management is a full-service financial services firm that grew out of the family office for the storied Rockefeller family. The firm caters to families managing intergenerational wealth, and it specializes in investments selected through an environmental, social and governance (ESG) lens. It currently oversees more than $5.4 billion in assets under management (AUM).

Responsibilities:

Used machine learning and intuitive design to build powerful tools that are grounded in value and simplicity to enable investors make smarter decisions about their money.

Executed data processing including statistical analysis, variable selection, dimensionality reduction and custom attribute engineering, as well as the evaluation of new data sources.

Performed time sensitive ad hoc analyses on a broad range of challenges across the company.

Created detailed models to evaluate consumer behavior with a view to streamlining efficiency of business processes .

Developed technical competence within the company in the areas of data science and customer experience, resulting in increased employee productivity and cost savings to the company.

Developed stock trading models to identify stock trading signals in real time.

Analyzed large stock datasets, performed quantitative analysis of stock features to develop the trading algorithms.

Used Pandas, Numpy, Scipy, Matplotlib, Sci-kit-learn and NLTK in Python for developing various machine learning algorithms.

Tuned the models using Machine learning algorithms Bayes point, logistic regression, decision tree and neural network models for good accuracy and deploy prediction models and test on the test data.

Visualize, interpret, report findings, and develop strategic uses of data by Python Libraries like Numpy, Scikit-learn, Matplotlib, Seaborn.

Used Python 3.X (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R (caret, trees, arules) to develop variety of models and algorithms for analytic purposes.

Performed Data Cleaning, features scaling, features engineering.

Missing value treatment, outlier capping and anomalies treatment using statistical methods, deriving customized key metrics.

Performed analysis using industry leading text mining, data mining and analytical tools and open source software.

Applied natural language processing (NLP) methods to data to extract structured information.

Implemented deep learning algorithms such as Artificial Neural network (ANN) and Recurrent Neural Network (RNN), tuned hyper-parameter and improved models with Python packages TensorFlow.

Evaluated models using Cross Validation, ROC curves and used AUC for feature selection.

Dummy variables where created for certain datasets to into the regression.

Creating data pipelines using big data technologies like Hadoop, spark etc.

With the aim to predict stock performance, looked at individual stocks and their corresponding subreddits. Found common recurring topics and the changing public sentiment of those topics over time.

Used clustering to separate post titles into groups, looked at sentiment towards a post by looking at upvote to downvote ratio and used XGBoost and Logistic Regression as a model to predict stock market.

Developed a visualization tool to automate generate stock transaction reports with stock and performance charts.

Worked with Python to create backtesting simulation for algorithm performance that was emulated in Power BI for visualization.

Extensive work with SQL for data manipulation and applying analytical inferences to provide business core needs .

Substantial use of Excel, especially using Pivot Tables to present results and derive conclusions, particularly with finance and client facing.

Presented results and reporting to high board members, particularly with visual tools like Tableau.

Responsible for implementing the company's KPI reporting and passing it around through high board and other hierarchy members.

Trained and mentored employees from international offices in monitoring business systems and applications.

Integrated new non-traditional datasets with incremental predictive power into existing investment management processes.

Santander - Boston, MA Jan 2016 - May 2017

Sr Data Scientist

The project was to build machine learning models using Python for the credit risk modeling to predict the probability of Loan default (PD), Loss Given Default (LGD) and Pre provision net revenue(PPNR) to facilitate the development of Loan assessment criteria.

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Exported the data set from Hive to MySQL using Sqoop after processing the data.

Used Python packages for the building Machine Learning models .

Conducted multivariate analysis on the data to identify underlying patterns and relationships within the data.

Performed Exploratory data analysis using pandas and NumPy packages in Python 3. x.

Designed and completed Tableau dashboards to illustrate analytical results of dataset.

Used Scikit-Learn and statsmodels to develop classification algorithms such as Logistic Regression, Random forest, support vector machine models that help in decision making.

Used Scikit-Learn and statsmodels to develop regression algorithms such as Linear Regression, Decision Tree, Random Forest models that help in decision making.

Performed data pre-processing and feature engineering for further predictive analytics using Python.

Boosted the performance of regression models by applying polynomial transformation and feature selection.

Generated report on predictive analytics using Matplotlib and Seaborn including visualizing model performance and prediction results.

Worked in Power BI environment to create weekly, monthly, daily reports and publish them to server.

Evaluated performance of classification algorithms such as Logistic Regression, Decision trees and Random Forest using Pandas and Scikit-learn using k-fold cross validation.

Implemented, tuned and tested the model on AWS EC2 with the best algorithm and parameters.

Prominent Securities Limited - Lagos, NG May 2010 - Oct 2014

Python Developer/Data Analyst

Prominent Securities Limited was borne out of the perceived need to enhance the accessibility of industry, commerce and individuals to capital, whether loan or equity and high-grade investments. Their clientele base spans the entire spectrum of the investing public, ranging from individual clients to corporate and institutional investors.

Responsibilities:

Managed and executed the successful delivery of statistical models and reporting tools to meet business needs.

Analyzed historical policies that were scored based on whether they were priced too low/too high resulting in more/less profit. Created a predictive algorithm for the score.

Efficiently interpreted results and communicated findings and potential value to managers and business partners.

Performed data pre-processing and feature engineering for further predictive analytics using Python.

Performed data collection, data cleaning, data profiling, data visualization and report creating.

Performed data cleaning on the datasets, which had missing data and extreme outliers from PySpark data frames and explored data to draw relationships and correlations between variables.

Setup database in AWS using RDS and configuring backups for S3 bucket .

Implemented data pre-processing using Scikit-Learn. Steps include Imputation for missing values, Scaling and logarithmic transform, one hot encoding etc.

Integrated solutions within existing business processes using automation techniques.

Managed complex big data projects: data exploration, model building, performance evaluation and testing.

Interfaced with various stakeholders across multiple business units understanding their requirements and developing and/or enhancing existing solutions to meet their needs.



Contact this candidate