Data Scientist Analyst

Location:

United States

Posted:

January 19, 2023

Contact this candidate

Resume:

Bhanu Chandar Rangenine

Lead Data Scientist

Phone: +1-614-***-****

Email: *****@*****.***

LinkedIn: www.linkedin.com/in/bhanu-chandar-rangenine-84683115a

Summary:

Experienced Data Scientist with over 10 years’ experience in Data Extraction, Data Modeling, Data Wrangling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.

Domain knowledge and experience in Telecom, Banking and Financial industries.

Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.

Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling, testing and validation and data visualization.

Proficient in Machine Learning algorithm and Predictive Modeling including Regression Models, Decision Tree, Random Forests, XGB, Sentiment Analysis, Naïve Bayes Classifier, SVM, Ensemble Models.

Proficient in Statistical Methodologies including Hypothetical Testing, ANOVA, Time Series, Principal Component Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis.

Proficient in Natural Language Processing (NLP),Text Mining, Spacy and Standford NER

Knowledge on time series analysis using AR, MA, ARIMA, GARCH and ARCH model.

Strong experience with Python (2.x,3.x) to develop analytic models and solutions.

Proficient in Python 2.x/3.x with SciPy Stack packages including NumPy, Pandas, SciPy, Matplotlib and I Python.

Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, HiveQL, Spark SQL, Py Spark.

Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.

Experience in building, publishing customized interactive reports and dashboards with customized parameters and user - filters using Tableau (9.x/10.x).

Experienced in Agile methodology and SCRUM process.

Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

Technical Skills:

Statistical Methods

Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning

Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method, Natural Language Processing (NLP)

Languages

Python (2.x/3.x), R, SAS, SQL, T-SQL

Data Visualization

Tableau, Matplotlib, Seaborn, ggplot2

Reporting Tools

Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services (SSRS)

Databases

MySQL, Postgre SQL, Oracle, HBase, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 R2/2008, Teradata

Operating Systems

PowerShell, UNIX/UNIX Shell Scripting (via PuTTY client), Linux and Windows

AWS

EC2, S3, Route 53, AWS CLI, Code pipeline, code deploy

Professional Experience:

CITI Bank, Tampa,FL June 2020 - Present

Lead Data Scientist

Project: Work List Manager-Optimizing Prediction Tool

Technologies: Python, NLP, Spacy, Flask, Oracle

Business objective: WLM-OPT predict the Low/High quality name patterns generated from work list.

Responsibilities:

Understand client’s requirements and objectives of the project

Identifying Business problem and converting the same into a data problem.

Processing, cleansing and verifying the integrity of data used for analysis in Python.

Extensively used Seaborn package for data visualization.

Converted some of the categorical columns to Boolean style of columns as majority of the data has one specific value.

Spacy is used for Feature Extraction technique and converted text to vectors from spacy vectorization

Applied various algorithms in Python and implementation of the same on the datasets.

Observed extreme gradient boosting technique performs better with accuracy, precision and F1 score metrics

Finally provided data insights and recommendations for the model.

JPMorgan Chase & Co, Tampa, FL Dec 2019 – June 2020

Lead Data Scientist

Project: Data Crawler

Technologies: Python, NLP, Spacy, Flask, AWS

Business objective: Data crawler is an AIML program to predict/generate schema from any system generated logs Crawler is developed on python platform using Spacy, crawler converts structured/semi-structured/unstructured logs into structured format

Responsibilities:

Understand client’s requirements and objectives of the project

Understanding the business problem and converting the same into a data problem

Daily discussion with management and client for smother transition of the project.

Processing, cleansing and verifying the integrity of data used for analysis in python

Developed python program for extracting system generated logs from kafka topics

Developed python connections to AWS S3 to Store the logs in S3 buckets

Text Mining, Predictive Modeling, statistical Modeling using logs

Applied Machine Learning algorithms/Advanced Analytics

Applied Spacy NER for finding entities from logs

Viteos Capital Market Services Ltd Jan 2019 – Dec 2019

Senior Data Scientist

Project: VU Rec Break Prediction

Technologies: Python, Flask, MongoDB

Business objective: Complex trades and positions. Smart reconciliations. Viteos’s reconciliation technology workflow ensures data is collected from all external sources—prime brokers, counterparties, FCM, custodians, administrators. Then it runs this data through Break Recommendation Engine—and predicts breaks

Responsibilities:

As the data is very huge with lot of missing data, applied various imputation techniques to impute the data

Processing, cleansing and verifying the integrity of data used for analysis in Python.

Involved in exploratory data analysis (EDA) for the given data set.

Extensively used Seaborn package for data visualization.

Converted some of the categorical columns to Boolean style of columns as majority of the data has one specific value.

Moderately used Feature Engineering techniques and converted many numerical to categorical and vice versa depending on the situation.

Applied various algorithms in Python and implementation of the same on the datasets.

Used Boosting and Bagging techniques to further improve the accuracy of the algorithm.

Applied Machine Learning algorithms/Advanced Analytics

Finally provided data insights and recommendations for the model.

Viteos Capital Market Services Ltd June 2018 – Dec 2018

Senior Data Scientist

Project: Distracted Driver Detection (image analytics)

Technologies: Python, Keras, Tensor Flow, CNN

Business objective: Client is a well-known insurance firm in US, one of the fastest growing companies in the Life Insurance sector. Now the company wanted to better insure their customers, by testing whether dashboard cameras can automatically detect drivers engaging in distracted behaviors.

Responsibilities:

Extensively used Convolution Neural Nets to identify the features of Distracted Driver.

Used the Data Augmentation by applying sheer, Zoom, rotation to generate more data and control the over fitting.

Classified the distracted driver by connecting the features with Feed Forward Neural nets.

Improved the performance of a service using the state of the art Convolution Neural nets

Built multiple pre-trained nets (ResNet, VGG16, DenseNet) and applied Ensemble’s for better accuracy.

Be part of core architecture team and tried with multiple pre-trained nets and tweaked the parameters for better accuracy.

Used Dropouts gracefully and controlled over fitting.

Stored the pre-trained weights in .h5 file for easy trails of different algorithms.

Century Link Nov 2016 – June 2018

Data Scientist

Project: Customer Churn Model

Technologies: R, SQL, Tableau, Oracle

Business objective: To define and communicate the stages through which a customer progresses when considering, purchasing and using products

Responsibilities:

Understand client’s requirements and objectives of the project

Identifying Business problem and converting the same into a data problem.

Processing, cleansing and verifying the integrity of data used for analysis in R.

Involved in exploratory data analysis (EDA) for the given data set.

Applied various data visualization techniques like base plot and ggplot for better data interpretation.

Applied various algorithms in R and implementation of the same on the datasets.

Daily and weekly call with management and client for smother transition of the project.

Text Mining, Predictive Modeling, statistical Modeling

Applied Machine Learning algorithms/Advanced Analytics

Century Link Sep2015 – Oct 2016

Data Scientist

Project: Propensity model for customer response mode

Technologies: R, SQL, Tableau, Oracle

Business objective: build a propensity model, who will respond for a product?

Responsibilities:

Phase1: Performed Exploratory Data Analysis, Data Cleaning, Features scaling and Features engineering.

Performed Data sanitization, Missing value treatment, outlier treatment.

Phase 2: Created Dummy variables for Categorical variables, and done the Binning variable creation for Continuous variables.

Performed Statistics -Descriptive statistics, Hypothesis testing, ANOVA.

Performed feature selection by picking the most predictive features from the model.

Used variable reduction techniques to drop the in-significant variables (multicolinearity).

Divided the data into training and validation datasets.

Phase 3: Built Response model at customer’s level (by using Logistic regression).

Used P value for finding out the fitness of the model.

Used Boosting and Bagging techniques to further improve the accuracy of the algorithm.

Finally provided data insights and recommendations for the model.

EKA Analytics May2014 – Aug 2015

Data Analyst

Project: Identify the NPS (Net Promoter Score) By Using Text Mining Analysis

Technologies: R, NLP

Responsibilities:

Finding out Customer or Agent Name by using Web Chats

Identifying positive and negative words from the web chats.

Identify most frequent or repeated words.

Identifying credit amount from web chats.

Started the project with Transfer learning approach with Glove pre-trained weights.

Used Embed layer to get the weight matrix of train data by embedding with Glove weights.

Used Bi-directional LSTM layer to improve the accuracy of the model.

Used Dropout layer to control the over fitting.

EKA Analytics Jan 2013 – May 2014

Data Analyst

Project: Next best offers for banking customers

Technologies: SAS, SQL Server, Excel

Business objective: Opportunity to analyze customer banking to detect opportunities for personal banker to cross and up sell

Responsibilities:

Understanding the business problem and pulled information.

Information in transactional systems needed to be pulled together and analyzed.

2.7 million daily customer’s events.

Building a predictive model to identify effective customers.

Building a recommendation engine form a specific type of information filtering system techniques that attempts to present information items that are likely of interest user

Validating a model by using cross validation methods are like grid search and boot strapping

By using different validation metrics are like (KS Statistics, Gini, ROC curve, sensitivity, AUC, Somers D)

Checking the model stability at testing phases and Out of time validation

Built the various models to measure the model performance and model accuracy.

Documentation of the processes to enable future analysts to reference.

Contact this candidate