Sign in

Data Scientist

St. Charles, Missouri, United States
March 21, 2019

Contact this candidate




MOBILE: +1-913-***-****

Data Scientist with strong mathematical background and Proficient in Natural Language Processing, Machine Learning and Deep Learning. Utilizing Statistics, I’ve been able to identify, describe, process and implement algorithms to solve challenging business problems. I always prioritize learning new techniques and tools as the exciting field of data science is ever evolving.


Over 6+ years of experience in Machine Learning, Deep Learning, Data Mining with large datasets of structured and unstructured data, Data Validation, Data acquisition, Data Visualization, Predictive Modeling and developed predictive models that help to provide intelligent solutions.

Experience with statistical programming languages such as R and Python.

Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R and Python.

Hands on Experience on Customer Churn, Sales Forecasting, Market Mix Modeling, Customer Classification, Survival Analysis, Sentiment Analysis, Text Mining, Recommendation Systems.

Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering, Regression and Time Series Analysis to analyze data for further Model Building.

Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K-Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosted Decision Trees, Stacking Models.

Expertise in Machine learning Unsupervised algorithms such as K-Means, Density Based Clustering (DBSCAN), Hierarchical Clustering and strong knowledge on Recommender Systems.

Hands on experience in implementing Dimensionality Reduction Techniques like Truncated SVD, Principal Component Analysis, t-Stochastics Neighborhood Embedding (t-SNE).

Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents, patterns within data, analyzing data and interpreting results.

Good knowledge on Deep Learning concepts like Multi-Layer Perceptron, Deep Neural Networks, Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks.

Hands on experience on Deep Learning Techniques such as Back Propagation, Choosing Activation Functions, Weight Initialization based on Optimizer, Avoiding Vanishing Gradient and Exploding Gradient Problems, Using Dropout, Regularization and Batch Normalization, Gradient Monitoring and Clipping Padding and Striding, Max pooling, LSTM.

Experience in using Optimization Techniques like Gradient Descent, Stochastic Gradient Descent, Adam, Adadelta, RMS prop, Adagram.

Experience in building models with Deep Learning frameworks like Tensor Flow and Keras.

Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization and building Models.

Extensive hands-on experience and high proficiency in writing complex SQL queries like stored procedures, triggers, joins and subqueries along with that used MongoDB for extraction data.

Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.

Experience with data visualization using tools like GGplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.

Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.

Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.

Proficient knowledge on Mathematical Matrix Operations, Statistics, Linear Algebra, Probability, Differentiation, Integration and Geometry.

Extensive experience working in a Test-Driven Development and Agile-Scrum Development.

Experience in using GIT Version Control System.

Excellent initiative, innovative thinking skills, and the ability to analyze details and adopt a big picture view and Excellent organizational, project management and problem-solving skills.


Client: Starbucks, Seattle, WA

Jan 2018 – Till date

Role: Data Scientist

Description: Starbucks Corporation is an American coffee company and coffeehouse chain. Starbucks was founded in Seattle, Washington in 1971. As of 2018, the company operates 28,218 locations worldwide. This project includes creation of statistical machine learning models which implements product sales forecasting, customers segmentation, Customer Life time value prediction and Customer Churn.


Gathering requirements from business and Reviewing business requirements and analyzing data sources.

Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.

Performed Data collection, Data Cleaning, features scaling, features engineering, validation, Visualize, interpret, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, SciPy, Scikit-Learn.

Implemented various statistical techniques to manipulate the data like missing data imputation, principle component analysis, sampling and t-SNE for visualizing high dimensional data.

Worked with Customer Churn Models including Random forest regression, lasso regression along with pre-processing of the data.

Constructing the new vocabulary to convert the data into numbers to be processed by the machine by using the approaches like Bag of Words, TF-IDF, Word2vec, Average Word2Vec.

Used testing methods like A/B Testing, Multi-Variate to measure impact on new initiatives.

Applied Clustering Algorithms such as K-Means to categorize customers into certain groups.

Worked with sales forecast and campaign sales forecast models such as ARIMAX, Holt-Winter, Vector Autoregression (VAR).

Using NLP developed deep learning algorithms for analyzing text, over their existing dictionary-based approaches.

Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.

Implemented LSTM layer network of moderate depth to gain the information in the sequence with help of Tensor Flow.

Created distributed environment of Tensor Flow across multiple devices (CPUs and GPUs) and run them in parallel. Implemented machine learning algorithms like Logistic Regression, SoftMax Classifier, Random Forest, Decision Trees

Client: Commonwealth Bank of Australia, Sydney

Jul 2015- Dec 2017

Role: Data Scientist

Description: Commonwealth Bank of Australia is an Australian multinational bank with businesses across New Zealand, Asia, the United States and the United Kingdom. It provides a variety of financial services including retail, business and institutional banking, funds management, superannuation, insurance, investment and broking services. This project includes predicting customer life time value modelling (CLV) and sorting the customers of different levels having different credit ranges and increasing their credit limit based on their credit history and usage of credit card. This also includes fraud detection, customer analytics, NLP tasks, Ticket routing techniques, etc.


Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.

Personalization, Target Marketing, Customer Segmentation and profiling.

Performed Data Cleaning, features scaling, featurization, features engineering.

Used Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python at various stages for developing machine learning model and utilized machine learning algorithms such as linear regression, Naive Bayes, Random Forests, Decision Trees, K-means, & KNN.

Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.

The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.

Analyzed and implemented few research proofs of concept models for Real time fraud detection over credit card and online banking purchases.

Worked with Credit Analysis, Risk modeling algorithms to implement in customer acquisition strategies into the real time business.

Studied and implemented Fraud detection models to monitor the unconventional purchases from customer bases and alert them with updates.

Performed Clustering with historical, demographic and behavioral data as features to implement the Personalized marketing that offers right product to right person at the right time on the right device.

Evaluated models using Cross validation, Log loss function used to measure the performance and used ROC curves and AUC for feature selection.

Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data.

Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.

Used Spark's Machine learning library to build and evaluate different models.

Client: Mindtree, Hyderabad, Telangana

Feb 2014 - Jun-2015

Role: Data Scientist

Description: Mindtree is one of the fastest growing digital, business consulting & technology service firms. Mindtree has more than 20,000+. I worked for various in-house projects which handles customer analytics, NLP tasks, OCR models etc.


Developed python scripts for day to day business activities.

Worked with various regression algorithms like Random Forest Regression, Decision Tree regression, Polynomial Regression, Binomial Regression and Support Vector Models to forecast machinery failures in auto-motive industries.

Worked with various customer analytics such as Customer targeting, campaign sales analysis, KPI analysis, forecasting sales, NLP models.

Worked on Personalized marketing models to implement simplicity and targeted marketing for specific customers.

Worked with Clustering algorithms to target specific group of customers to generate profitable revenue.

Used market basket analysis, association rules analysis to identified patterns, data quality issues and leveraged insights.

Used Convolutional Neural Network (CNN) to perform image classification and object detection.

Worked on huge data sets and developed predictive models with machine learning techniques.

Developed statistical models by leveraging best-in-class modeling techniques.

Acquired image dataset of products from different data sources and aggregated into one dataset on Amazon Redshift.

Worked with OpenCV, pyTessaract and other image processing techniques to analyze the text content from the scanned documents.

Create models to segment the image into different feature regions to implement text extraction for business requirement.

Converted unstructured pure text consumer comments data to structured dataset using NLP techniques and feature engineering.

Created a text classification model using RNN and LSTM with TensorFlow.

Explored and visualized the data to get descriptive statistics and inferential statistics for better understanding the dataset.

Built predictive models including support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate how the likelihood to recommend of customer groups would change in different set of service by using python scikit-learn.

Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback and retrained the model to improve the performance.

Client: CSC Technologies - Hyderabad, Telangana

Dec 2012- Jan 2014

Role: Data Analyst

Description: Participated in project which integrates locally and third-party designs to create solutions to project problems defines by the business requirement. As an Integration developer, implemented technical and data processing knowledge to solve moderately marketing and Data manipulation problems on very large volumes of data. With its unparalleled resources, huge project-oriented designs, we integrated solutions that are innovative and practical.


Communicated effectively in both a verbal and written manner to client team.

Completed documentation on all assigned systems and databases, including business rules, logic, and processes.

Created Test data and Test Cases documentation for regression and performance.

Designed, built, and implemented relational databases.

Determined changes in physical database by studying project requirements.

Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.

Facilitated gathering moderately complex business requirements by defining the business problem

Utilized SPSS statistical software to track and analyze data.

Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.

Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.

Successfully interpreted data to draw conclusions for managerial action and strategy.

Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.

Maintained the data integrity during extraction, manipulation, processing, analysis and storage.


Master of Computer Information Systems and Information Technology University of Central Missouri

Bachelor of Electronics and Communication Engineering JNTUH


Salesforce Certified Platform Developer I Credential ID: 17810079

Salesforce Certified Administrator (SCA) Credential ID: 17371963



C#, VB.NET, ASP.NET, Java Script, R, Python, SFDC, Visual Force


SQL Server, Oracle 11g, MS-Access.


Matrix operations, Differentiation, Integration, Probability, Statistics,

Linear Algebra, Geometry.

Machine Learning Algorithms

Logistic Regression, Linear Regression, Support Vector Machines, Decision

Trees, K-Nearest Neighbors, Random Forests, Gradient Boost decision Trees,

Stacking Classifiers, Cascading Models, Naive Bayes, K-Means Clustering,

Hierarchical Clustering and Density Based Clustering.

Machine Learning Techniques

Principal Component Analysis, Truncated SVD, Data Standardization,

L1 and L2 Regularization, Loss Minimization, Hyper Parameter Tuning,

Performance Measurement of Models, Featurization and Feature Engineering,

Content Based and Collaborative Based Filtering, Matrix Factorization, Model

Calibration, productionizing Models, A/B Testing, Point and Interval Estimation,

Hypothesis Testing, Cross Validation, Decision Surface Analysis, Retraining

Models periodically, t- stochastic neighborhood embedding.

Deep Learning

Artificial Neural Networks, Convolutional Neural Networks, Multi-Layer

perceptron’s, Recurrent Neural Networks, LSTM, GRU, SoftMax Classifier,

Back Propagation, Chain Rule, Choosing Activation Functions, Drop out,

Optimization Algorithms, Vanishing and Exploding Gradient, Striding, Padding,

Optimized weight Initializations, Gradient Monitoring and Clipping, Batch

Normalization, Max Pooling.


Agile, Waterfall

Contact this candidate