Sign in

Data Scientist

St. Charles, Missouri, United States
March 25, 2019

Contact this candidate



Data Scientist

636-***-**** e-mail id :

Professional Summary

Passionate Data enthusiast, having around 6 years of experience as professional qualified Data Scientist in Statistical modelling, Machine Learning, Data mining and Data Visualization.

Developing different Statistical Machine Learning, Text Analytics and Data Mining solutions to various business scenarios and generated Data Visualizations using Python and Tableau.

Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles.

Good knowledge on statistical analysis techniques like Confidence Interval, Hypothesis testing, ANOVA, Conjoint analysis, sentiment analysis and semantic analysis.

Comprehensive experience in developing solutions to complex business problems using Machine Learning models and gave feasible visualization using Python.

Adapt and deep understanding of Statistical modeling, Model testing, problem analysis model comparison, Optimization and Validation.

Constructed multiple Machine Learning models using Python’s scikit-learn libraries and used numpy, Pandas libraries to work with Data-frame, dictionaries, numpy-arrays.

Strong hands-on work experience in implementing Linear, Multi-Linear Regression, Logistic Regression, SVM, Naïve Bayes, Decision Trees, Random Forest Classifiers, Natural Language Processing and K-means Clustering methods to solve various business problems.

Implemented deep learning techniques like Artificial Neural Networks, Recurrent Neural Networks and Convolutional Neural Networks using Tensorflow, Keras and used Theano to solve various business problems.

Forecasted sales, demand for loans and future values using Time-series modelling techniques like Autoregressive, Moving Average, ARIMA and Holt-Winter.

Highly skilled in advanced Regression modeling, Time series analysis, Correlation, Multivariate analysis.

Developed viable visualization to display results and explained results visually using Python packages such as Seaborn, matplotlib, ggplot and pygal.

Extracted data and worked with data from multiple database sources like Oracle, SQL Server, DB2, mongo DB, Cassandra, NoSQL and Teradata.

Generated viable representations of ML models using Tableau for client and higher management.

Extensive knowledge of Data Science Lifecycle, SDLC, waterfall and Agile methodologies and used Agile methodologies to develop software products.

Proficient in Statistical modeling, Applied Mathematical methods and having expert knowledge in various business and engineering domains.

Forecasted behavior of Mechanical, thermal, fluid systems by creating mathematical model using linear, multi-linear and non-linear Regression and performed fault analysis of systems.

Professional, enthusiastic and self-driven leader having led multiple teams to analyze real-world business problem and collaborated with scientists, engineers with vision of adding value to business from data through Data Science and Machine Learning techniques.

Relocation: anywhere in United States

No sponsorship required to work in the United States

Work Experience

J.C. Penny, Plano,Texas February 2017 to present

Data Scientist 1

Role Summary: JC Penny is omnichannel Retail chain having over 870 stores in United States, Puerto Rico, revolutionizing shopping and investing in technology and resources to make shopping experience easy and seamless across all channels and devices, offering convenient delivery and pickup options. My Job involves creating a Customer lifetime value model to reduce Customer churn and performed sales forecasting.


Communicated with management to discuss insights obtained from data, assisted in taking best business decisions, reduced Customer Churn by 10% in few months of implementation by extracting value from data.

Performed Customer segmentation based on customers behavior, demographics, transactions and customer specific details like age, income and created multiple customer classes.

Constructed customer classes with historical, demographic and behavioral data as features using Random Forest Classifier and Logistic Regression to help marketing team understand purchase pattern of customers.

Assisted marketing team to devise business strategy to target customers with discount coupons, deals and offers to improve customer purchases.

Identified distinct patterns in which customers respond to offers and clustered their actions using K-means, K-means++ Clustering, Hierarchical Clustering and segmented them into different groups, helped marketing team to further analyze behavioral patterns of customers.

By using Multi-Linear Regression algorithm we created the Customer lifetime value (CLV) from the customers first six months of data, identified high and low value segments, helped employer to understand customers and improve customer service to retain customers.

For the better revenue generation finally proposed marketing strategies to target potential customers using their first three months data and regression model from this we evaluated CLV for every new customer .

Collaborated with risk management team and provided insights using various analysis models from python libraries like pyfolio, empyrical, qfrm and VisualPortfolio.

Investigated large datasets to handle missing values, cleaned messy datasets and applied feature scaling to standardize range of independent variables.

Researched predictive models including Logistic Regression, Support Vector Machine (SVC) and re-enforcement learning to prevent retail fraud.

Improved model performance by tuning hyper-parameters using optimization techniques like Grid search, Random search and Bayesian optimization and increased model efficiency by XG-Boosting

Validated models using Cross validation, loss function to measure model performance and created Confusion Matrix, ROC and CAP curves. Addressed overfitting and underfitting by tuning hyperparameters using L1 and L2 Regularization

Applied dimensionality reduction technique like Principal Component Analysis (PCA) to extract relevant optimal features from high dimensional data.

Forecasted sales from historical sales data using Time-series modelling techniques like ARIMA and Holt-winter model. Assisted supply chain management team in meeting customers demand and maintaining stock at stores.

Visualized results using Matplotlib, Seaborn libraries of scikit-learn and used Tableau to present results on dashboards for team members, management and other relevant departments in company.

Client: Wells Fargo October 2014 to November 2016

Data Scientist

Role Summary: Wells Fargo & Company is an American multinational financial services company headquartered in San Francisco, California, with central offices throughout the United States. It is the world's fourth-largest bank by market capitalization and the third largest bank in the US by total assets. Involved in evaluating customer credit data and financial statements in order to determine the degree of risk involved in lending money.


Developed predictive solutions to support commercial banking team using machine learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machine in Python.

Conducted analysis in assessing customer behaviors with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.

Evaluated parameters with K-Fold Cross Validation, Grid search methods to optimize performance of models

Worked on data cleaning, data preparation and feature engineering with Python, including NumPy, SciPy, Matplotlib, Seaborn, Pandas, and Scikit-learn.

Along with data analytics and Excel data extracts, Implemented Agile Methodologies, Scrum stories and sprints in a Python based environment .

Design and build world-class high-volume real-time data ingestion frameworks and automate various data sources into Bigdata technologies like Hadoop etc.

Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

Used MySQL and created Sql tables and involved in data loading and writing Sql UDFs.

Experience designing and optimizing complex SQL queries involving table joins using MySQL.

Worked in Tableau environment to create weekly, monthly, daily reports using tableau desktop & publish them to server.

Worked on importing and exporting data from Oracle into HDFS using Sqoop.

Worked on Excel using VLOOKUP, pivots, conditional formatting, large record sets, data manipulation and cleaning.

Used GIT HUB as version control software to manage the source code and to keep track of changes to files which is fast and light weight system.

Environment: Python, MySQL, SAS, Pig, HDFS, Hive, Excel, Tableau and GIT

Client: Seasonal Tastes July 2014 to September 2014

Data Scientist

Role Summary: Seasonal Tastes is restaurant situated at Gurgaon, Mumbai and Hyderabad which serves Chinese, Asian, International and Traditional vegetarian Indian cuisine. My role involves identifying customers sentiment about food and service using reviews from various websites and to assist in shaping advertisement strategies, improve customer service and increase customer base for more business.


Performed sentiment analysis of customer reviews and classified each review into good, bad and neutral class to understand pulse of customers about business.

Implemented Porter Stemmer (Natural Language Tool Kit) with NLP bag of words model using Count Vectorizer class to process text data.

Created predictive model using LSTM, Recurrent Neural Networks (RNNs) and studied reviews, obtained feedback on customer service to help employer reduce customer churn.

Experimented with other classification models like Random Forests, Logistic Regression and Naïve Bayes to classify customers reviews.

Extracted data from web using Web Scraping, Text mining and preprocess data into tab separated file to separate reviews by tab in data.

Cleaned dirty data and prepared data for feature extraction using Count Vectorizer of scikit-learn feature extraction library.

Automated customer service by creating chat box which responds to customer queries using deep learning and text processing with nltk of NLP library.

Evaluated model performance by creating confusion matrix, classification report and accuracy score. Improved model performance by k-fold cross validation and XG-Boosting and achieved model accuracy of 92%.

Developed Recommender systems using Apriori associate rule learning, sales data. Recommended attractive deals, cuisines and increased number of customers by 15%, worked with marketing team to devise powerful marketing strategy.

Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.

Presented simple visualization of results using seaborn visualization libraries of Python.

Increased client business by 10% in six months by efficiently transforming customer service based on feedback obtained from sentiment analysis.

.Client: Westpac Banking Corporation, India August 2013 to June 2014

Data Analyst

Role Summary: Westpac is Australian bank and financial-services provider. Westpac has 14 million customers and employs almost 40,000 people. Job involves collecting data from various data sources and pump it through informatica workflows to store it into data warehouse. This project involves data correction, business logic implementation using PL/SQL and other scripting languages like Shell scripting.


Acquired data from primary or secondary data sources and maintain databases/data systems.

Established new client data preparing them for entry into new platform.

Loaded data by converting CSV file into corresponding database tables.

Worked with management team to create prioritized list of needs for each business segment.

Monitored and resolved issues of data flow on daily basis. Also created views for reporting team to use data for marketing numbers on daily basis.

Collaborated with reporting team to resolve data discrepancies and logical data corrections which are occurring throughout reports.

Generated Tableau ad-hoc reports using excel sheet, flat files, CSV files.

Used data mining techniques for outlier detection and created algorithm to connect patterns between customer trends.

Created Software solutions in Software development lifecycle (SDLC) and Agile methodologies environment.

Performed computational tasks on data by creating pig, hive and Map reduce scripts to access and transform data in HDFS.

Developed and implemented metadata models for reporting functionalities and developed automated process for data corrections.

Written SQL, NoSQL and PL/SQL scripts to extract data from database and for testing Purposes.

Reviewed logical model with application developers, ETL team, DBAs, and testing team to provide information about data model and business requirements.

Identify and log defects if/when test fail, using SQL to narrow down root cause of problem for efficient investigation by development team and log accordingly.

Used advanced Excel functions to generate spreadsheets and pivot tables.


Masters : Computer and Information Sciences

Bachelors : Electronic and Computer Sciences

References available upon request

Contact this candidate