AVINASH KUSTAGI
**** ****** **, ********, ** ****5 +1-816-***-**** skype: amkustagi
*******.*******@*****.*** www.linkedin.com/in/kustagi https://github.com/amkustagi
PROFESSIONAL SUMMARY
Data Science professional with 5 years of experience in Analytics and IT Industry with the emphasis on Retail, Beverage and Services domains, handled huge volumes of structured, semi-structured and unstructured data for delivering the innovative products and services for Fortune 500 clients.
Certified Google Tensor flow and Microsoft Azure professional.
Worked on many data analytical projects for clients like Coca-Cola, Red Lobster and Essilor
Experienced in applying the data mining, machine learning, natural language processing and time series modeling technique in the retail and service domain
Hands-on experience in Natural Language Processing, and Topic extraction, Machine Learning algorithm, time series forecasting techniques
Hands-on experience with exploratory data analysis, data cleaning, visualization, Statistical Modeling using Python 2.7, R Studio, Tableau
Highly conversant in using the machine learning model such as Linear, Logistic regressions, Decision Trees, Random Forest, Support Vector Machines, K Mean, K nearest neighbors.
Handled non-traditional data source (Facebook, twitter streaming API, web scraping –Yelp.com)
Good understanding of distributed computing with Azure HDInsight Cluster. Used HBase Hadoop Cluster with HIVE, PIG, Oozie for analytics.
Experience in handing large volume of cloud data using Azure HDInsight Cluster, blob storage, Azure Machine Learning
Solid exposure in querying the relational databases like MS SQL Server, DB2 and NoSQL database like HBase
Extensively used open source tools – R Studio, Python and WEKA for analysis and building the machine learning.
Hand-on experience in using advanced data analysis models, Linear and the constraint optimization models for a multiple brand
Experienced in allocating the marketing budget for Coca-Cola brands. Very good understanding of applying the time series models to forecast the sale and revenue data, understand the anomalies and hidden trends from the time series data
PROFESSIONAL SKILLS
Languages
Python & R
Azure
Machine Learning platform, HDInsight Cluster, Spark Cluster
Big Data
Apache Spark 2.2+, Spark SQL, Hive, PIG, Scoop, Oozie, Hadoop
Analysis
Predictive Analytics, Forecasting, Supervised Learning, Unsupervised Learning, Time Series Modeling, Constrain Optimization Models, PCA, Factor Analysis, Conjoint Analysis, Correspondence Analysis, Market Basket Analysis, Churn Prediction
Relational Databases
MS SQL Server, DB2
NoSQL
HBase
Tools
Tableau 9.3+, AWS EC2, Excel, Power Point
Data Science Libraries
R Packages
ggplot2, caret, Dplyr, Tidyr, FPP, forecast, Party, etc.
Python Libraries
Pandas, NumPy, SciPy, NLTK, Beautiful Soup, Scikit-Learn, Mllib, TensorFlow, Keras
PREFESSIONAL CERTIFICATIONS
R & Statistics – Harvard
Machine Learning with Tensorflow on Google Cloud Platform
Processing Big Data with Hadoop in Azure HDInsight – Microsoft
Implementing Real-Time Analytics with Hadoop in Azure HDInsight – Microsoft
Certification in Marketing and Operations Research Management
PROFESSIONAL EXPERIENCE
Titan Data Group Inc St Paul, MN Jan’17 to Present
Role: Data Science Analyst
As a Data Science Analyst, I am responsible for developing products in the Machine Learning and AI Space. Working with a team of Data Scientists to build an AI Matching engine, ChatBots and Advanced Ad-hoc analytical reports for biggest retail, midsized Banks, Insurance and Airline clients.
Responsibilities:
Building a Machine Learning enabled ChatBots to revitalize the customer experience at centers. Using the conversational transcripts to analyze and predict the sentiment of customer to enhance the user experience.
Building an AI matching engine using Open API data sources to find out the best matching candidate profiles
Configuring an Azure HDInsight SPARK Cluster (~ 5 worker nodes) in Azure Environment to build unsupervised learning technique to know more about the customers buying behavior
Exploratory analysis to find out the insights using descriptive stats - box plot, scatter plot, heat maps to understand the customers’ needs and insurance subscriptions.
Using regression models to predict the hot prospects to sell the insurance using Logistic Regression
Integrating data from various source (Sales & Marketing, CRM) using PySpark (Python API for Spark)
Created interactive Tableau visualizations to show the impact on sales with various industrial attributes
Environment: R, Python, Azure HDInsight, Azure ML, AWS, NLP, Tableau, SQL, S
Analytics Quotient Services Bangalore, INDIA (Part of Millward Brown) May’12-Dec’15
Analytics Quotient Service is a marketing analytics company that unravels story behind data. Expertise in building the innovative data visualization tools and the custom analytical products tailored to the client needs to slice, dice and stimulate the business data. We used R and Python tools extensively, for building the descriptive, prescriptive, and predictive models.
Role: Data Scientist
Responsibilities:
•Building the analytics frameworks
•Responsible for Data collection, Data understanding, data preparation and normalizing the data
•Mentoring and collaborating with 1 or 2 Analyst and Trainee Analysts to get the things done
•Analyzing and processing the data using Excel, R, Python
•Hands-on experience working with Knowledge and Insights directors and presenting the analytical finding using dynamic tableau dashboard, PowerPoint templates and ggplots2 visualizations
•Applied Natural Language Processing techniques for processing the consumer comments to perform the Sentiment Analysis
•Worked on data analysis projects for Essilor, Red Lobster and Coca Cola Clients
•Price optimization and budget allocation using Constraint Optimization algorithms and Time Series – ARIMA forecasting techniques.
Project 1:
Overall Satisfaction Analysis for Red Lobster Using Regression Models – Used R for Analysis
It was a descriptive statistical project for the brand Red lobster. The project involved identifying what factors could influences the overall satisfaction of consumers. The overall satisfaction was rated from 1 to 5 range, with 1 being the least satisfied and 5 being the most satisfied. The independent factors could be the quality of food, ambiance, location, waiting time etc. which could play a vital role in impacting the overall result.
We considered the SMG (Service Management Group) database survey results in analyzing the impact on overall customer satisfaction. We used Ordinal logistic regression methodology in explaining the importance of features. Basically, business equation would look like.
Overall Satisfaction Rating = food quality + ambiance + time spent + amount spent + cleanliness + service + taste + hot pipping + parking + delivery + location + waiting time + etc.
The analysis involved predicting the overall satisfaction - ordinal rating, by analyzing the impact of each independent factors in explaining the output.
Packages used: MASS package, for Ordinal logistics regression modeling.
Analysis Outcome: I have conducted the same experiment with different location and analyzed what are the driving factor for that location. As expected some restaurant near down town, people are more concern towards waiting time, ambiance. In contrast, if we compare to the suburbs people were more concerned about food taste, delivery time, distance to reach etc.
Environment: Used R Studio for analysis and Tableau for the data visualization and MASS package for Ordinal logistics regression model.
Project 2:
Sentiment Analysis for Red Lobster Client uisng Natural Language Processing (NLP)
Sentiment analysis using public comments from SMG database for Red Lobster Client
(It was in a Proof of Concept Stage)
Tool Used: Python – BeautifulSoup and NLTK packages procession text; NumPy and Pandas for working with dataframes.
In this project, we took the corpus – public comments from SMG database, we also considered Yelp.com reviews about the restaurant. Extracted the data in HTML format and Created a Beautifulsoup object to extract the required fields.
• Used the Numpy and Panda packages to work with data frames
• Used CountVectorizer to convert the corpus into bag of words and created the vector space with all the features (unique words)
• Applying the transformation: Modifying minimum document frequency and Maximum document frequency
• Applying by grams and N-grams to capture – Cheddar bay biscuits, Red Lobster, Pina Colada
• Removed the stop words using NLTK
• Applied the stemming and lemmatization, to understand the frequency of the words.
• Used the AFINN sentiment dictionary to analyze the corpus,
Environment: Python using beautiful soup for Information retrieval and NLTK – Natural language processing tool kit
NLP programming concepts uses: Text mining – Information retrieval, web scrapping, Stemming, Lemmatization, Stop word removal, Sentiment Analysis, topic extraction.
Project 3:
Annual Marketing Budget Allocation for the Brand Coca Cola (For USA, China, Philippines, Turkey, European Markets)
•Responsible for Data collection and data preparation and normalizing the data. Used SQL and Excel for data prep.
•Supporting data consultants in the data modeling phase. In data modeling, we used Constraint optimization algorithms to optimizing the marketing budget. Also, used Time Series Models - decomposition of time series, trend, and seasonality detection, forecasting and exponential smoothing in predicting the market share and brand share to allocate the Marketing budget
•Used R (dplyr, ggplot2, fpp, forecast packages) for Statistical analysis and data modeling. Used Excel for Constraint optimization algorithms
Project 4:
Coca Cola Free Style Ad-Hoc Requests & KPI Dashboard Creation
Responsible for Database design, Data normalization and loading. Used SQL and ETL- CCV tool (Coca-Cola proprietary tool) for creating the database, extracting, and processing the data.
Supporting the Data consultant in the analysis phase by gathering the required data and preparing the data for data analysis. We used descriptive Analysis, machine learning techniques to build the KPI dashboard, used Tableau for visualizations.
Used SQL, Excel for Processing and Tableau for visualization
Project 5:
My Coke Rewards Redemption Analysis
Analyzed the Redemption and Take rate by brand for the financial year – Excel (extensively)
Project 6:
Red Lobster Account: Ad hoc request from Red Lobster – Sea food Chain in USA
Used SMG (Service Management Group) database for most of the analysis used SQL for data preparation
Sentiment analysis using public comments from SMG database (POC Stage- Used Python (NLTK) for analysis)
Monthly Media Activity intelligence & Competitor Pricing reports for top level management.
Overall satisfaction analysis using descriptive statistics
Project 7:
Info Market Tool and Survey Portal automation for Essilor – A Leading Lens Company in the World
•I was Responsible for project delivery. Managed 2 analysts and coordinated with development team to build a Visualization dashboard
Used Excel for processing and used Time Series and Regression concepts for forecasting. Delivered a tactical and strategic KPI dashboard to help managers and top-level management for taking decisions.
Wings iNet Technologies India Pvt. Ltd – Pune, INDIA Bangalore, INDIA May’11- Feb’12
Role: Business Associate
Worked on ETL projects – It involved automation of marketing and financial reports
Integrating data from multiple data source, build a foundational data source using SQL
Creating dashboards which helps management to keep track of KPI’s
Sanctum Technologies Pvt Ltd – Bangalore, INDIA Bangalore, India July’08 – July’09
Role: Software Engineer
Managing PPC and display campaigns
Organic Search - SEO using key word research, backlink optimization, Performance tuning and blogging
ACADEMIC PROJECT EXPERIENCES
Predicting the Fetal Status using classification algorithms
Predicting the Fetal Status (Normal/Susceptible/Pathologic) using Cardiotocograph Data
Used Regression, Decision tree, Random Forest, Bagging and Boosting algorithms to train the classifier and used the best model for further classification (Used R Studio - CARET and Party Packages)
N-Stock portfolio optimization using Non-Linear Programming
Used GRG Non-Linear algorithm and Constraint optimization algorithm to optimize the stock portfolio.
Sentiment analysis of GOP debates using Natural Language Processing.
Web scraping the primary debate text, Lemmatization to process and TF-idf Vectorizer for feature creation. Analyzed the sentiment of text over the time. Analyzed the presidential candidate’s sentiments towards different topics like China, India, ISIS, Israel etc. (Used Python – NLTK package)
Churn Analysis using a Telecom dataset.
Identifying the consumer preference of 4G, LTE, Voice mail plan, Total day minutes, total international calls, total number of service calls etc.
Identifying high valued customers using bins, conducting the descriptive and predictive statistics and predict the Churn rate.
Spam filter using Machine Learning
Used OLS, Ridge, Lasso, Elastic net regularizations, KNN, Decision Tree, Random Forest, Support Vector Machines (Linear/RBF), Stochastic Gradient Descent, Gradient Boosting, Bagging, Adaboost, xgboost, Stacking and Artificial Neural Network Classifiers to train the model and used the best model for production. (Used Python- Scikit-learn)
EDUCATION
Master’s in Business Intelligence and Analytics (GPA 3.96/4.0)
Rockhurst University, Helzberg School of Management, Kansas City, MO
Bachelor’s in Electronics and Communication
Don Bosco Institute of Technology, Bangalore, INDIA