Sign in

Senior Data Scientist / Data Analyst

Plainsboro Township, NJ
May 14, 2020

Contact this candidate


Ambar Dubey

Senior Data Scientist / Data Analyst



Around 6 years of hands on experience and comprehensive industry knowledge of Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling,Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models, Python, R, MS Excel, SQL, AWS (S3, EC2, Red shift).

Experience working on data-centric Python libraries like NumPy, Pandas for data analysis,numerical data manipulation and scientific computing.

Knowledge on supervised learning models such as KNN, Decision Trees, Random Forests, Bagging, Boosting and GLM (Linear Regression, Logistic Regression)

Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.

Hands on experience on clustering algorithms like K-means, LDA and Predictive algorithms.

Expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.

Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.

Experience in converting raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.

Experience of performing statistical analysis using univariate and bivariate analysis on processed data.

Experience in Hypothesis Testing, Chi-Square Test and A/B Testing.

Extensive experience with design and development of Tableau Visualization Solutions.

Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.

Expertise in cloud computing with AWS, Azure and GCP

Good knowledge in Database Creation and maintenance of physical data models with Oracle, Snowflake,Redshift(AWS), and SQL Server databases.

Experience in SQL programming (DDL, DML, and DCL) using Constraints, Querying, Joins, Keys, Indexes, Data Import/Export, Tables, Views, Window Functions.

Excellent understanding of Agile and Waterfall software development methodologies and technologies such as JIRA and Microsoft Project

Experience working on Agile projects with bi-weekly sprints to deliver high quality code in timely manner and participate in daily Stand up meeting and regular Spring Planning, Spring retrospective and Spring Demo meetings.

Efficient in collaborating with other members of the team and comfortable working alone as well to achieve the target.

Excellent communication and interpersonal skills backed up by sound professional ability to work independently as well as a team member



Python, R, Java, C, C++, HTML 5, XML, Unix, Spark

Statisticaland Visualization tools

R Studio, Scipy, Tableau, IBM SPSS, RapidMiner, Google Analytics, Seaborn, Matplotlib, Google Analytics, A/B Testing, Twitter Analytics, Adobe Analytics

Cloud Technologies

AWS [EC2, Lambda, S3, Redshift, Athena, Glue] Azure, Google Cloud Platform, Google Auto ML


GLM (Linear Regression, Logistic Regression), Decision Trees (Random Forests, GBM), Clustering (KNN, K-Means), Classification (Naïve Bayes)


Numpy, Pandas, Scikit-Learn, Matplotlib, NLTK, AutoML


SQL, PostgreSQL, OracleR12, MS-SQL, Google BigQuery, Hive, Teradata, NoSQL, DynamoDB, Redshift, Databricks, Impala


Pride - Chicago, IL

Role: Data Analyst Jan 2020 – Till Date

Description: PRIDEis an initiative that began in 2016 whose goal is to develop, field test and disseminate employment training and capacity-building programs that will improve vocational rehabilitation and employment options for refugees with disabilities in Illinois and elsewhere.


Creating reports and visualizations using Tableau to identify growth opportunities and maximize profits

Extracted and transformed US Census data to perform exploratory data analysis to help investors make informed decisions

Consistently met strict deadlines while documenting and referring complex issues to appropriate personnel

Environment: Tableau, Python, Microsoft Project


1.Information Privacy and Security (CITI Program)

2.HSP and Social Behavior (CITI Program)

Route -Chicago, IL

Role: Data Scientist Aug 2019 – Dec 2019

Description: Route is a SAAS based company that was built for the unique and complex needs across industries such as field services, facility management and property management. The project involved creating a model that will be able to identify objects in a picture that they will integrate within their mobile application, which will eventually help the company in automating their cost projections.


Developed a python script for image corpus by web scraping and enhanced the efficiency of data preparation task by75%

Built dashboards and wrote ETL scripts to automate data collection using Python, Javascript and Google cloud platform.

Performed Data Labeling using Google Image Labeling API with 95% accuracy

Implemented a neural network for object detection to label entities in an image with an accuracy of78%

Augmented data (200% of the original dataset), and labelled data (Google Image Labelling API) with 5 % error rate

Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.

Organized and facilitated Agile and Scrum meetings, which included Sprint planning, Daily Scrums, Sprint Review and Sprint Retrospective.

Created and maintained Product Increment, Product Backlog and Sprint Backlog defining new systems operations for the project.

Environment: TensorFlow, Python, Selenium, Google AutoML, Caffe

Zuri - Brazil

Role: Data Analytics Consultant Mar 2019 – Jun 2019

Description:Zuri is product manufacturing company based in Brazil which wanted to launch its first cosmetic product in Racife, Brazil. In order to do that, they needed to understand the Brazilian market and create a 3 year financial forecast as pitch deck to investors.


Devised and implemented A/B testing to analyse the target customer and their preferences.

Performed ad-hoc analysis to understand key performance indicators in the Brazilian cosmetic product market

Used SQL for deduping, running quality checks and handling nulls

Generated and presented a 3 year forecast model to upper management for decision support

As a subject matter expert I had to interact with the business stake holders to understand their requirement and make sure that the project will be completed in the right time efficiently.

Created and maintained a positive relationship with counterparts in Brazil (internal stake holders)

Environment: Python, MS Office, OracleR12

GSG Consultants -Chicago, IL

Role: Data Scientist Aug 2018 – Dec 2018

Description:GSG Consultants is one of the largest Engineering & Construction company based in Chicago, Illinois. GSG Consultants have played a key role in shaping the infrastructure of the Midwest and have earned a reputation for providing outstanding services in the most time sensitive environments. The company was using contemporary methods to add newcustomers to their profilewhich was creating a negative impact on their bottom line. As a part of the project, I had to devise a strategy to improve the company’s social media presence and how they can leverage this presence to attract more customers.


Developed an unsupervised machine learning technique model to know about the competitors preferences

Utilized distributed systems and built data pipelines to determine product impacts on different marketing channels and proposed solutions to drive impact for intensive analysis.

Performed sentiment analysis of 20,000+ competitor’s tweets to identify words/phrases used to generate higher engagement

Used social media traffic analytics to devise a strategy that helped GSG increase customer engagement by 40%

Pro-actively analysed data to uncover insights that increased business value and impact using SQL

Benchmarked social media practices of competitors to understand the best practices followed within the industry

Created a competitive matrix to compare the characteristics of multiple competitorsin the market to identify their differences, strengths, and weaknesses.

Created logical and innovative solutions to complex problems and presented proposals to stakeholders

Worked on Text Analytics, created word clouds and retrieved data from social networking platforms using Python

Environment:Python, Twitter Analytics, NLTK, SQL, Spark, Hive

Mindmap Advance Research Pvt. Ltd - Delhi, India

Role: Data Analyst May 2017 - May 2018

Description:Mindmap Research is a market research company in India that gathers and analyses data about customers, competitors, distributors, and other actors and forces in the marketplace. Using this data, they try to identify the problems in the client’s business, understand the needs of existing customers and make a well informed market decision to develop an effective strategy.


Researched and recommended the correct Target Group based on consumer demand

Created and managed a technical team and increased the revenue by 7%, reducing the dependency on 3rd party vendors

Assessed key market, competitive and customer trends to evaluate the impact on market segments helping the clients make informed business decisions to increase customer base and product quality.

Researched and developed presentations in Q3 of 2017 that helped to procure a $0.5 million contract in Q4 of 2017

Implemented intuitive Tableau dashboards using survey results and performed statistical tests to quality check data using internal tools.

Helped in training the New Hires on the specifications of the project and also helped them to understand the process of work and its importance.

Environment: SQL, IBM SPSS, Tableau, Microsoft Office Suite

Wipro Technologies - Delhi, India

Role: Data Analyst Sep 2014 - Apr 2017

Description:Wipro Technologies is a multinational corporation that provides information technology, consulting and business process services.


Created reports and dashboards to streamline the workforce, increasing the efficiency by 15%

Optimized complex SQL queries through tuning the logical structure of the databases, which resulted in 50% lesser execution time.

Mentored fellow contractors on different activities like splitting major tasks into small tasks, assigning, tracking the progress, running daily scrum meetings, providing technical help and reviewing code.

Responsible for maintaining 100% uptime for the client’s ecommerce portal

Provided technical solutions to application support team from a 20TB database size by creating ad-hoc SQL queries

Environment: Python, SQL, Unix, Putty, Java, HTML


Natural Language Processing

1.Spam Detection

Created a spam classifier to differentiate authentic articles from the spam articles

Increased the precision of the model by 3% and reduced processing time by 10 folds

Worked on different data formats such as JSONand performed machine learning algorithms in Python.

2.Sentiment Analysis on Yelp Reviews

Used multiple classification models to predict sentiment polarity on 10000+ user review data

Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naïve Bayes.


1.Churn Modeling

Created a segmentation model for a bank using Artificial Neural Network to tell which customers are at highest risk of leaving the bank.

Used K-Fold cross validation to check overfitting and grid search for parameter optimization

Achieved 5% more accuracy than the traditional machine learning models

2.Credit Card Fraud Detection

Comprehensive exploration of variables using univariate, bivariate data analysis techniques

Balanced dataset using the SMOTE sampling technique

Implemented multiple machine learning techniques for the detection of credit card fraud which improved the recall by 31%


1.German Credit Card Score Detection

Conducted exploratory data cleaning (imputations) and data analysis (univariate analysis, and bivariate analysis)

Used decision trees (random forest, C5.0) to predict the credit score from German credit score data

Exploratory Data Analysis

1.Analyzing delays in American Airlines

Designed 12 interactive workbooks and dashboards for airline performance measures and operational characteristics

Explored different causes of delays in flights for different airlines with numerous factors using R


Masters in MIS - University of Illinois, Chicago, IL

Contact this candidate