Data Scientist

Location:

Chicago, IL

Salary:

70000

Posted:

October 05, 2017

Contact this candidate

Resume:

Aditya Guli

**** ***** **** *****, *** ***, Chicago, IL 60616 *******@****.***

585-***-**** LinkedIn Quora

Summary

I was a senior data analyst in the Fraud Operations department for Amazon.com where my primary focus was act as a liaison between the software engineering team and senior management to assist in building high performing algorithms otherwise called Rules to detect fraudulent activity by performing requirements gathering, rule auditing, trend analyses, data manipulation, visualization etc., to deploy said rules.

I have mastered predictive modeling, machines learning, hypothesis testing, time series, exploratory techniques, data wrangling and processing and dashboard development using tools like R, Python, SQL, SAS Enterprise Miner, Shiny, Tableau.

Work Experience

Amazon.com, Hyderabad, IN Senior Data Analyst, Fraud Operations

As a Data Analyst, my key role was to support operations to run their business fluently free of fraudulent transactions by assisting in building high performing algorithms otherwise called Rules to detect fraudulent activity by performing data manipulation, trend analyses, requirements gathering and act as a liaison between software engineering and senior management to deploy said rules. Our purpose was to understand these backend processes and the business logic and use this to define and validate the data, cleaning, sampling, analyzing, and providing visualizations to the management. I extensively redefined and created several new processes, publishing weekly & monthly reports and dashboard visualizations to drive better decision quality. I was also involved in cleaning and removing false positives from department metrics which was used to gauge all associate performances within the department. A brief description of the projects I was involved in is given in the project section.

Skills & Tools

Hypothesis Testing.

Logistic Regression Analysis.

Machine Learning.

Text Mining.

Trend Analysis.

Time Series Analysis.

R Programming.

Python.

SQL.

Tableau.

Shiny R.

Advanced MS Excel.

OBIEEE.

SAS Enterprise Miner.

Relevant Project Work

Insurance Claims Prediction – Logistic Regression Analysis

Logistic regression to predict whether a person will make an insurance claim.

Current insurance premium pricing models only use the past three years of data to rate policies.

Stepwise selection method used to select the significant predictors of the claim indicator.

Two separate models were created, one each for personal and commercial vehicles.

Misclassification Rate:

oCommercial lines – 29%

oPersonal lines – 19%

If the use of the car is for commercial reasons then:

oOdds of a claim go down if the age of the vehicle or the number of years on job or when the license is not revoked.

oThe odds of a claim go up if the area is urban and if the vehicle type is unknown or is a pickup truck or is a sports car.

If the car is used for private reasons, then:

oOdds of a claim go down if the age of the vehicle increases or when the license is not revoked or when the person is married.

oOdds of a claim go up if the area is urban, the distance from work increases, the number of driving children increases and if the vehicle type is unknown or is a pick up or is a sports car.

Sentiment Analysis

Sentiment analysis on data from Amazon Product Reviews.

The key columns for this analysis are:

oHelpfulness Numerator – no. of people who voted if the review was helpful.

oHelpfulness Denominator – total no. of people who indicated the reviews.

Text: the actual review.

Summary.

The objective of the project was to create 2 columns and 2 – word clouds:

oSentiment – if the review was positive or negative.

oUsefulness – if the review was helpful or not.

Regression Analysis: Multi – Linear Regression on gameplay data for StarCraft2

A preliminary exploratory data analysis showed that the response variable Actions Per Minute (APM) had huge variance and had non – linear relationships with other predictor variables.

Subset the data by skill level (also called league index) which contributed towards the high variance in the data.

Predict the APM for each of these subsets using the rest of the independent variables.

After building a model for each skill level, the following variables were significant and most common

oSelectbyHotKeys

oGapBetweenPACs

oComplexUnitsMade

oActionsInPAC

oNumberOfPACs

WorkersMade

Stepwise method used to build regression models. Statistical tests (like Lack of Fit, Durbin – Watson and Anderson – Darling), showed that our model explained maximum variation in the APM. Residual plots endorsed this understanding.

Time Series Analysis: Netflix Stock Prices Data

Structured to track the stock prices on a day to day basis and potentially invest in the company.

Response variable was a column named “close” since the price reported on a stock is always the closing price for that day. Also, the next day’s opening price and adjusted closing price (price after dividend subtracted) depend on previous day’s closing price.

The primary objective of this project was to find candidate model(s), account for seasonality, forecast the prices and create 95% confidence intervals.

Amazon.com, Fraud Project 2013 and 2014

We conduct this project every year in the month of March. The data set for this project is a collection of 2000+ seller accounts flagged as fraud that year. We dive into the raw data and discover new modus operandi/fraud trends on ecommerce platforms. I was accredited for 4 new patterns. A few of my action items out of this project are:

Updating or building new predictive models for early detection of malicious activity.

Documentation of the project outcomes with specific identifiable attributes for any future analysis.

Cascading the visualizations to the global management team in the form of charts (Bar, Pie, Pareto) which reported total no. of incorrect investigations, poor performing rules etc.

Amazon.com, Metric Cleaning 2014

As one of the 5 analysts, I had to analyze a large sample of data which our automated system flagged as incorrect investigations.

Primary focus on recognizing the contribution of false positive to the raw data, factors that led them to be captured in the metric, and to fix it from a systems standpoint.

This project led to a 10% improvement in an associate’s metrics and gave a more accurate picture of the departmental performance.

Other factors used to determine the success of the project were redundancy (in audits and/or investigations) which went down by 7% which led to readjusting of yearly targets from 11.8% to 8%.

Challenges: We did not have variables for some key issues that we wanted to fix, like total misses between suspicious transaction was detected to when it was flagged. We had to work around this by working encouraging associates to be proactive and forward these instances.

Education

Bowling Green State University, Bowling Green, Ohio August 2017

Master of Science in Analytics

GPA: 3.2

MVSR Engineering College – Osmania University, Hyderabad, India June 2011

BE in Computer Science and Engineering

GPA: 3.12

Contact this candidate