Post Job Free

Resume

Sign in

Data Analyst Python

Location:
Chicago, IL
Salary:
75000$- 95,000$
Posted:
January 03, 2021

Contact this candidate

Resume:

SHUBHAM PURI

+1-773-***-**** adi4qt@r.postjobfree.com https://www.linkedin.com/in/purishubham/ https://github.com/shubpuri

SUMMARY

Business Analyst professional with skills in Software developing, machine Learning, data analysis, predictive modelling, relational databases and proficiency in probability, statistics, data extraction, data cleaning and other data mining & warehousing techniques

Highly proficient in writing complex SQL based queries to create key performance indicators while working with big datasets.

Experienced in creating dashboards and ad-hoc reports using various analytical tools and techniques.

EDUCATION

Master of Science in Business Analytics, University of Illinois, Chicago Dec 2019

Bachelor of Technology, Computer Science, Guru Gobind Singh Indraprastha University, India May 2018

TECHNICAL SKILLS

Languages and Databases: Python, R, MySQL, NoSQL, Git, Oracle, T- SQL, Java, JavaScript, C, C++, HTML 5, XML, XQuery

Statistical and Visualization tools: Tableau, Power BI, Micro strategy, QlikView, Qlik Sense, R Studio, SAS, SPSS, RapidMiner, Google Analytics, Matplotlib, Alteryx platform, Spotfire, Adobe Analytics, Data Studio, Navision, Looker, Datameer, Informatica

Technologies: AWS-EC2, S3, SageMaker, Azure, Google Cloud Platform, Spark, Hadoop, Map Reduce

Packages: Pandas, Numpy, Scipy, Scikit-learn, SQLAlchemy, TensorFlow, PyTorch, Keras, OpenCV, Selenium, BS4, Scrapy, MS Excel

Algorithms: Multinomial Regression, Decision Trees (Random Forests, GBM, XGBoost), Clustering (KNN, K-Means), Classification (Naïve Bayes, SVM), Unsupervised learning (PCA, NMF)

PROFESSIONAL WORK EXPERIENCE

Data Analyst, University of Illinois - Department of Disability, Chicago July 2020 – Current

Project description: Analysing the refugee and immigrant data about disabled and non-disabled immigrants in the United States.

Extracted US census data for disabled and non-disabled immigrants and performed exploratory data analysis using Python.

Created Tableau dashboard & stories to compare various attributes of US immigrants with respect to their disability status and presented them in a report in order to help investors make informed decisions.

Applied statistical techniques on Pre-Post survey data about disability involving techniques like hypothesis testing to check for significance, correlation matrix etc.

Technologies Used: Python, SQL, Power BI, Tableau, MS Excel, Jupyter

Data Analyst, Rewards Network, Chicago Mar 2020 – July 2020

Project description: Evaluating the performance of current products offered by RN in various restaurants across US by analysing the credit card network data and their member data.

COVID’19 Impact: Collaboratively worked on a dashboard to monitor the impact of Covid’19 on the business hence analysing the vulnerable sectors with a potential for damage control.

RTN Impact: Performed an analysis on a new product ‘Real time notifications’ by creating various SQL Server tables. Calculated the money spent by a customer in dines for restaurants within the last and next 90 days with & without RTN.

Merchant availability: Created various KPIs using SQL tables and assessed the functionality of the payment pipelines between merchants, banks & Rewards Network for consistency. Identified authentication, settlement blockages & suggested potential improvement areas to minimize capital loss through various discrepancies.

Technologies Used: Python, SQL, Git, ETL, R, Micro strategy, Power BI, Dbeaver, JIRA, AWS Redshift, S3, Linux

Data Analyst, Machinery Marketing International, Chicago Oct 2019 – Feb 2020

Project description: Extracting more leads for the sales department along with analysing the existing customer database.

Used XQuery, XML & Selenium to scrape over 500,000 records from various websites to get potential client leads. Performed univariate & bivariate data analysis on the existing database in order to reach the right customer with the right product to increase sales. Reported my findings to the CEO and collaborated with various teams across MMI for critical thinking.

Conducted technical interviews to hire summer interns in the data department.

Technologies Used: Python, Selenium, Git, XQuery, XML, Excel, Google HubSpot, Tableau, Data Management

Data Scientist, Reading in Motion, Chicago Aug 2019 – Dec 2019 Project description: Extract the student’s dataset used by RIM as part of their curriculum and perform analytical operations over it.

Cleaned the datasets, performed exploratory data analysis and checked for correlations.

Created various visualizations to compare the performance of different students & classes along academic years. Applied predictive models for the following year performances.

Designed a more efficient data warehouse by creating primary keys in each table to maintain a more systematic data entry and management in future.

Technologies Used: Python, MS Access, Excel, Tableau

Data Scientist Intern, Power Construction LLC, Chicago Jun 2019 - Aug 2019

Project description: Perform analytical operations and apply predictive models on the incidents data in order to minimize the money lost through claims at the construction site.

Created a Safety Dashboard using Power BI and Tableau to rate all the subcontractors based on various metrics and correlations in order to decide the safest job for a particular subcontractor score.

Performed data management and extracted tables using SQL from various sources followed by statistical significance testing and used time series predictive model (Accuracy 79.5%) to predict the no. of incidents & the value of their respective claims in the following years. Productionized these changes in a System Design Document for the Safety Department.

Software development intern, HCL Infosystems, Delhi Apr 2017- Jul 2017

Used machine learning to increase the efficiency & usability of the desktop keyboard such that the keyboard keeps on training & updating itself according to the user’s requirements and way of writing. Used Excel to manipulate data.

Performed basic data operations on excel by creating formula using dax.

ACADEMIC PROJECTS

Classification: Detection of Credit Card Fraud

Performed univariate, bivariate data analysis techniques on variables, balanced dataset using the SMOTE sampling technique

Implemented logistic regression for the detection of credit card fraud which improved the recall by 51%

Classification: Credit Score Analysis

• Predicted applicant’s good/bad credit risk using decision tree classification model with an accuracy of 86%.

Implemented logistic regression classification model to buy/sell stock based on sentiment scores with precision score of 86%.

Natural Language Processing: Stock Trading using Sentiment Analysis

Performed web scraping using Selenium to scrape news headlines of various stocks from businesstimes.com & performed a sentiment analysis to get a polarity score

Natural Language Processing: Text Mining and Sentiment Analysis

Predict whether a customer would give a positive or negative review based on the polarity and sentiment scores of YELP Data.

Tokenized, normalized, and filtered text and used Lasso regression, Random forest to predict the review sentiment scores.

Electronic Medical Records

Classified the kind of patients who access the EMR (Electronic Medical Records) portal using a real-life dataset from a health care survey.

Used different attributes for screening purposes and cleaned the data by removing NA/Error values and transforming the features into various categories. Implemented techniques like Random Forest, Logistic Regression to choose the best classification model after finding out which attributes were more important through exploratory analysis.

Chronic Kidney Disease Prediction

Predicted if a patient has Chronic Kidney Disease based on different medical factors like Age, Diabetes, BMI etc. from a real-life dataset.

Cleaned the data by removing the NULL value cases in order to obtain a pure dataset. Performed CHI-Squared and Correlation test for feature reduction followed by Cross Validation and Logistic Regression to analyze the features which would contribute more towards a patient’s odds of getting CKD.

Prostate Cancer Identification

Predicted if a cancer patient will survive after 7 years based on various medical metrics like stage of cancer, race, symptoms etc.

Analysed the interaction between various factors after cleaning the data by removing NULL values and transforming variables. Applied statistical models to classify & identify features that were more significant in increasing the odds of survival for a Prostate Cancer patient.

Analysis and Prediction of House Sales in King County

Studied various factors such as area of the house, number of bedrooms etc. to build different models in order to determine the house prices and compared them effectively to select the best model.

Used the dataset from KAGGLE which contained 19 house features along with 21,613 observations and Implemented different Predictive models such as Regression, Decision Tree and Boosting Algorithms such as ADA and Gradient Boosting.



Contact this candidate