Data Analysis

Location:

San Jose, CA

Posted:

October 16, 2017

Contact this candidate

Resume:

Sucharu Gupta

Add: *** Brandon Street #***, San Jose, CA 95134

Mob: 669-***-**** Email: ***********@*****.***

CARRER OBJECTIVE: Seeking full time job as Data Scientist/Analyst. I am authorized to work in USA and don’t need any visa sponsorship in future.

EDUCATION

Masters in Statistics May 2015 - May 2017

San Jose State University, CA

Bachelors of Engineering in Electronics and Communication Jul 2007- May 2011 RIMT University, INDIA

RELATED COURSEWORK: Regression, SAS (Statistical Analysis System), Sampling, Web and Data Mining, Design & Analysis of experiments, Multivariate Analysis, Statistical Consulting, Clustering Analysis and its Applications. SKILLS

• Programing languages: R, Python, SQL, SAS

• Expert in Visualization and dashboards using Shiny, ggplot2 and Tableau

• Machine Learning Algorithms: Linear and Logistic Regression, Decision Tree, Clustering techniques, Association Rule Mining, Naïve Bayes, Dimension Reduction Techniques (Principal and multiple correspondence Analysis).

• R package Libraries : caret, dplyr, tidyr, randomForest, mice, Rtsne, rgl, clusCA, clustrd, rpart, arules, nnet, ggplot2, missForest

• Python Packages:Pandas, Numpy, Scikit learn, matplotlib.

• Tools: JMP,SPSS Modeler, Minitab18

GRADUATION PROJECTS

Winner of IFCS 2017 Data Analysis Challenge in Tokyo regarding low back pain(LBP) patients:-

• Objective:

o To find automatic classification of the patients based on the baseline variables in order to find clinically applicable and useful groups.

• Analysis done using Cluster Correspondence Analysis technique for dimension reduction and Reduced K- means clustering technique on top of it.

• Missing Data Imputation is done using mice and missforest packages in R language.

• Data Visualization done using Profile plots, 2D and 3D graphs using Rtsne library.

• Important decisions were taken while choosing the number of clusters and interpretation of clusters. Analysis of Bacterial Mutagenicity NTP Dataset

• Objective:

o Determine the toxicity of chemicals to identify their harmful effects on DNA of living beings. o Find Biologically active chemicals which has close relationship with Carcinogenicity.

• Used data mining techniques such as Logistic + Multinomial Regression, Random Forest to study factors of causing gene mutagenicity in toxicological databases.

• Used web application framework, RShiny to visualize data analysis as interactive web application. Cost Estimation and Analysis Regression Models

• Objective:

o Cost analysis Model to be used internally by management to describe actual cost of products. o Cost estimation Model to determine the product price for potential customers.

• Analysis involved analyzing the impact of each independent factor in explaining the output.

• Variable and Model selection was done using backward and forward selection methods, R-squared values, P-value, AIC/BIC values.

• ROC curve plots and Confusion Matrix are used to know the Accuracy of the results. Overall Customer Satisfaction Analysis Project

• Objective:

o To identify what factors influenced the overall satisfaction of consumers o To reduce customer churn and foster loyalty

• The overall satisfaction was rated from 1 to 5 range, with 1 being the least satisfied and 5 being the most satisfied. Built Multinomial Regression models using R and Scikit-Learn module to study the independent factors i.e customer dealing, Size of stores, workers present, location, waiting times etc. which played a vital role in impacting the overall results.

• Studied how to maintain the loyalty of customers by looking into important factors like minimizing dropped calls, Internet service speed and delayed text notifications.

• The data was taken through various stages like cleaning, integrating, handling NA's, removing outliers, building models, testing, and finally implementing the model to the end users. JOB EXPERIENCE (2+ Years)

Data Analyst, at Punjab and Sindh Bank, INDIA Sep 2012 – Jan 2015

• Data Extraction and analysis done using TransUnion CIBIL credit reports where CIBIL is one of the largest collection of consumers Data globally.

• Analysis done before giving approval for loan, related to factors like Credit Score, Homeownership, Purpose of loan, Term length of loan, loan amount, Annual income, Employee or not, Previous bankruptcy fraud.

• Data Handling, Maintenance and Analysis done using Finacle Software (By Infosys)

• Ability to multi-task, prioritize and manage multiple simultaneous projects under tight and conflicting deadlines.

Contact this candidate