Post Job Free
Sign in

Data Analyst Mental Health

Location:
Albany, NY
Posted:
March 27, 2020

Contact this candidate

Resume:

SHUBHAM PAL

+1-518-***-**** adcgmv@r.postjobfree.com LinkedIn GitHub

EDUCATION

Master of Science Data Science University at Albany, SUNY CGPA: 3.30 Graduate: May 2020.

Relevant Courses: Data Mining, Machine Learning, Natural Language Processing, Statistics, Data Warehousing and Business Intelligence,

Business Analytics, Vector Analysis, Theory of Stochastics

Bachelor of Technology Electrical Engineering Jawaharlal Nehru Technical University, Hyderabad Graduated May 2018

Relevant Courses: Programming in C, Data Structures and Algorithm, Linear Algebra, Mathematical Modelling, Numerical Methods, Complex analysis

PROJECTS

QUORA QUESTION PAIR SIMILARITY (Machine Learning, Jan 2019- March 2019) –

• Identify which questions asked on Quora are duplicates of questions that have already been asked.

•This could be useful to instantly provide answers to questions that have already been answered.

•The task is to predict whether a pair of questions are duplicates or not. Extracted text features using NLP (BOW, TFIDF, W2VEC). Got Log Loss of 30% using XGBOOST.

APPAREL-RECOMMENDATION-USING-TEXT-AND-IMAGE-SIMILARITY (Machine Learning, Aug 2019 – Dec 2019) –

•Downloaded amazon’s apparel data from amazon’s product API’s.

•Recommend similar products/apparels based on product title and product image using BOW, IDF, W2VEC and CNN(VGG16).

•Designed weighted W2VEC model by assigning weights to the product brand and color gave better recommendations.

•Also developed simple CNN(VGG16) image similarity recommendations using Keras.

MACHINE LEARNING FOR PROFIT AND FUN (Machine Learning, Aug 2019 – Dec 2019) –

•Obtain a dataset that contains sufficient features about HKJC (Hong Kong Jockey Club) races and competing horses to build predictive models.

•Perform Exploratory Data Analysis to gain insight into the underlying structure of the data that may then aid in feature engineering and model specification.

•Perform random over sampling to address the imbalanced class of this classification problem.

•Train and test different models to come up with the best through experimentation with techniques like Grid Search and Cross Validation, then evaluated using metrics such as Accuracy score, Precision/Recall, ROC AUC score.

CHATBOT FOR HOTEL BOOKING (Natural Language Processing, April 2019 – May 2019)-

•Project to evaluate how Intelligent Chatbot’s can be used to access Open Data Sets.

•Performed data Extraction hotel data from using RapidAPI in Python3.

•Transformed the large raw data in RDBMS database using sqlite3.

•Design an interpreter to extract intent and entities of the input message and map it to relevant data in the dataset using SQL query.

STACK OVERFLOW TAG PREDICTOR (Web Scrapping, Sept 2018 - Dec 2018) –

• The task is to predict as many tags as possible for a given stack overflow question which contains title, body(code).

•Precision and recall should be high as correct tagging helps to assign question to right person.

•Modelled with only 0.5 million data points as limited compute constraints.

•Got a precision and recall of 66% and 33% respectively.

WEB APPLICATION FOR DATA ANALYSIS (Data Visualization, Jan 2019 – Feb 2019)

•Develop an interactive web application UI which allows user to upload their datasets, visualize its different features, perform different types of analysis and check its summary, then further predict and visualize its results.

•Used dynamic select input feature Shiny tool, which changes according to datasets uploaded in the UI.

•Used tools such as plotly, shiny in R for making this interactive dashboard.

SKILLS

Programming Languages: Python, R, SAS, MATLAB

Database Systems: NoSQL, SQL, ETL (Data Warehousing and Data base Modelling), Advance Microsoft Excel,Spark

Machine Learning and Deep Learning: Classification, Regression, and Clustering, PCA, t-SNE, Recommendation (Using Sklearn), Neural network (MLP), CNN, RNN(LSTM) using Keras.

NLP: Bag of Words, TFIDF, W2VEC, N-grams, NLTK with hands on experience

Python Libraries: Matplotlib, Pandas, IPython, Scikit-learn, NLTK, Seaborn, TensorFlow, Tweepy, Keras, NumPy.

Data Visualization: Tableau, Shiny, Seaborn, Matplotlib

Cloud Computing skills: Amazon Web Services (AWS).

EXPERIENCE(s)

New York State Office of Mental Health, Albany, NY (Data Analyst Intern) February 2020 – Present

Extract interpret and analyze data to identify key metrics and transform large raw data into relevant, actionable information using PL/SQL & SAS.

Collect, cleanse and provide modelling and analyses of structured and unstructured data for Office of Mental Health from MDW (Medicaid Data Warehouse) using Oracle SQL Developer.

Create visually impactful dashboards in Tableau for data reporting using roll-up tables and publish them on server.

Create reports to share findings and recommendations with the internal team and other stakeholders.

The New York State Energy Research and Development Authority (Data Analyst) April 2019 – February 2020

•Engineered real-time financial reports & dashboards in Tableau for NY’s residential and commercial projects.

•Spearheaded multiple analyses, devising predictive models & algorithms on a wide range of key metrics using SAS and SQL.

•Presented insights & strategy recommendations to project managers, team leads, & executive management.

•Manipulated big data to provide queries, statistical summaries, and data visualizations.

•Forecasted consumption / engagement & identified consumer preferences to augment social media outreach.

RoboSpecies Technologies Pvt. Ltd., India (Data Analyst) May 2017 – Jan 2018

Develop and implement data collection systems and other strategies that optimize statistical efficiency.

Wrote advance SQL query for extracting key metrics and performance optimization.

Identify & Remove duplicate data and perform importing, cleaning, transforming, validating from the database.

Make regression and classification models, optimize and visualize the sales of various robotic parts that the company deals in and prepare reports in Tableau for conveying insights.



Contact this candidate