Data Analyst Engineering

Location:

United States

Posted:

March 08, 2020

Contact this candidate

Resume:

PRERAK SHAH

857-***-**** ****.****@*****.***.*** https://www.linkedin.com/in/sprerak48 Boston

EDUCATION

Northeastern University Boston, MA

Masters in Analytics with concentration in Statistical Modeling (GPA: - 3.89/4) Sep 2019- Present Dwarkadas J. Sanghvi College of Engineering, Mumbai University Mumbai, India B. E. in Computer Engineering (GPA: - 9.00/10) July 2016- May 2019 Shri Bhagubhai Mafatlal Polytechnic, Mumbai University Mumbai, India Diploma. in Computer Engineering (GPA: - 8.89/10) July 2013- May 2016 PROFESSIONAL EXPERIENCE

April Innovations Mumbai, IN

Machine learning Developer Nov 2018 – July 2019

• Participated in meetings, workflow design and making of optimal data pipeline architecture

• Performed data wrangling on 10k image data for training and testing purpose using pandas & numpy libraries

• Processed YOLO-v3 pre-trained model for image classification and detecting Nude images on the company website

• Created CNN model for categorizing and blocking Nude images on the company website using python libraries and hosted on AWS EC2 server

• The model performed in a range 84-90% of accuracy in terms of predicting Nude images

• Developed dashboard using flask and matplotlib library to track performance metrics and generate insights of NudeNet predictions

• This project helped increase visibility of the website in SEO process Money Control Mumbai, IN

Data Analyst Dec 2017 – Oct 2018

• Involved in client meetings, assembling large data sets that meets functional business requirements

• Performed web-scraping using beautiful soup library on comments i.e. about 5k of stockholders from websites and blogs

• Applied data munging using pandas, numpy and structured the raw data in SQLite database.

• Performed sentiment analysis with NLTK & NLP libraries on SVM & RNN model, which increased the accuracy by 15%, which helped in preparing the Money Control Chatbot

• The analysis was drawn in Dialogue flow and the sentiment analysis helped design Chatbot to communicate and answer basic stocks-related questions to end-users based on behaviors ACADEMIC PROJECTS

Title: Detecting 3 types of Spoofing in Trading with Citi Bank (MIT Fintech) Boston, MA Technology: Python, Tableau Feb 2020-Present

• Performed Data munging and normalization of features using pandas and numpy libraries

• Applied PCA for feature selection and statistical information

• Developed Logistic Model using sklearn library for predicting spoofing in Trading data of more than 14lakh entries

• Improved the accuracy by 20% by using XGBoost for classification and prediction

• Visualized the spoofing by each end-user in a dedicated time frame using Tableau Dashboard Title: Coronavirus Data Visualization Boston, MA

Technology: Excel, Tableau. Jan 2020-Feb 2020

• Gathered data from publicly available from WHO, CDC using Tableau Server for dates 31 Jan’20 to 9 Feb’20

• Performed inferential statistics for top 10 countries and top 10 provinces in china affected by coronavirus using correlation

• Calculated the Fatality ratio and recovered vs confirmed ratio for mainland china

• Plotted the infected areas with heat map

• Compared deaths of previous corona-virus outbreaks with NCoronavirus and visualized using pie charts Title: Churn Modelling Boston, MA

Technology: R, R Shiny Nov 2019-Dec 2019

• Performed data cleaning on churn-modelling datasets i.e. 10k using dplyr and bsda package

• Created regression models i.e. Linear, Logistic to determine the trend of customer’s behavior with credit-card services.

• Analyzed the factors affecting the customer’s exit from credit service using clustering and regularization techniques in RStudio for visualization and inferential statistics using party, ggthemes, caret, corrplot, ggplot2, mlr

• Using SVM model resolved overfitting which help in achieving 83-88% of accuracy

• Resulting in prediction of customer’s exit from the credit-card service using randomforest, XGBoost, gbm Title: Autonomous Tagging of Text Data using Machine Learning Mumbai, IN Technology: Python, Google Collab and AWS July 2017-May 2019

• Performed web-scraping using urllib of stack overflow discussion board and stored using buckets on AWS EC2 server

• Applied data cleaning and loading of data from wide variety of data sources (Quora, stackoverflow) using SQL querying

• Removed stop words and lemmatization of text data using nltk library

• With the help of feature-extraction technique i.e. tfidf vectorizer, assign weights to each word in the text

• Developed LSTM model using keras, for determining the most-used feature tags and grouping of similar discussions

• The model performed in the range 82-88% accuracy, categorizing the similar discussion based on tags and feature vector

Contact this candidate