Kashyap Bhuva
Jersey City, NJ 508-***-**** ********@*****.*** *******@***.*** GitHub LinkedIn Blog Seeking fulltime job opportunities in Data Science Analytics starting May 2021 EDUCATION
MS in Data Science: Worcester Polytechnic Institute: - GPA 4/4 May 2021 Data Science Graduate Credits: NMIMS University: - GPA 3.47/4 May 2018 Bachelor of Engineering, Mechanical, University of Mumbai: - GPA 7.42/10 May 2017 PROFESSIONAL EXPERIENCE
National Grid Data Scientist Co-op Brooklyn, New York July 2020- Till Date
Weather Forecasts Validation:- Evaluating the Performance of weather forecast models of different vendors
• Enabling the company to make decision of un-subscribing from the under-performing vendors.
• Productionizing the validation of weather forecast models from different vendors and comparing their performances.
• Involves creating Python Scripts in Virtual AWS Environment for building an Automated Scheduled framework which extracts data from unstructured JSON data files, does data preprocessing on the actual as well as forecasted data, and evaluates the accuracy metrics on Python and Power BI.
Worcester Polytechnic Institute Graduate Assistant Worcester, Massachusetts Jan 2020-July 2020 Mathematics and Statistics Dept.
Hansa Cequity Associate Data Scientist Mumbai, India Jun 2018-May 2019 a) Customer Churn Model: - To Predict whether a de-activated customer will Churn.
• Providing a Model Lift of 2.3, managed to slash the churn rate by 25% by enabling the contact-center to target the top 3 decile high propensity customers.
• Involved EDA, feature engineering, continuous variables categorization, feature selection using information value and weight of evidence, evaluation & validation on R using the Gains & Lifts Chart, AUC & KS Statistic. b) Forecasting: - Predicting no. of Promo Calls to be made for each product in the next 3 months using Time Series
• Used Box-Jenkins Methodology (ARIMA) for forecasting.
• The forecast enabled an optimal allocation of the contact-center executives thereby minimizing the resource ideal time as well as resource shortage saving 5276 work hours allocation/month c) Market Basket Analysis: - For the Recommendation of the Media Packs/Plans
• Obtained the top Association rules, by setting a minimum threshold for support & confidence.
• Implemented customer base profiling using decision tree for all the association rules wherein an antecedent had multiple consequents, to recommend the most likely pack a customer can purchase.
• Enabled the contact center to increase the cross-sell with a lift of 1.8 d) Loan Purchase Propensity Model: - To Predict the Convertibility of the Leads
• Obtaining a model lift of 2.5, attained the leads targeting goal of the client.
• Involved EDA, data cleaning, categorizing continuous variables, feature selection using information value, cross-validation on R and evaluation using the Gains & Lifts Chart & AUC.
• Classified leads into Hot, Warm, Cold by setting the probability thresholds for each bank branch. Completed Intime & Out of time Model Validation. Deployed and productionalized the model in the real-time using SQL procs for an automated lead scoring, Test & Control Approach for the monthly tracking of the model performance. Algorithms used: Logistic Regression, Xgboost, Lasso & Ridge, Decision Trees, Apriori, ARIMA Falcon Award Winner (Oct 2018), for being amongst the top performing new recruits. TECHNICAL SKILLS & CERTIFICATIONS
Concepts: Statistics, Predictive Modeling, Machine Learning, Data Visualization, NLP, Neural Networks & Deep Learning, Business Analytics, Clustering for Segmentation, Cloud Computing Tools/Framework: Python, R, SQL, SAS, SPSS, PowerBI, Tableau, Excel, DataRobot, AWS EC2, AWS S3 Python Libraries Familiar:, Numpy, Pandas, Scikit Learn, NLTK, TensorFlow and Keras Certifications: Base SAS Certified Programmer, Completed 16 Certificates from Moocs in data science ACADEMIC PROJECTS (WORCESTER POLYTECHNIC INSTITUTE) a) Sentiment Analysis: Detecting Hate Speech on Twitter using various Natural Language Processing techniques like Stemming, Lemmatization & Stop-words Removal, Vectorization techniques like Bag of Words, Tf-Idf, Word2vec & classification using Naïve Bayes, Support Vector Machines, K-Nearest Neighbors & Artificial Neural Networks. b) Bank Stock Returns Prediction (Ongoing): This project involves predicting the annual stock return of a given bank using Clustering and subsequent model building on each cluster of banks. The dataset contains stock returns data of several banks over the last two decades.
c) Predicting the Appointment No-Shows: Found out the No-Show propensity on an appointment level for a hospital in Greater Boston. Used python for data cleaning & feature engineering, AutoML tool of DataRobot for modeling and the Payoff-Matrix to iterate & find out the classification threshold at which the profit is maximum. BLOG AND RESEARCH PUBLICATIONS
• Blog on Towards Data Science: Kaggle/Academic vs Real-World Data Science Analytics May 2020
• Publication: Comparative Study of the Machine Learning techniques for Predicting the Employee Attrition Aug 2018
• Publication: Data Intelligence for Maximizing the life of the CNC cutting tool Jan 2017