Dearborn, MI ***************@*****.*** +1-313-***-****
www.linkedin.com/in/harshitaShiva
SUMMARY
* **** ********** ** * junior business analyst using Tableau, Excel, SQL, Airtable, Microsoft suits and google suits to cater to client’s report analysis.
Proficient in Python, R, OOP, SDLC, C/C++, Excel, Map Reduce, Tableau, SQL, Agile methodology, Machine Learning methodologies, ETL, Multivariate Statistics
Passionate about statistical insight driven analytics and machine learning methodologies.
Well organized, strong oral and written communication, creative and collaborative skills with various stakeholders from different teams.
Adaptive to changes, willing to learn new techniques to apply Best Practice to business need. EDUCATION
University of Michigan, Dearborn, MI, USA August 2019- Present Masters in Data Science anticipated graduation date – May2021 New Horizon College of Engineering, Bengaluru, India August 2014 - July2018 Bachelors in Information Science and Technology
TECHNICAL SKILLS
Languages: C, C++, Java, Python, R, PHP
Operating Systems: Windows, Linux
Database Technologies: SQL, Hadoop, Spark, ETL Frameworks Visualization Tools: Tableau, PowerBI
Understanding in: Agile methodology, SDLC, SAP HANA, JIRA PROFESSIONAL EXPERIENCE
Junior Business Analyst, Parentof Solutions Pvt. Ltd. (Bengaluru, India) August 2018- August 2019
Responsible for requirement gathering, data collection and categorization of raw data acquired from clients.
Assigned for communication with clients to improve innovation and project optimization.
Organized project planning & tracking progress of the project and held brainstorming sessions to propose effective strategies.
Created feedback story boards and dashboards on collected data using Tableau.
Responsible for tracking the project’s progress, feedbacks and client followups.
Maintained communication between team, manager and the Managing Director.
Tools Used : Microsoft Excel, Google office Suite, Airtable, Tableau, SQL PROJECTS
1. Coronavirus: A brief Analysis September 2020 - December 2020
Implemented Time series analysis using python libraries to analyze and understand the growth of the Coronavirus disease on a regional basis.
Calculated the recovery rate and mortality rate based on the data set to understand the severity of the virus.
Analyzed the growth factor of the virus over months and weeks worldwide and in the USA.
Implemented a simple linear regression model to see how much the actual records have deviated from the predicted model. 2. NFL Big Data Bowl -2020 September 2020 - December 2020
Predicted the average expected points added (or EPA) per player, in order to determine how good players are at defending against passing plays relative to other players using the NFL dataset.
Processed the data using Pyspark in order to provide data that could be used to build a model to predict a player’s average EPA.
Implemented a linear regression model to effectively predict the data with a Mean Square Error value of only 0.22 on previously unseen testing data.
HARSHITA SHIVARAMAKRISHNA
3. Speaker Recognition June 2020
Implemented an algorithm in python to recognize the speaker among 100 unique audio files using classification methods.
Implemented the librosa and sklearn libraries.
4. Mobile Inventory Management System January 2020- April 2020
Created a web application based Inventory management system using php and SQL queries to automate the process of maintaining billing, consumer/employee details and summary of products sold/unsold.
Implemented Database Development process, Entity Relation diagram, Relational Schema Diagram and Triggers
Tools Used: AMPPS Webserver, PHP, phpMyAdmin
5. Predicting Emotion Intensities Twitter October 2019- December 2019 Dataset- “WASSA-2017 Shared Task on Emotion Intensity” Saif M. Mohammad and Felipe Bravo-Marquez.
Approach consisted of 3 distinct phases: preprocessing, feature engineering, and machine learning.
Preprocessing included cleaning the dataset to remove irrelevant features - Ekphrasis library
Feature extraction consisted of Syntactic, Semantic and Word Embeddings.
Iterated over 3 different regression algorithms: Linear Regression, Random Forest Regression and SVM.
Chose the best model ascertained for each emotion and calculated the mean of correlation coefficients over the four emotions, and arrived at a value of 0.6264, which would was at 10th position on the leaderboard of WASSA 2017 competition.
Tools Used: Python 3.0, Jupyter Notebooks, NLTK toolkit 6. Genetic Algorithm-Based Gene Subset Selection for Disease Prediction December 2017 - May 2018
Obtained gene subsets using genetic algorithms and analyzed them to retrieve informative gene based on non-dominated Pareto.
Used Support Vector Machine (SVM) algorithm for data set classification and Principal Component Analysis (PCA) to reduce data set dimensionality.
Implemented algorithms in Python and obtained dataset from the online public catalog of National Cancer Institute, USA.
The classification provides us with predicted values ranging from 0 to 4. 0 being a healthy gene set, 1 being a person with plausible disease, 2-4 being the different stages and intensities of the present disease.
The result is a scatter plot representing two sets of data points as diseased and healthy.
Tools used: Ubuntu 16.04, Python 2.7
CERTIFICATIONS
JMP software, Dec 2020 (Ongoing)
SAS Programming, Udemy, July-2020.
Tableau 2020 A-Z – Udemy, June-2020.
Machine Learning- offered by Stanford, Coursera, (Ongoing).
R Programming, Inference for Linear Regression, Data Visualization - Data Camp, Sep-2019.
Python for Data Science, Data Camp, Dec 2018.
SQL for Data Science, Udemy, Nov 2018.
Digital Marketing, Digital Academy 360, April 2018.