Sales Data

Location:

Brooklyn, NY

Posted:

March 28, 2020

Contact this candidate

Resume:

VINEETHREDDY NALLA

Jersey City, NJ ***** 201-***-**** ******************@*****.*** https://www.linkedin.com/in/vineethreddy-nalla/

EDUCATION

Pace University, Seidenberg School of CSIS New York, NY

Master of Science (M.S.) in Information Systems Machine Learning and Data Science GPA: 3.7 December 2019

University College of Engineering, Osmania University Hyderabad, India

Bachelor of Engineering (B.E) Electronics and Communication Engineering May 2017

RELATIVE COURSEWORK

DBMS Deep Learning Cloud Computing Machine Learning Probability and Statistics NLP Data Visualization Computer Vision.

TECHNICAL SKILLS

Programming Languages: C++, Java, SQL, Python, R, LINUX, Matlab, SAS, Spark, Perl, Scala, Julia, and JavaScript.

Relevant Python Libraries: Pandas, Theano, Pytorch, Scipy, Matplotlib, Seaborn, Sklearn, Keras, Tensor Flow, Numpy, nltk.

Tools: Qlikview, Rshiny, Tableau, Power BI, Excel, Looker, MongoDB, Airflow, D3.js, NoSQL, Docker.

Data Science: Random Forest, SVM, Logistic Regression, Decision Trees, Time series, KNN, Gradient Boosting.

Cloud Technologies: Hadoop, Map Reduce, Hive, Kafka, Athena, EMR, RDS, HDFS, EC2, S3, Redshift, AWS, Kinesis.

WORK EXPERIENCE

Sema4 Stamford, CT

Data Science Intern- Bioinformatics September 2019 – December 2019

●Migrated the clinical information of the cancer patient’s EMR data into AWS Redshift, and performed EDA to understand the correlation relationship between the features.

●Developed NLP regular expression rules for identifying the keywords pertaining to different cancer types in order to determine the stages of cancer, and visualized the percent of the cancer patients in different stages.

●Performed survival analysis on the genomics data to predict patients’ living chances based on his cancer stage by evaluating several machine learning models (Random Forest, XGBoost, LR) to build a two-stage model, achieving an F-Score of 90%.

Boehringer Ingelheim Danbury, CT

Biostatistics and Data Science Intern June 2019 – September 2019

●Collected, merged and cleaned raw sales data of the ongoing clinical trial drug extracted from internal websites using web scraping techniques, and performed EDA with Tableau to understand the key features of the data.

●Created two interactive R Shiny dashboards to monitor KPIs, helping the higher management better understand the business before releasing the drug into the market.

●Used dimensionality reduction techniques PCA to reduce multidimensional features and employed AdaBoost ensemble model over the reduced dimension space to predict drug sales, reducing the error by 18% over the baseline model.

Pace University New York, NY

Data Science Researcher, NLP & Machine Learning January 2018 – May 2019

Performed NLP to conduct sentiment analysis using afinn, Bing, and NRC lexicons on customer reviews using Tidy to determine the quantity of positive and negative reviews.

Extracted data by web scrapping using CSS selectors and automated the process of Text normalization, stemming, lemmatization, and stop word removal by implementing scheduling in R.

Coded an R Shiny application to display the rating, review summary, star rating predictions, along with TF-IDF and afinn lexicon results to visualize the predictions and provide actionable data for future strategy.

Mercedes Benz, Silver Star Hyderabad, India

Enterprise Data Management January 2016 – November 2017

●Designed a database using My SQL to manage the inventory of car parts to notify customers about part availability.

●Configured the database using Access and SQL to track past customer behavior and increase the profit margin.

●Collaborated with other departments to validate the accuracy of customer and car part data using SQL Server.

ACADEMIC PROJECTS

Superstore Sales Prediction Nov 2019 – Dec 2019

●Used a combination of time series analysis to make forecasting predictions for a different products of a furniture store to the next year.

●Visualized using seaborn package to see the trend and seasonality of the sales in each year, observed the sales have increasing trend and common seasonal sales with the low sales in the beginning and high sales at the end of the year.

●Tested the stationarity of the time series by using Dicker Fuller Statistical test and eliminated the non-stationarity in the time series of the data with the appropriate “ACF” and “PACF” values before applying a ML model.

●Modeled yearly averages using a SARIMA model, and the monthly residuals using three different regression models. Of these, gradient boosting regression showed the best performance, with a 61% improvement over the baseline model.

Credit Scoring Project, Pace University Sep 2019 – Nov 2019

●Conducted exploratory credit score analysis in Python to predict loan obtainability.

●Extracted important features for determining the loan status by applying Random Forest and Decision Trees.

●Compared three ML models by showing ROC curves, t-test statistics, and confusion matrix to predict the loan availability.

●Applied grid search cross-validation technique to perform hyper parameter tuning and achieved a 73% accuracy rate in predicting loan feasibility.

Contact this candidate