Resume

Sign in

Data Python

Location:
Buffalo, New York, United States
Posted:
October 15, 2018

Contact this candidate

Resume:

+1-716-***-**** Shreyas Prashant Kulkarni ***, Englewood Ave, Buffalo, NY 14214

ac7d57@r.postjobfree.com https://www.linkedin.com/in/shreyaskulkarni20/ https://github.com/Shreyas20

EDUCATION

University at Buffalo, State University of New York Aug 2017- Feb 2019 (Expected) Master of Science (M.S.) in Data Science GPA - 3.38 Courses: Probability Theory, Numerical Mathematics, Statistical Data Mining I, Programming and Database Fundamentals for Data Scientists, Data Intensive Computing, Machine Learning, Statistical data mining 2, Databases system, Data Science: Industrial overview Pune Institute of Computer Technology, Pune July 2013-May 2017 Bachelor of Engineering (B.E.) in Computer Science Graduated with first class Courses: Data Structures, Operating Systems, Software Engineering, Computer Network, Database Management, Parallel Computing

WORK EXPERIENCE

1) Froot Research, Pune, India Data science Research Intern June 2018- Aug 2018

- Technologies – Python, Apache spark, Association rule mining, MySQL, Pandas, NumPy, WARMR, ILP

- Developed an algorithm to bucketize the integer data based on the distribution along the column/ array to generate the frequent patterns.

- Generated frequent patterns for multi-relational database with virtual joins using spark. Studied if any other method gives better efficiency.

- Detecting anomalies in Association rules mining to check if timestamp has any effect on generated rules. 2) Unisoft Technologies, Pune, India Database Intern June 2017- July 2017

- Focus on Oracle database - Passed Oracle Database Certified Associate with 77% 3) GS Lab, Pune, India Project Intern July 2016- June 2017

- Technologies and languages- RabbitMQ, mongoDB, Spark, Elasticsearch, Kibana, J48, Java, Python, JS, HTML, weka

- Project: Generic User Activity Analysis and prediction, which can be used in backend of any platform for analysis and recommendations

- Analyzed user data efficiently to customize the system for vendor using various diagrams

- Enhanced user experience by predicting the results depending on various attributes demanded by users with 80% accuracy.

TECHNICAL SKILLS

Languages and databases: Python, R, MySQL, mongoDB, Oracle SQL, MATLAB, Scala, C, C++, Java, JSP, LISP, HTML, JS Tools: Apache spark, Hadoop, R studio, Elasticsearch, Kibana, weka, android, RabbitMQ, selenium, Eclipse

PROJECTS

1) Consensus based distributed Neural Network Java, Weka tools, PeerSim

- Working on an independent project with Prof. Haimonti Dutta in which consensus based neural network model is built in java using weka tool.

- Gossip protocol is used to optimize communication cost among the nodes of distributed model and visualized using PeerSim Package. 2) Santander product recommendation R, Association Rules, SVD, Hierarchical clustering, K-means, ggplot2

- Analyzed user data based on gender, nationalities, age, profession and drew out necessary conclusions for recommendations

- Genrated various association rules between user details and account details using apriori in R to get recommendations.

- Performed low rank approximation using SVD on training data of 30M users and recommended account types for users based on their personal details for test data for 3M rows. Also tried to cluster data using different algorithm which failed due to disparity. 3) Article classification using spark Python, NYTimesArticleAPI, RDD, Apache Spark, MLlib, TF-IDF

- Collected NYT articles for various topics like politics, weather, sports using NYTimesArticleAPI library in python.

- Removed stopwords, symbols and digits and created tokens to get scaled feature vectors for all articles in training set using TF-IDF.

- Sent it to classification algorithm like logistic regression, RF, Naive Bayes using PySpark to get 70-80% accuracy for all methods. 4) Word count and word co-occurrence using MapReduce Python, ArticleAPI, TweePy, Pandas, D3.js, MRJob, Hadoop

- For the topic 'shooting', twitter data and NYT articles are collected using TweePy and NYTimesArticleAPI.

- After removing stopwords, symbols and digits in mapper, word count is done for both datum, using MRJob library in python.

- Co-occurrence of top 10 words are calculated and word clouds are created using D3.js for both of these results for both datum. 5) Neural Network and Deep NN implementation Python, TensorFlow, NumPy

- Neural network is implemented with 94% accuracy on MNIST dataset in python.

- Deep neural network for 1,2,3,5,7 hidden layers is implemented using TensorFlow and compared their results. 6) Data Scientists Analysis R, regsubsets, glmnet, rf, ggplot2

- A project in R which separated out different classes of respondents participated in the survey based on various parameters

- Predicted the salaries of classes like students, career-switchers respondents using worker’s responses as training set, by using supervised ML algorithms like OLS, subset selection, shrinkage methods, random forests and tree to determine the best method for prediction for the data containing categorical variables. Random forest gave approx. 85% of accuracy so is the best method.

- Analyzed and pointed out different trends in the field of data science based on these responses. 7) Twitter Data analysis R, TwitteR, Fiftystater, ggplot2

- Collected tweets related to flu based on hashtags in R and plotted it based on their locations on heatmap sorted by the states.

- Compared it with the heatmap having level of actual flu in the states and concluded the relationship between tweets and actual flu. 8) NYC Parking Violation Analysis Python, NumPy, Pandas, Matplotlib, MySQL, PyMySQL

- Analyzed NYC parking data of 4M rows stored in MySQL database to draw out conclusions from observed pattern using visuals created using MatplotLib library in python. Storing entire data in MySQL gave higher operational speed than Pandas.

- Used PyMySQL to perform operations on database using python and Pandas data-frame, NumPy for handling data efficiently.

ADDITIONAL ACTIVITIES

- Created a pip installable python package for bucketting algorithm with GNU GPL v3.0, which is used to convert integer array to categorical.

- 1st zonal winner in network and cyber security at Seven Mentors in 2016

- Published paper ‘Generic User Event Activity Analysis and Prediction’ in IRJET journal.



Contact this candidate