Sign in

Data Python

Buffalo, New York, United States
February 18, 2019

Contact this candidate


Soham Sachin Gupte

*** ********* ******, *******, **-14214

716-***-**** Academics:

University at Buffalo, State University of New York M.S. in Computer Science GPA:3.4 August 2017 to February 2019 Courses: Algorithm Analysis and Design, Computer Security, Modern Networking Concepts, Distributed Systems, Machine Learning, Pattern Recognition, Data Intensive Computing, Bioinformatics and Data Mining University of Mumbai

B.E. in Electronics Engineering First Class August 2013 to May 2017 Courses: Digital Circuits Design, Microprocessors and Microcontrollers, Artificial Intelligence and Fuzzy Logic, Advanced Computer Networks Experience:

Integral Information Systems LLC. Data Science Intern June 2018 to August 2018

Developed an algorithm that accurately predicts 30 day all cause hospital readmissions.

Overcame the drawbacks of earlier algorithms that relied on only clinical factors by including several non-clinical factors like education levels, lifestyle, local weather, income, diet, pollution levels, zip codes, medication compliance etc.

Used techniques such as vectorization, bucketing, normalization etc. and used algorithms such as Logistic Regression, Support Vector Machines and Boosted Decision Trees during the development process. An accuracy of upto 96 percent was achieved on the test dataset. Technologies used: Python, Jupyter Notebook, Pandas, Scikit-learn Technical Skills:

Languages and Technologies : Python, Java, C, R, Pandas, Scikit-learn, Tensorflow, Hadoop, Spark, MapReduce, mySQL, HTML, CSS, Javascript, Node js, Github, Eclipse, Docker

Operating System: Windows 10, Ubuntu


Publish/Subscribe Distributed System using Docker Containers Implemented a decentralized magazine subscription system with multiple brokers hosted on Docker containers. Subscribers and publishers are connected to the brokers based on geography and topic based routing is used to service the subscription requests. The backend code for the system is written in Python and socket programming and multithreading was used for communication between different brokers.

Data Analytics Using Spark

Collected thousands of New York Times articles on different topics like sport, entertainment, politics etc. using the Article Search API and BeautifulSoup. Using PySpark performed data cleaning, tokenization, feature extraction(using TF-IDF) on the collected articles and then used this newly formed data set to train Random Forest, Naïve Bayes and Logistic Regression Classifiers. Maximum accuracy achieved was around 80%.

Word Count and Co-occurrence Using Hadoop and MapReduce Tweets and New York Times Articles related to the topic ‘Shootings’ were collected using TweePy and NYT Article Search API. Inside the HDFS, the required data cleaning operations like stopword removal, stemmer etc. were performed in mapper and word count(using MRJob) and word co- occurrence calculations were performed in the reducer. Using D3.js the results given by the MapReduce program were visualized using WordCloud.

Reliable Data Transport Protocols

Implemented the Stop and Wait and Go-Back-N and Alternating Bit (ABT) Data Transfer Protocols from scratch in C and then analyzed and compared their performance after testing them on a simulator. Throughputs for different scenarios and packet loss probabilities were observed for all the implementations and they were along expected lines.

Principal Component Analysis and Association Rules Implemented the PCA dimensionality reduction algorithm to reduce an n-dimensional dataset to 2 dimensions and tested it a medical dataset and compared our implementation’s performance with scikit-learn implementations of the SVD and t-sne algorithms. Implemented the Apriori Algorithm to compute frequent itemsets in a dataset containing information on gene expression and diseases. Implemented logic which mines Association Rules that may help researchers uncover relationships between gene expressions and a particular disease.

Classification Algorithms

Implemented K-NN, Naïve Bayes, Random Forests and Boosted Decision Trees from scratch in Python and tested them on given datasets. Using 10 Fold Cross Validation, evaluated the performance of each algorithm using metrics such as accuracy, precision, recall etc.

Service-based Waypoints Application

Developed an application which takes source and destination cities as input from the user and displays a map with a route as well as major towns along the route. Moreover, on clicking on a marker along the route the application also displays important weather information for that town. Technologies Used: Google Maps API, Open Weather API, HTML, CSS, Javascript

Contact this candidate