Sign in

Data Scientist

Richardson, Texas, United States
August 17, 2018

Contact this candidate



Mailing Address: *** W Renner Rd, Apt 210, Richardson, TX - 75080. Phone No: +1-510-***-**** Willing to relocate and travel**

E-Mail ID:



The University of Texas at Dallas, TX August 2016 – May 2018 M.S. in Computer Science (Intelligent Systems Track) TECHNICAL SKILLS

Area of Interest: Natural Language Processing, Deep Learning, Machine Learning, Big Data Analysis, Data Science, Statistics Programming Languages: Python, Scala, R, Java

Frameworks and Libraries: Pytorch, Docker, Flask, Spark, Keras, TensorFlow, Numpy, Scipy, Spacy, Kafka, Elastic Search, OpenCV


INSIGHT SOFTMAX CONSULTING LLC. San Francisco, CA and Dallas, TX September 2017 - Present Data Scientist

- Project: Ticket Price Modeling: Building REST-ful APIs for interaction between request data and deployed Deep Learning Models. [Python/Flask]

- Project: Statements of Law: Parsing Legal Documents and Extracting Relevant Paragraphs and Keywords using EmbedRank and Spacy.

- Using PySpark to transform Terabytes of Legal Data in a distributed manner using Dataproc Cloud Instances.

- Filtering out metadata and extracting only the most relevant paragraphs containing citations pertinent to the case.

- Extracted paragraphs Labelled with meaningful tags and associated concepts. [PySpark] FORMATION INC. San Francisco, CA July 2017 – August 2017 Data Science Intern

- Project: Like2Vec: Recommendation System using Word Embeddings

- Contributing to the development of a Recommendation Pipeline by modifying already existing modules like LLR.

- Added configurations for making the Code generic along with Cleaning and Commenting using Scala Style Guide. [Scala]

- Hypertuning of Parameters to attain the best model by adding Evaluations module to calculate metrics like RMSE & Recall.

SKYMIND LABS San Francisco, CA March 2017 – June 2017 Deep Learning Researcher

- Project: Isomantics: Translation using Neural Word Embeddings

- Performing Inferential Statistics on the word embeddings trained using different models like gensim, fasttext. [Python]

- Using Keras to train a translation matrix to convert one language to another available in the form of Neural Embeddings.

- Finding Similarity between the languages by statistical analysis performed on Singular Values of Translation Matrices

- Co-presented the research project at multiple conferences including Open Data Science Conference ’17 alongside Dr. Mike Tamir, Head of Data Science at Uber ATG and UC Berkeley lecturer. PUBLICATIONS

Jain, V., Gupta, A., Sharma, S., Dubey, A., ”Comparative Analysis of Machine learning algorithms in OCR” in 3rd 2016 International Conference on “Computing for Sustainable Global Development”, 2016, 0973-7529 and IEEE Xplore; Gupta, A., Kataria, R., Gupta, N., “Stock Market Prediction using Machine Learning” in (IJIET) International Journal of Innovations in Engineering and Technology, Vol. 6(4), 2016, 2319-1058; PROJECTS

Seasonal Fashion Trends Predictor June – July 2018

- Developed a fashion trend predictor pipeline involving an Instagram scraper, and object and color classifier.

- Equipped with the ability to recommend a season in which a particular dress should be worn. [Python] Real Time Prediction of Market Sentiment using Twitter Feeds April – May 2017

- Developed a Python Scrapper to grab tweets using Twython and streamed them to Consumer using Kafka.

- Consumer extracted the sentiment of the tweets and streamed features as Json to Elastic Search to index and then to Kibana for Visualization. [Python]

Pedestrian Recognition and Detection using OpenCV and Machine Learning April – May 2017

- Developed a Real Time Pedestrian Detection and Recognition System to track the frequency of multiple pedestrian when crossing the frame.

- Used Haar Classifier for Detection and Fisherfaces algorithm for Recognition of Pedestrians. [OpenCV, Python] Stock Market Prediction using Machine Learning May 2016

- Extracted the OHLC data of IBM's stock from yahoo finance and got technical indicators like MACD, EMA, DEMA etc.

- Trained data with Machine Learning Algorithms like Logistic Regression and SVM to build the Hypothesis. [R] Santander Customer Satisfaction Prediction March 2016

- Transformed the data into operational form

- Predicted the customer satisfaction of Santander Customers. [Machine Learning - R] Predicting Airbnb User Destination January 2016

- Transformed Airbnb user data into useful structure. Most probable fields were chosen and data sampled. Various algorithms like Neural Nets were used and data trained on the chosen Algorithm.

- Predicted the destination by running the test data against the model. [Machine Learning - R] Prediction of Market Sentiment using Twitter Feeds Oct 2015 - Nov 2015

- Developed a correlation between the feature list containing Mentions, Hashtags, NER, and Links, with the Sentiment of the tweets.

- Built a model that proved that the data extracted from a tweet alone can predict the emotional state of the user. Prediction of Flight Delays using Single-Node Hadoop Cluster March, 2014-August, 2014

- Built a linear regression model to predict, how many minutes a particular flight will delay in a specific month.

- [Hadoop, R, Pig, Dataset (over 1L Records) ]

Contact this candidate