Post Job Free

Resume

Sign in

Data Scientist

Location:
Bloomington, IN
Posted:
September 06, 2017

Contact this candidate

Resume:

Rakshesh Shah

San Francisco, California *****

ac16u7@r.postjobfree.com 812-***-**** www.linkedin.com/in/raksheshshah www.github.com/raxshah EDUCATION

Master of Science in Data Science, Indiana University Bloomington January 2016 – Present Cumulative GPA: 3.9/4

Bachelor of Engineering in Electronics & Com., Gujarat Technological University June 2009 – June 2013 Cumulative GPA: 3.7/4

WORK EXPERIENCE

Data Scientist Intern, Asurion Research Labs, San Mateo, CA May 2017 – August 2017

• Part of team who is building next generation NLP Chat-bot to automate insurance claim process.

• Spent most of time in data cleaning and preparing classification model using machine learning techniques.

• Created data visualization (ggplot) and dashboard (shiny) to provide management insight to business problem.

• Incorporated Machine learning components using full-stack agile software development on AWS and AzureML. Graduate Research Assistant, Indiana University Bloomington January 2017 – May 2017

• Assisting in implementing machine learning algorithms with the use of Big data analytics pipeline.

• Analyzing datasets through quantitative analysis techniques i.e. generalized linear models, stochastic models. Graduate Teaching Assistant, Indiana University Bloomington August 2016 – Present

• Teaching assistant for courses: Python for Data Science; Applied Machine Learning

• Conducting office hours to mentor and solve students’ doubt in machine learning and data science techniques. Software Engineer, Tata Consultancy Services, India March 2014 – December 2015

• Prepared data sets from various databases system using SQL, Java, ETL etc. for targeting, reporting, and data science research by using data cleansing tools, wrangling and data preprocessing techniques.

• Analyzed large data sets using statistical methods on Hadoop ecosystem to derive fact-based decisions.

• Successfully delivered two data analytics projects for clients Comcast USA and Vodafone UK.

• Lead and mentored team of 8 people to deliver proof-of-concept (POC) application. Software Engineer Intern, Innoventaa Technocrats Pvt. Ltd, India July 2013 – February 2014

• Delivered ecommerce website module. Used J2EE, Servlet, Hibernate, RESTful web services, HTML, jQuery. KEY PROJECTS

Iowa House Price Prediction Challenge [Techniques: Linear, Ridge, LASSO regression, Random Forest, AWS]

• Secured top 10% rank in Kaggle competition for predicting house prices using regression techniques.

• Developed end-to-end machine learning pipeline from exploratory data analysis, feature engineering, model building, performance evaluation, and online testing with large data set. Implemented data pipeline on AWS. Otto Group Product Classification [Techniques: Decision Tree, Ensemble Stacking, RF, Naive Bayes, SVM]

• Built a predictive model, using various classification and statistical data analysis techniques, that can classify products into product categories. Performed EDA to prepare model. Secured top 20% in Kaggle competition Inventory and Customer Management System, TCS [Techniques: HDFS, Pig, Hortonworks, Agile, Git]

• Created proof of concept application for managing inventory and customer information using Hadoop eco- system for client Comcast USA during professional career. Spent most of the time in data preprocessing. Real-time Twitter Analytics [Techniques: Sentiment analysis, NLTK, Kinesis Firehose, Elastic Search, Kibana]

• Built a near real time discovery platform to analyze sentiment from tweets. It can continuously measure and compare sentiments of given input terms. Implemented pipeline on AWS. Dogs vs Cats Image Classification, Deep learning [Techniques: CNN, Keras, Theano, Tensorflow]

• Implemented VGG16 model of ImageNet to classify Cats from Dogs using deep learning techniques. Used GPU from Google cloud to get an accuracy of 80% on 35000 images. TECHNICAL SKILLS

Programming Languages Python, R, Java, SQL

Machine/Deep Learning Classification, Regression, Clustering, Ensemble methods, Recommendation, Bayesian methods, Convolution and Recurrent Neural Network Big data Spark, Hadoop, MapReduce, Hive, NoSQL, Hbase, AWS Natural Language Processing Topic modeling (LDA), Word2Vec, Sentiment analysis, POS tagging Libraries Scikit-learn, Matplotlib, bokeh, shiny, ggplot2, Tensorflow (GPU), Keras Misc Statistics, Data visualization, Tableau, Statistical modeling, Shell scripting, Excel Certifications Oracle Certified Java Programmer; Total 14 MOOC in data science



Contact this candidate