Sign in

Data Scientist

Sunnyvale, California, United States
February 24, 2018

Contact this candidate


*** * ******** ******, ********* CA-94085

Mobile: 812-***-****



Web Portfolio:



Masters of Science in Data Science at Indiana University Bloomington (GPA 3.6) Jan 2016 to Dec 2017 Bachelor of Engineering in Information Technology at University of Pune (GPA 3.7) Jun 2008 to Jun 2012 Technical Skills

Programming Languages: Python, R, Java,, C/C++, VBA Macros, PL-SQL, Shell Scripting Machine learning: Clustering, Neural Networks, Deep Learning, Optimization, Computer Vision techniques, NLP, Bagging, Boosting Statistics: Bayesian Inference, Monte Carlo methods, Statistical Inference, Time Series Analysis, Probability Distributions, Markov Chains Python Libraries: Scikit-Learn, Numpy, Scipy, Pandas, Gensim, NLTK, Matplotlib, Seaborn, Statsmodels, Beautiful Soup, XGBoost, PySpark Tools/Frameworks: TensorFlow, Keras, Dataiku Data Science Studio, Weka, Tableau, Alteryx, SVN, Git, Jira Big Data: Map Reduce, Spark, Hive, Pig, Flume, Sqoop Work Experience

QxBranch (Data Scientist Intern) – Washington, D.C. Jan 2017 to Aug 2017 Modelling Cyber Risk

• Worked on creating a fully integrated solution, that provides reinsurance analysts with a tool for assessing cyber risk of organizations to effectively write policies and build portfolios

• Implemented Positive Unlabeled Learning and Domain Infused Machine Learning techniques for predicting external breach

• Designed and implemented a change data capture framework, for augmenting master dataset over time from multiple data sources.

• Scraped Google News and Identified key words using TFIDF and Text Rank, hence increasing the AUC score by 10%

• Found temporal patterns by merging censys and ARIN dataset with other sources enabling better identification of security breach Stochastic Modelling for Cricket Fantasy League Prediction

• Used Monte-Carlo simulations to estimate posterior distributions for different outcomes of each player faceoff pair

• Created hierarchical logistic regression models and infused domain knowledge for generating ball by ball simulations of a match

• Predicted fantasy points a player will score in a cricket match using multiple simulations with a RMSE of 15 points

• Generated equipoised matchups for maximizing profits in bets by expanding the Elo ranking system in chess to cricket

• Implemented web scraping in python to gather data from ESPN and cricsheets which auto-updates after every match Accenture (Software Engineering Analyst) – Bangalore, India Jun 2012 to Nov 2015

• Implemented Anti Money Laundering system using rule based AI and anomaly detection using clustering

• Worked as a SQL and Hive Developer for Banking and Communication industry clients

• Performed Data Migration to HDFS from Mainframe and Change Data Capture with Apache Hive and Sqoop

• Automated the process of monthly report generation using UDFs in Python for potential money laundering customers Projects

San Francisco Crime Classification-Kaggle Competition (Gaussian Naïve Bayes, Python Scikit-Learn, R ggplot2, Tableau)

• Performed feature engineering, discovered significant insights by identifying spatial and temporal patterns of crimes in San Francisco

• Implement a multiclass classification problem, using Naïve Bayes and XGboost to classify a crime type among 20 crime categories Web Traffic Time Series Forecasting (ARIMA, LSTM-Recurrent Neural network, Keras, Statsmodels)

• Predicted future web traffic to Wikipedia pages using an ensemble of self-tuned (using ACF and PACF) ARIMA and LSTM models

• Used differencing and transformation to make a series stationary for application of ARIMA and attained a SMAPE score of 61.3 Parts Of Speech Tagging (Hidden Markov Model, Viterbi Algorithm, Forward Backward algorithm, LSTM)

• Got an accuracy of 94% on Brown Corpus at word level and 91% at sentence level using higher order HMM coded in python

• Implemented the same system using LSTM(Tensorflow) and got an accuracy of 95.23%. Topic modelling using Supervised, Semi Supervised and Unsupervised Techniques (Expectation Maximization Algorithm)

• Implemented EM algorithm from scratch on the 20 Newsgroup dataset with varying visibility factor to predict topics

• Identified topics in the text with an accuracy ranging from 81 to 72% and visibility factor ranging from 1 to 0.1 respectively Tetris game playing AI bot (Probabilistic Search)

• Created a bot to play Tetris game using two-stage look ahead probabilistic search approach using python Achievements

• Data Science Scholar Fellowship from School of Informatics, Computing and Engineering, Indiana University Bloomington

• Secured second rank from about 100 Ph.D. and masters’ student in the tournament to build a Tetris Playing bot

• Accenture Stellar Award for automating the process of monthly report generation for potential money laundering customers

Contact this candidate