Post Job Free
Sign in

Data Scientist, Machine Learning, Predictive Modelling

Location:
Seattle, WA
Posted:
October 09, 2017

Contact this candidate

Resume:

Akash Devgun

***** **, **** ******, *** M***, Bellevue, WA, 98007 720-***-**** ********@********.***

LinkedIn: https://www.linkedin.com/in/akash-devgun-b86b563b GitHub: https://github.com/AkashDevgun

SUMMARY

Experienced Machine Learning Engineer and Data Scientist, an insatiable intellectual curiosity and ability to mine hidden gems located inside the datasets. Responsible for applying Machine Learning and Deep Learning Techniques to various business problems such as forecasting, recommendation engines, anomaly detection, classification/ regression/ clustering etc.

Skills & Abilities

MACHINE LEARNING SKILLS

Classification, Regression, Collaborative Filtering, Topic Modelling, Supervised, Unsupervised, Naive Bayes, Linear/ Logistic Regression, Regularization, k-NN, Support Vector Machine(SVMs), Decision Trees, Ensemble Methods (Random Forest, Gradient Boosting Trees GBM), Association Rules, Bayesian Inference, PCA, SVD, Clustering (k-means, GMM, Spectral, Hierarchical)

DEEP LEARNING SKILLS

Multilayer Neural Nets, Convolutional and Recurrent Neural Networks, RNN-LSTMs, Restricted Boltzmann Machine

TOOLS & TECHNOLOGIES

Python, R, MATLAB, JAVA, scikit-learn, SQL, Theano, Keras, Tensorflow, SPARK, MLlib, HADOOP, MapReduce

COMPETENCY

Machine Learning, Statistical Machine Learning, Deep Learning, Data Mining, Data Analytics, Statistics, Data Visualizations, Statistical Modelling, Predictive Modelling, Natural Language Processing

Experience

SOFTWARE DEVELOPER, MACHINE LEARNING ENGINEER MICROSOFT RESEARCH AG-IOT, ZENSA LLC APRIL 2017 – PRESENT

Applied Random-Forest Imputation Models to find missing values in Weather Data for Micro Climate Predictions.

Predictions Error reduced by 60% using feature engineering, feature Selection and modelling by xgboost.

Did Time Series predictions using seasonality decomposition and RNN-LSTMs.

Technologies and Skills Used – Python, R, Scikit-learn, Machine Learning, Deep Learning, Data Science, TensorFlow

DATA SCIENTIST SRP SYSTEMS INC. JUL. 2016 – SEP. 2016

DYNAMIC PRICING SYSTEM – FINANCIAL SERVICES

Applied Regression Models using scikit-learn and R to forecast Demand from prices for rental stores.

Did Data Visualizations and analysis on time-series dataset. Ran Ensemble Methods, RNN-LSTMs to find trends and patterns.

Handled the Imbalanced Dataset using Sampling Methods. Best Features selected to reduce overfitting. RMSE improved from 1.07 to 0.29 using Ensemble Method, xgboost.

Ran the Spark Cluster on AWS. Developed price vs demand optimization module to make maximum revenue.

Technologies and Skills Used – Python, R, Spark, MLlib, Scikit-learn, Machine Learning, Data Science

SOFTWARE ENGINEER-2 (MACHINE LEARNING) SAMSUNG JAN. 2013 – MAR. 2015

INTLELLIGENT INTRUSION DETECTION SYSTEM

Developed data collection module on client side to collect logs from system or apps using XML based design.

Done the data acquisition, data cleaning and stored the data in suitable format.

Implemented threat detection and rule generation mechanism using Decision Trees Classifier and neuro-fuzzy logic on server.

HYBRID PERSONALIZED RECOMMENDATION SYSTEM

Identified cosine similarity and implemented collaborative recommender for Samsung apps and user rating dataset.

Handled the Missing data values using Restricted Boltzmann Machine.

Done the dimensionality Reduction using PCA, SVD. Recommendation Error was improved.

Technologies and Skills Used – Python, Java, C++, R, Machine Learning, Deep learning, NLP

SOFTWARE DEVELOPER-1 PLAYBUFF/ARCH MOBILE SOLUTIONS JUN. 2011 – FEB. 2012

Developed 2D Mobile Game called Monku and 3D Mobile Game DoubleTrap2.

Monku was among the top 10 games in Nokia Avi Store during first 3 weeks of release.

Technologies and Skills Used – Java, C++, Data Structures, SQL

Academic Projects

CHURN PREDICTION ON IMBALANCED DATASET (DATA ANALYTICS: SYS ALGOS APPS PROJECT)

Missing data inferred from surrogated decision trees. Created balanced dataset by sampling methods.

Reduced Overfitting and did Classification to predict churns using scikit-learn. AUC 0.47 improve to 0.69 by Random Forrest.

Configured Spark Cluster on AWS. Best features selection and predictions performed using MLlib.

NBA PREDICTIONS AND ANALYTICS (DATA MINING PROJECT)

Data Scraped from Sports db. Created features which highly correlated to Win. Best features selection by Extra Trees model.

Did classification to predict The Win for upcoming matches, achieved 67% accuracy by Gradient Boosting Trees (GBM).

Used Spectral Clustering and Gaussian Mixture Models on set of best features and 89% accurate player clusters formed.

SCIENCE QUESTION ANSWERING (MACHINE LEARNING PROJECT)

Applied Convolutional and Recurrent Neural Networks using theano and TensorFlow on word-vectors to extract Features.

Done Feature Engineering to add useful features in dataset with Wikipedia API. Accuracy 50.4% improved to 72.4% on Kaggle

MUSIC RECOMMENDATION SYSTEM (ANALYSIS OF HIGH DIMENSIONAL DATASET PROJECT)

Extracted features from music data by MFCC and further reduced Dimensions by PCA. Song Similarity 63.2% achieved by k-NN

Applied Recurrent Neural Network - LSTM to predict song similarity with Genres. Achieved 82.1% accuracy by RNN-LSTM.

IMAGE RECOGNITION AND OBJECT CLASSIFICATION (COMPUTER VISION PROJECT)

Ran Support Vector Machine (SVM) to classify dataset of 101 Categories and achieved 76% accuracy. Applied Convolutional Neural Networks and Provided Pyramid Pooling layer for different size images. Achieved 90.4%testing accuracy with CNNs.

Education

MASTER’S AUG. 2015 – DEC. 2016 UNIVERSITY OF COLORADO, BOULDER (GRADUATED)

Major: – Computer Science (GPA: 3.748), Related coursework: Machine Learning, Probabilistic Graphical Models, Computer Vision(Deep Learning), Big Data, Data Mining, Analytics: System Algos Apps, Analysis of High Dimensional Datasets, Data Science

BACHELOR’S JUL. 2007 – MAY 2011 NATIONAL INSTITUTE OF TECHNOLOGY, JALANDHAR

Major: Computer Science, Related coursework: Probability Theory, Data Structures Algorithms, Natural Language Processing

Achievements

Won the 1st prize in Best Developer Tool at CU-Hackathon April 2016 at University of Colorado, Boulder.

Publications

"Mathematical Model to generate 3D Surface using Machine Learning Genetics Algorithms", IEEE International Conf. CICN, pp. 1237-1242, 2014

"Machine Learning Adaptive Approach to remove Impurities Over BigData", IEEE International Conf. ECCE, pp. 220-225, 2014



Contact this candidate