Data Scientist

Redmond, WA
October 07, 2018

University at Albany, State University of New York


University at Albany, State University of New York

Masters in Computer Science, GPA: 3.54/4.0 May 2017

Rajasthan Technical University, Kota, INDIA

Bachelors in Computer Science June, 2014

Achieved top 30% in Predict Future Sales [Kaggle]


Programming Language: Python, R

Database: TSQL, Sql Server [with SSIS integration services]

Big Data Tools: Apache Spark

ML Libraries: Numpy, Pandas, Scikit-learn, XGBOOST, CATBOOST etc.

Visualization Libraries: Matplotlib, Plotly, Seaborn, ggplot

Cloud Platform: Azure Stack, Azure cloud

Deep Learning Frameworks: Keras, Tensorflow


PeopleTech Group January 2018-Present

Jr. Data Scientist

Involved in defining the initial prototypes and proof-of-concepts with Expedia and Microsoft for various business problems.

Initial analysis and breaking down of problem to leverage the full power of data using advance SQL queries and quantitative analysis.

Engineered Features that increased the F-score by 15% for YOB use-case, by using target encoding, binning, scaling, Nearest Neighbors, multi-way interactions, etc. in Python and SQL.

Engineered ad-hoc data pipelines for testing of Azure hybrid architecture designed by Solutions Architect to test out the efficacy of Microsoft on-premises services.

Developing and Testing a generalized Framework for automating the Data Science pipeline for automation of intermediate steps in data science pipeline.

Operationalize the predictive models to deploy and generate the Final Reports.

Center for Technology in Government August 2017-December 2017

Research Fellow [Volunteer work]

Worked on URBAN BLIGHT project to help the local government fighting the urban blights.

Successfully worked and trained models for object detection on custom objects [Transfer Learning] in Tensorflow and Keras.

Successfully built a graph network from GPS data and worked on finding the nearest distance in space between 2 objects for incident reporting based on their location data using Apache Spark and CASSANDRA to notify the neighborhood [on some severe incident] within a certain vicinity of origin House.

Engaged with creation of various data analytical services [including visualization] related to an area which includes but not limited to severe Houses in an area, Active/Inactive Areas, locating/accumulating neighborhood houses based on severity of incidents, etc.

Center for Technology in Government June 2016 – April 2017

Research Assistant

Improved the efficiency in retrieving records from Apache Solr using Master Slave framework.

Developed object storage architecture using SAIO & Solr for email archiving in ERMS.

Generated test scripts using Selenium and Jmeter to do performance and load testing of ERMS website.


University at Albany, State University of New York May 2016 – May 2017

Researcher [Dropbox][GitHub] (Java, Python, Calculus, ML, Projected GD, convex optimization)

Completed project-based research on Graph Analytics (Graph-IHT) using ML and anomalous pattern techniques under Professor Feng Chen with 2 other PhD students.

Performed Testing on Elevated mean statistic, Least Square and Logistic Regression Cost Functions for Graph-IHT and Graph-MP on Simulated Graph Data.

Final Results was successfully able to predict irregularities in given graph with more than 90% accuracy.


Perform Cloud Data Scientist – [MCSE - 70774]

Deep Learning & Machine Learning Specialization by Andrew NG

Data Analysis Nanodegree by Udacity [Enrolled]

