Sign in

Data Microsoft Office

Harrison, New Jersey, United States
January 24, 2018

Contact this candidate




MS, Information Systems, New Jersey Institute of Technology, Newark, NJ Sept 2016 – Dec 2017 B. Tech, Computer Science, Guru Gobind Singh Indraprastha University, India Aug 2011 – July 2015 TECHNOLOGY


Python, Java, HTML, CSS, JavaScript, HTML 5, XML, Bootstrap MySQL, SQL Server

Tableau, IPython, PowerBI, Advance Excel, R, Rapid Miner, MATLAB, Minitab, Bloomberg Apache Hadoop, Apache and Elastic MapReduce, Amazon Web Services, Apache Kafka, Apache Spark Streaming(Basics), pyspark,Azure ML Studio, Apache Oozie, Apache Pig, Apache Hive, Apache HBase Shiny(R), Flask (Python)

Git, Docker

Python (numpy, pandas, sklearn), Linux, Selenium, Machine Learning, Statistics,UML Modeling, Corporate Finance, Visio, Microsoft Office, Requirement Analysis, Unit Testing, Test-Driven-Development Programming:


Analytics Tools:




visualization skills

Version Control:


Predicting Backorder Risk for Products

Resampled the Imbalanced Classification Dataset model by using Smote Analysis and improved accuracy by 20% and reduced training time by 50% by using Various Machine Learning Algorithm in scikit-learn streaming over 2TB of data.

Twitter Sentiment Analytics using Apache Spark Streaming APIs and Python

Used Apache Kafka to buffer live tweets data fetched with the help of twitter API.

Used Stream Processing API by Spark to convert live data into DStreams and performed sentiment analysis on it along with its visualization.

Working with Edgar datasets: Wrangling, Pre-processing and exploratory data analysis.

Extraction of all the statistical tables from 10 K and 10 Q filings using Python.

Generation of the URL to get the data for the first day of the month from EDGAR Log File Dataset by developing a pipeline in Python

Handled missing data and computed summary metrics and performed anomaly detection.

Logged all the operations in a log file with summaries of 12 files in one file and uploaded it to Amazon S3 Zillow Kaggle Dataset

Data Ingestion and Wrangling.

Using RMSE and MAPE to predict log errors using different prediction models, the best result was shown by Random Forest.

Used Azure ML Studio for the Deployment of Model by invoking the JSON API.

Created a REST API that given a LAT and LONG, should return the top 10 closest homes. Big Data Analysis of Wikipedia dataset

Processed Big Data and performed Predictive Analysis on Wikispecies Dataset in Hadoop fully distributed mode.

Identified the most popular species in Wikipedia by parsing the XML and applying Google’s Page Ranking Algorithm using MapReduce.


Associate System Engineer, IBM

Carried out Automation Testing for the Data Provisioned and pre-delivery sanity Checks using Selenium.

Analyzed issues related to the Data Loading and conversion of files into different format.

Identified defects and errors in data prior to data processing. Collaborated with back end and database testers.

Involved in design calls to understand customer requirement and provide suggestions on requirements.

Developed SQL procedures for loading the data into Database

Prepared data for exploratory analysis, intelligent data products, and dashboards

Designed Dashboards and Data Visualizations to communicate meaningful metrics to different customers according to their requirements.





May 2017

October 2016




August 2015-

July 2016

Contact this candidate