JIGYASA KOHLI
EDUCATION
MS, Information Systems, New Jersey Institute of Technology, Newark, NJ Sept 2016 – Dec 2017 B. Tech, Computer Science, Guru Gobind Singh Indraprastha University, India Aug 2011 – July 2015 TECHNOLOGY
PROJECTS
Python, Java, HTML, CSS, JavaScript, HTML 5, XML, Bootstrap MySQL, SQL Server
Tableau, IPython, PowerBI, Advance Excel, R, Rapid Miner, MATLAB, Minitab, Bloomberg Apache Hadoop, Apache and Elastic MapReduce, Amazon Web Services, Apache Kafka, Apache Spark Streaming(Basics), pyspark,Azure ML Studio, Apache Oozie, Apache Pig, Apache Hive, Apache HBase Shiny(R), Flask (Python)
Git, Docker
Python (numpy, pandas, sklearn), Linux, Selenium, Machine Learning, Statistics,UML Modeling, Corporate Finance, Visio, Microsoft Office, Requirement Analysis, Unit Testing, Test-Driven-Development Programming:
Database:
Analytics Tools:
Distributed
programming:
Advanced
visualization skills
Version Control:
Others:
Predicting Backorder Risk for Products
Resampled the Imbalanced Classification Dataset model by using Smote Analysis and improved accuracy by 20% and reduced training time by 50% by using Various Machine Learning Algorithm in scikit-learn streaming over 2TB of data.
Twitter Sentiment Analytics using Apache Spark Streaming APIs and Python
Used Apache Kafka to buffer live tweets data fetched with the help of twitter API.
Used Stream Processing API by Spark to convert live data into DStreams and performed sentiment analysis on it along with its visualization.
Working with Edgar datasets: Wrangling, Pre-processing and exploratory data analysis.
Extraction of all the statistical tables from 10 K and 10 Q filings using Python.
Generation of the URL to get the data for the first day of the month from EDGAR Log File Dataset by developing a pipeline in Python
Handled missing data and computed summary metrics and performed anomaly detection.
Logged all the operations in a log file with summaries of 12 files in one file and uploaded it to Amazon S3 Zillow Kaggle Dataset
Data Ingestion and Wrangling.
Using RMSE and MAPE to predict log errors using different prediction models, the best result was shown by Random Forest.
Used Azure ML Studio for the Deployment of Model by invoking the JSON API.
Created a REST API that given a LAT and LONG, should return the top 10 closest homes. Big Data Analysis of Wikipedia dataset
Processed Big Data and performed Predictive Analysis on Wikispecies Dataset in Hadoop fully distributed mode.
Identified the most popular species in Wikipedia by parsing the XML and applying Google’s Page Ranking Algorithm using MapReduce.
PROFESSION
Associate System Engineer, IBM
Carried out Automation Testing for the Data Provisioned and pre-delivery sanity Checks using Selenium.
Analyzed issues related to the Data Loading and conversion of files into different format.
Identified defects and errors in data prior to data processing. Collaborated with back end and database testers.
Involved in design calls to understand customer requirement and provide suggestions on requirements.
Developed SQL procedures for loading the data into Database
Prepared data for exploratory analysis, intelligent data products, and dashboards
Designed Dashboards and Data Visualizations to communicate meaningful metrics to different customers according to their requirements.
September
2016
September
2017
May 2017
October 2016
December
2016
*****@****.***
https://www.linkedin.com/in/jigyasakohli/
https://github.com/jmsjigyasa
August 2015-
July 2016