Data Analyst

Secaucus, NJ
March 20, 2018

Data Analyst/Tableau Developer/Business Intelligence

Analyst/Data Scientist


M.S. Business Information Systems, New Jersey Institute of Technology, Newark, NJ Sept 2016 – Dec 2017 Relevant Courses: Database Management Systems, Data Analytics, Big Data Systems, Corporate Finance, Business Process Innovation, Data Mining

B.S Computer Science, Guru Gobind Singh Indraprastha University, India August 2011 – July 2015 Relevant Courses: Data Structures, Algorithms Design and Analysis, C, OOP using C++, Data Warehousing TECHNOLOGY

Programming Skills: Python, Java, HTML, CSS, C, C++, JavaScript, HTML 5, XML, Bootstrap, Selenium, UNIX/LINUX Big Data: Hadoop, Elastic MapReduce, Kafka, Apache Spark, Streaming (Basics), pyspark, Oozie, Pig, Hive Cloud Platforms: Amazon Web Services, Google Cloud Platform, Azure Databases: MySQL, NOSQL, SQL Server, PostgreSQL

Statistics: Linear Regression, Decision Trees, Logistic Regression, Random Forest, K- Means Clustering, KNN, Naive Bayes Classifier, Neural Networks, PCA, XG Boost, Gradient Boost BI Tools: Tableau, Jupyter Notebook, PowerBI, Advance Excel, R, Rapid Miner, MATLAB, Minitab, SAS Others: SDLC, UML Modeling, Corporate Finance, Visio, Microsoft Office, Requirement Analysis EXPERIENCE

Data Analyst, IBM August 2015-July 2016

Created technical specifications, test plans and test data to support ETL data flows.

Carried out automation testing for the data provisioned and pre-delivery sanity checks using Selenium by increasing the efficiency of the application by 70%.

Created complex stored procedures, triggers, cursors, tables, views and other database objects using T-SQL in SQL Server Management Studio.

Analyzed issues related to data loading, conversion of files into different formats, identified defects and errors in data prior to data processing.

Involved in tuning the existing T-SQL code for performance improvement.

Designed Dashboards and developed ad-hoc reports using Tableau as per customer requests. ACADEMIC PROJECTS

Geospatial database (Amazon RDS, MS SQL Server 2016)

Data was high dimensional so worked on Amazon RDS for this Geospatial project.

Normalized MSHA Dataset and designed database using SQL and PL/SQL scripts and implemented ODBC to pull data from DB to fetch the data depending on coordinates.

Twitter Sentiment Analytics using Apache Spark Streaming APIs and Python (Data Science & Cloud)

Used Apache Kafka to buffer live tweets data fetched with the help of twitter API and used stream processing API by Spark to convert live data into dstreams and performed sentiment analysis along with its visual analysis. Zillow Zestimate Dataset Data Analysis

Cleaned the data with feature engineering to reduce the high dimensionality of the data.

Predicted log error in the Zillow zestimate using random forest regression and neural network models and calculated the MAE and RMSE value to evaluate the model.

Deployed the model on Azure ML Studio and made REST API calls to get the 10 closest homes to a given latitude and longitude.

Flight Data Analysis (Big Data & Cloud)

Automated Data Processing using Oozie Workflow on Airline On-Time Performance Dataset in Hadoop Fully Distributed mode and identified airlines with probability for being on schedule, airports with longest and shortest average taxi time per flight and the most common reasons for flight cancellation using MapReduce and Pig. SberBank Russian Housing Problem Data Analysis (Selenium, Python, Tableau)

Scraped the latitude and longitude values of sub areas in Moscow using Selenium in Python for Geospatial analysis.

Preprocessed the data by doing EDA, handling missing values, Feature Selection and Engineering and Regression Model Building.

Designed dashboard using Tableau to analyze the sales and growth of house price with the change in different features.

Predicted housing price using various regression models and calculated the RMSE value to evaluate the performance of models.


