Sign in

Python,SQL,Splunk,Hive,Spark,Machine learning,Tableau,Data Analytics

Surat, Gujarat, India
October 22, 2019

Contact this candidate


Arpit Bapna

Data Science Enthusiast


Associate Software Engineer

Accenture, Pune

From Dec’17 – Apr’19

* ****

* ******

Worked as Developer for BFSI domain project.

Basically, involved in:

- Deployed Python Scripts for Report generation for customer.

- Data analysis using Splunk, Python.

- Strong experience in data visualization techniques for final report communication to client.

- Implemented enhancements based on client requirements.

- Requirement gathering and root cause analysis by consulting different stakeholders.

Professional Photographer

Oragraphy, Vadodara

From May’17 – Oct’17

6 Months

Web Development and Graph Designing for a leading photography company also worked as Professional Cinematography.

ASP.NET MVC Internship

L&T Heavy Engineering

From May’16 – June’16

2 Months

Worked as a C# & ASP.NET Developer intern designed, deployed & tested three different web-based applications on windows server 2012. Technical Skills

Frameworks Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, StatsModels, OpenCV, TensorFlow, Keras, ASP.NET MVC. ML/DL Techniques Linear Regression, Logistic Regression, Advance Regression, SVM, Random/Decision Forests, Time Series Analysis, Clustering: K-Means & Hierarchical, NLP and Deep Learning. Tools & Languages Python, SQL, Tableau, Hadoop, Hive, Sqoop, Apache Spark, Flume, Jupyter, Swift, C#,Excel. Key Skills

• Supervised/Unsupervised Learning • Business Analysis & Strategy • Predictive Analytics & Modeling

• Team Management & Leadership • Data Wrangling & EDA • Data Visualization & Storytelling Education

Degree Institute Board/University Year Performance PGDDS IIIT-B IIIT Sept’2018 - Sept’19 3.5/4.00

B. Tech MPSTME NMIMS Aug’13 - May’17 2.8/4.00

XII - class Radiant English Academy CBSE 2013 63.40% X - class Ryan International School ICSE 2011 62.50% KEY DATA SCIENCE PROJECTS

• UBER DEMAND SUPPLY GAP (EXPLORATORY DATA ANALYSIS PROJECT): Main aim of this project revolved around finding out the reason why and how there were non-availability and cancellation of rides/trips on certain hours of the day. Also, recommend a solution to improve this problem.


A Chinese automobile company aspires to enter the US market. They have a main concern of two main factors: Which variables are significant in predicting the price of the car? and How well those variables describe the price of a car. We need to model the price of the cars against the independent variables. So, in turn it will help the management to understand that how exactly the prices vary with the independent variables.

LinkedIn Githhub


Surat, IN


A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual value and flip them at a higher price. For the same purpose, the company has collected a data set from house sales in Australia. We need to build a regression model using regularization, so as to predict the actual value of the prospective properties and decide whether to invest in them or not.

• HANDWRITTEN DIGIT RECOGNITION (SUPPORT VECTOR MACHINE PROJECT): A classic problem in the field of pattern recognition is that of handwritten digit recognition. Suppose that you have images of handwritten digits ranging from 0-9 written by various people in boxes of a specific size - similar to the application forms in banks and universities. We are required to develop a model using Support Vector Machine which should correctly classify the handwritten digits from 0-9 based on the pixel values given as features. Thus, this is a 10-class classification problem.


In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition. We need to reduce customer churn; telecom companies need to predict which customers are at high risk of churn. Thus, this is classification problem where we need to implement various models and compare using appropriate evaluation metrics.


New York City is a thriving metropolis. Just like most other metros its size, one of the biggest problems its citizens face is parking. The classic combination of a huge number of cars and cramped geography leads to a huge number of parking tickets. In an attempt to scientifically analyse this phenomenon, the NYC Police Department has collected data for parking tickets. We will try and perform some exploratory analysis on this data. Spark will allow us to analyse the full files at high speeds as opposed to taking a series of random samples that will approximate the population. For the scope of this analysis, we had analyse the parking tickets over the year 2017 - 2018. RESEARCH PUBLICATIONS

1. “A REVIEW OF TEXT MINING TECHNIQUES AND APPLICATIONS” in International Journal of Computer (IJC) (2017) Volume 24, No 1, pp 170-176.

2. “EXAMINATION QUESTION CLASSIFIER: A NOVEL APPROACH TO APPLYING NLP IN ACADEMIC APPLICATION” in International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue 04 Apr’17 CERTIFICATES

• Object Oriented Programming Using C# NIIT • Python for Data Science Edx & UC San Diego, Feb ’19 – Apr’19

• Web Application Development Using ASP.NET MVC5 NIIT

Contact this candidate