Sign in

Data Scientist

Singapore, West Region, 64, Singapore
June 09, 2018

Contact this candidate


Arun V Sankar


www.linkedin. com/in/arunvsankar

DOB – 26/10/1987

Senior Data Scientist(Contract)- Techvantage Systems : Predictive Modelling Lead Data Scientist (HR Analytics) – UST Global Inc.

• Working with Chief People Officer to leverage organization-wide data for data- driven decisions

• Focuses mainly on employee stagnation, compensation and gender gap, leverage text data, key attrition parameters

Research Assistant (Intern)- National University of Singapore

• Innovative efficiency and stock returns- Analyzed innovative efficiency (IE) of the firms scaled by R&D expenses is a strong predictor of future returns Data Scientist (Intern)- Barghest Building Performance (BBP) Ltd

• Approach temperature estimation model- My model helped in improving the efficiency of the chillers which works for cooling big buildings Probationary Officer, Canara Bank

QA Analyst/Business Analyst, Wipro Technologies

Oct 2016 – Jan 2018

July 2015- Mar 2016

• All India Rank 58 in GATE Industrial Engineering 2013

• 96.14 Percentile in CAT 2012 (Common Admission Test) ACHIEVEMENTS

July 2015- Mar 2016

Jan 2011-April 2013

July 2014- Jan 2015


National University of Singapore- Data Analytics, Masters in Technology CGPA: 7.7/10

College of Engineering Trivandrum- Industrial Engineering, Bachelors in Technology, CGPA: 6.9/10


Jan 2015- June 2016

June 2006- May 2010

• Intro,Intermediate and Expert level certified in both R and Python- DataCamp

• Python for Data Analysis and Visualization–Udemy

• Deep Learning with Python and Keras - Udemy

• Scala and Spark for Big Data and Machine Learning CERTIFICATIONS

Feb 2018 – Apr 2018


Worked with leadership team for data-driven decision making and insight generation from employee data.

These are some of the areas that I have worked related to HR analytics

• Build live dashboards in PowerBI with key indices/metrics

• Scorecard generation for identifying how the true performers fare against others in terms of compensation and career growth

• Predicting the likelihood of a successful recruit for high potential / non high potential positions based on profile of existing high potential and high performing staff

• Employee churn modelling using ML algorithms for which extensive data preparation, cleansing and new variable generations are done mainly in R programming and Excel

• Sentiment analysis and text mining with exit interview data and employee survey which helped to build Manager scorecard

• Analysis of gender diversity within UST

Time Series Forecasting:

I have worked in the commission calculation system for a packaging and parcel giant in Canada where quota setting for each territories are predicted using time series analysis and modelling. Extensive dashboarding and feature generation helped in building a good forecast for future quarters using random forest with a baseline model ANOVA. Innovative Efficiency and Stock Returns

The sample consists of 25 years (1982-2007) patent filings data in US. We find that innovative efficiency (IE) of the firms scaled by R&D expenses is a strong predictor of future returns. Summary statistics of the IE based on different industries revealed significant variations of IE across industries. We sort firms into Low, Medium and High (groups) based on percentiles of the IE in year t-1 and portfolios are formed every year from 1982-2007 and different characteristics like number of firms, R&D Expenses, Size, Market Equity, Momentum of the stocks, Asset Growth etc. are calculated for each groups plus the operating performance and correlation matrix for the variables are evaluated. It is found that High IE portfolios generally provide highest operating performance and significant positive correlation suggests IE measures are potential predictors of operating performance. This project was done mainly on R programming. Investment Risk Analysis in a New Production Division using Monte Carlo Simulation Identified the best alternative (New vs Old production system) using Monte Carlo Simulation method is more realistic since it adds uncertainty measures to all future cash flow by randomly generating (in this scenario; 32000 sets of input) from a range of values randomly selected from a bell curve given mean and standard deviation. Important factors are then evaluated using Tornado and Spider Diagram and a Rainbow Diagram is plotted to identify the breakeven quantity of each project. The variables which were found to be sensitive are used to make a random trial and these values are used to compute the Annual-worth (AW) of each project. Probabilistic Risk Analysis of each alternative is performed and their risk profile is compared to find the best alternative under uncertainty.

Approach Temperature Estimation Model

My model helped in improving the efficiency of the chillers which works for cooling big buildings. Huge amount of data was available from the SCADA system installed in the plant and I was able to segregate out the best possible efficiency the chillers have met in different set of conditions which were used to build a polynomial regression model for the approach temperature which is subsequently fed to the chiller system, thus automating the whole process which was manually been done before.


Contact this candidate