Data Analyst/ Data Scientist

Location:

Bangalore, Karnataka, India

Salary:

350,000 per annum

Posted:

June 12, 2021

Contact this candidate

Resume:

EDUCATION:

**** - **** **** *****

VIKAASA SCHOOL – MADURAI

**** - **** ***** *****

TAGORE VIDHYALAYAM – MADURAI

****-** - ********* ** ************ (B.Arch. - 5 Year Course) RVS SCHOOL OF ARCHITECTURE – DINDIGUL

FROM 2-NOV-2020 – CERTIFIED DATA SCIENTIST

DATA MITES INSTITUTION

BANGALORE

(Course Completed)

RAAJAVIGNESH S

Personal Information:

ADDRESS:

NO.51, K-34, VENKATASAMY LAYOUT

NAGANATHPURA - ELECTRONIC CITY

BANGALORE 560100

PHONE:

+91-805*******

EMAIL:

*********@*****.***

LINKEDIN:

https://www.linkedin.com/in/raajavignes

GitHub:

https://github.com/Raaja6897/Machine_

Learning_Projects

HackerRank:

https://www.hackerrank.com/raaja6897

LANGUAGE:

ENGLISH – FLUENT

FRENCH – LIMITED

TAMIL – NATIVE

TELUGU – MOTHER TONGUE

PROFILE: DATA ANALYST/DATA SCIENTIST

Business-minded data scientist with the ability to deliver valuable insights via data analytics and advanced data-driven methods. Passionate about bringing machine learning, deep learning to Business, and Technology to develop products for solving real-life problems. Thinking from the consumer perspective

CERTIFICATIONS:

MACHINE LEARNING:

INTRODCUTION TO MACHINE LEARNING FROM DUKE UNIVERSITY

(coursera)

PROGRAMMING:

PYTHON FROM MICROSOFT (Edx)

CLOUD:

MICROSOFT AZURE AI PARTICIPATION CERTIFICATE

MANAGEMENT:

INTRODUCTION TO SOFTWARE PRODUCT MANAGEMENT (coursera) DIGITAL MARKETING:

DIGITAL MARKETING CERTIFIED ASSOCIATE FROM SIMPLILEARN PROGRAMMING AND TOOLS:

Languages and Libraries: Python - Pandas, NumPy, Matplotlib, Seaborn, SciPy, Sci-kit learn, NLTK, Tensorflow, Keras

Relational Databases: MySQL

Analytics and BI: Tableau, PowerBI, MS Excel

Statistics: Hypothesis Testing, ANOVA, Chi-Square, Data Distribution - Binomial, Bernoulli, Poisson, Exponential

Machine Learning: Linear Regression, Lasso, Ridge, SVM, Random Forest, Decision Tree, Naïve Bayers, KNN, K-means, Gradient Boosting Classifier, XGBoost, Stochastic Gradient Descent, LightGBM, Catboost. Deep Learning: MLP, LSTM, RNN, CNN,ANN

Skills: Hadoop, Apache Spark, PySpark(Learning), Spark SQL(Learning) PROJECTS:

1. Client Project – Real Time Project from

Rubixe company

Business Problem: Sales prediction of 20+ Product and, To Identify High Potential and Low Potential customers, basically customer who buys the product and who are just enquiring about the product

Solution: The Sources comes from different platforms like call, website, live chat, live chat PPC, Customer Referral etc. This is kind of multiclass but as the data wasn’t sufficient for multiclass. I converted the customer who are converted, Long Term, Previous customer to High Potential by adding a column with values 1 and 0, 1 represents High Potential and 0 represents Low Potential. To identify High and Low potential.

EDA - SciPy, Visualization seaborn, matplotlib

Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison

Accuracy – 94%, AUC Score – 99%, F1 Score-94%

Final Model Selection: Random Forest as the accuracy is 94% and Recall is also 94%. Other models tend to overfit the data. With 100% accuracy and 100% in all validations. 2. COVID-19 Prediction using chest X-Ray: Image

Classification Deep Learning(Convolutional Neural

Network, Deep Learning)

Medical Problem: To find patients infected by covid-19 Using X-Ray.

Solution: Using CNN – VGG16 and also customer CNN, the model was trained with over 3000 image in each class, Normal patient X-Ray image and Covid patient X-Ray images.

Using VGG16 and custom Convolution Layers

For medical problem. The Validation we use is Recall Recall – 92%

Validation – Confusion Matrix and ROC Curve with 96% accuracy.

SKILLS:

• Machine Learning

• Deep Learning

• Python

• SQL

• Fast Learner

• Communication

• Problem Solving

• Product Management

INTEREST:

KNOWLEDGE

MARKETING

PRODUCT MANAGEMENT

ARTIFICIAL INTELLIGENCE

INTERNET OF THINGS (IOT)

QUANTUM COMPUTING

OTHER DETAILS:

D.O.B: 13-07-1997

Marital status: Single

MACHINE LEARNING

MODELS USED:

Regression:

1. Linear Regression,

2. ElasticNet,

3. Support Vector Regressor

4. (SVR from svm), Lasso, Ridge,

5. ExtraTreeRegressor,

6. XGBRegressor,

7. Stochastic Gradient Descent

8. Regressor (SGD),

9. LightGBM Regressor,

10. Gradient Boosting Regressor,

11. Cat Boost Regressor,

12. KNN Regressor,

13. Random Forest Regressor

Classification:

1. Logistic Regression,

2. Naïve Bayers

3. Random Forest,

4. K-Nearest Neighbors,

5. MLP(Neural Network),

6. XGBoost,

7. SVC from SVM,

8. Gradient Boosting Classifier,

9. stochastic gradient descent(SGD),

10. Light Gradient Boosting,

11. Cat Boost

DEEP LEARNING

MODELS USED:

MLP: Multilayer Perceptron

CNN: Convolutional Neural Network

PROJECTS:

3. Advanced House Price Prediction: Regression(Machine Learning)

Business Problem: To predict the price of house price. With the feature like Total area, constructed area, build area, year built, no of rooms, location, garage space etc.

Solution: The problem faced is there was 80 columns, 79 feature and 1 Sales Price column but only 1460 row are present. The feature with null value over 50% are removed and the feature with null eg: Garage- null, states that there is no garage in that house. So replaced null with ‘None’. This help to improve accuracy. As there are only 1460 outliers cannot be remove so I did log transform to reduce the skewness of that features.

Got an accuracy of 88% and Mean Square Error is 0.016498 from Linear Regression.

Validation of Model: Mean Square Error

Accuracy – 88%

4. Liver Patient Prediction(Machine Learning)

Medical Problem: To predict if a patient has liver disease or not. Based on their medical test features.

Solution: Feature include – Age, Total Bilirubin, Direct Bilirubin, Alkaline Phosphates….. Etc. are given

With these test data we were able to find the patients is Liver Disease or Non-Liver patient.

This is a medical problem so we must choose our model, with highest recall.

Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison

Accuracy – 75%, AUC Score – 79%, F1 Score – 85%, Recall-82%

Final Model Selection is Random Forest. This model has the highest recall of 82%.

PROJECTS:

5. Walk or Run Classification using IoT data.(Machine Learning)

Business Problem: To Identify if a person Walk or Run with IoT data from smart watch.

There are accelerometer and gyroscope sensor in the smart watch. With that graphs we should classify if a person Walk or Run

Solution: With Accelerometer X,Y,Z axis and Gyroscope x,y,z axis

We found that when accelerometer x and z spikes positive when the person run.

Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison

Accuracy – 99%, AUC Score – 100%, F1 Score – 99%

Final Model Selection: Most of the model are 99% and 100%.As we have 86000 row data. The model trained very well. In this problem overfitting will not be a problem.

Final Model selected is Random Forest, which is always best in classification problems.

6. Computer Vision: Face Mask Detection using

OpenCV,and using custom trained model with ImageNet.

Over 2000 image of ‘with mask’ and ‘without mask’ is trained using MobileNet with imagenet weightages

From Caffemodel face detection is added to function of detect and predict mask custom function. And my trained image is loaded in H5 format

With OpenCV videostream from laptop camera, each image is taken every second and prediction model is applied, the project is tested with good accuracy score. Range from 70- 99.9% in real time when image is inputted every second.

Contact this candidate