EDUCATION:
VIKAASA SCHOOL – MADURAI
TAGORE VIDHYALAYAM – MADURAI
****-** - ********* ** ************ (B.Arch. - 5 Year Course) RVS SCHOOL OF ARCHITECTURE – DINDIGUL
FROM 2-NOV-2020 – CERTIFIED DATA SCIENTIST
DATA MITES INSTITUTION
BANGALORE
(Course Completed)
RAAJAVIGNESH S
Personal Information:
ADDRESS:
NO.51, K-34, VENKATASAMY LAYOUT
NAGANATHPURA - ELECTRONIC CITY
BANGALORE 560100
PHONE:
EMAIL:
*********@*****.***
LINKEDIN:
https://www.linkedin.com/in/raajavignes
h/
GitHub:
https://github.com/Raaja6897/Machine_
Learning_Projects
HackerRank:
https://www.hackerrank.com/raaja6897
LANGUAGE:
ENGLISH – FLUENT
FRENCH – LIMITED
TAMIL – NATIVE
TELUGU – MOTHER TONGUE
PROFILE: DATA ANALYST/DATA SCIENTIST
Business-minded data scientist with the ability to deliver valuable insights via data analytics and advanced data-driven methods. Passionate about bringing machine learning, deep learning to Business, and Technology to develop products for solving real-life problems. Thinking from the consumer perspective
CERTIFICATIONS:
MACHINE LEARNING:
INTRODCUTION TO MACHINE LEARNING FROM DUKE UNIVERSITY
(coursera)
PROGRAMMING:
PYTHON FROM MICROSOFT (Edx)
CLOUD:
MICROSOFT AZURE AI PARTICIPATION CERTIFICATE
MANAGEMENT:
INTRODUCTION TO SOFTWARE PRODUCT MANAGEMENT (coursera) DIGITAL MARKETING:
DIGITAL MARKETING CERTIFIED ASSOCIATE FROM SIMPLILEARN PROGRAMMING AND TOOLS:
Languages and Libraries: Python - Pandas, NumPy, Matplotlib, Seaborn, SciPy, Sci-kit learn, NLTK, Tensorflow, Keras
Relational Databases: MySQL
Analytics and BI: Tableau, PowerBI, MS Excel
Statistics: Hypothesis Testing, ANOVA, Chi-Square, Data Distribution - Binomial, Bernoulli, Poisson, Exponential
Machine Learning: Linear Regression, Lasso, Ridge, SVM, Random Forest, Decision Tree, Naïve Bayers, KNN, K-means, Gradient Boosting Classifier, XGBoost, Stochastic Gradient Descent, LightGBM, Catboost. Deep Learning: MLP, LSTM, RNN, CNN,ANN
Skills: Hadoop, Apache Spark, PySpark(Learning), Spark SQL(Learning) PROJECTS:
1. Client Project – Real Time Project from
Rubixe company
Business Problem: Sales prediction of 20+ Product and, To Identify High Potential and Low Potential customers, basically customer who buys the product and who are just enquiring about the product
Solution: The Sources comes from different platforms like call, website, live chat, live chat PPC, Customer Referral etc. This is kind of multiclass but as the data wasn’t sufficient for multiclass. I converted the customer who are converted, Long Term, Previous customer to High Potential by adding a column with values 1 and 0, 1 represents High Potential and 0 represents Low Potential. To identify High and Low potential.
EDA - SciPy, Visualization seaborn, matplotlib
Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison
Accuracy – 94%, AUC Score – 99%, F1 Score-94%
Final Model Selection: Random Forest as the accuracy is 94% and Recall is also 94%. Other models tend to overfit the data. With 100% accuracy and 100% in all validations. 2. COVID-19 Prediction using chest X-Ray: Image
Classification Deep Learning(Convolutional Neural
Network, Deep Learning)
Medical Problem: To find patients infected by covid-19 Using X-Ray.
Solution: Using CNN – VGG16 and also customer CNN, the model was trained with over 3000 image in each class, Normal patient X-Ray image and Covid patient X-Ray images.
Using VGG16 and custom Convolution Layers
For medical problem. The Validation we use is Recall Recall – 92%
Validation – Confusion Matrix and ROC Curve with 96% accuracy.
SKILLS:
• Machine Learning
• Deep Learning
• Python
• SQL
• Fast Learner
• Communication
• Problem Solving
• Product Management
INTEREST:
KNOWLEDGE
MARKETING
PRODUCT MANAGEMENT
ARTIFICIAL INTELLIGENCE
INTERNET OF THINGS (IOT)
QUANTUM COMPUTING
OTHER DETAILS:
D.O.B: 13-07-1997
Marital status: Single
MACHINE LEARNING
MODELS USED:
Regression:
1. Linear Regression,
2. ElasticNet,
3. Support Vector Regressor
4. (SVR from svm), Lasso, Ridge,
5. ExtraTreeRegressor,
6. XGBRegressor,
7. Stochastic Gradient Descent
8. Regressor (SGD),
9. LightGBM Regressor,
10. Gradient Boosting Regressor,
11. Cat Boost Regressor,
12. KNN Regressor,
13. Random Forest Regressor
Classification:
1. Logistic Regression,
2. Naïve Bayers
3. Random Forest,
4. K-Nearest Neighbors,
5. MLP(Neural Network),
6. XGBoost,
7. SVC from SVM,
8. Gradient Boosting Classifier,
9. stochastic gradient descent(SGD),
10. Light Gradient Boosting,
11. Cat Boost
DEEP LEARNING
MODELS USED:
MLP: Multilayer Perceptron
CNN: Convolutional Neural Network
PROJECTS:
3. Advanced House Price Prediction: Regression(Machine Learning)
Business Problem: To predict the price of house price. With the feature like Total area, constructed area, build area, year built, no of rooms, location, garage space etc.
Solution: The problem faced is there was 80 columns, 79 feature and 1 Sales Price column but only 1460 row are present. The feature with null value over 50% are removed and the feature with null eg: Garage- null, states that there is no garage in that house. So replaced null with ‘None’. This help to improve accuracy. As there are only 1460 outliers cannot be remove so I did log transform to reduce the skewness of that features.
Got an accuracy of 88% and Mean Square Error is 0.016498 from Linear Regression.
Validation of Model: Mean Square Error
Accuracy – 88%
4. Liver Patient Prediction(Machine Learning)
Medical Problem: To predict if a patient has liver disease or not. Based on their medical test features.
Solution: Feature include – Age, Total Bilirubin, Direct Bilirubin, Alkaline Phosphates….. Etc. are given
With these test data we were able to find the patients is Liver Disease or Non-Liver patient.
This is a medical problem so we must choose our model, with highest recall.
Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison
Accuracy – 75%, AUC Score – 79%, F1 Score – 85%, Recall-82%
Final Model Selection is Random Forest. This model has the highest recall of 82%.
PROJECTS:
5. Walk or Run Classification using IoT data.(Machine Learning)
Business Problem: To Identify if a person Walk or Run with IoT data from smart watch.
There are accelerometer and gyroscope sensor in the smart watch. With that graphs we should classify if a person Walk or Run
Solution: With Accelerometer X,Y,Z axis and Gyroscope x,y,z axis
We found that when accelerometer x and z spikes positive when the person run.
Validation for model: Accuracy Score, Area Under Curve(AUC),Recall, Precision and F-1 Score with ROC Curve Comparison
Accuracy – 99%, AUC Score – 100%, F1 Score – 99%
Final Model Selection: Most of the model are 99% and 100%.As we have 86000 row data. The model trained very well. In this problem overfitting will not be a problem.
Final Model selected is Random Forest, which is always best in classification problems.
6. Computer Vision: Face Mask Detection using
OpenCV,and using custom trained model with ImageNet.
Over 2000 image of ‘with mask’ and ‘without mask’ is trained using MobileNet with imagenet weightages
From Caffemodel face detection is added to function of detect and predict mask custom function. And my trained image is loaded in H5 format
With OpenCV videostream from laptop camera, each image is taken every second and prediction model is applied, the project is tested with good accuracy score. Range from 70- 99.9% in real time when image is inputted every second.