Data Science Trainee

Location:

Posted:

October 16, 2019

Resume:

Professional Summary:

I am a Data science aspirant and chatbot enthusiastic with expertise in creating data regression models using predictive data modelling. Deep knowledge of statistical methods, data analytics, database creation, managing coupled data along with strong communicational & coordination skills. I have acquired skills of Data Science: Machine learning, Exploratory Data Analysis, Programming with Python libraries both through classroom training & through online courses.

Education:

Graduation:

B.Tech (Computer Science and Engineering) - Anurag College of Engineering - 2011 to 2015

Intermediate:

Narayana Jr. College- 2009 to 2011

Work Experience:

Company : Data Jango

Role : Data Science Trainee

Duration : 27 January 2019 – Present

Roles and Responsibilities : Collected Data for stock market prediction by understanding data and attributes, its distribution and relationship with other attribute.

End-To-End Model building for a multi-classification problem Water plant working condition, achieved an accuracy of 82% with Random Forest.

Working on POC’s for various projects.

Worked on IMDB dataset, data preprocessing, lemmatized corpus and used it to build count vectorizer and Term Frequency-Inverse Document Frequency model got an accuracy of 92% on train set.

Applied Hashing on the corpus and got an accuracy of 89% (it was slightly over fitted model and currently fine-tuning the model).

Built a chatbot using Google’s dialogflow.

Built a customized Named Entity Recognizer (NER) with Spacy.

DATA SCIENCE TOOLKIT:

Machine Learning:

Supervised learning:

Classification: Logistic Regression, K-Nearest Neighbors, SVM, Decision Trees and Naive Bayes algorithm, Ensemble Classifiers: Bagging, Random Forest, Gradient Boosting, ADA Boost and Voting Classifiers

Regression: Linear Regression, Ridge, Lasso and Elastic Net Regression, SVR, Ensemble Regressors: Gradient Boosting, Random Forest

Anomaly Detection: Gaussian distribution, Multivariate normal distribution and Oversampling anomalous records using SMOTE

Unsupervised learning:

Clustering: K-Means clustering, Hierarchical clustering

Dimensionality Reduction: Principal Component Analysis (PCA), Locally Linear Embedding (LLE)

Solvers: Stochastic Gradient Decent

Natural Language Processing:

Text Processing: Cleaning text, Tokenizing, Removing special characters & stop words, Expanding contractions, Case conversions, Correcting words, Stemming, Lemmatization, POS Tagging and Parsing

Text Classification, Summarization: Feature Extraction, Topic modeling, Automated document summarization, Text Classification, Clustering, Semantic Analysis & Sentiment Analysis

Statistical Analysis:

Data Processing : Data transformation, Data quality check

Exploratory Data Analysis: Handling missing values, Feature extraction, Feature transformation and Feature selection.

Correlation, Segmentation, Linear regression and Logistic regression

Tools And Technologies:

Data Science Tools: Python, Sci Kit-Learn, TensorFlow, Keras, Stats Models, Pandas, NumPy, SciPy.

Statistical Concepts: Central Tendency (mean, median, mode), Variance, Standard Deviation, Z-Score,Covariance, Correlation, Central Limit Theorem, Statistical significance, Confidence Interval, P-Value, Chi-Square test, ANOVA.

Deep Learning: DNN, CNN(Image Processing, Object Detection)

NLP: Beautiful Soup, Spacy, NLTK, WordNet and Genism

Data Visualization: Matplotlib and Seaborn.

Database: MySQL

Web Technologies: HTML, CSS, Bootstrap, Java Script

DialogFlow: Experience in building basic chatbots using Google’s DialogFlow.

Projects:

Data Jango Technologies Duration: Apr’19 to Till

Title: IMDB Data Set

TeamSize: 4 Members

Worked on IMDB reviews data set and completely handled pre-processing activity.

Also built Stemmed corpus and lemmatized corpus which is then used to build count vectorizer and Tf-idf model.

Involved in fine-tuning the model and achieved an accuracy of 84% on unseen data.

Data Jango Technologies Duration: Jan’19 to Mar’19

Title: Stock Market Data Set

TeamSize: 4 Members

Analyzed the data provided to build best fit models based on data and provided appropriate insights to business problem.

Found best features and extracted data from database by identifying relations between the data provided.

Built a linear regression model for the data and obtained an accuracy of 86.7% on un-seen data.

Using web services and the sample pickle file the predictions are displayed on the web page.

Tools used: MySql, Work Bench, Python, Jupyter notebook.

Company : Amazon Development Centre (India) Pvt. ltd.

Role : Seller Support Associate.

Joining Date : 29 September 2015 – 28 November 2018.

Accomplishments:

ISTQB-Certified Tester Foundation Level.

Participated in Code Debugging in Anurag College Of Engineering in 2015.

Coordinated Technical fest(2015) in Anurag College of Engineering.

Contact this candidate