Data Scientist

Location:

Burnaby, BC, Canada

Salary:

95000

Posted:

March 10, 2020

Contact this candidate

Resume:

Sagar Parikh

Email: ***************@*****.*** Data Scientist Contact: 778-***-****

LinkedIn: www.linkedin.com/in/parikh-sagar GitHub: www.github.com/sagarparikh2013 TECHNICAL SKILLS:

• Programming: Python, Java (J2SE)

• Machine Learning: Natural Language

Processing, Neural Networks, Deep

Learning

• Data Visualization: Tableau, D3.js

• Web/Cloud Technologies: Amazon Web

Services (Cloud), JS, REST services

• Model Interpretations: LIME, SHAP

• Frameworks & Tools: Pyspark, Hadoop,

NLTK, Spark ML, sklearn

• Databases: MySQL, MongoDB (NoSQL),

Cassandra

EXPERIENCE:

Data Scientist, Insurance Corporation of British Columbia, Vancouver (1 Year) May 2019 – April 2020

• Lead in various Natural Language Processing (NLP) applications including extracting information and classification tasks on Claims data (millions of rows of data) for internal business targets.

• Currently working on implementing BERT with custom SentencePiece tokenizer.

• Utilized transfer learning techniques for deep learning models on structured and unstructured data.

• Reviewed Language Model code and created a visualization script to evaluate its performance.

• Created a front-end portal for internal data labelling task and created a model to retain only relevant information – effectively reducing the training data by 60%.

• Methodologies used: TF-IDF vectors using Linear models (Logistic Regression, Naïve Bayes), Tree- based models (GBT, Random Forest). Deep Neural Networks (CNNs, LSTMs, BERT) using custom trained GloVe Word Embeddings.

• Technology stack: Pyspark, Spark ML, Keras, Pytorch, LIME, SHAP, sklearn

• Results: Improved the AuPR performance over structured data from 58% to 72% and presented the progress to multiple stakeholders within ICBC.

Cloud Developer Intern, SoftVan, Ahmedabad Sept 2017 – Dec 2017

• Created an end-to-end application (Missing Persons Portal) with front-end and back-end capabilities and deployed it on AWS Cloud with Auto-Scaling feature.

• It’s a web portal to file complaint for missing persons and to view status of the complaint.

• Implemented various AWS Cloud Services (EC2, S3, RDS, Lambda and Elastic BeanStalk) and made using RESTful web services in Java.

PROJECTS:

Toxic Comments Classification - Kaggle (NLP)

• Detecting various types of toxicity such as threats, obscenity, insults etc. from Wikipedia’s talk page edits comments dataset.

• Experimented with various pre-processing and text cleaning procedures to extract features.

• Started with TF-IDF models and continued with more advanced models using word vectors and dropout techniques. Best performance was achieved with LSTM + Glove Embeddings. FootWizard – Football matches outcome prediction

• Scraped and extracted data from multiple sources and APIs to predict outcome of EPL football matches.

• Cleaned the data, performed feature engineering by analysing the dataset, and tried out various models. Most imp features were players’ FIFA score, team’s last 5 score, home or away and so on.

• Deployed the best model with an interactive Flask web app to generate real-time EPL matches predictions. Project Link

Automatic Question Generation (NLP)

• Selecting topically important sentences from text documents, performing Gap Selection using Stanford Parser extracted NP & ADJP from important sentences as candidate gaps.

• Utilized NLTK parser & grammar syntax logic for generating questions of fill in the blanks type.

• Question Classification was done with pre-trained SVM classifier. Outputs report in PDF, HTML and email.

Exploratory Data Analysis on Amazon’s Product Reviews Dataset

• Visual analysis of 143 million product reviews spanning multiple categories over a span of 20 years.

• Performed ETL using scalable algorithms running in parallel in a multi-node cluster leveraging efficient caching & join techniques. Reduced data from 90GB to 20GB.

• Uncovering correlations & interesting trends between various features such as day of week with most reviews, month-wise trends, helpfulness of reviews

• Tools: Big Data tools such as Pyspark, Parquet. Data Visualization in Tableau: Public Dashboard Link Drowsy Driving Detection System

• Designed an application which tracks a car driver’s eyes and sets off an alarm if driver found drowsy.

• Applied Computer Vision techniques on live feed from the dashboard camera to determine the percentage of openness of eyes and alert the driver as well as send a notification to the administrators (using AWS SNS) as soon as the level goes below the set threshold. EDUCATION:

MSc. in Computer Science - Big Data Specialization, Simon Fraser University, B.C. Sept 2018 - April 2020 Courses: Natural Language Processing, Big Data I, Big Data II, Algorithms for Big Data, Machine Learning, Statistics B.E. in Information Technology, Gujarat Technological University, Ahmedabad Aug 2014 – May 2018 Courses: Artificial Intelligence, Big Data, Data Mining and Business Intelligence, Cloud Computing, Web Technologies

Contact this candidate