Email: email@example.com Data Scientist Contact: 778-***-****
LinkedIn: www.linkedin.com/in/parikh-sagar GitHub: www.github.com/sagarparikh2013 TECHNICAL SKILLS:
• Programming: Python, Java (J2SE)
• Machine Learning: Natural Language
Processing, Neural Networks, Deep
• Data Visualization: Tableau, D3.js
• Web/Cloud Technologies: Amazon Web
Services (Cloud), JS, REST services
• Model Interpretations: LIME, SHAP
• Frameworks & Tools: Pyspark, Hadoop,
NLTK, Spark ML, sklearn
• Databases: MySQL, MongoDB (NoSQL),
Data Scientist, Insurance Corporation of British Columbia, Vancouver (1 Year) May 2019 – April 2020
• Lead in various Natural Language Processing (NLP) applications including extracting information and classification tasks on Claims data (millions of rows of data) for internal business targets.
• Currently working on implementing BERT with custom SentencePiece tokenizer.
• Utilized transfer learning techniques for deep learning models on structured and unstructured data.
• Reviewed Language Model code and created a visualization script to evaluate its performance.
• Created a front-end portal for internal data labelling task and created a model to retain only relevant information – effectively reducing the training data by 60%.
• Methodologies used: TF-IDF vectors using Linear models (Logistic Regression, Naïve Bayes), Tree- based models (GBT, Random Forest). Deep Neural Networks (CNNs, LSTMs, BERT) using custom trained GloVe Word Embeddings.
• Technology stack: Pyspark, Spark ML, Keras, Pytorch, LIME, SHAP, sklearn
• Results: Improved the AuPR performance over structured data from 58% to 72% and presented the progress to multiple stakeholders within ICBC.
Cloud Developer Intern, SoftVan, Ahmedabad Sept 2017 – Dec 2017
• Created an end-to-end application (Missing Persons Portal) with front-end and back-end capabilities and deployed it on AWS Cloud with Auto-Scaling feature.
• It’s a web portal to file complaint for missing persons and to view status of the complaint.
• Implemented various AWS Cloud Services (EC2, S3, RDS, Lambda and Elastic BeanStalk) and made using RESTful web services in Java.
Toxic Comments Classification - Kaggle (NLP)
• Detecting various types of toxicity such as threats, obscenity, insults etc. from Wikipedia’s talk page edits comments dataset.
• Experimented with various pre-processing and text cleaning procedures to extract features.
• Started with TF-IDF models and continued with more advanced models using word vectors and dropout techniques. Best performance was achieved with LSTM + Glove Embeddings. FootWizard – Football matches outcome prediction
• Scraped and extracted data from multiple sources and APIs to predict outcome of EPL football matches.
• Cleaned the data, performed feature engineering by analysing the dataset, and tried out various models. Most imp features were players’ FIFA score, team’s last 5 score, home or away and so on.
• Deployed the best model with an interactive Flask web app to generate real-time EPL matches predictions. Project Link
Automatic Question Generation (NLP)
• Selecting topically important sentences from text documents, performing Gap Selection using Stanford Parser extracted NP & ADJP from important sentences as candidate gaps.
• Utilized NLTK parser & grammar syntax logic for generating questions of fill in the blanks type.
• Question Classification was done with pre-trained SVM classifier. Outputs report in PDF, HTML and email.
Exploratory Data Analysis on Amazon’s Product Reviews Dataset
• Visual analysis of 143 million product reviews spanning multiple categories over a span of 20 years.
• Performed ETL using scalable algorithms running in parallel in a multi-node cluster leveraging efficient caching & join techniques. Reduced data from 90GB to 20GB.
• Uncovering correlations & interesting trends between various features such as day of week with most reviews, month-wise trends, helpfulness of reviews
• Tools: Big Data tools such as Pyspark, Parquet. Data Visualization in Tableau: Public Dashboard Link Drowsy Driving Detection System
• Designed an application which tracks a car driver’s eyes and sets off an alarm if driver found drowsy.
• Applied Computer Vision techniques on live feed from the dashboard camera to determine the percentage of openness of eyes and alert the driver as well as send a notification to the administrators (using AWS SNS) as soon as the level goes below the set threshold. EDUCATION:
MSc. in Computer Science - Big Data Specialization, Simon Fraser University, B.C. Sept 2018 - April 2020 Courses: Natural Language Processing, Big Data I, Big Data II, Algorithms for Big Data, Machine Learning, Statistics B.E. in Information Technology, Gujarat Technological University, Ahmedabad Aug 2014 – May 2018 Courses: Artificial Intelligence, Big Data, Data Mining and Business Intelligence, Cloud Computing, Web Technologies