Data Scientist

Location:

Sunderland, MA

Salary:

$60000 per year

Posted:

February 15, 2019

Contact this candidate

Resume:

Raka Dalal

*** ******* **** *** **, Sunderland, MA 01375

443-***-**** ****@****.*** rakadalal1409.wordpress.com RakaDalal rakadalal Education

University of Maryland Baltimore County- Baltimore, MD Aug 2016 - Dec 2018 MASTERS IN COMPUTER SCIENCE GPA: 4.0/4.0

Jadavpur University - Kolkata, India Aug 2012 - May 2016 BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND ENGINEERING GPA: 8.16/10.0 Online Certified Courses (UDEMY)

Data Science A-Z, Deep Learning A-Z, Hive for Processing Big Data, Complete & Practical SAS, AWS Concepts, AWS Essentials, Apache Spark Technical Skills

Languages Python(Most Experienced), C, C++, C#, Java, MATLAB, Octave, JavaScript, PHP, HTML, CSS Tools/Frameworks PyTorch, Keras, TensorFlow, Scikit-learn, NumPy, Pandas, NLTK, Gretl, ETL Tools: SAS, SSIS Big Data Frameworks Hadoop, Hive, Apache Spark: RDD, DataFrames, SparkSQL, MLLib, Spark Streaming, GraphX Data Visualization Tableau, Microsoft Excel, Web Visualization: Flask Database Concepts SQL, PL/SQL, PostgreSQL, NoSQL: MongoDB AWS Services IAM, VPC, S3, EC2, RDS, SNS, CloudWatch, ELB, Auto Scaling, Route 53, Lambda Experience

Biosight (Data Scientist) Kissimmee, FL

TOOLS: PANDAS, NUMPY, KERAS, SCIKIT-LEARN, FLASK Nov 2018 - Present

• I built a diagnostic tool for detecting brain cancer using brain MRI images. I have used architectures powered by Neural Networks (CNN) to train the model and have achieved a high training and validation accuracy. I also developed a interactive user interface powered by python Flask to showcase our work. University of Maryland Baltimore County (Graduate Research Assistant) Baltimore, MD TOOLS: NLTK, WORD2VEC, GENSIM, STANFORD DEPENDENCY PARSER, PANDAS, NUMPY Jan 2017 - May 2018

• Createdanend-to-endpipelinethatusesasemi-supervisedbootstraplearningmodeltoextractdifferentrelationsfromlarge-scalecybersecuritytextdataset with limited training samples and populate a knowledge graph. We evaluated our model on the CVE dataset and achieved high accuracy. [Github]

• Automated the process of extracting semantic relation from sensor input by formalizing knowledge as digital twinmodelscomingfromsensorsinindustrial production lines and by introducing a semantic querymechanism. We published this work at IKG 2017. [Publication] General Electric (Research Fellow Intern) GRC, Niskayuna, NY TOOLS: FLASK, NLTK, WORD2VEC, GENSIM, REGEX May 2017 - Aug 2017

• I built a semi-supervisedbootstrap learning based approach to extract relations from huge unstructured text dataset with an iterative client feedback loop. Evaluations over diverse datasets, including aircraft engine maintenance records and a Google relation extraction corpus, showed promising results. I also designed a user Interface using Python Flask to showcase my work and facilitate users to give feedback. Samsung Research & Development (Summer Research Intern) Bangalore, India TOOLS: MATLAB, OCTAVE, PCA, FFT May 2015 - July 2015

• Detected the level of stressofuserwithhighaccuracyusingrawaccelerometerandgyroscopedata.FilteringandPrincipalComponentAnalysiswereapplied on the data to get the resultant signal. Fast Fourier Transform of this final data gave the heart rate which was used to infer the stress level. Indian Institute of Technology Kharagpur (Summer Research Intern) Kharagpur, India TOOLS: POSTGRESQL, NUMPY, SCIKIT-LEARN, TABLEAU, MICROSOFT EXCEL May 2014 - July 2014

• The project involved analysis of GPS data of trucks to detect hotspots based on stopping time. We clustured the data using Density-Based Spatial Clustering

(DBSCAN) and subsequently typecast the data with respect to stopping time distribution, busyness distribution and finally visualized them on Indian roads. Major Projects

Auto-Completion System [Github][Blog Post] [Tools: Pandas, NLTK, Scikit-learn, NumPy] Aug 2018 - Sep 2018

• The project involves building an auto-completion system for customer service representatives by suggesting sentence completions. For this I developed the trigram Katz Backoff model from end to end which gave amazing results in prediction. Finally, this system was wrapped in a HTTP server using Flask. Movie Recommendation System [Github] [Tools: Apache Spark, AWS, Hadoop YARN] Sep 2018 - Oct 2018

• Designed a movie recommendation system using Item Based Collaborative Filtering approach. I have extensively used Apache Spark framework to build the recommendation system and used Amazon’s Elastic MapReduce service to run the script on a cluster with Hadoop YARN. Dialogue Classification [Github] [Tools: Pandas, NumPy, Scikit-learn, Keras, seaborn, matplotlib] Oct 2018 - Nov 2018

• I used Simpsons dataset in Kaggle to build a classifier that predicts the character who has spoken a given dialogue. I did extensive feature engineering

(linguistic and statistical) and spot checked ML classifiers such as Naive Bayes, Logistic Regression, ANN (best performance), CNN and Bidirectional LSTM. House Price Prediction [Github] [Kernel] [Tools: Pandas, NumPy, Scikit-learn, Keras, xgboost] Aug 2018 - Sep 2018

• I worked on the Ames Housing dataset which consisted of 79 explanatoryvariablesdescribingeveryaspectof residential homes in Ames, Iowa. The training data has 1461 datapoints of 81 variables. After extensive feature engineering I predicted the final price of each home by using XGB Regressor. Prediction of academic references for Wikipedia articles [Github][Tools: Beautiful Soup, NLTK, Regex] Jan 2015 - Aug 2016

• Crawled wikipedia for academic CS articles starting from Wikipedia Books to gather interesting statistics about the references in wikipedia articles till 2012. The data was cleaned, analyzed and finally, a statistical model was built using N-grams to predict references that will be added in the future.

Contact this candidate