Data Assistant

Location:

Seattle, WA

Posted:

January 26, 2018

Contact this candidate

Resume:

KHYATI PAREKH

**** *** *** ** * +1-973-***-**** * ********@**.***

RELEVANT SKILLS

• Languages: Python, R, HTML, CSS, C/C++, JavaScript

• Tools: Tableau, Unity 3D, MS Office

• Cloud: Amazon Web Services, Microsoft Azure

• Databases: SQLite, MySQL, AsterixDB, SQL server, Hadoop, Redshift, Spark EDUCATION

2016-Ongoing MS in Data Science University of Washington, Seattle, Washington Relevant courses: Applied statistics, Data visualization, Database management, Introduction to Statistics and Probability, Machine learning, Data Visualization 2012-2016 B.Tech in Computer Science Nirma University, India EXPERIENCE

Graduate Research Assistant – University of Washington June ‘17 – Sept ‘17 Analyzing and visualizing results from the data taken from the syntrophic mutualism experiment between Desulfovibrio vulgaris Hildenborough (DvH) and the archaeon Methanococcus maripaludis (Mmp) using Python. Data includes the change in the codon of the evolved and original cultures. Used machine learning to find out which genes affected the mutation the most and bokeh plots to visualize the results effectively. Research Assistant – Indian Institute of Management, Ahmedabad Jan ‘16 – May ‘16 As a part of my internship, I was part of a team that created part of a Mobile Area Network (MANET) formed by users in a campus community. We developed simulations and evaluations of scheduling algorithms using Java to design an effective data sharing application. Data sharing included defining and implementing data model and data transformation for uploaded data.

PROJECTS

Visualization of Fan Fiction text data

The project consisted of creating interactive visualizations for data extracted from fanfiction.net. We used javascript and d3 for the visualizations and JSON files to store the data from the website. We also embedded Tableau sheets onto HTML webpages to visualize several statistics.

Python vs PySpark

In this project, I compared scalable data analysis in Python vs in PySpark. To compare the runtimes of a dataset in Python and Pyspark, I stored my files in an S3 bucket and read my csv files from there into Python and Pyspark respectively. To better compare them, I gradually increased the file size from 100MB to 10GB to see the difference in performance of Pyspark and Python.

Capstone Project: Amplero Time Series analysis - Ongoing Our goal is to predict the future behavior of customers from multivariate time series. Using, Machine Learning and/or Deep Learning methods(section Technical Approach) we will find out the churners and non-churners, and the probability of a customer either switching their existing plan or leaving the network altogether(state transitioning). Also, we will conduct research to measure the similarity between two time series depending on magnitude by doing Value comparison, Trend comparison, Distribution comparison, Distance Analysis, and other Statistical Test like t-test. EXTRA CURRICULUR ACTIVITES

GPSS Coordinator at FIUTS

Currently hold an executive position on the board of trustees of FIUTS (Foundation for International Understanding Through Students). FIUTS is a non-profit organization at the University of Washington for incoming international students. Percussion ensemble

Part of the percussion ensemble at the University of Washington under the guidance of Dr. Bonnie Whiting. The group focuses on contemporary music of many genres composed for percussion ensemble.

Contact this candidate