Data Software Engineer

Location:

Posted:

September 01, 2017

Resume:

AISHWARYA NITIN KAPSE

******@***.*** 949-***-**** https://github.com/aishwaryakapse https://www.linkedin.com/in/aishwaryakapse 3655 Pruneridge Avenue, #21, Santa Clara - 95051

EDUCATION:

Master of Science – Computer Engineering (Computer Software) GPA: 3.74 Jun 2017 University of California, Irvine CA

Relevant Courses: Information Storage, Middleware and Distributed Systems, Next Generation Search Systems, Projects in Databases and Web Applications, Design and Analysis of Algorithms, Advanced System Software, Computer Networks EXPERIENCE:

Software Engineer (Contract) – Python, Scala, Spark, Bash, pyspark May 2017 – Aug 2017 MapR Technologies – 350 Holger Way, San Jose, CA.

Developed proactive support for MapR customers resulting in case deflection and resource conservation.

• Constructed data lake in MapR File System from customer logs using rsync Linux utility, SFTP, and bash scripting.

• Indexed logs from data lake on Elasticsearch using Fluentd for visualization on Kibana.

• Calculated aggregate statistics by running SQL queries on the indexed data using Spark-SQL, pyspark and Hive.

• Achieved logs storage as POC from MapR Streams to MapR-DB using Spark Streaming and Kafka API. Informatics Intern – Python, Django Framework, Docker, Jenkins Oct 2016 – Mar 2017 Zymo Research Corp – Irvine, CA

Automated data collection and filtering. Implemented storage of hierarchical data for faster access.

• Accomplished data collection and filtering of DNA data from domains like NCBI and ArrayExpress.

• Realized data storage in Amazon S3 and HBase on Amazon EMR.

• Implemented nested set model to store hierarchical bioinformatics data in MySQL using Django Framework.

Graduate Student Researcher May 2016 – Sep 2016

Donald Bren School of Information and Computer Sciences – University of California, Irvine, CA

Achieved crawling of crawler-unfriendly AJAX generated websites for the Cloudberry project.

• Extracted dynamic content related to zika virus from domains such as “healthmap.org” and “promedmail.org”.

• Employed open source crawlers like crawljax and Scrapy with splash.

• Accomplished collection of live twitter feeds using Twitter Streaming API and Apache Kafka. Movie Recommendations from MovieLens Data Set – Scala, Apache Spark Mar 2016 – Jun 2016

• Generated recommendations using Item-Based Collaborative Filtering and Cosine Similarity on one million ratings.

• Achieved better performance results compared to results with Alternating Least Squares model inbuilt in MLlib. Persistent Storage of Access Logs – Scala, Apache Spark Mar 2016 – Jun 2016

• Simulated real-time generation of access logs using netcat utility and huge log file integrated with Apache Kafka.

• Achieved information extraction and storage using Spark Streaming, regex, and Cassandra database. Search Engine for UCI ICS Domain using Java Jan 2016 – Mar 2016

• Crawled the content on ics.uci.edu domain using crawler4j and built an inverted index over the data.

• Ranked the results based on term frequency, inverse document frequency, HTML tags, URL data and length. E-commerce movie shopping website – Java, JavaScript Jan 2016 – Mar 2016

• Designed Ecommerce website with support for add, delete, search, update, and shopping cart.

• Utilized HTML, CSS, JavaScript, Servlets, JSPs, and MySQL database. Deployed website on AWS.

• Optimized performance using prepared statements, load-inline functions and extended DB using SAX parsing. TECHNICAL SKILLS:

Programming: Java, Python, Scala, SQL, Bash

Operating Systems: Linux, Unix

Big Data Technologies: Apache Spark, Hadoop 2.0, MapReduce, Spark Streaming, MapR File System, MapR Streams, Hive Databases: MySQL, Cassandra, HBase, MapR-DB

Tools: Amazon S3, Amazon EC2, Amazon EMR, MLLib, Jenkins, Docker, Elasticsearch, Fluentd, Eclipse, IntelliJ Web Crawlers: crawljax (for AJAX - Java), Scrapy with splash (Python) Web Development: HTML5, CSS3, JSP, Servlets, JavaScript, Django Framework

Contact this candidate