Engineer Software

Location:

Richardson, TX

Posted:

February 20, 2017

Contact this candidate

Resume:

Shengyang Su *****.***********@*****.*** 469-***-****

Summary

New M.S. graduate in Computer Science seeking full-time Software Engineer or Data Engineer opportunity

- Hands-on experiences with Web Development, Big Data, Search Engine and Natural Language Processing

- Familiar Languages: Java, SQL, R, Python, PHP, and Scala

- Web Development: Spring Framework, RESTful, HTML/CSS, JavaScript, JQuery, Bootstrap, MySQL

- Data Engineering: Hadoop, MapReduce, HDFS, Spark(Streaming), Hive, Pig, Cassandra, Kafka, MongoDB Education

M.S. in Computer Science

The University of Texas at Dallas August.2014 - December.2016 Dallas, USA Work Experience

Data Engineer Intern Mount Sinai Health System, New York Summer, 2015

- Conducted data mining and analysis using R programming

- Designed and implemented a data pipeline which enabled batch analysis

- Made prediction for certain cancer medicine by training and testing gene expression dataset with machine learning classifiers such as logistic regression, SVM, decision trees etc. Software Engineer Ping An Insurance Group, Shanghai 04/2010-12/2013

- Designed and implemented the internal data processing tools, which is used by team to process the data and analyze large data sets of financial statements

- Back end development, designed the system integration interfaces across platform, implemented system enhancement and developed new features to existing system

- Front end development, used Html/CSS, JavaScript, JQuery(Ajax) to dynamically display url Landing page, searching page, profile creation page

Projects

Spark Streaming Log Aggregation (Scala, Spark Streaming, Apache Kafka, Apache ZooKeeper, MongoDB)

-Implement function to generate the streaming logs including timestamp, advertiser, publisher, website, geo, bid

-Used Apache ZooKeeper and Apache Kafka to receive the streaming message

-Filtered out the invalid data, aggregated logs by publisher and geo, and computed the average bid Library Management Website(PHP, MySQL, JavaScript) The Library Management System is a management software for monitoring and controlling the transactions in a typical library. The project is developed in PHP, MySQL, Bootstrap, and JavaScript, focusing on the various features such as searching, adding to cart, checking in and checking out books, budget and user management. Food Recommendation System (Spark, MLlib, Play Framework)

- Trained the food product dataset using Amazon reviews(userId, productId, rating) with around 60,000 reviews

- Implemented web interface using the Play framework where to rate the product and get recommendations

- Fetched a randomly selected product from Amazon page and parsed it using Lagarto HTML parser

- Used the MLlib ALS recommender to predict the ratings on all the products that haven’t rated. Then showed the top 10 result by rating

IMDB Reviews Analysis(Hadoop, Hive, Pig, Cassandra)

-Implemented Chaining of Map Reduce job along with both in memory and Reduce side join. Achieved desired output using secondary sorting and custom partitioning in MapReduce Job on HDFS

-Implemented various complex Hive, Cassandra queries to gain insightful analytics of IMDB movie database

-Developed different User Defined Functions (UDF) in Pig and Hive to filter data based on various constraints Information Retrieval System, Lyrics Search Engine(Java, Apache Lucene, Spring Framework) Build Web-based search engine for a collection of lyrics crawled from 120000 webpages. Indexed the crawled data using Lucene and built two relevance models. Analyzed search result using Natural Language analysis tools. Optimized the result by clustering in multiple machine learning methods Geo-based Tweet Sentiment Analysis(Python, NLTK)

- Got real-time streaming data with Tweepy, each twitter's id, content and corresponding locations

- Manipulated the data, converted longitude, latitude coordinates to detail address using GeoCoder and Python

- Reorganized data and applied sentiment analysis classifiers based on Naive Bayes

- Implemented K-means algorithm to separate states into clusters with Spark (Scala)

Contact this candidate