Machine learning

Location:

Queens, NY

Posted:

October 20, 2020

Contact this candidate

Resume:

Jiang

646-***-****

**********@*****.***

GitHub

CERTIFICATIONS

Machine Learning

NN and Deep Learning

NN Hyperparameter Tuning

Convolutional NN

Sequence Models

SKILLS

Programming languages

Python

JavaScript

Java

MIPS

ML and Data Analysis

Libraries

NLTK

Gensim

TensorFlow

PyTorch

Scikit-learn

OpenCV

NumPy

Pandas

Matplotlib

Seaborn

Technologies

SQL

Git

Linux

Key Courses Taken

Computer Vision

Artificial Intelligence

Analysis of Algorithms

Software Engineering

Data Base

Operating System

Computational Geometry

Graph Theory

Computer Networks

Calculus 2, 3, 4

Probability Theory

Linear Algebra

Extra-Curriculum

Archer Club

EDUCATION

• Stony Brook University Sep 2016 – Aug 2020

• Bachelor of Science, Computer Science and Applied Mathematics and Statistics double major

• Overall GPA: 3.37

WORK EXPERIENCE

Machine Learning Intern at UPMONTH, (June 2020 – Sep 2020)

• I was a machine learning intern for a small technology company who improved the search engine of current SaaS product by leveraging the power of NLP.

• Our CEO asked me to think of a way to extract concepts from documents and use it to improve current search engine.

• This is a single person project. I started off by researching about the state-of-the-art methods for document embedding. I ended up implementing EmbedRank, which is based on using Doc2Vec to transform document and key phrase into the same vector space. I first built a data pipeline to extract text from documents and preprocess it to clean form. Then I implemented key phrase extractor using POS-Tagging method. Next I trained Doc2Vec from scratch using the 100k documents available in our database. Lastly, I implemented MMR (maximal marginal relevance) to bring in diversity (similarity reduction) when ranking the key phrases.

• This project was a success. The old search engine relies heavily on Regular Expressions and Inverse Document Frequency. My Doc2Vec model embeds document to high dimension vector space which can be used to match conceptual queries from user. The model was also used on returning list of relevant documents with corresponding rank when customer was typing up a new document. When user hover over the list of relevant documents they can see the top key phrases extracted from that document. It also provided the freedom to expand the breadth of search capabilities without substantially increasing or rethinking the footprint devoted to search.

PROJECTS

Congressional Boundary Redistricting System (Jan 2020 – May 2020)

• My focus in this team is to collect and clean real-world dirty data.

• Our goal is to build a web application to show the congressional boundaries of United States and to prevent gerrymander (unfair districting).

• With team of 4, we aim to build an interactive web application that aids user to edit the congressional boundary and increase fairness. Classifying Hand Signs Using Self-Implemented CNN (Jan 2020)

• Built my own CNN using NumPy. Some key CNN functions include zero padding, forward and backward propagation for Conv layer, forward and backward propagation for Pooling layer.

• Used Conv2D->MaxPool->Conv2D->MaxPool->FullyConnected network structure. I used RELU for all Conv layers and SoftMax for the output layer. ss

• Used Adam Optimizer to optimize the loss. I trained the model on mini batches to speed up convergence.

• The final training accuracy is 94% and test accuracy is 80%.

• Learned the fundamental of CNN and the components of it.

Contact this candidate