Jiang
He
**********@*****.***
GitHub
CERTIFICATIONS
Machine Learning
NN and Deep Learning
NN Hyperparameter Tuning
Convolutional NN
Sequence Models
SKILLS
Programming languages
Python
JavaScript
Java
MIPS
C
ML and Data Analysis
Libraries
NLTK
Gensim
TensorFlow
PyTorch
Scikit-learn
OpenCV
NumPy
Pandas
Matplotlib
Seaborn
Technologies
SQL
Git
Linux
Key Courses Taken
Computer Vision
Artificial Intelligence
Analysis of Algorithms
Software Engineering
Data Base
Operating System
Computational Geometry
Graph Theory
Computer Networks
Calculus 2, 3, 4
Probability Theory
Linear Algebra
Extra-Curriculum
Archer Club
EDUCATION
• Stony Brook University Sep 2016 – Aug 2020
• Bachelor of Science, Computer Science and Applied Mathematics and Statistics double major
• Overall GPA: 3.37
WORK EXPERIENCE
Machine Learning Intern at UPMONTH, (June 2020 – Sep 2020)
• I was a machine learning intern for a small technology company who improved the search engine of current SaaS product by leveraging the power of NLP.
• Our CEO asked me to think of a way to extract concepts from documents and use it to improve current search engine.
• This is a single person project. I started off by researching about the state-of-the-art methods for document embedding. I ended up implementing EmbedRank, which is based on using Doc2Vec to transform document and key phrase into the same vector space. I first built a data pipeline to extract text from documents and preprocess it to clean form. Then I implemented key phrase extractor using POS-Tagging method. Next I trained Doc2Vec from scratch using the 100k documents available in our database. Lastly, I implemented MMR (maximal marginal relevance) to bring in diversity (similarity reduction) when ranking the key phrases.
• This project was a success. The old search engine relies heavily on Regular Expressions and Inverse Document Frequency. My Doc2Vec model embeds document to high dimension vector space which can be used to match conceptual queries from user. The model was also used on returning list of relevant documents with corresponding rank when customer was typing up a new document. When user hover over the list of relevant documents they can see the top key phrases extracted from that document. It also provided the freedom to expand the breadth of search capabilities without substantially increasing or rethinking the footprint devoted to search.
PROJECTS
Congressional Boundary Redistricting System (Jan 2020 – May 2020)
• My focus in this team is to collect and clean real-world dirty data.
• Our goal is to build a web application to show the congressional boundaries of United States and to prevent gerrymander (unfair districting).
• With team of 4, we aim to build an interactive web application that aids user to edit the congressional boundary and increase fairness. Classifying Hand Signs Using Self-Implemented CNN (Jan 2020)
• Built my own CNN using NumPy. Some key CNN functions include zero padding, forward and backward propagation for Conv layer, forward and backward propagation for Pooling layer.
• Used Conv2D->MaxPool->Conv2D->MaxPool->FullyConnected network structure. I used RELU for all Conv layers and SoftMax for the output layer. ss
• Used Adam Optimizer to optimize the loss. I trained the model on mini batches to speed up convergence.
• The final training accuracy is 94% and test accuracy is 80%.
• Learned the fundamental of CNN and the components of it.