Python, SQL, Java, Machine Learning, Data Mining, Data Visualization

Location:

Philadelphia, PA

Posted:

December 10, 2020

Contact this candidate

Resume:

Lihong He

**** * ***** ****** *** *** – Philadelphia, PA, 19121 – USA

Æ +1-267-***-**** Q ******.**.**@*****.*** lihong-hehe

• 5+ years of qualiﬁed experience in applying Machine Learning techniques and data analysis.

• Strong knowledge of Classiﬁcation, Clustering and Regression techniques.

• Experienced in data crawling, data extraction, data cleaning, and data analyzing.

• Proﬁcient in data visualization tools such as Highcharts and Matplotlib.

• Extensive hands-on with Python libraries such as Scikit-Learn, PyTorch, NumPy, SciPy, Pandas.

• Proﬁcient with Python, Java, SQL, and Matlab.

Education

Temple University Philadelphia, PA

Ph.D. candidate of Computer Science, GPA: 3.93/4.00 Jan 2016–Present

+ Research Interest: Machine Learning, NLP, Data Mining, and Information Retrieval. Huazhong University of Science and Technology Wuhan, China Master of Communication & Information Engineering, GPA: 83/100 Sep 2010–Mar 2013 Huazhong University of Science and Technology Wuhan, China Bachelor of Electronics & Information Engineering, GPA: 88/100 Sep 2006–Jun 2010 Work Experience

Nokia Bell Labs New Providence, NJ

Data Scientist Intern Jun-Aug 2020

Achieving automatic log parsing with focus on string clustering:

+ Implemented a tokenizer for log ﬁles.

+ Grouped log strings into diﬀerent clusters by sentence embedding and traditional clustering algorithms (e.g., K-Means, DBSCAN, and Hierarchical Clustering).

+ Developed a demo to support the process of log parsing. Environment: Python, JSON, CSV, and Excel.

Temple University, Department of Computer Science Philadelphia, PA Research Assistant Jan 2016-Present

Doing research in Machine Learning, NLP, data collecting and analyzing:

+ Retrieved entities from Web by combining Deep Learning techniques (e.g., BiLSTM, CNN) and Regular Expressions.

+ Predicted comment volume for a news article with Random Forest, SVM, Neural Network, Linear Regression.

+ Extracted named entities and keywords from tweets by NLP tools (e.g., StanfordNER, AlchemyAPI).

+ Implemented a crawler with user-friendly interface, collecting terabytes of data from online news websites.

+ Managed large datasets using Pandas data frames and MySQL.

+ Worked on diﬀerent types of ﬁles (CSV, JSON, HTML, XML, and Excel) for data analysis.

+ Performed data visualization with Matplotlib in Python and Highcharts in JavaScrip. Environment: Python, Java, MySQL, Matlab, JavaScrip, HTML, XML, and JSON. Skills

+ Machine Learning, Deep Learning, NLP, Data Mining, Clustering, Regression, MapReduce

+ Neural Network, Random Forest, SVM, K-Means, DBSCAN, Hierarchical Clustering

+ Python, Java, SQL, Matlab, JavaScrip

+ Scikit-Learn, PyTorch, TensorFlow, NumPy, SciPy, Pandas, Matplotlib, Seaborn Publications

+ L. He et al., "Cannot Predict Comment Volume of a News Article before (a few) Users Read It", ICWSM, 2021.

+ L. He et al., "On the Dynamics of User Engagement in News Comment Media", WIRE, 2019.

+ S. Zhang, L. He et al., "How to Invest my Time: Lessons from Human-in-the-Loop Entity Extraction", KDD, 2019.

+ S. Zhang, L. He et al., "Regular Expression Guided Entity Mention Mining from Noisy Web Data", EMNLP, 2018.

+ J. Yuan, L. He, E. Dragut, W. Meng, C. Yu, "Result Merging for Structured Queries on the Deep Web with Active Relevance Weight Estimation", Information Systems, 2017.

+ A. Schneider, L. He, A. Mukherjee, E. Dragut, "COIN – An Inexpensive and Strong Baseline for Predicting Out of Vocabulary Word Embeddings", NAACL, 2021 (under review).

+ L. He et al., "Retrieving User Comments from the Web", WWW, 2021 (under review). Projects

Extraction of Entity Mention from Noisy Web Data:

+ Proposed a hybrid framework which combines the expressive power of regular expressions, ability of deep learning, and human-in-the-loop approach for entity extraction from web data.

+ Investigated the impact of regular expressions and human annotations to the proposed deep neural network.

+ Studied the trade-oﬀs between spending time to write regular expressions and manually label text fragments. Prediction of Comment Volume for News Article:

+ Came up with some new novel features in the prediction of user comment volume.

+ Explored the ability of multiple machine learning techniques (Random Forest, SVM, Neural Network, Linear Regression) on the prediction task.

+ Analyzed the inﬂuence of early user commenting activity to the eventual comment volume in a news article. Topic-Based Tweets Classiﬁcation Using Higher Level Features:

+ Extracted named entities and keywords from tweets to build higher level features.

+ Compared the performance of multiple classiﬁer (SVM, Random Forest, ANN, CNN) for topic modeling of tweets. Crawling User Comments from the Web:

+ Identiﬁed the commenting system (e.g. Livefyre, Disqus) incorporated by a web page.

+ Devised an automatic solution for extracting user comments for websites at scale. Article & Comment Collection and Data Analysis for News Outlets:

+ Implemented a crawler with user-friendly interface to extract articles and comments from news outlets.

+ Analyzed the user interest and commenting activity across news outlets. Awards

Summer Research Award 2019 - the most distinguished award that Temple University oﬀers to graduate students to encourage their pursuit of research activity in the summer. Leadership

Supervisor of the Crawler Interface: Supervised two undergraduates to beautify the interface of our crawler. Supervisor of Comment Collection: Supervised three master students for Disqus, Viafoura, and Coral Talk. Head Teacher of an Undergraduate Class: Helped the undergraduate students in both study and daily life.

Contact this candidate