Lihong He
**** * ***** ****** *** *** – Philadelphia, PA, 19121 – USA
Æ +1-267-***-**** Q adijz4@r.postjobfree.com lihong-hehe
• 5+ years of qualified experience in applying Machine Learning techniques and data analysis.
• Strong knowledge of Classification, Clustering and Regression techniques.
• Experienced in data crawling, data extraction, data cleaning, and data analyzing.
• Proficient in data visualization tools such as Highcharts and Matplotlib.
• Extensive hands-on with Python libraries such as Scikit-Learn, PyTorch, NumPy, SciPy, Pandas.
• Proficient with Python, Java, SQL, and Matlab.
Education
Temple University Philadelphia, PA
Ph.D. candidate of Computer Science, GPA: 3.93/4.00 Jan 2016–Present
+ Research Interest: Machine Learning, NLP, Data Mining, and Information Retrieval. Huazhong University of Science and Technology Wuhan, China Master of Communication & Information Engineering, GPA: 83/100 Sep 2010–Mar 2013 Huazhong University of Science and Technology Wuhan, China Bachelor of Electronics & Information Engineering, GPA: 88/100 Sep 2006–Jun 2010 Work Experience
Nokia Bell Labs New Providence, NJ
Data Scientist Intern Jun-Aug 2020
Achieving automatic log parsing with focus on string clustering:
+ Implemented a tokenizer for log files.
+ Grouped log strings into different clusters by sentence embedding and traditional clustering algorithms (e.g., K-Means, DBSCAN, and Hierarchical Clustering).
+ Developed a demo to support the process of log parsing. Environment: Python, JSON, CSV, and Excel.
Temple University, Department of Computer Science Philadelphia, PA Research Assistant Jan 2016-Present
Doing research in Machine Learning, NLP, data collecting and analyzing:
+ Retrieved entities from Web by combining Deep Learning techniques (e.g., BiLSTM, CNN) and Regular Expressions.
+ Predicted comment volume for a news article with Random Forest, SVM, Neural Network, Linear Regression.
+ Extracted named entities and keywords from tweets by NLP tools (e.g., StanfordNER, AlchemyAPI).
+ Implemented a crawler with user-friendly interface, collecting terabytes of data from online news websites.
+ Managed large datasets using Pandas data frames and MySQL.
+ Worked on different types of files (CSV, JSON, HTML, XML, and Excel) for data analysis.
+ Performed data visualization with Matplotlib in Python and Highcharts in JavaScrip. Environment: Python, Java, MySQL, Matlab, JavaScrip, HTML, XML, and JSON. Skills
+ Machine Learning, Deep Learning, NLP, Data Mining, Clustering, Regression, MapReduce
+ Neural Network, Random Forest, SVM, K-Means, DBSCAN, Hierarchical Clustering
+ Python, Java, SQL, Matlab, JavaScrip
+ Scikit-Learn, PyTorch, TensorFlow, NumPy, SciPy, Pandas, Matplotlib, Seaborn Publications
+ L. He et al., "Cannot Predict Comment Volume of a News Article before (a few) Users Read It", ICWSM, 2021.
+ L. He et al., "On the Dynamics of User Engagement in News Comment Media", WIRE, 2019.
+ S. Zhang, L. He et al., "How to Invest my Time: Lessons from Human-in-the-Loop Entity Extraction", KDD, 2019.
+ S. Zhang, L. He et al., "Regular Expression Guided Entity Mention Mining from Noisy Web Data", EMNLP, 2018.
+ J. Yuan, L. He, E. Dragut, W. Meng, C. Yu, "Result Merging for Structured Queries on the Deep Web with Active Relevance Weight Estimation", Information Systems, 2017.
+ A. Schneider, L. He, A. Mukherjee, E. Dragut, "COIN – An Inexpensive and Strong Baseline for Predicting Out of Vocabulary Word Embeddings", NAACL, 2021 (under review).
+ L. He et al., "Retrieving User Comments from the Web", WWW, 2021 (under review). Projects
Extraction of Entity Mention from Noisy Web Data:
+ Proposed a hybrid framework which combines the expressive power of regular expressions, ability of deep learning, and human-in-the-loop approach for entity extraction from web data.
+ Investigated the impact of regular expressions and human annotations to the proposed deep neural network.
+ Studied the trade-offs between spending time to write regular expressions and manually label text fragments. Prediction of Comment Volume for News Article:
+ Came up with some new novel features in the prediction of user comment volume.
+ Explored the ability of multiple machine learning techniques (Random Forest, SVM, Neural Network, Linear Regression) on the prediction task.
+ Analyzed the influence of early user commenting activity to the eventual comment volume in a news article. Topic-Based Tweets Classification Using Higher Level Features:
+ Extracted named entities and keywords from tweets to build higher level features.
+ Compared the performance of multiple classifier (SVM, Random Forest, ANN, CNN) for topic modeling of tweets. Crawling User Comments from the Web:
+ Identified the commenting system (e.g. Livefyre, Disqus) incorporated by a web page.
+ Devised an automatic solution for extracting user comments for websites at scale. Article & Comment Collection and Data Analysis for News Outlets:
+ Implemented a crawler with user-friendly interface to extract articles and comments from news outlets.
+ Analyzed the user interest and commenting activity across news outlets. Awards
Summer Research Award 2019 - the most distinguished award that Temple University offers to graduate students to encourage their pursuit of research activity in the summer. Leadership
Supervisor of the Crawler Interface: Supervised two undergraduates to beautify the interface of our crawler. Supervisor of Comment Collection: Supervised three master students for Disqus, Viafoura, and Coral Talk. Head Teacher of an Undergraduate Class: Helped the undergraduate students in both study and daily life.