Shanchan Wu
Department of Computer Science, University of Maryland, College Park, MD 20742, USA
H 301-***-**** B ***@**.***.***
Research Interests
Social Media and Social Network Analysis, Information Retrieval, Machine Learning and
Data Mining
Education
University of Maryland, College Park 20062012
Ph.D. Candidate, Computer Science expected graduation: August 2012
Dissertation Topic: Prediction in Social Media for Monitoring and Recommendation
(tentative)
University of Maryland, College Park 20062008
M.S., Computer Science
Tsinghua University, Beijing, China 20012004
Master of Engineering, Department of Automation
Tsinghua University, Beijing, China 19972001
Bachelor of Engineering, Department of Automation
Research and Work Experience
University of Maryland, College Park2008presentResearch Assistant
Advisor: Louiqa Raschid
{ Built a prediction model in microblog (twitter) by learning from a hybrid network.
{ Developed a scalable tool to crawl the user follower network and user information from
Twitter API. It can run coordinately
on multiple servers, efficiently speeding up the data collection.
{ Built a prediction system for blogs. Analyzed the links between different blogs. Built
the profiles for blogs. Developed
multiple solutions and trained a Ranking SVM. Identified a number of features that
improve the prediction accuracy.
{ Processed the blog data and built a retrieval system based on Lucene open source.
Implemented a state-of-art document
similarity metric.
IBM T.J. Watson Research Center05/200908/2009
Summer Research Intern in Information Retrieval
{ Designed and implemented a novel algorithm and a similarity metric to search for Top K
similar sentences from the transla-
tion memory which stores segments of text that have been previously translated. Built a
demo for Top K search for similar
sentences. Implemented the novel search algorithm based on the Lucene lower level API.
Conducted experimental evalua-
tions on real datasets.
{ Set up a Hadoop cluster configuration for cloud computing. Built up the translation
memory index using the Hadoop
cluster.
IBM T.J. Watson Research Center 05/2008 08/2008
Summer Research Intern in Database Systems
{ Studied DB2 z/OS Table Space Allocation Algorithm, which finds available free space to
insert a new record while trying
to maintain data clustering, minimize waste of space and unnecessary space extension.
Developed a simulation tool to
simulate the algorithm. Provided the product and development teams a convenient tool to
optimize the current algorithm.
Built a platform for further development of the simulation of new table space allocation
algorithms. Produced a published
research paper from this work.
{ Improved the current algorithm. Identified several techniques to improve the algorithm.
Implemented some of the tech-
niques and demonstrated better performance. Planed a direction for further improvement.
Yahoo! China 07/2004 07/2006Software Engineer
1/3
{ Led a tool development team. Led a project of building a popular search words ranking
system and the project of building
the new customer center.
{ Applied Yahoo Search Technology to the web publishing system. Improved the system
efficiency.
{ Improved the fetch program for yahoo china, which automatically fetched some particular
data from external web pages,
inserted it into database, and pushed the generated HTML files to the Yahoo China front
servers.
{ Technical support for yahoo china portal revision. Developed some editor tools for the
portal editors, including slide show,
live text/picture broadcasting.
{ Administrated and maintained Yahoo China web publishing system (Yahoo Jake), which was
distributed on a cluster of
Unix servers.
Tsinghua University, Beijing, China 09/200107/2004Research Assistant
{ Designed the technique to analyze the web log sequential patterns. Built a web log
mining and recommendation demo
system. Developed a model to predict web requests.
{ Applied the Rough Set theory to analyze the attribute reduction and discretization of
attributes with real values. Discretiza-
tion is an important preprocessing step of the data mining to make the data mining
techniques useful for the real applications
containing attributes with continuous values.
Academic Honors
{ Privilege to enter the Master program at Tsinghua with Entrance Exams free, 2001
{ General Electric Scholarship, Tsinghua University, 1999-2000
{ Sanwa Bank International Fund Scholarship, Tsinghua University, 1998-1999
{ Excellent Student Scholarship of Tsinghua University, 1997-1998
{ Excellent Freshman Scholarship, Tsinghua University, 1997
Publications
{
Computer Skills
Languages: Java(proficient),C++(proficient), C, Python, Perl, PHP, Unix Shell Script, SQL
Tools and OS: Eclipse, Microsoft Visual Studio, Unix/Linux, Windows
Databases: MySql, SQL Server, DB2References
Available upon request.
3/3