DAVID ARTHUR SMITH
home: office: online:
** *********** ******* ********** ** Computer Science Email:*******@**.*****.***
Amherst, MA 01002, USA University of Massachusetts Amherst
http://www.cs.umass.edu/ dasmith
Phone: +1
413-***-**** 140 Governors Drive
Mobile: +1-410-***-****
Amherst, MA 01003-9264, USA
Phone: +1-413-***-****
Education
Johns Hopkins University
2010 Ph.D. in Computer Science
Advisor:Jason Eisner
National Science Foundation fellowship (2003 6)
Wolman fellowship (2002 3)
Harvard University 1994 A.B. summacumlaude in Classics (Greek)
Harvard National Scholar
Professional Experience
University of Massachusetts Amherst
September 2008Research Assistant Professor, Department of Computer Science,
Center for Intelligent
Information Retrieval
Johns Hopkins University
September 2002September 2008Research Assistant, Department of Computer Science, Center for Language and Speech
Processing
Machine learning for natural language processing: semi-supervised learning and efficient
inference techniques;
syntactic parsing; morphological disambiguation; machine translation and word alignment
Summer Research Workshop, 2003:Member of Syntax for Statistical Machine Translation team
Google, Inc
. May 2005September 2005
Internship in Machine Translation group
Research on improved training and decoding for machine translation
Tufts University
July 1994August 2002
Perseus Digital Library Project
Information retrieval and extraction, named-entity disambiguation, digital libraries,
document layout analysis,
document alignment, morphological analysis
Teaching Experience
Introduction to Natural Language Processing
Fall 2009
Department of Computer Science, University of Massachusetts Amherst (585)
Designer and Instructor
Advanced undergraduate/graduate class with students from computer science and
linguistics; enrollment: 17
Mining Text and Images in Digital Libraries Using Grid Computing Spring 2009
Department of Computer Science, University of Massachusetts Amherst (791MT)
Designer and Instructor, with James Allan and R. Manmatha
Graduate seminar with readings and final project; enrollment: 10
Empirical Research Methods in Computer Science Fall 2005
Department of Computer Science, Johns Hopkins University (600.408)
Designer and Primary Instructor (with Noah Smith)
One-credit course for advanced undergraduates and graduate students on computer-intensive
statistics and experi-
mental design; enrollment: 18
AnOverviewofStatisticalMachineTranslation August 2006
Conference of the Association for Machine Translation in the Americas, Cambridge, MA
Designer and Primary Instructor (with Charles Schafer)
Tutorial on data, models, and algorithms in statistical MT for broad audience;
enrollment: 12
Invitedcourselectures:
Tufts University (CS 0150-TC, Classics 0191-TC), Information retrieval in digital
libraries, February 2002
GrantsandContracts
DARPA Machine Reading:A Universal Machine Reading System(co-PI, $2.5M) 2009 2014
NSF Data-Intensive Computing: Mining a Million Books: Linguistic and Structure Analysis,
Fast 2009 2013
Expanded Search, and Improved OCR(co-PI, $2.3M)
Army/MURI:SUBTLE: Situation Understanding Bot through Language and Environment(co-PI,
2007 2012
$634k)
NIH Clinical and Translational Science program (CIIR membership subcontract, $40k)
2010
2015
NSF CluE:Learning Word Relationships Using TupleFlow(senior personnel, $450k) 2009 2011
Yahoo!, Inc.: Data-Intensive Processing for Better Search, Analysis, and OCR(PI, in-kind
access 2009 2011
to Yahoo! s Hadoop cluster)
NEH Start-up Grants:OCRonym: Entity Extraction and Retrieval for Scanned Books(co-PI,
$50k) 2009 2010
Dissertation
[1] David A. Smith. Efficient Inference for Trees and Alignments: Modeling Monolingual
and Bilingual Syntax with
Hard and Soft Constraints and Latent Variables. PhD thesis, Johns Hopkins University,
2010.
RefereedConferenceProceedings
[
Refereed Journal Articles
[32] Gregory R. Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A.
Rydberg-
Invited Presentations
Princeton University, Computer Science Department, February 2011
Carnegie Mellon University, Language Technologies Institute, February 2011
MIT, Computer Science and Artificial Intelligence Laboratory, January 2011
Humboldt University, Berlin, Institut fur deutsche Sprache und Linguistik, January 2011"
UCLA, Institute for Pure and Applied Mathematics, August 2010
University of Edinburgh, School of Informatics, March 2008
University of Pittsburgh, Computer Science Department, February 2008
University of Maryland, Computer Science Department, February 2008
Tufts University, Computer Science Department, December 2007
Advising
Doctoral Committees
Kedar Bellare. Advisor, Andrew McCallum. 2009 (proposal)
Gregory Druck. Advisor, Andrew McCallum. 2009 (proposal)
David Mimno. Advisor, Andrew McCallum. 2009 (proposal)
Lisa Friedland. Advisor, David Jensen. 2010 (proposal)
Jangwon Seo. Advisor, Bruce Croft. 2010 (proposal)
Xiaobing Xue. Advisor, Bruce Croft. 2010 (proposal)
Michael Bendersky. Advisor, Bruce Croft. 2010 (proposal)
Current Advisees
Jason Naradowsky. UMass Ph.D. student; co-advisor, Andrew McCallum. 2008
Kriste Krstovski. UMass Ph.D. student. 2009
Xiaoye Wu. UMass Ph.D. student. 2009
Other Research Supervised
Elif Aktolga (UMass Ph.D. student). Qualifying synthesis project, with James Allan. 2009.
Jinyoung Kim (UMass Ph.D. student). Qualifying synthesis project, with Bruce Croft. 2009.
Andrew Kae (UMass Ph.D. student). Qualifying synthesis project, with Erik Learned-Miller.
2009 10.
Jacqueline Feild (UMass Ph.D. student). Qualifying synthesis project, with Erik Learned-
Miller. 2009 10.
David Goff (Cornell undergraduate). Summer REU Site advisee. 2010.
Jeff Dalton (UMass Ph.D. student). Qualifying synthesis project, with James Allan.
2010 11.
Service
Journal reviewing: Computational Linguistics,Computers and the Humanities,Literary and
Linguistic Comput-
ing,Proceedings of the National Academy of Sciences
Conference reviewing: ACH/ALLC,ACL,COLING(MachineLearningareachair),DH10,ICML,HLT-NAACL,
EACL, EMNLP, IJCNLP, SIGIR
David Arthur Smith 6
Departmentcommittees: graduate program committee (UMass, 2010 11); ad-hoc committee for
new institute
for computational and experimental linguistics (UMass, 2009 10); curriculum (UMass,
2008 10); graduate
student recruiting (JHU, 2003 7), system administration (JHU, 2003 8)
Software
Programmer for document management system for thePerseusDigitalLibrary
(http://www.perseus.tufts.edu)
1999 2002. One of the largest heterogeneous humanities digital libraries, Perseus
presents sources for lan-
guage, literature, art, and archaeology for several periods from the ancient
Mediterranean through 19th century
North America. Users viewing documents receive automatically generated information on
morphology, lexicon,
translations, technical terms, and named entities, as well as temporal and spatial
visualizations. As of fall 2005,
traffic on this site had reached 15,000,000 page views to 500,000 users a month.
Programmer forPerseus:SourcesandStudiesonAncientGreece, 2.0 (Yale U. P., 1996), 3.0 (Yale
U. P., 2000).
PersonalDetails
Date of Birth: 27 October 1972
Citizenship: USA
Languages: English (native); ancient Greek, Latin, French, German (reading); Arabic
(basic)