Eric Glover Ph.D.
E-mail: *********@**********.***,
Web:
http://www.ericglover.com/
Overview:
A strong academic background and a proven track record of delivering on
effective (user-facing and back-end) large-scale commercial systems in
the areas of (MLR) Machine Learned Relevance, Web-Scale categorization and system architecture and
algorithm development.
About two years of entrepreneurial experience as CEO of an angel funded startup.
More than a dozen years of commercial web search experience as summarized
below in the highlights.
Highlights:
Current Job: Fellow at Quixey.com
Quixey:
Advisor: Oct 2009 - August 2011
Fellow: August 2011 - Present
Starting as an early adviser, joined full time in August 2011 as a Fellow -
responsible for many things including driving the large-scale machine learning effort.
Currently responsible for search relvance, testing infrastructure, and assisting with the architecture.
My team designs, develops, evaluates and deploys new relevance features and functions, as well as implementing
various algorithms related to other search features including the autosuggest, query log mining, and others.
General - Entrepreneurial:
CEO and Co-founder of a small media search company, recieved angel funding
- Responsible for defining, designing and leading the implementation of
the entire back-end system, as well as managing the operations of the company.
CTO and Co-founder of Airtime High Tech Stuff, LLC., a commercial website development company
specializing in high-performance integrated website and system architectures.
Searchme:
Web-scale categorization system (CHOCO): Technical lead/primary
coder - large-scale categorization system (more than 1000 categories) -
run on Billions of documents in days. System included full web-based UI
for training, active learning, evaluation, ontology management and
automated error checking to aid Search Analysis/contractors.Vertical suggest/query intention mining: Leveraging full document
to category matrix, produced a ranked list of "vertical suggestions" in
real-time per search.
Multimedia blending: Technical lead - project ranked and index
multimedia (YouTube, Hulu, Imeem, Flickr and others) content. Solved
complex AI problems related to different features from text-web pages.
Implemented custom automated feed processing and defined techniques for
efficiently discovering appropriate pages to index.
Competitive analysis/judgment collection system (TORGO): Tehnical
lead - Internal web-based system for collect ingjudgments for
competitive analysis and MLR training. System included scrapers to pull
data from our competitors and caching system, easy to use UI. Flexible
design of code and DB schema to rapidly change user-judgment options.
MLR feature design and development: Various roles - coding,
system design, and formally defining features. Key accomplishments
include defining and design specification of XPATH/Perl pased system
for rapidly (hours) adding totally new features including use of
structured data.Near-real time feed processing system ("El Rapido"): Tecnical
lead/primary coder - near-real-time data inclusion from RSS feeds.
System included full process from fetching and parsing feeds to
indexing and generation of MLR features. Search Quality team managed
feeds and options (via Excel). Significant perceived relevance boost -
time from initial propsal to live under 3 weeks.
Lead projects on automatic page quality, spam, and many relevance
(DCG) improvementsPresented in board meetings, as well as involved in other
business meetings.
Ask.com:
Invited presenter at the NATO MMDSS conference in Gazzada, Italy,
September 2007At Ask.com, defined and created multiple core search,
entity-extraction and
classification-related technologies used by millions of users every day
Lead a team of engineers to develop and implement internal
infrastructure products for improved management of structured data,
classification, and data mining
NEC Labs:
Managed a team of several programmers and students to develop a
modular enterprise search technology architecture and demonstration
system capable of learning new categories in minutes, and performing
category-specific search over a variety of (unstructured) data sources.
Developed a prototype enterprise search system (Inquirus). This
system incorporated many new technologies including: rapid category
learning (active learning based), advanced feature selection,
SVM-based classification, automated query
modifications, intelligent resource routing, multiple-source
capabilities, automated query expansion, and search strategies
General - academic:
PhD in CSE/AI, focusing on preference-based web metasearch
Published several highly cited conference and journal
papers, as well as filed more than ten patent applications related to
data
mining/topic extraction, search engine architectures, and search
technologies.
Broad knowledge of computer science/AI, including algorithms,
software, and hardware (a master's degree in VLSI)Education
4/1994 BSE Electrical Engineering, Magna Cum Laude from University of
Michigan, Ann Arbor, MI
5/1997 MSE Electrical Engineering, University of Michigan, Ann
Arbor,
MI
8/2001 Ph.D. Computer Science Engineering, University of Michigan,
Ann Arbor, MI Dissertation:
Eric J. Glover, Using Extra-Topical User Preferences to Improve
Web-Based Metasearch, Ph.D. Dissertation, University of Michigan,
2001.
Employment
8/2011 - Present Fellow at Quixey.com
is a new startup created to enable people to discover the applications
they need. Quixey is all about the next generation of search called Functional Search - instead of searching for apps by name or category,
search by what you want to do
1/2011 - 8/2011 CTO and co-founder of Airtime High Tech Stuff, LLC
9/2009 - 8/2011 CEO and co-founder of Intelligent Search Solutions Inc.
9/2009 - 8/2011 Adviser to Quixey.com
8/2009 - 10/2009 - Consultant for Lighthouse Capital Partners
Hired as a consultant to aid in the sale
of Searchme Intellectual Property (acquired by Lighthouse after
Searchme shutdown). Responsible for attending business meetings,
answering technical questions, and leading the effort to produce a
functional demonstration system.
3/2007 - 7/2009 - Principal Scientist/Classification Architect/Sr.
Staff Scientist at SearchMe.com
Initially hired to design and develop
the core categorization infrastructure. Work required design and
presentation in board meetings, as well as substantial coding. System
(see highlights above) was used to enable categorization of Billions of
web pages (in days) assigning more than 1000 possible labels. The full
document-category matrix was used at run-time for Vertical Suggestions
and relevance ranking, a key Searchme differentiating feature. System
required both offline and online components. Offline required ease of
use for Search Analysts to rapidly train (hours), as well as
active-learning, feature selection, and evaluation components. Online
system required new algorithms to maximize performance - for over 1000
categories with accuracy of non-linear classifiers ran with performance
approaching linear classifiers (one midrange server could do 10-20
pages per second (all categories). System also included integrated
ontology and deployment management - the Ontologist could manage which
categories should be user-facing, how categories relate, as well as
specific classifier options (external UI was text-lists like 'higher
recall' or 'higher precision', etc). Training system included a
distrubtued task manager which ran on six dedicated servers (with other
servers temporarily added as needed when there were too many jobs) and
required minimal maintenance, and virtually no configuration for
clients.
After completing most of the CHOCO system, I designed and lead a small
team to build and implement the TORGO user-judgment system. This system
was designed for contractors and Search Analysts to very rapidly make
judgments about query/url pairs. Unlike systems in use at other search
engines, this system collected significantly more data - users could
provide query or url specific judgments, could see cached pages, manage
the scraping. Scrapers could be easily added - with the final system
enabling non-engineers to add custom scrapers to controll production
system options. The initial system design assumed that the specific
menus, scrapers and judgments would be determined after project was
completed and launched, and had to be extremely flexible and reliable.
New types of judgments could be added in minutes - with no code changes.
Near real-time inclusion system (El Rapido) - Designed and coded a
system (managed by Search Quality) that would take an input RSS list
and for each fetch, process, and insert into internal document storage
system for rapid indexing. System was designed to ensure fresh and
fully-processed (classification, other metadata and page contents)
results were available for ranking in minutes. Work included managing
MLR features to ensure reasonable ranking - total project time under 3
weeks from initial suggestion of idea until live. Provided a
significant perceived relevance boost.
Multimedia blending - Tech-lead on key differentiating project that
would mix-in (organic ordering) multimedia content with regular web
(and news - see above). Project required defining features and XML feed
specifications (to both UI and Indexing engineers), writing custom feed
processing, and developing new MLR approaches. Resultant system was
able to include new sources, with features of different meanings, and
using very few (low thousands) judgments develop a distributionally
consistent MLR function (consistent with the separately trained regular
web-MLR). Key probelms solved: 1: Which features, 2: How to train
given less data and different feature 'meaning' (i.e. Page Views on
YouTube is different form Page Views on Imeem, and Hulu doesn't have
Page Views), 3: How to spider/locate content with resource constraints
(can't have all of YouTube, need to have videos for relevant queries),
4: Manage editorial preferences and content expiration (Hulu videos are
higher quality than un-official YouTube videos), 5: System must be
generic - to enable rapidly adding of future sources
Many other large-scale projects - please ask, highlights above
summarize several others.
2/2007 - 3/2007 - Manager (3) at Ask.com
Managed four engineers, playing an
active role
in design and development of new technologies in the areas of
classification, entity extraction, relevance, and structured data
management.
5/2004 - 2/2007 - Research Engineer (Software Engineer 5) at Ask.com
(IAC Search and Media formerly AskJeeves)
Recent work included leading a small
team of engineers to develop multiple internal products in the areas of
extraction, machine learning, and structured data management. Focus
included large-scale processing and analysis, highly-accurate
classification, and efficient algorithms. Previous work within ask
included developing highly visible, very high impact core technologies
- currently on the live site. Core technologies are in the areas of
entity extraction/classification, information extraction from
semi-structured data (including the Wikipedia), relevance and
disambiguation.
11/2002 - 4/2004 - Research Staff Member at NEC Laboratories America -
Project leader of the Inquirus project
I managed a team of three
full-time programmers/developers and several students, and participated
in several outside collaborations. I was responsible for creating new
research ideas and
communicating it to the development team for incorporation into our
Inquirus search system - implemented primarily in C++. The (new)
Inquirus search system is a modular architecture (several patents filed
on various aspects of the system) that included dynamic routing of
search resources (query processing, result processing and data
resources). Several demonstration systems were built, including a
MEDLINE based demo system demonstrating high-precision and high recall
for test medical queries. A second
demonstration system included built-in active learning for very rapid
category generation. An outside user (using the entirely web-based
interface) could train a custom search category (such as "Movie
Reviews", "Computer Science Papers", "Clinical Trials", "Executive
Bios", and others) in minutes. At project termination, the search
architecture included the ability to search in Japanese (including
proper word splitting), and process inbound data sources in multiple
encodings.
Research
and
technology highlights of the Inquirus project: System
utilized a new technology we invented called search strategies. Very
fast active learning for improved category creation. Efficient SVM
based classification. Real-time feature ranking and feature selection.
System
included modules for various data interfaces (including Web, Oracle,
MySQL, Z39.50). Multi-language/character encoding technology (Japanese
term extraction using Chasen).
Relevant
research: Automated methods for local hierarchy generation from
small document clusters. New methods for predicting
the generality or specificity of a document (improves relevance).
Technology for automatic discovery of
related medical concepts. Use of web structure to
improve classification accuracy and concept naming. Demonstrated
effective use of uncertainty sampling
with SVMs and use of
web structure (extended anchortext/anchortext windows) for extremely
accurate Yahoo document classification.
New method for web-graph modeling, incorporating local web communities.
New technology for improved
phrasal/concept extraction and concept grouping.
7/2001 - 10/2002 : Scientist at the NEC Laboratories America (formerly
named
NEC Research Institute), Princeton, NJ
Worked with Steve Lawrence, Gary Flake*
and C. Lee Giles* on improving
metasearch, and data mining. Continued dissertation work and developed
new methods for feature extraction/selection, and improved document
classification. Continued work on the Inquirus 2 prototype, and
participated in various research activities related to data mining.
*Gary and Lee left the laboratory prior to October 2002.
1/1999 - 6/2001: Intern at NEC Research Institute, Princeton, NJ
Collaborating with C. Lee Giles, Gary
Flake and Steve Lawrence on
improving and modeling
web metasearch. Involved in implementing a content-based metasearch
engine
that
considered more than just keywords . For more detailed information
please
refer to the publications below.
9/1998 - 12/1998: CAD GSI, University of Michigan, Ann Arbor, MI
Duties: Responsible for assisting
students with CAD related
questions or
problems. Supported: Mentor Graphics suite (Design Architect, Quicksim,
Accusim,
IC Station, Design Veiwpoint Editor), EPOCH, Synopsys, Verilog XL,
SignalScan.
Significant accomplishments include re-writing of the digital
transistor
models for the VLSI class. Helped to debug and prevent software
problems.
1/1995 - 8/1998: Graduate Student Research Assistant for the
University
of
Michigan Digital Library (UMDL) project, University of Michigan, Ann
Arbor,
MI
Designed and prototyped multiple software
agents including the
Remora, WebAgent, and the Preference Agent. UMDL agents were written
primarily in
C++. Agents were developed in the CORBA framework under SOLARIS, and
required
extensive use of the Web. Wrote numerous CGI scripts in PERL, as well
as
other tools including web robots which automatically downloaded and
analyzed
web pages.
UMDL research focused on a distributed AI
(agent) architecture as a
basis
for a multi-purpose digital library. Library functions included
searching (both across and inside of) collections, document retrieval,
electronic commerce
and pricing, user interface and preferences. The UMDL project was used
to
provide content to local middle school and high school children as part
of
their science curriculum.
9/1994 - 12/1994: CAD Graduate Student Instructor (GSI), University
of
Michigan, Ann Arbor, MI
Duties: Responsible for assisting
students from many Electrical
Engineering classes in using the Mentor Graphics tool set. Aided
students in using Design
Architect, IC Station, Accusim, Quicksim and HSPICE. Responsibilities
included
problem solving and basic circuit debugging.
Publications:
Eric Glover, The
"Real World" Web Search Problem, MMDSS NATO Conference, Gazzada,
Italy, September 2007. .
Please
e-mail for an electronic copy of the actual paper.
Eric J. Glover, David M. Pennock, Steve Lawrence, and Robert
Krovetz. Inferring
hierarchical descriptions, Proceedings of the Eleventh
International Conference
on Information and Knowledge Management (CIKM'02), November 2002.
David M. Pennock, Sandip Debnath, Eric J. Glover, and C. Lee Giles. Modeling
information
incorporation in markets with application to detecting and
explaining
events, Proceedings of the 18th Conference on Uncertainty in
Artificial
Intelligence (UAI-2002), pp. 405-413, August 2002.
Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M.
Pennock, and Gary W. Flake. Using web structure for classifying and
describing web
pages, Proceedings of the Eleventh International World Wide Web
Conference,
pp. 562-569, May 2002.PS PDF
Gary Flake, Eric Glover, Steve Lawrence, C. Lee Giles Extracting
Query
Modifications from Nonlinear SVMs, Proceedings of the Eleventh
International
World Wide Web Conference, May 2002.
©
David M.
Pennock, Gary
W. Flake, Steve
Lawrence,,
and . Winners
don't take
all: Characterizing the competition for links on the web, Proceedings
of
the National Academy of Sciences,
Volume
99, Issue 8, pp. 5207-5211, April 2002.,,,,,, Finn Årup
Nielsen,
Andries Kruger, and .
Persistence of web references in scientific research., 34(2): 26-31,
2001
Steve Lawrence, Frans Coetzee, Eric Glover, David Pennock, Gary Flake,
Finn
Nielsen, Robert Krovetz, Andries Kruger, and C. Lee Giles.
Persistence of Web References in Scientific Research, IEEE Computer,
vol 34, no 2, pp
26--31, 2001
Eric J. Glover, Gary W. Flake, Steve Lawrence, William P.
Birmingham, Andries
Kruger, C. Lee Giles, David M. Pennock. Improving
Category Specific Web
Search by Learning Query Modifications, Symposium on
Applications and the Internet, SAINT 2001, San Diego, California,
January 8--12, 2001.
Frans Coetzee, Eric Glover, Steve Lawrence, and C. Lee Giles. Feature
selection
in web applications using ROC inflections. In Symposium
on Applications
and the Internet, SAINT, San Diego, CA, January 8--12 2001.
Andries Kruger, C. Lee Giles, Frans Coetzee, Eric Glover, Gary
Flake, Steve
Lawrence, and Cristian Omlin. DEADLINER: Building a new niche
search engine.
In Ninth International Conference on Information and Knowledge
Management,
CIKM 2000, Washington, DC, November 6-- 11 2000.
Eric J. Glover, Steve Lawrence, Michael D. Gordon, William P.
Birmingham, C. Lee Giles, "Web
Search
-- Your Way," Accepted to Communications of the ACM
Eric J. Glover, Steve Lawrence, William P. Birmingham, C. Lee Giles,
"Architecture
of
a Metasearch Engine that Supports User Information Needs,"
Eighth International
Conference on Information and Knowledge Management (CIKM 99), Kansas
City,
MO, November, 1999
Eric J. Glover, Steve Lawrence, Michael D. Gordon, William P.
Birmingham, C. Lee Giles, "Recommending
Web
Documents Based on User Preferences," in ACM SIGIR 99
Workshop
on Recommender Systems, Berkeley, CA, August, 1999
E. J. Glover, S.R. Lawrence, K.D. Bollacker, C.L. Giles, W.P.
Birmingham, G.W. Flake, "A Metasearch Engine Architecture That Supports
Individual Information
Needs," NEC Research Institute Technical Report, TR# 99-063, May 13,
1999
E. J. Glover, W. P. Birmingham, and M. D. Gordon, "Improving Web
Search Using Utility Theory," in Proceedings of the First International
Workshop on Web Information and Data Management, WIDM 98. Bethesda,
Maryland, 1998
Eric J. Glover, Sunju Park, Anil Arora, Daniel Kiskis and Edmund
Durfee, "A case study on the evolution of software tools selection and
development in a large-scale multi-agent system," in Workshop on
Software Tools for Developing
Agents, AAAI 1998. Madison, WI: AAAI
E. J. Glover and W. P. Birmingham, "Using Decision Theory To Order
Documents," in Digital Libraries 98, Pittsburgh, PA, 1998: ACM
D. E. Atkins, W. P. Birmingham, E. H. Durfee, E. J. Glover, T.
Mullen, E. A. Rundensteiner, E. Soloway, J. M. Vidal, R. Wallace, and
M. P. Wellman, "Toward Inquiry-Based Education Through Interacting
Software Agents," IEEE Computer, vol. 29, pp. 69-76, 1996
Patents:
Filed more than ten patents including those related to entity
detection/extraction, search architectures,
efficient data mining, medical concept extraction/relationship
discovery, improved metasearch performance, automatic hierarchy
generation and document cluster naming, improved document
classification techniques using web structure.
Hobbies
Digital photography, traveling, cooking, hacking (the good kind)
computer
security, and online gaming.
last updated: April 18, 2012
BibTeX
©