Saket S.R. Mengle
Chicago, IL-60616
Phone no: 312-***-****
Email: *****@**.***.***
Web Page: http://omega.cs.iit.edu/~smengle
Areas of Interest
Text Mining, Information Retrieval, Natural Language Processing, Machine Learning
Education
Illinois Institute of Technology Exp: Aug. 2009
Ph. D Candidate in Computer Science
(Post Proposal-Nov, 2008)
Illinois Institute of Technology May, 2007
M.S in Computer Science (G.P.A: 4.0)
K.J. Somaiya College of Engineering, June, 2005
B.E. in Computer Engineering (Rank: 2/120)
Mumbai
Vivekanand Polytechnic, Mumbai June, 2002
Diploma in Computer Technology (Rank: 5/70)
Awards and Achievements
- First place showing in the IIT Research Day 2007 - Graduate Division competition
- Candidate for Graduate Outstanding Academic Achievement Award at IIT, Chicago, 2007
Publications
Journals
- S. Mengle, N. Goharian, Ambiguity Measure Feature Selection Algorithm. Journal of
American Society for Information Science and Technology (to appear)
- S. Mengle, N. Goharian, Passage Detection using Text Classification. Journal of American
Society for Information Science and Technology (to appear)
Conferences
- A. Platt, N. Goharian, S. Mengle, Improving Off-topic Search Detection using Time
Series Information, In proceedings of ACM 22rd Symposium on Applied Computing (ACM
SAC), March 2007.
- S. Mengle, N. Goharian, A. Platt, FACT: Fast Algorithm for Categorizing Text, In
proceedings of IEEE 5th International Conference on Intelligence and Security Informatics
(IEEE ISI), May 2007.
- S. Mengle, N. Goharian, Using Ambiguity Measure Feature Selection Algorithm for
Support Vector Machine Classifier, In proceedings of ACM 23rd Symposium on Applied
Computing (ACM SAC), March 2008.
- S. Mengle, N. Goharian, A. Platt, Discovering Relationships among Categories using
Misclassification Information, In proceedings of ACM 23rd Symposium on Applied
Computing (ACM SAC), March 2008.
- S. Mengle, N. Goharian, Detecting Hidden Passages from Documents. In proceedings of
SIAM Conference on Data Mining (SIAM SDM) Workshop., April 2008
- N. Goharian, S. Mengle, On Document Splitting in Passage Detection. In proceedings of
31st Conference on Research and Development in Information Retrieval (ACM SIGIR), July
2008
- A. Platt, S. Mengle, N. Goharian, Improving Classification Based Off-topic Search
Detection via Category Relationships, In proceedings of ACM 24rd Symposium on Applied
Computing (ACM SAC), March 2009.
Experience
Teaching Assistant
Dept. of Computer Science in IIT, Chicago Jan 2006 - Present
CS429 - Information Retrieval
CS529 - Advanced Information Retrieval
CS422 - Data Mining
CS425 Databases
Computer Science Dept, Georgetown University, Washington DC Fall 2007
COSC-416 - Information Retrieval
Research Assistant Summer 2006, 2007, 2008
IIT Information Retrieval Laboratory (http://ir.iit.edu)
Aug 2005 Jan 2006
Internet Media Developer
IIT Online, Chicago
Technical Skills
- Programming Languages: C, C++, Java and Python
- Databases: Oracle, MS SQL Server and mySQL
- Web Application Development: PHP, ASP and Javascript
Professional Membership
- ACM student member
- IEEE student member
- SIAM student member
Research/Projects
- Feature Selection in Text Classification (on-going)
- Passage Detection Using Text Classification (on-going)
Discovering Relationships Among Categories (on-going)
-
Gnutella-style P2P file sharing system using Java(Spring, 2007)
-
Sports Website using SOAP technology (2004-2005)
-
Research Summary
My research focus is in the area of text classification and addresses three research problems, namely feature
selection, passage detection and category relationship detection. A brief summary of each of these is provided
below. I have co-authored two journal papers in Journal of American Society for Information Science and
Technology and seven conference papers in conferences such as ACM SIGIR, SIAM SDM, ACM SAC and IEEE ISI.
Feature Selection in Text Classification: Text classification algorithms are widely used to assign predefined
categories to documents. A key problem in text classification is feature space high dimensionality. Feature selection
methods reduce the size of the feature set to optimize the classification effectiveness and efficiency. We p roposed,
developed and evaluated Ambiguity Measure (AM) feature selection algorithm that selects the most unambiguous
features from the feature set. Unambiguous features are those features whose presence in a document indicates a
strong degree of confidence that a document belongs to only one specific category. Unambiguous features mislead a
single-labeled classifier and decrease its effectiveness on unbalanced datasets. We evaluated the AM feature
selection on both Na ve-Bayes and Support Vector Machine text classifiers and favorably showed the effectiveness
of our approach in outperforming eight existing feature selection methods using five benchmark datasets.
Passage Detection using Text Classification: One of the limitations of the text classifiers is that although text
classifiers work effectively to identify the category of a document in its entirety, they fail to identify the category of
hidden passages in documents. Passage detection has various applications. Passages may be hidden within a
document to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to
all corporate and governmental organizations that deal with sensitive information. Although passage retrieval is used
to identify passages that are related to user specified queries, no prior work is done to detect in passages that belong
to user specified categories. We proposed passage detection that detects the passages related to a given category
rather than related to queries. One of the issues in passage detection is the approach used for splitting documents
into passages. Earlier methods like discourse passage approach (splitting a documents based on paragraphs and
sentences) and windowing approach (splitting a documents into fixed word windows) d id not use the information
generated by the text classifiers to define passages. We introduced a method called keyword based dynamic passage
(KDP) approach that only defines passages around the terms with high term weights and favorably showed the
improvements in the effectiveness. Moreover, KDP does not depend on the window size parameter as existing
windowing approaches and also works when discourse information is not available.
Category Relationship Detection: Knowledge of relationships among categories can be utilized to create ontology
of categories. Manual efforts such as Yahoo Directories and Open Directory Project exist. However, manual
evaluation requires a large amount of resources and may be based on a limited knowledge. We introduce a
methodology to discover relationships among categories. Prior works used clustering to find relationships among
categories. We utilize the misclassification information generated by the text classifiers to discover relationships
among categories. Misclassifications are defined as the false positives and false negatives that are generated during
the process of classification. We observed that misclassifications mostly occur among categories that are closely
related to each other. Hence, a relationship is predicted be tween two categories when documents belonging to one
category are consistently misclassified as the other category. Our evaluation indicates that misclassification based
approaches outperform state of the art clustering based approach for the category rela tionship detection task.