Computer Science Assistant

Location:

Chicago, IL

Posted:

February 16, 2013

Contact this candidate

Resume:

Saket S.R. Mengle

** * **** ******, *****

Chicago, IL-60616

Phone no: 312-***-****

Email: *****@**.***.***

Web Page: http://omega.cs.iit.edu/~smengle

Areas of Interest

Text Mining, Information Retrieval, Natural Language Processing, Machine Learning

Education

Illinois Institute of Technology Exp: Aug. 2009

Ph. D Candidate in Computer Science

(Post Proposal-Nov, 2008)

Illinois Institute of Technology May, 2007

M.S in Computer Science (G.P.A: 4.0)

K.J. Somaiya College of Engineering, June, 2005

B.E. in Computer Engineering (Rank: 2/120)

Mumbai

Vivekanand Polytechnic, Mumbai June, 2002

Diploma in Computer Technology (Rank: 5/70)

Awards and Achievements

- First place showing in the IIT Research Day 2007 - Graduate Division competition

- Candidate for Graduate Outstanding Academic Achievement Award at IIT, Chicago, 2007

Publications

Journals

- S. Mengle, N. Goharian, Ambiguity Measure Feature Selection Algorithm. Journal of

American Society for Information Science and Technology (to appear)

- S. Mengle, N. Goharian, Passage Detection using Text Classification. Journal of American

Society for Information Science and Technology (to appear)

Conferences

- A. Platt, N. Goharian, S. Mengle, Improving Off-topic Search Detection using Time

Series Information, In proceedings of ACM 22rd Symposium on Applied Computing (ACM

SAC), March 2007.

- S. Mengle, N. Goharian, A. Platt, FACT: Fast Algorithm for Categorizing Text, In

proceedings of IEEE 5th International Conference on Intelligence and Security Informatics

(IEEE ISI), May 2007.

- S. Mengle, N. Goharian, Using Ambiguity Measure Feature Selection Algorithm for

Support Vector Machine Classifier, In proceedings of ACM 23rd Symposium on Applied

Computing (ACM SAC), March 2008.

- S. Mengle, N. Goharian, A. Platt, Discovering Relationships among Categories using

Misclassification Information, In proceedings of ACM 23rd Symposium on Applied

Computing (ACM SAC), March 2008.

- S. Mengle, N. Goharian, Detecting Hidden Passages from Documents. In proceedings of

SIAM Conference on Data Mining (SIAM SDM) Workshop., April 2008

- N. Goharian, S. Mengle, On Document Splitting in Passage Detection. In proceedings of

31st Conference on Research and Development in Information Retrieval (ACM SIGIR), July

2008

- A. Platt, S. Mengle, N. Goharian, Improving Classification Based Off-topic Search

Detection via Category Relationships, In proceedings of ACM 24rd Symposium on Applied

Computing (ACM SAC), March 2009.

Experience

Teaching Assistant

Dept. of Computer Science in IIT, Chicago Jan 2006 - Present

CS429 - Information Retrieval

CS529 - Advanced Information Retrieval

CS422 - Data Mining

CS425 Databases

Computer Science Dept, Georgetown University, Washington DC Fall 2007

COSC-416 - Information Retrieval

Research Assistant Summer 2006, 2007, 2008

IIT Information Retrieval Laboratory (http://ir.iit.edu)

Aug 2005 Jan 2006

Internet Media Developer

IIT Online, Chicago

Technical Skills

- Programming Languages: C, C++, Java and Python

- Databases: Oracle, MS SQL Server and mySQL

- Web Application Development: PHP, ASP and Javascript

Professional Membership

- ACM student member

- IEEE student member

- SIAM student member

Research/Projects

- Feature Selection in Text Classification (on-going)

- Passage Detection Using Text Classification (on-going)

Discovering Relationships Among Categories (on-going)

Gnutella-style P2P file sharing system using Java(Spring, 2007)

Sports Website using SOAP technology (2004-2005)

Research Summary

My research focus is in the area of text classification and addresses three research problems, namely feature

selection, passage detection and category relationship detection. A brief summary of each of these is provided

below. I have co-authored two journal papers in Journal of American Society for Information Science and

Technology and seven conference papers in conferences such as ACM SIGIR, SIAM SDM, ACM SAC and IEEE ISI.

Feature Selection in Text Classification: Text classification algorithms are widely used to assign predefined

categories to documents. A key problem in text classification is feature space high dimensionality. Feature selection

methods reduce the size of the feature set to optimize the classification effectiveness and efficiency. We p roposed,

developed and evaluated Ambiguity Measure (AM) feature selection algorithm that selects the most unambiguous

features from the feature set. Unambiguous features are those features whose presence in a document indicates a

strong degree of confidence that a document belongs to only one specific category. Unambiguous features mislead a

single-labeled classifier and decrease its effectiveness on unbalanced datasets. We evaluated the AM feature

selection on both Na ve-Bayes and Support Vector Machine text classifiers and favorably showed the effectiveness

of our approach in outperforming eight existing feature selection methods using five benchmark datasets.

Passage Detection using Text Classification: One of the limitations of the text classifiers is that although text

classifiers work effectively to identify the category of a document in its entirety, they fail to identify the category of

hidden passages in documents. Passage detection has various applications. Passages may be hidden within a

document to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to

all corporate and governmental organizations that deal with sensitive information. Although passage retrieval is used

to identify passages that are related to user specified queries, no prior work is done to detect in passages that belong

to user specified categories. We proposed passage detection that detects the passages related to a given category

rather than related to queries. One of the issues in passage detection is the approach used for splitting documents

into passages. Earlier methods like discourse passage approach (splitting a documents based on paragraphs and

sentences) and windowing approach (splitting a documents into fixed word windows) d id not use the information

generated by the text classifiers to define passages. We introduced a method called keyword based dynamic passage

(KDP) approach that only defines passages around the terms with high term weights and favorably showed the

improvements in the effectiveness. Moreover, KDP does not depend on the window size parameter as existing

windowing approaches and also works when discourse information is not available.

Category Relationship Detection: Knowledge of relationships among categories can be utilized to create ontology

of categories. Manual efforts such as Yahoo Directories and Open Directory Project exist. However, manual

evaluation requires a large amount of resources and may be based on a limited knowledge. We introduce a

methodology to discover relationships among categories. Prior works used clustering to find relationships among

categories. We utilize the misclassification information generated by the text classifiers to discover relationships

among categories. Misclassifications are defined as the false positives and false negatives that are generated during

the process of classification. We observed that misclassifications mostly occur among categories that are closely

related to each other. Hence, a relationship is predicted be tween two categories when documents belonging to one

category are consistently misclassified as the other category. Our evaluation indicates that misclassification based

approaches outperform state of the art clustering based approach for the category rela tionship detection task.

Contact this candidate