Data Management

Location:

Kitchener, ON, Canada

Posted:

January 20, 2015

Contact this candidate

Resume:

Curriculum Vitae

Anup Kumar Chalamalla ( Email: **********@*****.*** )

PhD Candidate, David R. Cheriton School of Computer Science University of Waterloo, Canada

http://www.cs.uwaterloo.ca/~akchalam http://twitter.com/anupkumarc +1-226-***-****

Education

PhD Candidate in Computer Science, University of Waterloo Advisor: Prof. Ihab F. Ilyas

Thesis: Modeling Lineage of Anomalies in Data (GPA : 88.13/100) Sep 2009 - Present

!Bachelor of Technology in Computer Science and Engineering

Indian Institute of Technology Bombay July 2003 - May 2007

Work Experience

• Research Assistant, University of Waterloo, Sep. 2009 - Present

• Research Intern, Data Analytics Group, Qatar Computing Research Institute, Sep ’11 - Mar ’12

• Software Engineer, Information Management Group, IBM Research, May 2007 - May 2009

Interests

Data Analytics, Economics, Quantitative Finance, Algorithms and Software Development

(I am generally interested in applying data analytics solutions to problems in economics and

quantitative finance; specifically, social media analysis to measure economic activity/behavior, anomaly

detection in big finance data, statistical analysis of big data, and algorithm design in these topics. I have

published in top data analysis conferences, patented ideas and did relevant university finance courses.)

Patents

1. A SYSTEM AND METHOD FOR CHECKING DATA FOR ERRORS. Ouzzani M., Papotti P.,

Ilyas I., Chalamalla A. PCT App. no. PCT/GB2014/051***-****.

2. A METHOD OF PROCESSING A RATINGS DATASET. Ilyas I., Amer-Yahia S., Chalamalla A.

G06Q30/02, Pub No. WO2014177181 (A1) - 2014-11-06.

Summary of Major Research Projects

• Anomaly Detection in Streaming Data through Predictive Analytics:

• [Problem/Motivation] Data arrives continuously from a large number of sources in many applications

including those is finance institutions. My system deals with automatically identifying and correcting

anomalies in streamed data. It is imperative to keep every snapshot of a continuously updated database

of records free from errors for use in other applications.

• [Challenges] When there are errors in streamed data, sufficient evidence to identify them precisely is

not available as only partial data is known at a given time, and more data is expected at a later time. By

making judgement about anomalies pre-maturely (and greedily), there is a possibility that decisions

based on anomalies may need to be reversed at a later stage causing an increase in maintenance cost.

• [Contributions] We designed a principled approach which models the behavior of patterns in the data

using techniques such as regression, mixed models and ARIMA to predict future behavior of patterns.

The forecasts are then used by an algorithm designed based on statistical analysis of data to precisely

identify anomalies in the data.

Portfolio Management based on Graph Analytics:

•

• [Problem/Motivation] Typically, in trading systems consumers interact with instruments/stocks to buy

or review. Discovering prominent interaction patterns between communities of consumers and stocks

is important. For e.g. tech professionals between age 25-30 *bought* S&P 500 internet companies’

stocks. A graph analytics driven approach can be used to discover such patterns and use them to

recommend to specific consumers, customized portfolios having common attributes/themes.

• [Contributions] Our system aims to discover interaction patterns between communities of different

entity-types in social interaction systems. Patterns are ranked based on a scoring function or satisfying

a diversity constraint on the universe of entities. See under patents (2). This work was reviewed at a

top data management journal PVLDB with two favoring reviews; to be revised and resubmitted.

• Modeling Lineage of Anomalies in Data:

• [Problem/Motivation] Real world databases including those in financial institutions have anomalies

due to integration from multiple sources. Traditionally, the database is made to comply with a set of

business rules, records violating these rules are identified and updated to remove errors.

• [Contributions] My PhD Thesis goes a step further to address the lineage of anomalies/errors by

tracking the origins of data records to their sources (a large number of them) in a data integration

pipeline. Working at a more fundamental level of a database system, we enable storing and querying

the lineage efficiently, and through algorithms relying on statistical analysis, prune lineage to precisely

identify the sources of errors and prevent them. (See publications 2)

• Anomaly Detection in Text Data:

• [Problem/Motivation] Texts generated by social media and speech-to-text conversion of telephonic

conversations of financial transactions are noisy but rich with business intelligence for enterprises.

• [Contributions] I worked on a system (while at IBM Research) which mines a text database to

discover outlier conversations which reflect reasons for failure of a transaction. I used machine

learning tools such as clustering to detect anomalies with a high accuracy. (See publications 3-4)

Publications

1. A. Chalamalla, Ihab F. Ilyas, M. Ouzzani, P. Papotti: Descriptive and Prescriptive Data Cleaning,

SIGMOD 2014.

2. S. Negi, S. Joshi, A. Chalamalla, L V Subramaniam: Automatically extracting dialog models from

conversation transcripts. ICDM 2009

3. A. Chalamalla, S. Negi, L V Subramaniam, G. Ramakrishnan. Identification of Class Specific

Discourse Patterns, CIKM 2008

4. T. Faruique, S. Negi, A. Chalamalla, L V Subramaniam. Exploiting Context to Detect Sensitive

Information in Conversations. CIKM 2008 [poster]

5. S. Khaitan, G. Ramakrishnan, S. Joshi, A. Chalamalla. RAD: A Scalable framework for Annotator

Development, ICDE 2008 [Demo]

6. Identifying and Correcting Anomalies in Streaming Data (In progress)

7. Mining Community Relationship Patterns in Social Data (Technical Report Available)

Awards and Honors

PhD Entrance Merit Scholarship, University of Waterloo, 2009 - 11

•

UW Graduate Merit Scholarship, University of Waterloo, 2010 - 14

•

Ranked 59th all-India in the IIT Joint Entrance Examination 2003 (among 200,000 candidates)

•

Outstanding Achievement Award, by IBM Research for developing data mining solutions for

•

Corporate Brand Analytics (COBRA) project, 2008

Relevant Coursework

• Theory and Math: Probability and Stochastic Theory, Numerical Modeling for Finance, Statistical

Inference, Bayesian Learning, Linear Optimization, Differential Equations, Design and Analysis of

Algorithms, Advanced Logic, Theory of Computation

• Data Science and Systems: Statistical Foundations of Clustering, Advanced Database Systems,

Cloud Data Management

• Others: Topics in Software Engineering, Artificial Intelligence, BioInformatics

Programming Languages and Tools

Python, C++, Java, Matlab, R, Hadoop/MapReduce, Hive, SQL, PL/SQL, ETL, C Language

Professional and Extra-Curricular Activities

Peer Reviewer for Conferences SIAM Data Mining, SIGMOD, ICDE, KDD, PVLDB

•

Student Member of ACM

•

Anup Chalamalla, January 2015

Contact this candidate