Curriculum Vitae
!
Anup Kumar Chalamalla ( Email: abjvwt@r.postjobfree.com )
PhD Candidate, David R. Cheriton School of Computer Science University of Waterloo, Canada
http://www.cs.uwaterloo.ca/~akchalam http://twitter.com/anupkumarc +1-226-***-****
!
Education
PhD Candidate in Computer Science, University of Waterloo Advisor: Prof. Ihab F. Ilyas
Thesis: Modeling Lineage of Anomalies in Data (GPA : 88.13/100) Sep 2009 - Present
!Bachelor of Technology in Computer Science and Engineering
Indian Institute of Technology Bombay July 2003 - May 2007
Work Experience
• Research Assistant, University of Waterloo, Sep. 2009 - Present
• Research Intern, Data Analytics Group, Qatar Computing Research Institute, Sep ’11 - Mar ’12
• Software Engineer, Information Management Group, IBM Research, May 2007 - May 2009
Interests
Data Analytics, Economics, Quantitative Finance, Algorithms and Software Development
(I am generally interested in applying data analytics solutions to problems in economics and
quantitative finance; specifically, social media analysis to measure economic activity/behavior, anomaly
detection in big finance data, statistical analysis of big data, and algorithm design in these topics. I have
published in top data analysis conferences, patented ideas and did relevant university finance courses.)
Patents
1. A SYSTEM AND METHOD FOR CHECKING DATA FOR ERRORS. Ouzzani M., Papotti P.,
Ilyas I., Chalamalla A. PCT App. no. PCT/GB2014/051***-****.
2. A METHOD OF PROCESSING A RATINGS DATASET. Ilyas I., Amer-Yahia S., Chalamalla A.
G06Q30/02, Pub No. WO2014177181 (A1) - 2014-11-06.
Summary of Major Research Projects
• Anomaly Detection in Streaming Data through Predictive Analytics:
• [Problem/Motivation] Data arrives continuously from a large number of sources in many applications
including those is finance institutions. My system deals with automatically identifying and correcting
anomalies in streamed data. It is imperative to keep every snapshot of a continuously updated database
of records free from errors for use in other applications.
• [Challenges] When there are errors in streamed data, sufficient evidence to identify them precisely is
not available as only partial data is known at a given time, and more data is expected at a later time. By
making judgement about anomalies pre-maturely (and greedily), there is a possibility that decisions
based on anomalies may need to be reversed at a later stage causing an increase in maintenance cost.
• [Contributions] We designed a principled approach which models the behavior of patterns in the data
using techniques such as regression, mixed models and ARIMA to predict future behavior of patterns.
The forecasts are then used by an algorithm designed based on statistical analysis of data to precisely
identify anomalies in the data.
Portfolio Management based on Graph Analytics:
•
• [Problem/Motivation] Typically, in trading systems consumers interact with instruments/stocks to buy
or review. Discovering prominent interaction patterns between communities of consumers and stocks
is important. For e.g. tech professionals between age 25-30 *bought* S&P 500 internet companies’
stocks. A graph analytics driven approach can be used to discover such patterns and use them to
recommend to specific consumers, customized portfolios having common attributes/themes.
• [Contributions] Our system aims to discover interaction patterns between communities of different
entity-types in social interaction systems. Patterns are ranked based on a scoring function or satisfying
a diversity constraint on the universe of entities. See under patents (2). This work was reviewed at a
top data management journal PVLDB with two favoring reviews; to be revised and resubmitted.
• Modeling Lineage of Anomalies in Data:
• [Problem/Motivation] Real world databases including those in financial institutions have anomalies
due to integration from multiple sources. Traditionally, the database is made to comply with a set of
business rules, records violating these rules are identified and updated to remove errors.
• [Contributions] My PhD Thesis goes a step further to address the lineage of anomalies/errors by
tracking the origins of data records to their sources (a large number of them) in a data integration
pipeline. Working at a more fundamental level of a database system, we enable storing and querying
the lineage efficiently, and through algorithms relying on statistical analysis, prune lineage to precisely
identify the sources of errors and prevent them. (See publications 2)
• Anomaly Detection in Text Data:
• [Problem/Motivation] Texts generated by social media and speech-to-text conversion of telephonic
conversations of financial transactions are noisy but rich with business intelligence for enterprises.
• [Contributions] I worked on a system (while at IBM Research) which mines a text database to
discover outlier conversations which reflect reasons for failure of a transaction. I used machine
learning tools such as clustering to detect anomalies with a high accuracy. (See publications 3-4)
Publications
1. A. Chalamalla, Ihab F. Ilyas, M. Ouzzani, P. Papotti: Descriptive and Prescriptive Data Cleaning,
SIGMOD 2014.
2. S. Negi, S. Joshi, A. Chalamalla, L V Subramaniam: Automatically extracting dialog models from
conversation transcripts. ICDM 2009
3. A. Chalamalla, S. Negi, L V Subramaniam, G. Ramakrishnan. Identification of Class Specific
Discourse Patterns, CIKM 2008
4. T. Faruique, S. Negi, A. Chalamalla, L V Subramaniam. Exploiting Context to Detect Sensitive
Information in Conversations. CIKM 2008 [poster]
5. S. Khaitan, G. Ramakrishnan, S. Joshi, A. Chalamalla. RAD: A Scalable framework for Annotator
Development, ICDE 2008 [Demo]
6. Identifying and Correcting Anomalies in Streaming Data (In progress)
7. Mining Community Relationship Patterns in Social Data (Technical Report Available)
Awards and Honors
PhD Entrance Merit Scholarship, University of Waterloo, 2009 - 11
•
UW Graduate Merit Scholarship, University of Waterloo, 2010 - 14
•
Ranked 59th all-India in the IIT Joint Entrance Examination 2003 (among 200,000 candidates)
•
Outstanding Achievement Award, by IBM Research for developing data mining solutions for
•
Corporate Brand Analytics (COBRA) project, 2008
Relevant Coursework
• Theory and Math: Probability and Stochastic Theory, Numerical Modeling for Finance, Statistical
Inference, Bayesian Learning, Linear Optimization, Differential Equations, Design and Analysis of
Algorithms, Advanced Logic, Theory of Computation
• Data Science and Systems: Statistical Foundations of Clustering, Advanced Database Systems,
Cloud Data Management
• Others: Topics in Software Engineering, Artificial Intelligence, BioInformatics
Programming Languages and Tools
Python, C++, Java, Matlab, R, Hadoop/MapReduce, Hive, SQL, PL/SQL, ETL, C Language
!
Professional and Extra-Curricular Activities
Peer Reviewer for Conferences SIAM Data Mining, SIGMOD, ICDE, KDD, PVLDB
•
Student Member of ACM
•
!
Anup Chalamalla, January 2015