Niketan Pansare
Website: http://www.cs.rice.edu/ np6/
B ***@****.***
Blog: http://niketanblog.blogspot.com/
Objective
To obtain a research internship in the eld of approximate query processing and applied
machine learning (especially for large-scale systems) for Summer 2013.
Education
Ph.D. in Computer Science, Rice University, USA.
2009 Present
Master of Science in Computer Engineering, University of Florida, USA.
2007 2009
Bachelor of Engineering in Information Technology, VJTI, Mumbai.
2002 2006
Post Graduate Diploma in Embedded Systems, Electronics Corporation of India
2003 2004
Limited, Mumbai.
Experience
Research intern, IBM Research, India.
Summer 2011
- Developed a novel topic model for spoken language (STM) that explicitly takes into account
uncertainties arising in speech-to-text translation.
- Link: http://www.cs.rice.edu/ np6/Papers/SpokenTopicModel.pdf (ICDM 12 paper).
Software Development Engineer Intern, Microsoft, Seattle.
Summer 2008
- Developed Table Analysis Tool for Cloud, which is a set of canned data mining tasks for
non-expert users using Microsoft Excel as front-end and SQL Server in the cloud.
- Link: http://tinyurl.com/9ph9br3
Software Engineer, MAQSoftware, Mumbai.
2006 2007
- Developed enterprise web applications using C#, ASP.NET Ajax and XML
- Developed data warehouse (Usage Reporting) for Microsoft using C# and SQL Server BI
Publications
Pansare N, Jermaine CM, Haas P, Rajput N. Topic Models over Spoken Language.
2012
IEEE International Conference on Data Mining (ICDM 12), December 2012.
Pansare N, Borkar V, Jermaine CM, Condie T. Online Aggregation for Large MapRe-
2011
duce Jobs. Proc. VLDB Endow., August 2011.
Sahay S, Rajput N, Pansare N. Social Ranking for Spoken Web Search. CIKM 2011.
2011
Arumugam S, Dobra A, Jermaine CM, Pansare N, Perez L. The DataPath system:
2010
a data-centric analytic processing engine for large data warehouses. ACM SIGMOD
10, June 2010.
Pansare N. Multi-query optimization in the Datapath system. Master s thesis, 2009.
2009
University of Florida, Gainesville, USA.
Tools
C, C++, Java, R, C#, Scheme, Common Lisp
Pgm Lang
Hadoop, SQL Server BI, Servlet, JSP, Ajax, ASP.NET
Technologies
1/2
Projects
- Implemented STM (see ICDM 12 paper) in C++ using GNU Scienti c library (GSL).
Spoken Topic
Model (STM) - CMU Sphinx4 speech-to-text engine was modi ed and data was generated by pro-
viding it with real-world audio les (TedTalks/Yale).
- The e ectiveness of STM was tested by comparing it to Latent Dirchlet Allocation
using o -the-shelf classi ers (SVMlight, SVMmulticlass and Weka).
- Modi ed Hyracks (Hadoop-like system) to provide necessary machinery for OLA.
Online
Aggregation - Dealt with Inspection paradox in a principled way to provide unbiased estimates
(OLA) using bayesian model implemented in C++.
- The overall system was then tested using Wikipedia tra c dataset.
- Data-centric database implemented from ground-up and tested on 10TB scale TPC-
Datapath
system H data-set.
- Developed multi-query optimizer in C++ to provide data-centric query plans.
- Yadmt: Tool to nd the best classi er for your dataset using statistical tests sug-
Non-research/
Personal gested in Machine Learning literature. Link: http://code.google.com/p/yadmt/.
- Voca: Voca is a desktop app (written in Java) that is designed to run in back-
ground, with minimal user interaction/interference, and that allows users to issue
voice commands. Link: https://www.facebook.com/voca.desktop.
For detailed listing of my projects/courses, see http://www.linkedin.com/in/niketan.
References
Peter Haas
IBM Research
Chris Jermaine
Rice University
2/2