Roshan R Sumbaly
Contact ********@*****.*** http://cs.stanford.edu/people/rsumbaly
Information +1-415-***-**** http://github.com/rsumbaly
Building large-scale systems and data-mining
Interests
Stanford University, USA 2008 - 2010
Education
M.S. in Computer Science
Specialization - Database Systems
Teaching Assistant - Information Retrieval & Web Search, Data Mining
BITS Pilani, India 2004 - 2008
B.E. (Honors) in Computer Science
Cumulative GPA - 10.0/10.0
Teaching Assistant - Computer Programming 1, Parallel Computing
LinkedIn, USA
Experience
Software Engineer (April 2010 - Present)
Working on Project Voldemort ( http://project-voldemort.com ) and Hadoop as a part of SNA
( http://sna-projects.com )
Yahoo! Inc, USA
Technical Yahoo! Intern (June 2009 - September 2009)
As a part of the Cloud Computing & Data Infrastructure team, incorporated various compres-
sion algorithms at three di erent tiers of the PNUTS / Sherpa distributed datastore, resulting
in decrease of average round-trip latency.
Stanford University, USA
Graduate Research Assistant (September 2008 - June 2009)
Worked in collaboration with the Computational Earth & Environmental Science group to
port various sparse complex matrix solvers to NVIDIA GPU Clusters using CUDA.
Hewlett Packard Labs, India
Research Intern (January - June 2008)
Proposed and built a prototype data integration middleware (based on Grid Monitoring Ar-
chitecture), for aggregation of HP s enterprise data. Data integration was achieved using RDF
& SPARQL.
Indian Institute of Science (IISc), India
Research Intern (May - July 2007)
Worked in the Grid Applications Research Lab on prediction of job queue wait time in batch
scheduled machines ( like Torque, LFS ) using historical logs. Proposed new metrics and
algorithms while also building a generic simulator for replaying logs to test new clustering
algorithms.
Bhabha Atomic Research Centre (BARC), India
Intern (May - July 2006)
Worked on a scheduling algorithm, based on back lling optimization and fairness policies, and
deployed it on a 512 node cluster in the Supercomputing Research Facility at BARC. Also
contributed to an inhouse distributed monitoring system.
International Conference on High Performance Computing (HiPC 2007)
AIGA - Arti cially Intelligent Grid Assistant
Developed a Grid based Question Answering system capable of mining answers from distributed
data-sets.
Published an article in IEEE Technical Committee on Scalable Computing (TCSC) Newsletter
titled Deployment of a Natural Language Processing system on a Grid
Stanford University, USA
Projects
Update Summarization
Built a system which generates a summary of a multi-document dataset based on the assump-
tion that the user has already read a given set of documents.
Opinion Mining over Large News Datasets
Developed metrics and algorithms to determine the opinion about people by mining New York
Times corpus ( 1.8 million articles spanning over 20 years )
Implemented using Aster Data s nCluster - a Map-Reduce based RDBMS with infrastructure
provided by Amazon EC2.
Supervised Machine Learning Classi ers for Usenet newsgroup messages
Implemented variants of classical classi ers like Naive Bayes, SVM, Decision Trees and Nearest
Neighbor methods. Analyzed various existing feature selection methodologies and proposed
new domain speci c features to enhance accuracy of the classi er.
Encrypted Tweets
Built a client side symmetric-key encryption system for Twitter using Greasemonkey.
Also build a proxy server capable of performing man-in-the-middle attack on SSL.
BITS Pilani, India
Analysis and Implementation of Load Balancing Algorithms in Distributed Environments
Simulation of variants of the classical balls into bins load balancing algorithms using SimGRID
Toolkit.
Personalization using Link Analysis
Implemented various link analysis algorithms on browsing history for personalized recommen-
dations.
Programming: C/C++, Java, Python, SQL, OpenMP and MPI Parallel Programming
Skills
Toolkits: Eclipse, Lucene, CUDA, Hadoop
Platforms: Linux, Windows, Solaris, Mac OS X
Worked on Amazon s EC2, S3, Elastic MapReduce and Google s App Engine. Managed AWS
resources ( $30K worth of computing time ) for 50 students as TA at Stanford
Recipient of CEES/RPSEA 2008-09 Fellowship for research in GPGPUs
Achievements &
Recipient of UC Berkeley Fellowship & Purdue University Graduate Fellowship for 2008-09
Awards
Recipient of Narotam Sekhsaria Scholarship for 2008-10
Awarded BITS Pilani Alumni Global 30 under 30 Award in 2009
Recipient of Dhirubhai Ambani Undergraduate Scholarship & BITS Merit Scholarship, for all
four years of undergraduate studies
Awarded the Gold Medal for highest GPA in 2004 batch of BITS Pilani
Led a team of four to win the National Runners-up Prize at Microsoft s Imagine Cup 2007 for
the project eduGRID
Founder & Student Coordinator, Linux User Group and CSD (Centre for Software Develop-
ment) at BITS
Coordinator, Conferences
Solaris and Open Solaris, Java : Now & Future, Web and Mobile Applications using Net-
Beans, University Days, Sun Microsystems
Microsoft Robotic Studio, Microsoft
Cluster & Grid Computing, CDAC : Centre for Development of Advanced Computing
LY.dvi