Kapil Dalwani
Software Engineer Data Science at Eventbrite
************@*****.***
Summary
Kapil is a graduate student from the Computer Science department of The Johns Hopkins University.
He has strong fundamentals in data structure and algorithms.
He has taken formal courses in field of Machine Learning, Natural Language Processing, Information Retrieval
and Extraction and is ready to apply the knowledge gained over his course work to real life problems.
Currently, he is working with ATT Interactive also called YellowPages.com. We works in the Data insight
group, where his role is to mine user query logs. The team is also responsible for finding patterns in user search
which helps in improving the relevancy of the search. Over the past one year, he has gained substantial
experience working on Hadoop and hive. He has also worked on the search engine side of the products and have
gained experience in SOLR/Lucene.
Before working with ATTi, he was working in start-up. He wss mainly handling the search side of the social
networking website. He is proficient in Lucene and Solr. On back end he is working on PHP for writing server
side code, along with Mysql for DB. He also write codes in Python to do some admin/adhoc related stuff.
His interests lies in applying Machine learning, Natural language and Information retrieval methodologies on
big data by leveraging the power of Hadoop.
Specialties
Hadoop,
Hive,
Natural Language Processing,
NLP,
Information retrieval,
Machine learning,
ML,
Java,
Python,
Big data,
Recommendations.
Experience
Software Engineer Data Science at Eventbrite
Page1
July 2012 - Present (1 year 6 months)
Recommendation, Machine Learning and Search. Data discovery.
Improving event classification using Machine Learning techniques.
Working on SOLR to provide quality search results.
Recommending events using social graph.
Hadoop pipleline: Adding features like Auto complete and spell checker to search.
Writing hadoop piplelines to solve data mining problems.
Logistic Regression,
LibLinear, Octave
Hadoop
Java
Python
Django
Redis
Hive
SOLR
Lucene
Machine Learning,
Scikit learn
Senior Software Engineer with Data Insight Team at AT&T Interactive [YellowPages.com]
January 2011 - July 2012 (1 year 7 months)
Working with Data Insight Team doing query log analysis, click through rates etc.
Worked on projects to mine user query/search logs and suggest spell correction and related queries to the
users.
Kapil worked on extending an existing version of a data library which makes it extremely useful in reading
Hive tables inside a map-reduce jobs.
Currently, Kapil is implementing the next version of Search Offline simulator(SOS). SOS is build upon
Hadooop map/reduce paradim and mines the user's query logs. Its a useful product to output complex
Dimensions and Metrics to the user, converting business requirements into useful information. SOS is also
used as a pre-step to A/B testing where a comparison can be made between the current production version
and a new candidate search version. Using the probabilistic CTR model its helps in determining how good(or
bad) the new version will perform.
Page2
On an ad-hoc basis, Kapil is responsible for providing useful insight on user query logs and click through data
using HIVE as a tool.
Worked on improving ranking algorithms for local search. Worked on end to end search engine using SOLR.
Hadoop, Map reduce
Hive
Scribe
Software Engineer for Search at Pipio
December 2009 - September 2010 (10 months)
I am mainly handling the search aspect of the product. I am working on SOLR and Lucene.
I implemented and designed the search and hashtags functions on the site.
I write a lot of code in PHP on the server side. I am also working on some ad-hoc Python code deployment,
mailer projects.
I do write a lot stored procedures and sql queries in MySql.
XMPP, JS are some other cool technologies I am working on.
Software Intern at Aleph Point at Aleph Point
June 2009 - February 2010 (9 months)
I am working as a Software developer for Aleph point. The work is challenging and involves great deal of
knowledge in SOLR/Lucene.
Team member at CLSP'09 workshop at Center for language and speech processing
June 2009 - July 2009 (2 months)
During Summers 09, I volunteered for working in the CLSP workshop. I was a part of the n-gram team and
under the tutelage of Prof. Satoshi Sekine we build an n-gram search engine.
More details can be found here
http://www.cs.jhu.edu/~kapild/files/projects.html#nsearch
Publications:
1) N-gram Search Engine with Patterns Combining Token, POS, Chunk and NE Information, Proceedings of
LREC, 2010
by Satoshi Sekine, Kapil Dalwani
2) New Tools for Web-Scale N-grams. Dekang Lin, Ken Church, Heng Ji, Satoshi Sekine, David Yarowsky,
Shane Bergsma,
Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani and Sushant Narsale, Proceedings
of LREC, 2010
Product Engineer at CoreObjects
Page3
December 2006 - January 2008 (1 year 2 months)
I worked on 2 project in Coreobjects. Both of the projects were built from scratch.
The first was a audio sharing and streaming wesbite built on a small social network. It was built on MVC
architecture with spring, hibernate and struts as its major components.
the second project was an Eclipse plugin which took input from user to build on the fly code using
Freemarker as a template. The final product was a wesbite build on Adode Flex.
2 recommendations available upon request
Software Engineer at Computer Sciences Corporation
August 2004 - November 2006 (2 years 4 months)
1 recommendation available upon request
Courses
MS, Computer Science
The Johns Hopkins University
Natural Language Processing
Machine Learning
Algorithms
Data Mining
Information Retrieval
Publications
N-gram Search Engine with Patterns Combining Token, POS, Chunk and NE Information,
Satoshi Sekine, Kapil Dalwani May 20, 2010
Authors: Kapil Dalwani, Satosh Sekine
We developed a search tool for ngrams extracted from a very large corpus (the current system uses the entire
Wikipedia, which has
1.7 billion tokens). The tool supports queries with an arbitrary number of wildcards and/or specification by a
combination of token,
POS, chunk (such as NP, VP, PP) and Named Entity (NE). It outputs the matched ngrams with their
frequencies as well as all the
contexts (i.e. sentences, KWIC lists and document ID information) where the matched ngrams occur in the
corpus. It takes a fraction
of a second for a search on a single CPU Linux-PC (1GB memory and 500GB disk) environment.
Page4
Education
The Johns Hopkins University
MS, Computer Science, 2008 - 2009
Activities and Societies: My course work has been related to the filed of Natural Language Processing,
Informational Retrieval, Information Extraction, Data Mining and Machine Learning.
Punjab Engineering College
B.E., Electrical and Electronics Engineering, 2000 - 2004
Projects
Rdio + Google Chrome plugin
March 2012 to April 2012
Members:Kapil Dalwani
Description
One click (for Rdio®) way of adding your currently playing Rdio songs to your Rdio playlist. One Click Add.
OneClick+
OneClick+: It's a Google chrome extension to provide an one click solution of adding your currently playing
songs to your Rdio playlist. OneClick+ works with Rdio®.
Overview
This extension uses the 3-legged Oauth authentication to your Rdio account. After doing the initial handshake
of exchanging tokens, it loads your owned playlist
and the most recently played song. One can just click on the '+' link against a playlist name to add that song to
that playlist. It's an one click solution to add your currently playing song
to one of your playlist.
a) It is useful for people who listens to lot of recommendations and would like to easily add the song they are
currently listening to a playlist of their choice.
b) You have an option to create a new playlist and add the new song to it.
c) You can search for songs, and it will return the most common track(top hit) for that search query. You can
then again add the song to the playlist of your choice or add another new playlist.
ToBikeToBart
February 2012 to Present
Members:Kapil Dalwani
ToBikeToBart:
Page5
This is a very simple android app, which uses BART API. The purpose of the app is to tell user whether he is
allowed to carry his bike to bart or not.
The input is arrival and departure station, along with a specific time he/she wish to travel. Then the app calls
the BART API, stating whether he can carry his bike
on the following trains.
I face this problem when I am biking to/back from work. At times I dont wanna bike and would like to BART
with bike instead.
Hence, ToBikeToBart
Other funny names : ToBOrNotToBToB: TO bike or not to bike to BART
BART API: http://api.bart.gov
Kaggle Yelp Recsys 2013
August 2013 to Present
Members:Kapil Dalwani
My first attempt to a Kaggle competition.
To predict Yelp business ratings. I use libFM library
https://www.kaggle.com/c/yelp-recsys-2013/
Skills & Expertise
Java
Data Mining
Machine Learning
Natural Language Processing
Information Retrieval
Solr
Hadoop
MapReduce
Algorithms
Big Data
scikit
Hive
Recommendation
Recommender Systems
Panda
Python
Page6
Certifications
Cloudera Certified Hadoop Developer
Cloudera June 2011
Interests
Biking,
Listening to rock classical music, investing in stocks
Page7
Kapil Dalwani
Software Engineer Data Science at Eventbrite
************@*****.***
3 people have recommended Kapil
"I worked with Kapil on two assignments during his tenure in CoreObjects. Kapil is a fine blend of
Intelligence, Logic and aptitude with right attitude. Good reasoning ability, logical, problem solving
approach, Patient, keeping the surrounding light even in tensed situation with his balance of sense of humor
and handling pressure. A good team player and potential to be a good mentor to fellow members."
Hemant Kumar, Engineer, CoreObjects, worked directly with Kapil at CoreObjects
"kapil has been known to me for almost 1.5 years now and is a very sincere, hardworking and dedicated
person. He is technically very sound and has great communication skills. A good team player and has a
responsible attitude towards any task given to him. He’s done a great job during his tenure with Coreobjects.
Overall he is a very friendly, helpful and a good person to work with. I wish him All the Best. :)"
Smita Tyagi, Member - CI, Coreobjects India Pvt. Ltd., worked indirectly for Kapil at CoreObjects
"Kapil has exceptional analytic and technical skills. He was the consistently the top-performer amongst his
peers and can has the ability to preserve to produce results."
Sunjay Jose, Senior Software Engineer, Computer Sciences Corporation, managed Kapil at Computer
Sciences Corporation
Contact Kapil on LinkedIn
Page8