Resume

Engineer Network

Location:

San Jose, CA

Posted:

October 04, 2013

Contact this candidate

Resume:

Sudnya Padalikar

San Jose, CA

ab9utb@r.postjobfree.com

OBJECTIVE

To shape the development of a new vertically integrated product using disruptive technology from

the ﬁelds of computer vision, computational photography, machine learning, and high performance

parallel computing.

RELEVANT PROJECTS

Multi-Modal Gesture Recognition

• Performs 20-class gesture recognition using a training set of 7,754 labeled RGB-D images.

• Classiﬁcation is performed using a multi-level neural network.

• Feature selection is performed automatically using a sparse autoencoder on unlabeled video.

Neural Network Classiﬁcation Library

• C++ implementation of a neural network classiﬁer on arbitrary input data.

• GPU accelerated, scales to millions of input features.

• Applied to image gesture recognition.

Discrete and Linear Optimization Library

• From scratch C++ implementation of a constrained linear and discrete optimization solver for

general purpose NP-hard optimization problems.

• The high level design uses simulated annealing, greedy heuristics, local search, and tabu search.

• Applied to neural network training and vehicle routing.

EDUCATION

Georgia Institute of Technology, Atlanta, GA

• MS - Computer Science – August 2007 - December 2008

Vishwakarma Institute of Technology, Pune University, India

• BE - Computer Engineering – August 2002 - July 2006

EXPERIENCE

NVIDIA: Santa Clara, CA

GPU Streaming Multiprocessor Architect – February 2012 - Present

• Modeled new architecture features in the processor performance simulator.

• Evaluated new instructions using simulator modeling and writing directed performance tests.

This implementation was also used to evaluate the RTL design and performance bottlenecks.

• Implemented hardware support for precise exceptions in the simulator.

• Debugged complex workloads (e.g. Cuda Nested Parallelism uScheduler) on a full chip simu-

lator involving multiple interacting units.

NVIDIA: Santa Clara, CA

GPU Architecture Engineer – March 2010 - February 2012

• Ported an architecture simulation framework to enable evaluation of the next generation GPUs.

• Wrote bringup tests for a new feature in the next generation GPU.

• Developed a tool to translate application traces into unit tests that run on RTL.

• Worked on an instrumentation tool that captures GPU processor state for performance studies.

NFinTes: Marietta, GA

Research Intern – April 2009 - February 2010

• Wrote a discrete event simulator to evaluate GPU remote procedure calls among nodes.

• Simulator was written in C++ with MPI.

• Used CUDA applications as workloads to perform the evaluation.

Qualcomm Inc: San Diego, CA

Summer Engineering Intern – May 2008 - August 2008

• Integrated layer 2 software on a WiMax base station.

• Implemented a framework for performance and functional evaluation of WiMax base station

software on a QDSP6 processor.

IBM: Pune, India

Associate Systems Engineer – August 2006 - July 2007

• Worked on proﬁling and testing of Proventia integrated security appliances.

G.S. Labs: Pune, India

Enginerring Project Intern – June 2005 - July 2006

• Designed and implemented a voice bridge tying together Skype (P2P VoIP network) and

Asterisk (open source IP-PBX).

SELECTED PUBLICATIONS

Sudnya Padalikar and Gregory Diamos. ”GPU-RPC: Exploiting The Latency Tolerance of CUDA

Applications.” In NVIDIA Research Summit, San Jose, California, USA, September 2009.

MAJOR PROJECTS

A Massively Parallel Simulator - Archaeopteryx

• Built a massively parallel simulator (CUDA based) to simulate future parallel processor archi-

tectures on current GPUs.

• Explicit separation between functional and timing model.

• Achieves high performance by exploiting data structure locality, hierarchical synchronization,

and minimal state-per-thread.

Parallel Discrete Event Simulator

• The simulator is partitioned into models which describe the system being simulated and a

kernel that manages events and time synchronization.

• Detailed timing models for network links, ethernet devices, and layer 3 and 4 protocols.

• Sequential and parallel implementations of the simulator kernel, each with identical interfaces.

The parallel version implements the (Chandy-Misra-Bryant) time synchronization algorithm

using MPI.

SKILLS

Languages:

• C++, Python, Java, Matlab, C, CUDA, PHP, Perl, Intel x86 assembly, ARM assembly,

NVIDIA GPU Assembly, Scripting (Csh, Bash).

Libraries:

• STL, Boost, MPI, Pthreads, OpenMP, BLAS, LAPACK, Numpy.

REFERENCES

• Will be provided on request.

Contact this candidate