Salil Wadhavkar 267-***-****
*** ****** ***, *** ** *****@**************.***
Evanston, IL 60202 http://www.salilwadhavkar.com
Objective
To obtain a full-time position related to computer engineering.
Research Interests
Heterogeneous multi-core design, high performance microarchitecture, cycle-level processor simulation tools,
instruction-level parallelism, improving single-thread performance using novel microarchitectural techniques,
trace-level reuse.
Education
North Carolina State University Raleigh, NC
Ph.D., Computer Engineering (GPA: 4.0/4.0) Aug 2006 - Dec 2012 (Expected)
Thesis: Designing a Workload-agnostic Heterogeneous Multi-core Processor
Temple University Philadelphia, PA
M.S.E., Electrical Engineering (GPA: 3.97/4.0) Aug 2003 - May 2006
Thesis: Reducing the Overhead of Runahead Execution using RENO
University of Mumbai Mumbai, India
B.E., Electronics Engineering Sep 1999 - June 2003
Publications
FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar
Template. Niket K. Choudhary, Salil Wadhavkar, Tanmay Shah, Hiran Mayukh, Jayneel Gandhi, Brandon Dwiel,
Sandeep Navada, Hashem H. Najaf-abadi, and Eric Rotenberg. IEEE Micro Top Picks, Issue 3, May-June 2012.
FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar
Template. Niket K. Choudhary, Salil Wadhavkar, Tanmay Shah, Hiran Mayukh, Jayneel Gandhi, Brandon Dwiel,
Sandeep Navada, Hashem H. Najaf-abadi, and Eric Rotenberg. International Symposium on Computer Architecture,
June 2011.
FabScalar. Niket K. Choudhary, Salil Wadhavkar, Tanmay Shah, Sandeep Navada, Hashem H. Najaf-abadi, and
Eric Rotenberg. Workshop on Architectural Research Prototyping (WARP), held in conjunction with ISCA-36, June
2009.
Current Research
Choosing cores for robust heterogeneous multi-cores: Single ISA heterogeneous multi-cores o er a promising
approach for designing future multi-cores. Given the application diversity in todays systems, a multi-core can be
composed of cores designed to cater to di erent application characteristics. This research investigates methods to
recommend a set of core designs that o er close to optimal performance for a wide range of applications by studying
common application characteristics and processor resource requirements. A set of such diverse cores eliminates the
need to design a per-application optimal core from scratch using time-consuming design space explorations.
Past Research
Framework for Processor Customization and Design-Space Exploration: A detailed cycle-accurate
simulation framework to model a customizable RTL (Verilog) model of a superscalar processor was developed using
C++. This framework can be used wherever fast and accurate estimate of performance is needed, for example, the
design-space exploration of a superscalar processor, pre-RTL evaluation of microarchitectural techniques, etc.
Partial trace-level reuse: Trace-level reuse eliminates the need to execute complete sequences of instructions by
matching the trace inputs with previously stored instances. However, this requires a large amount of storage to be
e ective. This research optimized general trace-level reuse by identifying that dependence chains are more suitable for
reuse due to the data ow relationship among instructions.
Work Experience
Intel Corp., Software Services Group Santa Clara, CA
Performance Tools Intern Feb 2011 - Aug 2011
Ported SEP, a performance pro ling tool, to support heterogeneous processor platforms.
Investigated techniques to improve resource utilization of SPEC benchmark suites on heterogeneous processor
platforms.
North Carolina State University, Dept. of Electrical and Computer Engg. Raleigh, NC
Research Assistant Aug 2007 - Present
North Carolina State University, Dept. of Electrical and Computer Engg. Raleigh, NC
Teaching Assistant Aug 2006 - Aug 2007
University of Pennsylvania, Dept. of Computer and Information Science Philadelphia, PA
Research Assistant Jun 2006 - July 2006
Temple University, Dept. of Electrical Engg. Philadelphia, PA
Teaching, Research Assistant Jun 2004 - May 2006
Selected Academic Projects
Long Latency Cache Miss Tolerant Architectures: A comprehensive evaluation and comparison of L2 cache
miss latency tolerant techniques such as Runahead Execution, Checkpointed Early Load Retirement, and Continual
Flow Pipelines, on a ROB-based dynamically scheduled superscalar substrate. 2007.
Smart Victim Cache: Improving the performance of a victim cache by selectively allocating and replacing cache
blocks. 2007.
MiniC Compiler: A VLIW compiler for MiniC that performs scheduling and employs well-known back-end
optimizations. 2007.
UNIX Thread Library: A user-level thread library, similar to the POSIX thread library, including a preemptive
round-robin scheduler and synchronization primitives. 2007.
Mutual Exclusion in Distributed Systems: A library for ensuring mutual exclusion between processes, using an
optimized version of the Lamport algorithm. 2007.
Pro ling and Timing Analysis for Embedded Systems: A study of application behavior and code timing
analysis for embedded systems using the M16C microcontroller. 2007.
RTOS Instrumentation: A study of C/OS-II real-time operating systems using instrumentation code and
transmitting data on the serial port to represent system events. 2007.
Thread-Level Speculation on Multi-core Processors: A detailed memory-side simulator extension for exploiting
Thread-Level Speculation in Chip Multi-Processors. 2006.
Benchmark Study for Thread-Level Speculation: A run-time and code-level analysis of a oating-point
benchmark equake, to identify opportunities for exploiting Thread-Level Speculation. 2006.
MSI Implementation: An implementation of the MSI cache coherence protocol for parallel processors using SESC.
2006.
Technical Skills
Simulator development, C/C++, Assembly, Verilog HDL, BASH, Perl, Cadence Tools (Schematic and Layout)
L TEX, Linux, Windows.
A
Professional A liations
Student Member - IEEE, ACM-SIGARCH
Member - Phi Kappa Phi