DIBAKAR BARUA
*** **** ******* ****, ***. ***, San Jose California +1-404-***-**** H1-B Visa Holder *******.*******@*****.*** https://github.com/dibakarbarua https://linkedin.com/in/dibakarbarua OBJECTIVE
Product Development/Prototyping professional looking for a Full-Time position in Hardware-Software Development/Co-design for the Computer Hardware Industry EXPERIENCE
• ASIC-Verification Engineer, NVIDIA (Santa Clara, California) April 2017 – Present o Working in the discrete GPU Full-Chip Verification team supporting 6 chips simultaneously o Weekly debugging of failures using waveforms and log parsing, frequently interface with RTL/Architecture owners of various clusters and modules in the GPU, based on understanding of feature testing at System Software scale. Full ownership of all individual issues o Supported the full-chip test bench infrastructure: Optimized software for ~1000x CPU runtime gain by replacing the legacy PLI based DRAM C-model interface with DPI-C o Profiled the entire register-access and memory-access infrastructure at full-chip level using C++/System- Verilog to gauge scope for future optimizations and reduced runtime from days to hours o Detailed understanding of GPU pipelines, with special emphasis on the GPGPU compute framework supporting HPC (High Performance Computing) and ML (Machine Learning) workloads o Created ML based optimizations for internal infrastructural flows for version control and gatekeeping
• Hardware Engineer, Oracle America Inc. (Santa Clara, California) June 2016 – March 2017 o Worked in the FPGA Platform team for prototyping future SPARCTM based CPUs onto FPGAs o Updated an existing interface gasket that transfers data from L3-cache to PCIe interface IP with updated intra-chip packet format for new data-protection features o Added error handling features to the existing core prototype by using GPIO bypass on Microblaze soft-core processor and software modelling of error handling IP o Created the flow for using Scandump (a JTAG based debugging technique) for the latest FPGA architectures
• Logic Design Intern, Intel Corporation (Folsom, California) May 2015 – December 2015 o Media Encoder Front End Design Team for Intel's Visual and Parallel Computing Group. o Supported my team in verification of an HEVC Decoder cluster consisting of several units. o Wrote several Python scripts and tools to that effect, assisted in writing cover points and assertions for different projects, setup the framework for a cluster testbench based on Intel's current generation flow
• Research Assistant, Alternate Computing Lab, Georgia Tech January 2014 – March 2014 o Worked in a team research project to design C++ models for FPGA based accelerators for computing problems involved in Machine Learning Algorithms
• Hardware Intern, Mentor Graphics (Noida, India) December 2013 - January 2014 o Carried out LUT count optimization in various hardware modules of the newly modified Design-Ware Library in Verilog. Performed output validation using C-Shell scripts EDUCATION
Georgia Institute of Technology (GPA 3.9/4) August 2014 – May 2016 M.S in Electrical and Computer Engineering
Specialization: Computer Architecture, System-Software Relevant Courses: Advanced Computer Architecture, Advanced Programming Techniques, Advanced Operating Systems, Advanced VLSI Systems, Parallel & Distributed Computer Architecture, Compiler Design Netaji Subhas Institute of Technology, University of Delhi 76.98% (GPA: 3.96/4.00) August 2010 – May 2014 B.E in Electronics and Communication Engineering
Relevant Courses: Computer Architecture, Digital Design, Operating Systems, Data Structures and Algorithms SELECTED ACADEMIC PROJECTS
• CS 6241 – Compiler Optimizations (Loop Optimization Passes for LLVM Backend) o Developed Loop Optimization schemes in HIR to LIR translation in LLVM using techniques such as array index expression redundancy elimination, loop fusion, global value numbering o Obtained a suitable speed up as compared to non-optimized LIR (IR = Intermediate Compiler Representation of microcode)
• Bachelor’s Thesis: Viterbi Decoder Implementation using Hardware-Software Co-Design o Built a novel Viterbi Decoder implementation by offloading the Traceback Subroutine to dedicated parallel hardware on an FPGA from a sequential decoder running on a soft-core microprocessor
• CS 6210 – Advanced OS (Implementation of a User-Level Threads Library) o Designed a pThreads based User-Level Threads library using a preemptive round robin scheduler and context switching paradigms in C
• ECE 8873 – Advanced Memory Systems (State of the Art Cache Replacement and Insertion Policies) o Implemented latest Cache Replacement and Insertion Policies such as DRRiP and SHiP-PC based on reuse patterns of cache lines
o Used the actual ISCA CRC Simulator for implementation
• ECE 6100 – Advanced Computer Architecture (Implementation of Cache Coherence Protocols) o Implemented the MI, MSI, MESI, MOSI, MOESI and MOESIF Cache Coherence Protocols on 4, 8 and 16 core multiprocessor traces
o Carried out experimentations on the simulator for 8 traces to assess the likely sharing property
• CS 6210 – Advanced OS (Distributed Web Proxy Server) o Implemented a web proxy server and implemented an internal caching mechanism for storage based on various cache-replacement algorithms (LRU, LFU)
o Compared the performance of the different algorithms with load characteristics
• ECE 6100 – Advanced Computer Architecture (Tomasulo Algorithm on an OoO Superscalar Processor) o Designed a simulator for implementing the Tomasulo Algorithm on an out-of-order processor with parameterized fetch count, functional units and common data bus
• ECE 6100 – Advanced Computer Architecture (Cache Simulator and Cache Design for SPEC benchmarks) o Designed a Cache Simulator for an L1 cache supported by a Victim Cache and Strided Prefetcher SKILL SET
Programming C, C++, Python, System-Verilog, Verilog, Assembly (x86), SASS (Nvidia), C-shell Software Libraries/ OS C++ STL, MPI, OpenGL, pThreads, Linux, GNU Toolchain (gdb, awk, sed, Make, etc.) Tools CUDA, ns-2, pSpice, EagleCAD, GNU Radio, Cadence Virtuoso Design and Verification VCS, Verdi, ModelSim, UVM, DPI-C, PLI, C-models for Verification, Fullchip/SoC Testbench Development
Protocols, Bus Architectures I2C, SPI, USB, Modbus, 1-wire, UART, TCP/IP, JTAG, PCIe, AXI FPGAs and Prototyping Xilinx Ultrascale/Virtex-7, Vivado, Synplify Pro, Microblaze, ARM Based SoCs