Post Job Free

Resume

Sign in

System Software C++

Location:
Potomac, MD
Posted:
October 12, 2023

Contact this candidate

Resume:

* ******** **, #***

San Francisco, CA, *****

415-***-****

ad0b4x@r.postjobfree.com

AWASTHI, VINAY

OBJECTIVE Obtain a position where I can apply my analytical and programming skills to create world class products.

SKILLS & ABILITIES Proficient in performance analysis, system programming. I am enabling nested encrypted VMs (TDX) for Microsoft. I optimized runtime for Intel Binary translation reducing translation overhead from 4x+ to 1.6x. I developed and optimized Openjdk 12, ZGC low-pause garbage collector (1 TB memory collections, eliminating need for Intel to develop, Sparc M7 equivalent, check and load instruction, saving 13% performance degradation observed on x86 and ZGC low- pause collector (Java). I developed SEP drivers to collect and attribute system software performance/power using VTune.for Linux and OSX. I optimized OpenCL image stabilization frameworks running on CPUs/GPUs for Power/Performance for Apple, using hand-coded AVX -2 intrinsics, unrolling, cross-file 3 level nested loops. I am currently enabling VMM isolation technology called TDX

(https://software.intel.com/content/dam/develop/external/us/en/documents/td x-whitepaper-v4.pdf ) for Microsoft. I know C, C++ 17 STL. I am also enrolled in Stanford AI certification program completing in 2022. EXPERIENCE INTEL CORPORATION

1997 to Present

Security New OS & New CPU:(2018 – current)

In this role I have been enabling TDX and helping Microsoft (TDX VM Nesting). Before this role, I was enabling VMware implement TDX, SGX, CET (Shadow stack for return tracking. My role is to enable Microsoft Windows 11 to do authorized booting, encrypted memory mgmt.

/paging setup for VTLs running to support unmodified Windows 11 in TD environment. I am also architecting vCPUs VTLs for booting as well as enabling Hyper-V for initial pre boot, boot loading key mgmt. with encryption enabled as well as supporting stability of VMs in nested TD environment.

OpenJDK Java compiler low pause collector (ZGC) 2015 – 2018 (Mar) In Java Compiler Group, I implemented a barrier analysis code allowing intel to check any instruction sequence to validate its feasibility (M7 Sparc implemented Load and test for ZGC garbage collector and Intel and various pointer versioning instructions ideas to check as part of GC Page 2

pathfinding…). I implemented new barrier in ZGC reducing performance overhead from 14% to 5%. Rest of team then reduced to 3.5% and this patch was given to Oracle as part of ZGC release. Security (2011-2015).

Intel acquired Transmeta Binary Translation code base and 3 teams were formed. I was in Binary translation team that produce ROP/JOP gadgets detection and prevention using Binary Translation used by McAffee to do batch mode threat analysis reducing weeks of analysis to hours. Idea here was to run all binaries in container and JIT them as they run to check control flow integrity (became CET feature in CPU coming out in Tigerlake) as well as detect loops and transform those from SSE to AVX so that old binaries can take advantage of new vector instructions with out needing a recompilation. I was performance lead for this project. I identified and fixed issued to decoder (xed) performance, self modifying code (java script in browsers) performance as well as other security issues related to change in page protections while running this translator. Apple 2008-2011

I worked with Apple to optimize Final Cut Pro frameworks using intel AVX/AVX2 instructions using intel compiler intrinsics as compiler could not do 3 level loop unrolling as used in these frameworks for Motion stabilization. I also wrote OpenCL CPU implementation whitepapers etc and presented OpenCL on CPU capabilities at WWDC 2011. I create Vtune SEP driver to collected PEBS data and decode symbols with out using symbolication frame works to pinpoint where PEBS interrupt events are originating from at the time of interrupt. This driver is used by Vtune for performance event data collection on Mac OSX.

Graphics 2007-2008

Intel Graphics Gen6 Fulsim depth and render cache as well as Data port units c++ cycle accurate simulation models

Larrabee development and Gen6 development teams used these model to verify Graphics units optimizations and correctness.

Nehalem MOCOE 2005-2007

Intel Nehalem processor Memory Ordering Center of Expertise: Here my role was to develop internal memory ordering spec for all intel x86 cpu validation teams to follow and also provide this as input for team writing external memory ordering spec. for C++ community. A formal spec. existed for Itanium but not for x86. Networking 1997-2005

Networking Silicon and driver development.

I joined this team as gigabit Ethernet and 10/100 ethernet driver developer for Unixware, Solaris Linux operating systems. After 2 yrs I moved as performance lead for networking stack and silicon development (TCP IP offload, fragmentation offload) where I developed specialized drivers and software to showcase silicon performance capabilities for short 64 byte packets (maximum stress on stack/silicon as there is not enough payload). I also proposed interrupt moderation I also debugged large complex issues involving platform bring up (Itanium, new chipsets, new operating systems, PXE boot install (real to protected mode transitions, EFI and PXE boot etc..), PCI express protocol (new at that time) and device interoperability issues (Broadcom and Intel on same bus Page 3

causing data corruption) etc..and issues related in new phy (internally developed vs ope Intel outsourced from Marvell).

The above debug also involved cache coherency domain debug (Itanium and X86-64 where IOH kept data for too long causing .data corruption and changed in MESI protocol for Itanium as some assumptions made by designers were proven wrong when data arrived at much higher rate (gigabit from 10/100 ethernet).

https://slideplayer.com/slide/11567806/.

http://hg.openjdk.java.net/zgc/zgc/rev/ee60614dc39e I also implemented G1GC and POGC heterogeneous memory support where Java heap is distributed across DRAM and intel new persistent memory.

https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-whitepaper- v4.pdf

BACG BRITISH AMERICAN CONSULTING GROUP DATA WAREHOUSING PROJECT– 1995-1997

Here I worked as lead C++ developer/consultant helping BACG design data warehousing software. ACCENTURE 1993-1995 TQT MALAYSIA

C++ project to manage customers for Time Quantum Telekom Malaysia. EDUCATION INDIAN INSTITUTE OF TECHNOLOGY, KANPUR

I have masters from here with specialization in control systems and polymer science. STANFORD

Enrolled in AI graduate certification program completing in 2021. COMMUNICATIO

N

I have published various papers with in Intel (intel did not want to publish as they are related to Intel’s trade-secrets), have 2 patents and have presented from Intel at WWDC Conference.

LEADERSHIP I am lead security architect, enabling Intel TDX for Nested VM/VM migration (Patent U.S. Application No. 17/134,339 (filed 12/26/2020), for Microsoft. I also lead G1GC development for Optane (NVDIMM) and MCDRAM .Earlier, I was lead performance architect, for Intel Binary Translator (Transmeta Crusoe) which was used for ROP/JOP gadgets detection as well as JIT to transform SSE instructions to AVX without recompiling.. https://www.linkedin.com/in/vinay-awasthi-86910b3/



Contact this candidate