AJO E JOSE
+91-996******* *********@*****.*** Bengaluru, India www.linkedin.com/in/ajo-e-jose
SUMMARY
Heterogeneous CompuƟng & Embedded Systems Engineer with 4.5+ years of experience in hardware acceleraƟon and real-Ɵme system design. Expert in parƟƟoning complex DSP and ML workloads across Xilinx Versal/XDNA architectures. Proven track record in opƟmizing kernels using SIMD/VLIW and designing custom FPGA IPs. Skilled in low-level C/C++ firmware, RTOS (FreeRTOS), and wired/wireless communicaƟon protocols for real-Ɵme embedded systems.
TECHNICAL SKILLS
Hardware Plaƞorms: Xilinx Versal ACAP (VCK190, VEK280), AMD XDNA NPU, STM32, ESP32 Programming & ScripƟng: C, C++, Python, MATLAB, OpenCL, Bash, TCL, AIE intrinsics, HLS pragmas Interfaces & Protocols: NoC, AXI4/AXI-Stream, DMA, I2C, SPI, UART, BLE, MQTT Tools: Vivado, XRT, ViƟs Analyzer, Vivado ILA, AI Engine Analyzer Embedded & OS: FreeRTOS, PetaLinux, Yocto Project, STM32CubeMX, bare-metal Build & CI/CD: Makefile, CMake, Git, Jira, CI/CD pipelines Debugging & TesƟng: JTAG, GDB, Valgrind, perf, gprof, GoogleTest (gtest), logic analyser WORK EXPERIENCE
Yantravision SoŌware Pvt. Ltd. Bengaluru, India
Contract Employee for AMD Xilinx Apr 2022 – Present AI Compiler and Kernel Engineer XDNA NPU Team
Contributed to the Ryzen AI compiler middle-end, transformed quanƟzed and post-opƟmized ONNX IR graphs into hardware-aligned execuƟon plans for XDNA-based NPUs to maximize throughput.
Developed high-performance NPU kernels for Conv, MatMul, etc.. using intrinsics and AIE-API, leveraging SIMD/VLIW architectures to achieve peak hardware efficiency.
Designed a spaƟal-temporal parƟƟoning framework to scale large tensor workloads across mulƟple NPU cores, significantly reducing inter-core synchronizaƟon overhead.
Architected a mulƟ-level Ɵling and memory orchestraƟon strategies for L1/L2 caches, including staƟc scratchpad planning, double-buffered DMA to hide data movement latency, etc...
Implemented a heurisƟc-based cost model to automate Ɵling decisions by analyƟcally evaluaƟng execuƟon cycles, memory traffic, and compute uƟlizaƟon.
Executed cycle-accurate profiling and simulaƟon to eliminate pipeline stalls, fine-tuning instrucƟon scheduling and register allocaƟon for maximum operator throughput. Heterogeneous CompuƟng Engineer Xilinx Versal
Implemented heterogeneous DSP/ML applicaƟons on Versal VEK280 by parƟƟoning workloads across AI Engine, Programmable Logic (FPGA), and ARM cores to achieve high-throughput, real-Ɵme performance.
OpƟmized AIE kernels for compute and memory efficiency using advanced Ɵling strategies, SIMD vectorizaƟon, VLIW-based instrucƟon scheduling, etc...
Designed high-performance custom FPGA IPs and data movers using ViƟs HLS, enabling efficient integraƟon between AIE graphs, NoC, and LPDDR memory subsystems.
Built MATLAB reference models to validate algorithmic correctness and evaluate signal quality of fixed-point AIE implementaƟons.
Developed C/C++ host applicaƟons using XRT APIs and the OpenCL programming model to manage asynchronous execuƟon, buffer allocaƟon, and kernel orchestraƟon. Generated Vivado XSA plaƞorms to enable ViƟs-based hardware/soŌware integraƟon.
IdenƟfied and resolved system-level boƩlenecks, stalls, and deadlocks using ViƟs Analyzer and Vivado ILA, iteraƟng on HW/SW parƟƟoning to meet strict latency targets.
Contributed tutorials to the official AMD ViƟs-Tutorials repository (hƩps://github.com/Xilinx/ViƟs-Tutorials), did documentaƟon of end-to-end Versal design flows and best pracƟces for VCK190/VEK280 integraƟon and opƟmizaƟon.
Accord InnovaƟons Pvt. Ltd. Bengaluru, India
Embedded Systems Intern Trainee Dec 2020 – Oct 2021
Contributed to early-stage prototyping of medical electronic devices, worked on firmware development, hardware bring-up, funcƟonal tesƟng, and demo preparaƟon.
Developed embedded firmware for STM32 and ESP32 microcontrollers using C/C++. Have used FreeRTOS for implemenƟng mulƟtasking for sensor data acquisiƟon and communicaƟon handling.
Implemented wired communicaƟon protocols such as UART, I C, and SPI for interfacing sensors like load cells, accelerometer/gyroscope, color sensors, and RFID modules. Applied signal processing filters (e.g., moving average, low-pass filters) to improve measurement accuracy.
Designed and integrated wireless communicaƟon protocols like BLE, Wi-Fi, and MQTT, also worked in AWS IoT Core configuraƟons for cloud connecƟvity.
Designed small validaƟon PCBs using KiCad. Have performed comprehensive hardware and soŌware debugging using oscilloscopes, logic analyzer, JTAG, and GDB. EDUCATION
B.Tech in Electronics and CommunicaƟon Engineering 2016 - 2020 IES College of Engineering (APJ Abdul Kalam Technological University) Thrissur, KL COURSES & CERTIFICATIONS
FPGA Design & HLS (Udemy): C-based hardware design, HLS pragmas, performance opƟmizaƟon OperaƟng Systems from Scratch (Udemy): Processes, memory management, mulƟthreading Advanced Embedded Systems & Microcontrollers (Livewire): Bare-metal programming, protocols, real-Ɵme systems