Generative Ai Software Architect

Location:

Phoenix, AZ

Posted:

January 12, 2025

Contact this candidate

Resume:

Imtiaz Sajwani

********@*****.*** https://github.com/ImtiazSajwani www.linkedin.com/in/imtiaz-sajwani-900869a/ U.S. Citizenship

Generative AI Software Architect

Innovative and solutions-driven Generative AI Software Architect with deep expertise in designing and optimizing large- scale AI and deep learning architecture across cloud and heterogeneous platforms. Skilled in deploying Generative AI models, integrating state-of-the-art hardware solutions, and optimizing performance on AI accelerators, with a proven track record of delivering high-impact AI solutions for enterprise applications. Experienced in end-to-end AI infrastructure, from distributed training on cloud platforms to customized FPGA emulation environments, leveraging industry-leading frameworks like DeepSpeed, ONNX, and MLOps for seamless AI deployment. Known for a strategic approach to total cost optimization (TCO) and driving digital transformation in sectors such as fintech, media, and autonomous systems, delivering scalable AI architectures that meet complex client needs. Areas of Expertise

Generative AI & Machine Learning AI Deployment & ML Ops AI Model Optimization FPGA & Hardware Integration Platform & Infrastructure Support Cloud & Virtual Platforms Sales Enablement & TCO Analysis Technical Communication Professional Experience

DCAI/SMG – Intel, Chandler, AZ 05/2020 – Current

Generative AI Solution Architect/Tech Sales Specialist Developed and executed AI-driven solutions for enterprise clients, leveraging advanced technologies to drive digital transformation and enabling scalable, high-performance Generative AI applications.

• Deployment of RAG/LLM and agentic AI pipeline (US Army) with Nvidia NIMS/NeMo and Model optimization.

• Partnered with Born-In-Cloud, strategic, and Nextgen public and private cloud Financial/eCommerce/Startups clients to deploy end-to-end Gen AI/ML workloads on heterogeneous platforms, leveraging MLOps frameworks and inference servers (Triton/vLLM) to ensure seamless deployment and operational efficiency.

• Architected advanced Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) pipelines for Agent AI deployment on Intel platforms, achieving optimized embedding and LLM performance.

• Designed and implemented distributed training environments using DeepSpeed and MPI on Intel platforms; authored engineering blogs and presented to the Board of Advisors, showcasing successful inference and training capabilities on the cloud for Intel customers.

• Optimized Generative AI Large Language models (LLM) and diffusion models on Intel accelerator using C++/Python, enhancing performance and efficiency for high-impact fintech and media use cases.

• Optimized and deployed secure small language model (SML) in C++/Python with robust guardrails on NVIDIA platform for US Army-specific use case.

• Led an AI engineering team in migrating AI workloads from NVIDIA GPUs to Intel platforms, successfully influencing sales strategies through comprehensive end-to-end TCO analysis. Habana Labs/Nervana – Intel, Chandler, AZ 01/2014 – 04/2020 AI/Deep Learning Software Engineer

Designed and optimized deep learning models and algorithms on Habana processors, enhancing AI performance and efficiency for scalable enterprise applications.

Cloud Computing

• Optimized NLP (RoBERTa) and Computer Vision (ResNeXt) workloads by implementing DSP/IA custom kernels in C++ for Spring Hill NNP, successfully deploying models in Facebook data centers using the PyTorch/ONNX framework and NNPI software stack to boost processing efficiency.

• Provided comprehensive end-to-end platform support for UEFI BIOS, CSME, PCode firmware, and Facebook's Linux Kernel with Root of Trust (ROT) boot integration for Deep Learning (DL) accelerator SoCs, while remotely leading a failure analysis team to ensure robust performance and reliability. Autonomous Driving (AD)

• AD Platform Architecture (Waymo/Cruise/Argo/Tesla) for Xeon Compute Board/AI SW Accelerator Architect the Camera interface board for Autonomous Driving (AD) on FPGA with C/C++ and Verilog.

• Analysis of ADAS CV workload on Xeon/SPH, develop IA/DSP kernels ops for DL models.

• Architect/Implementation of DPDK-NTB SW stack on Xeon for fail-safe AD usage model. TMG-Intel, Chandler, AZ 04/2008 – 12/2013

Hybrid FPGA Emulation/SW Engineer

Engineered and deployed hybrid FPGA emulation environments to validate and optimize software-hardware integration, accelerating development cycles for Intel's cutting-edge products.

• Design and Integration of UHFI SW stack with Simics functional simulator for Intel Hybrid platform.

• Integration of SystemC/TLM functional models for SoC Virtual Platform (VP).

• Design and implementation of USB 3.1 Virtual Device Transactor on FPGA using C++/Verilog.

• Integration of Agent IOSF XTOR with Base-IA (CPU/System Agent) for Virtex-7 FPGA.

• Design, implementation and simulation of IOSF-Sideband XTOR SW/HW, the XTOR was based on SCMI 2.0 interface.

• Emulated the RTL on zebu emulator with graphics model. UNO/NBI – Intel, Chandler, AZ

Software Engineer

Additional Relevant Experience

System Integration Solutions, Atlanta, GA

System Engineer/FAE

Transcom Engine Corp., Perth-Australia

Embedded Software Engineer/FAE

Education Qualifications

Master of Computer Science (MCS) in AI/Machine Learning Georgia Institute of Technology, Atlanta, GA

Bachelor of Science (B.S.) in Computer System Engineering Curtin University of Technology, Perth, Australia

Associate of Applied Science (A.A.S.) in Computer Science Community College TAFE, Perth, Australia

Professional Courses and Training

• Generative AI with Large Language Models

• Quantum Computing – QxQ-IBM

• Programming a Quantum Computing with Qiskit – Coursera

• Intel Artificial Intelligence Sales Champion

• AWS Cloud Practitioner Essentials

• Strategic Management Macquarie University Sydney-Australia US Patent

Engineering blogs

• P19800 - REMOTE USB NETWORK DEVICE CONTROL

• P19799 - REMOTE USB VIDEOPHONE COMMUNICATION

• P19798 - EMULATED UNIVERSAL SERIAL BUS INPUT DEVICES

• Optimize artificial intelligence BERT-based language apps

• Empower Applications with Optimized LLMs: Performance, Cost, and Beyond Customer Blogs

• Roblox

• Netflix

Contact this candidate