Full-Stack AI Engineer - LLM & RAG Expert

Location:

Jersey City, NJ

Posted:

May 13, 2026

Contact this candidate

Resume:

Sai Krishna Kokkula

Jersey City, NJ ***** 848-***-**** **********************@*****.*** LinkedIn GitHub Website PROFESSIONAL SUMMARY

Full-Stack & AI Engineer with 3+ years of experience building production-grade LLM applications, RAG pipelines, and scalable backend systems. Proven track record at Walmart and Centene delivering measurable performance gains 40% reduction in hallucination rates, 3 throughput improvement, and sub-200ms semantic search over 500K+ documents. Expert in Python (FastAPI), React/Next.js, LangChain, and cloud-native deployments. TECHNICAL SKILLS

AI / GenAI: LangChain, OpenAI API, RAG, Prompt Engineering, LLM Fine-tuning (LoRA), FAISS, ChromaDB, Qdrant, Semantic Search

Backend: Python, FastAPI, Flask, Node.js (Express), REST APIs, Microservices, SQLAlchemy, PostgreSQL, MongoDB, Redis

Frontend: React.js, Next.js, TypeScript, Redux Toolkit, Tailwind CSS, HTML5/CSS3

DevOps & Cloud: Docker, Docker Compose, GitHub Actions, GCP (GCS, Cloud Logging), Vercel, CI/CD

Security & Testing: JWT, OAuth2, RBAC, Rate Limiting, Jest, Mocha, Cypress PROFESSIONAL EXPERIENCE

Full-Stack AI Developer Walmart – Bentonville, AR Oct 2025 – Present AI-Powered Product Search & Recommendation Engine

Designed production RAG pipelines (LangChain + OpenAI API + FAISS/ChromaDB), cutting chatbot hallucination rates by 40%

Built semantic search over 500K+ documents using embedding models + FAISS, achieving <200ms query latency

Implemented hybrid dense sparse retrieval with reranking, improving top 5 search accuracy by 35%

Fine-tuned Llama 2/3 and Mistral on domain data using LoRA; served via FastAPI with streaming responses

Architected async FastAPI services handling 5K+ concurrent requests with PostgreSQL connection pooling; reduced DB load 60% via Redis caching

Migrated Node.js/Express + MongoDB API to FastAPI + PostgreSQL, achieving 3 throughput improvement

Optimized SQL queries (indexing, partitioning, CTEs), reducing p95 latency from 2s to 150ms

Built prompt playground with versioned templates and real-time token counting, used daily by 20+ AI engineers

Containerized full RAG stack with Docker Compose; set up GitHub Actions CI/CD reducing deployment time from 30 to 8 min Full-Stack Developer Centene Corporation, USA May 2024 – Sep 2025 Intelligent Customer Support Chatbot (RAG + LLM)

Built an end-to-end RAG-powered customer support assistant using LangChain + OpenAI API over a company knowledge base

Developed scalable full-stack applications with React, Next.js (SSR/App Router), FastAPI, and TypeScript with Redux Toolkit

Designed and maintained versioned RESTful APIs with async processing, proper validation, and JWT/OAuth2 security

Implemented Redis caching and PostgreSQL/MongoDB query optimizations, significantly reducing API response latency

Built and deployed microservices with Docker on GCP; automated CI/CD pipelines using GitHub Actions

Created document ingestion pipelines with embedding generation for knowledge base indexing using FAISS and ChromaDB Front-End Engineer Nisum – Hyderabad, India Dec 2022 – Nov 2023 Web Application Development (Full Stack with Python & React)

Built reusable React/Angular/Vue component libraries integrated with Python backends (Django, Flask, FastAPI) via REST APIs

Implemented state management (Redux, Context API), JWT/OAuth authentication, and responsive cross-browser UI

Improved page load performance via lazy loading, code splitting, and Webpack/Vite optimization

Wrote unit and integration tests (Jest, Mocha, Cypress); participated in Agile sprints, code reviews, and design handoffs EDUCATION

Pace University - Master's degree in computer science New York, Jan 2024 – Dec 2025

Contact this candidate