AI/ML Engineer - Vision, LLMs & Deployment Expert

Location:

Fairfax, VA

Posted:

April 07, 2026

Contact this candidate

Resume:

Pranav Koduru

571-***-**** Fairfax, VA *************@*****.*** LinkedIn GitHub

SUMMARY

AI/ML Engineer with 2 years of production experience across computer vision systems, LLM integration, and full stack AI deployment. Built and shipped systems spanning TensorRT optimized inference pipelines, QLoRA fine tuning on 7B parameter VLMs, and multi agent RAG architecture each running in Docker based production environments. MS Computer Science with hands on experience deploying on AWS, GCP and platforms like Railway, Vercel. SKILLS

• ML / Deep Learning: PyTorch, TensorFlow, Hugging Face Transformers, Scikit-Learn, OpenCV, Albumentations, PEFT/LoRA/QLoRA, Quantization, TensorRT

• LLMs and GenAI: OpenAI API, Lang Chain, LlamaIndex, Qdrant, Pinecone, RAG, Multi agent systems, Prompt Engineering, Embedding models, Anthropic API.

• Languages: Python, SQL, JavaScript

• MLOps and Infrastructure: Docker, Fast API, AWS, Weights & Biases, MLflow, Git, GCP, Microsoft Azure.

• Data & Analytics: Pandas, NumPy, Matplotlib, Plotly. EXPERIENCE

ATAI Labs May 2023 – Dec 2023

Machine Learning Engineer Hyderabad, India

• Built a synthetic data generation pipeline in Blender using 3D scene meshes to simulate rare failure cases, boosting downstream model accuracy by 14% and cutting down hours by 20% by generating 10K edge case frames not captured in production footage.

• Designed a CNN based anomaly classifier for live surveillance feeds, reducing manual review workload 70% and cutting mean time-to-alert from several minutes to 15 secs by replacing threshold-based model triggers with a ResNet based model trained on 5K labelled frames.

• Optimized a DeepLabV3+ segmentation pipeline for real-time scene element detection, reducing inference latency from 8 seconds to 1.5 seconds and achieving 60 FPS on a GPU production by converting the model using TensorRT and applying layer fusion and dynamic batch scheduling.

• Refactored and parallelized data ingestion workflows across a 5-person cross-functional team, increasing through by 38.5% while maintaining full pipeline reproducibility by migrating sequential I/O to async multiprocessing with Python, Asyncio and introducing validation checkpoints.

ATAI Labs Aug 2022 – May 2023

Machine Learning Intern(promoted to full-time) Hyderabad, India

• Trained pixel-level segmentation models for warehouse occupancy detection, achieving 89.2% classification accuracy on an 8- class held-out test set of 24K images by applying InceptionNet with custom class-weighted loss to address a class imbalance in the dataset.

• Built and curated a 60K+ image dataset for proprietary computer vision tasks, achieving 95% inter-annotator agreement and reducing label noise by 15% by designing a multi-stage review pipeline with automated outlier flagging using Scikit-learn and cross-validation by domain experts.

• Implemented continuous model monitoring to proactively detect and mitigate model drift, maintaining stable performance within 3– 5% variance in key metrics (mIoU, accuracy, F1 score) by tracking data distribution shifts, prediction confidence, and retraining triggers in a dynamic production environment.

• Designed a comprehensive image augmentation pipeline using Albumentations and OpenCV, improving model generalization and robustness and contributing to a 5–10% increase in validation performance under varied lighting, occlusion, and viewpoint conditions. PROJECTS

Personalized Learning Roadmap Platform

• Architected a full-stack RAG system with GPT-4, serving personalized learning roadmaps in 3–8s end-to-end with 85% relevance accuracy at under 100ms vector retrieval latency by chunking 6K curriculum documents into semantic units, indexing with all- MiniLM-L6-v2 embeddings, and applying MMR re-ranking to diversify retrieved context.

• Designed a dynamic knowledge graph of learning paths using React Flow, enabling exploration of prerequisite relationships and skill dependencies, improving user navigation and interpretability of roadmaps through structured node-link visualization.

• Deployed the production system with a React-based frontend (hosted on Vercel) and containerized backend services (Fast API

+ Qdrant), integrating React Flow for interactive roadmap visualization and DuckDuckGo API for real-time resource enrichment.

Vision-Language Model Defect Detection System

• Fine-tuned LLaVA-1.5-7B with QLoRA for manufacturing defect classification across 15 categories, achieving 91.6% accuracy and 83% recall on a ~5K-sample held-out test set while keeping GPU memory usage under 12 GB by applying 4-bit quantization with PEFT LoRA adapters (r=8, α=16) on a single RTX-class GPU, training for ~3–5 epochs with a custom defect-weighted cross-entropy loss.

• Designed a greedy-decoding inference pipeline using Hugging Face Auto Processor, processing images in under 3s per sample with

<32 tokens generated per prediction loading into a Fast API-compatible inference path, containerized via Docker with an eval workflow.

• Structured the full training lifecycle as a reproducible MLOps pipeline from dataset generation through checkpoint selection to evaluation enabling one-command reproducibility and experiment comparison via Weights & Biases across local GPU runs by packaging the project as an installable Python module, externalising all hyperparameters into a YAML config. AI Powered Personal Finance Platform

• Built a dual-provider LLM microservice routing across Claude and OpenAI over 10+ task types, dynamically selecting models by complexity and cutting estimated inference cost by ~60% vs. a single-provider approach.

• Designed a 3-layer memory system structured key facts, delta-compressed AI narrative, and a 5-message recency buffer maintaining conversation accuracy at constant token cost regardless of session length.

• Engineered a zero-API-cost keyword classifier across 8 domains and a dynamic prompt pipeline injecting only task-relevant DB context, reducing average prompt size by an estimated ~40% while improving response coherence.

• Implemented a 6-action agentic layer where Claude outputs validated structured JSON payloads driving real application actions SQL execution, UI rendering, and user management from a single natural language input. EDUCATION

• George Mason University Fairfax, Virginia

Masters in computer science 2025

Related Coursework: Data Mining, Machine Learning, Fundamentals of Artificial Intelligence, Analysis of Algorithms. GPA: 3.6/4.00

• Andhra University Visakhapatnam, India

Bachelors in Computer Science Engineering 2023

Related Coursework: Data Structures and Algorithms, Data Warehousing and Data Mining, Machine Learning, Image Processing, Object Oriented Software Engineering.

GPA: 3.3/4.0

• Certifications: Azure AI fundamentals (Microsoft), Google Cloud certifications (Google Developer), AI for All: GenAI Practices (NVIDIA), Langchain and Vector databases in Production (Activeloop), Retrieval Augumented Generation with Langchain and LlamaIndex (Activeloop).

Contact this candidate