Sai Vivek Katkuri
AZ, USA +1-929-***-**** *****************@*****.*** LinkedIn GitHub
PROFESSIONAL SUMMARY
AI Engineer with significant expertise in designing scalable Generative AI solutions and optimizing models across various domains. Key projects include engineering intelligent layout suggestions and autonomous agents, improving user experience and efficiency. Skilled in real-time personalization and decision-making through advanced recommendation systems. Focused on leveraging Al expertise to foster innovative solutions supporting organizational growth.
TECHNICAL SKILLS
•Programming & Scripting: Python, SQL, R, TypeScript, Bash
•Machine Learning & Deep Learning: PyTorch, TensorFlow, Keras, Scikit-learn, Random Forest, XGBoost, LightGBM, CatBoost, AutoML, Isolation Forest, DBSCAN, LSTM, pandas, numpy, matplotlib
•Generative AI & Large Language Models: GPT-4, Hugging Face Transformers, LLaMA 2, Amazon Titan Text, LLaMA, Mistral, Claude (Anthropic), Instructional Fine-Tuning, Prompt Engineering
•LLM Fine-Tuning & Multi-Agent Systems: LangChain, LangGraph, CrewAI, LoRA, QLoRA, PEFT, Multi-Agent Co- ordination, Task Decomposition
•Retrieval-Augmented Generation & Vector Search: RAG Pipelines, Vector Indexing, Semantic Search, Context-Aware Reasoning, Contextual Retrieval
•MLOps & Deployment: FastAPI, Flask, Docker, Kubernetes, MLflow, Airflow, DVC, Prefect, Terraform, REST API Development, AWS SageMaker, GCP Vertex AI, Azure ML, Serverless (AWS Lambda), Local Deployment
•Data Management & Storage: pinecone, Neo4j, PostgreSQL, MongoDB, Redis, Amazon S3, MySQL, NoSQL, Knowl- edge Graphs, Graph Databases, Chroma DB, Qdrant, Milvus, Weaviate, PostgreSQL (vector-based)
•Time-Series Analysis & Model Explainability: Facebook Prophet, ARIMA, SARIMA, Holt-Winters, Bayesian Struc- tural Time Series (BSTS), Pandas, NumPy, Matplotlib, EDA, SHAP, LIME, Feature Engineering, Anomaly Detection, Model Evaluation
CERTIFICATIONS
● Microsoft Certified: Azure AI Engineer Associate PROFESSIONAL EXPERIENCE
Webflow Jan 2025 - Present
AI Engineer
•Designed and launched AI-driven features across 5+ Webflow modules, such as intelligent layout suggestions, automated SEO content, and dynamic form interactions, enhancing usability for over 100,000 users
•Architected and implemented retrieval-augmented generation (RAG) pipelines using Pinecone and ChromaDB, increasing content relevance in Al-generated blocks by 30%.
•Engineered and maintained knowledge graphs with Neo4j to power structured design intelligence, enabling accurate and explainable UI recommendations that boosted new user onboarding by 18%.
•Built autonomous Al agents for selecting templates, tagging themes, and assembling page components, decreasing initial site setup time by 40%.
•Refined full model lifecycle from training to deployment across 7 development and staging environments, ensuring 99.2% uptime and maintaining 92%+ prediction accuracy
•Led development of semantic search tools and AI-assisted design helpers, contributing to two successful product releases and directly influencing roadmap priorities through internal showcases.
•Conducted 6 cross-functional demos highlighting AI integration across design workflows, promoting adoption of genera- tive features and receiving strong feedback from early testers. Accenture Jan 2021 - Jul 2023
Machine Learning Engineer
•Applied LoRA techniques to optimize parameter efficiency during fine-tuning, cutting memory usage by 40% and reducing training time by 2.5x across 6 enterprise NLP models deployed in internal automation workflows.
•Fine-tuned a 65B parameter LLM using QLoRA on a single 48GB A100 GPU, achieving 15.8-bit precision equivalence and reducing projected infrastructure costs by over 30 lakhs annually for internal prototypes.
•Created over 50 structured prompt templates tailored to IT support and service desk scenarios, increasing model response relevance by 37% in pilot user evaluations and reducing manual escalation by 22%.
•Customized LLaMA 2 models using lightweight adaptation methods, boosting task-specific F1 scores by 18% and slashing retraining effort by 60%, saving 100+ engineering hours per model update cycle.
•Deployed and containerized model services via Docker across 3 staging environments, ensuring consistent runtime behav- ior and reducing environment-specific errors by 90% during integration testing.
•Benchmarked and enhanced inference latency from 680ms to 470ms (a 30% gain) and cut compute resource usage by 35%, helping maintain SLA targets for 24/7 real-time ticketing systems across 5 global clients.
•Collaborated with DevOps and QA teams to automate deployment pipelines for ML services, reducing manual intervention by 70% and enabling weekly rollouts for experimentation and updates. BizGaze Limited Jun 2019 - Jan 2021
Junior Machine Learning Engineer
•Assisted in developing and validating machine learning models (Random Forest, XGBoost, LSTM) on over historical server logs, improving resource forecasting accuracy and reducing over-provisioning by 20%, enhancing load balancing across 50+ servers.
•Processed and transformed 2.1 million+ log entries using Pandas and NumPy to create structured datasets for time-series modeling, resulting in 30% faster data pipeline execution for training tasks.
•Contributed to real-time anomaly detection using Isolation Forest and DBSCAN, which led to a 15% decrease in unplanned outages by flagging irregular usage patterns before failure events.
•Improved model performance through hyperparameter tuning with GridSearchCV and Bayesian Optimization, achieving
12% lower MAE and 18% reduction in RMSE across three test environments.
•Helped build and expose prediction services via RESTful APIs using Flask, supporting 500+ real-time infer- ence requests per day with minimal latency.
PROJECTS
AI-Powered Adaptive Course Generation System
•Built a personalized learning engine using CrewAI, LLMs, and RAG to generate real-time adaptive courses based on learner profiles and web trends.
•Unified Pinecone, PostgreSQL, and SerpAPI for long-term memory, semantic search, and up-to-date content delivery, deployed via AWS Lambda.
Agentic RAG-Powered Mentor Recommendation System
•Designed a multi-agent mentor matching system with CrewAI, LangChain, and RAG, tailoring recommendations to user interaction history.
•Enriched discovery using Knowledge Graphs and vector search (Pinecone, Chroma DB) for contextual, skill-linked mentor suggestions.
AI-Powered Code Review API https://github.com/vivekprojects-GIT/AgenticAI/tree/main/code_review_service
•Architected and deployed a production-ready RESTful API service using FastAPI that automates intelligent code review, providing developers with instant feedback on algorithmic efficiency, security vulnerabilities, and industry best practices
•Integrated Google Gemini LLM (gemini-1.5-flash) to deliver context-aware, multi-language code analysis with 12 stan- dardized feedback categories including efficiency analysis, algorithmic suggestions, maintainability assessments, and se- curity recommendations
•Designed robust API infrastructure with /review-code POST endpoint featuring comprehensive input validation using Pydantic models, structured JSON responses, and proper error handling for malformed requests
•Developed comprehensive testing framework using pytest with complete coverage for successful requests, validation errors, and edge cases, ensuring reliability and maintainability of core functionality
•Implemented production-grade features including OpenAPI/Swagger documentation at /docs endpoint, structured logging for all requests and model responses, and language-specific analysis capabilities
•Validated system performance through extensive testing with multiple programming languages and algorithms
(including inefficient implementations like bubble sort), confirming consistent delivery of all expected feedback fields
•Impact: Enables automated code quality assessment for development teams and external clients, reducing manual review overhead while maintaining high code standards
•AI-Powered Code Review Python, FastAPI, Google Gemini LLM, pytest AI Coding Support Chatbot https://github.com/vivekprojects-GIT/AgenticAI/tree/main/coding_support_bot
•Architected and deployed an intelligent conversational AI system using LangGraph orchestration framework that provides contextual coding guidance and hints without revealing complete solutions, enhancing users' problem-solving capabilities
•Engineered multi-node conversation flow with specialized LangGraph nodes for clarification requests, hint generation, general support, and safety filtering to ensure educational value while preventing direct answer disclosure
•Integrated Google Gemini Flash model via Google AI API for fast, cost-effective responses while maintaining conversa- tional context across multiple interaction turns using LangChain memory management
•Implemented robust safety guardrails and filtering mechanisms that intelligently block direct code solution requests while allowing constructive guidance, ensuring the chatbot maintains its educational purpose
•Deployed production-ready service using LangGraph Platform with langgraph deploy command, providing accessible endpoint URL for seamless integration with external applications and testing environments
•Established comprehensive project structure using langgraph init scaffolding with documented API structure, usage guide- lines, and deployment procedures for maintainability and team collaboration
•Impact: Enables scalable coding education support, allowing developers to receive personalized guidance while building independent problem-solving skills through contextual assistance rather than direct solutions
•AI-Powered Coding Support Chatbot LangGraph, Gemini Flash, LangChain, Google AI API EDUCATION
Southern Arkansas University May 2025
Master of Science, Data Science
Kim's Degree & PG College Sep 2021
Bachelor of Science, Mathematics, Statistics & Computer Science