Job Overview & Responsibilities:
This role focuses on designing, building, and scaling production-grade Retrieval-Augmented Generation (RAG) systems and Agentic AI solutions. The position owns the full lifecycle of RAG pipelines—from data ingestion and indexing to retrieval, reranking, and generation—ensuring high relevance, performance, and reliability.
Key responsibilities include integrating and optimizing vector databases, improving chunking and embedding strategies, and building robust evaluation frameworks to measure retrieval accuracy and response quality. The role also develops AI Agents and multi-agent workflows, enabling dynamic reasoning, tool/function calling, and serverless integrations.
In addition, the position designs and implements Model Context Protocol (MCP) servers, exposing internal tools, APIs, and datasets to LLMs and integrating them across frontend, backend, and model layers. The role contributes to system architecture and MLOps, including model serving infrastructure, monitoring, observability, and cost/performance optimization for LLM inference.
Required Skills & Experience:
3–5 years of experience in machine learning, deep learning, advanced analytics, or applied GenAI
Strong proficiency in Python (FastAPI/Flask); NodeJS is a plus
Hands-on experience building production-level RAG systems
Deep understanding of LLMs, embeddings, vector databases, and reranking techniques
Practical experience with LangChain, LlamaIndex, LangGraph, or similar frameworks
Familiarity with Docker, Linux, and basic DevOps/MLOps
Strong system thinking, debugging, and problem-solving skills
Excellent communication and presentation abilities, able to explain complex AI concepts to both technical and non-technical audiences
Fluent in written and spoken English
Teaching and mentoring mindset, supporting junior engineers and cross-functional stakeholders
Preferred Qualifications:
Experience in client training, presales support, solution presentations, or creating AI learning materials (documentation, slides, demos)