Sai Teja, Narra Venkata
AI Machine Learning Engineer
Fort Worth, Texas • 940-***-**** • *******************@*****.*** • LinkedIn
Professional Summary
AI & Machine Learning Engineer with 4+ years of experience building scalable Generative AI systems and Agentic Workflows.
Advanced expertise in and KG-RAG/GraphRAG architectures, utilizing knowledge graphs to model complex data relationships and enhance retrieval precision.
Extensive knowledge in architecting stateful, HITL, multi-step agentic workflows using LangGraph.
Comprehensive understanding of the full SDLC for AI products, driving solutions from requirement gathering and system design to automated testing and CI/CD deployment.
Technical Skills
●Programming & Web Technologies: Python, JavaScript, C++, FastAPI, Flask, React.js, Next.js, TailwindCSS, Axios, Git
●ML & AI Frameworks: Scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, Hugging Face, ONNX, TensorRT, OpenAI API, LangChain, LangGraph, FAISS
●Data Engineering & Pipelines: ETL Pipelines, Databricks, Spark, PySpark, Airflow, Kafka, Ray, n8n
●Databases: PostgreSQL, MongoDB, Snowflake, PgVector, Pinecone, Neo4j
●Cloud & MLOps: Azure (AI Foundry, AI Services, DevOps), AWS (S3, SageMaker, Lambda, Bedrock, Glue), GCP (BigQuery, Vertex AI, Google ADK), MLflow, ClearML, Docker, Kubernetes, KServe, GitHub Actions, Jenkins
●Monitoring & DevOps Tools: Prometheus, Grafana, Splunk
●Visualization & Reporting: Tableau, Plotly, Matplotlib, Seaborn
●NLP & LLM Systems: spaCy, Hugging Face Transformers, OpenAI API, RAG pipelines, KG-RAG, vector search, LLM orchestration, A/B Testing.
●Project & Team Collaboration: Agile Scrum, Jira, Confluence, Slack, Microsoft Teams, GitHub Projects
Professional Experience
AI Software Engineer
U.S Bank – Remote, United States Jul 2024 – Present
Domain: Finance & Risk Analytics
●Designed POC AI Agents using agent-to-agent (A2A) architectures to automate complex financial workflows, reducing manual processing time by 30%.
●Architected stateful, multi-step agentic workflows using LangGraph and LangChain with HITL validation, improving task completion reliability for internal pilots.
●Developed RAG architecture patterns to ground agents in real-time data, increasing response accuracy by 25%.
●Engineered scalable feature pipelines to process 1M+ daily documents and managed FAISS vector indexing for RAG services.
●Deployed internal pilot GenAI apps on Microsoft Azure AI Foundry, architecting an MCP server with Pydantic for schema validation to standardize context requests between agents and Azure AI Services.
●Optimized RAG retrieval by implementing hybrid search and GraphRAG with new chunking strategies, cutting retrieval latency by 20%.
●Implemented embedding caching to reduce API costs by 15% and configured Prometheus/Grafana to monitor token cost and latency.
Environment: Python, Scikit-learn, Airflow, MLflow, Azure ML, Kafka, Grafana, PostgreSQL, Docker, GitHub Actions, Azure DevOps, AKS
Python Developer
Cisco Systems – Bengaluru, India Aug 2021 – Jul 2023
●Contributed to clinical NLP and risk scoring pipelines, improving diagnostic accuracy by 12% on 2M+ patient records.
●Supported fine-tuning of 3 LLMs with Hugging Face, reducing inference error by 9%.
●Engineered modular pipelines in Airflow/ClearML to automate 30+ weekly experiments, doubling reproducibility.
●Integrated OCR pipelines (using Tesseract-OCR) to digitize scanned medical intake forms, unlocking unstructured data for downstream NLP analysis.
●Designed RPA workflows to automate data reconciliation between legacy systems and modern EHRs, reducing manual data entry time by 30%
●Prototyped RAG retrieval with PgVector and FAISS, cutting latency by 10%.
●Collaborated with 3 teams to evaluate model fairness, flagging bias in 17% of outputs.
●Built NER pipelines in Python using spaCy to extract ICD/CPT codes from discharge summaries; achieved 91% F1-score on test data.
●Developed Grafana dashboards by querying model logs via RESTful APIs and Python scripts to monitor drift and prediction quality across 5 hospital sites.
●Wrote Python ETL scripts to ingest and normalize clinical data (sourced from legacy HL7 V2 and modern FHIR feeds) from Snowflake into pandas-based pipelines, enabling 360 patient views for risk prediction.
Environment: Python, Scikit-learn, XGBoost, Airflow, ClearML, Hugging Face, PgVector, FAISS, LangChain, LangGraph, Docker, Snowflake Junior Software Engineer
Cisco Systems – Remote, India Jun 2021 – Jul 2021
●Built 4 REST APIs with Flask/FastAPI to support EHR data exchange and JWT-based patient access, handling 100K+ monthly API calls.
●Developed 3 dashboards in Plotly Dash/Grafana to visualize clinical KPIs (e.g., readmission rates, appointment no-shows), saving 50+ hours/month in manual reporting.
●Containerized services using Docker and GitHub Actions CI/CD, reducing deployment cycles by 45%.
●Experimented with YOLO pose estimation models for pose estimation in medical imaging workflows as a POC, achieving real time improvement in local inference speed using PyTorch on CUDA.
●Developed clinician-facing UI modules in Next.js and TailwindCSS to streamline access to lab results and visit summaries, reducing information retrieval time by ~25% per patient session.
●Integrated Prometheus + Alertmanager to monitor API health and flag anomalies in lab data ingestion, cutting debugging time by 35%.
Environment: Python, React.js, FastAPI, Flask, Next.js, PostgreSQL, Docker, GitHub Actions, Prometheus, TailwindCSS, PyTorch, CUDA, Telegram API.Personal Projects.
College Admissions Guide: Collaborating with senior engineers from Infosys and Verizon on an agent-based college admissions guide using LangGraph, LangChain, and Postgres RAG, with reviewer agents, web search, and a React UI delivering personalized, up-to-date recommendations
Automated safety surveillance system: Designed and deployed a real-time vehicle crash detection system using YOLOv8 with PyTorch cuda and cudnn on GPU, achieving 3–4x faster inference over CPU-based YOLOv4 benchmarks. Integrated Telegram API to trigger instant alerts on detected events
Agentic AI Assistant: Built a production-ready Agentic AI Assistant using FastAPI, OpenAI API, RAG, and PgVector, enabling real-time task execution with PDF ingestion and 90% goal completion across 1,000+ queries
Knowledge Distillation: Compressed a 33.9M parameter teacher model with 80.5% accuracy on the CIFAR-10 dataset into a 6.3M student model while maintaining 78% accuracy using Logit based distillation.
Sentinal911: Built a FastAPI backend to analyze 911 call audio using Whisper model, scoring urgency/deception on 0-1 scale and flagging fake swatting with 85%+ test accuracy. Developed a Next.js frontend to display 10+ real-time metrics per call, including dispatcher fatigue, drop rate, and anomaly flags
Education
University of North Texas, Denton, TX
Master of Science in Artificial Intelligence, 3.9 GPA
Certifications
• AWS – Certified Machine Learning Engineer Associate
• NVIDIA – Building RAG Agents with LLMs
• NVIDIA – Generative AI with Diffusion Models