Post Job Free
Sign in

AI/ML Engineer - LLMs, RAG, MLOps, Production AI

Location:
United States
Salary:
85000
Posted:
April 30, 2026

Contact this candidate

Resume:

AGRIMA JAIN

AI/ML ENGINEER

Location: BOSTON, MA

Phone No: +1-551-***-**** Email: : **************@*****.*** Linkedin SUMMARY

AI/ML Engineer with 4 years of experience building scalable, production-grade systems across Healthcare, Media & Speech Analytics, and Retail/E-commerce domains. Specialized in LLMs, RAG pipelines, and ensemble models (XGBoost, ClinicalBERT), delivering 15–18% accuracy improvements in healthcare AI applications. Strong expertise in end-to-end ML pipelines and MLOps (AWS SageMaker, MLFlow, GitHub Actions), with experience deploying low latency microservices using Docker, FastAPI. Proven track record in speech analytics and ASR using Whisper, Wav2Vec2, with solid foundations in Python, PyTorch, Hugging Face, SQL.

SKILLS

• Programming: Python, SQL, JavaScript

• Web Technologies: HTML, CSS, JavaScript

• Machine Learning & AI: XGBoost, Scikit-learn, ClinicalBERT, Ensemble Models, NLP, LLMs, RAG Pipelines

• Deep Learning & NLP: PyTorch, Hugging Face, Whisper, Wav2Vec2

• MLOps & Deployment: AWS (SageMaker, Step Functions), MLflow, GitHub Actions, CI/CD

• APIs & Frameworks: FastAPI, REST APIs

• Data & Analytics: Pandas, EDA, Feature Engineering, Matplotlib, Power BI, Tableau

• Tools & Platforms: Docker, PostgreSQL

WORK EXPERIENCE

AI/ML ENGINEER CODAMETRIX, BOSTON, MA SEP 2025 – PRESENT Project: Automated Medical Coding Platform using AI/ML Description:

Developed a production-grade AI/ML system for automated medical coding and billing using stacked ensembles and RAG pipelines on FHIR data, improving accuracy, reducing manual effort, and accelerating revenue cycle workflows. Responsibilities:

• Designed and deployed end-to-end AI/ML systems for automated medical coding, processing thousands of real-time patient records daily from FHIR-compliant data sources.

• Built a stacked ensemble model combining XGBoost and ClinicalBERT embeddings, with a simple meta-learner, improving prediction accuracy by 15–18% over single models.

• Implemented out-of-fold (OOF) stacking pipelines to ensure proper validation, reduce overfitting, and maintain stable performance in production.

• Developed a RAG pipeline using LLM APIs (OpenAI/Claude) on top of structured + unstructured clinical data to improve contextual understanding in edge cases.

• Engineered NLP pipelines to process large-scale clinical notes (300K+ documents), generating embeddings and features for downstream models.

• Deployed models as Dockerized microservices with low-latency inference (120–200 ms), enabling near real-time billing and prediction workflows.

• Automated training and retraining workflows using AWS SageMaker and Step Functions, enabling continuous model updates with new data.

• Used MLflow for experiment tracking and model versioning, ensuring reproducibility across development and production environments.

• Built CI/CD pipelines using GitHub Actions and Docker, enabling reliable and frequent deployments with testing and rollback support.

FULL STACK AI ENGINEER INTERN AMWELL, BOSTON, MA MAY 2025 – AUG 2025 Project: AI-Powered Healthcare Scheduling & RAG Platform Description:

Built a scalable LLM-based scheduling platform using RAG and agentic workflows to improve provider matching and automate triage. Developed low-latency NestJS microservices with Redis and deployed via AWS with CI/CD-enabled MLOps pipelines. Responsibilities:

• Built AI-driven RAG systems and LLM-powered scheduling intelligence for Blue Cross Blue Shield’s virtual care platform, improving provider matching accuracy and reducing query latency.

• Engineered LLM-enabled backend services using NestJS and Node.js with Redis caching and microservice architecture, optimizing inference latency and ensuring high system reliability.

• Designed agentic AI workflows leveraging LLM reasoning for automated appointment triage and rescheduling, minimizing manual intervention and improving operational efficiency.

• Developed deep learning–based anomaly detection models (Autoencoders) to monitor scheduling patterns and stabilize predictions across high-volume clinical data streams.

• Automated MLOps and AI deployment pipelines using AWS Lambda, SageMaker, Terraform, and CI/CD, enabling scalable model deployment, monitoring, and continuous improvement. SOFTWARE ENGINEER HAPPYMONK AI LABS, BENGALURU, INDIA JAN 2021 – DEC 2023 Project 2: Speech Analytics & Transcription Platform Aug 2022 – Dec 2023 Role: Software Engineer – AI/ML Engineer

Description:

Worked on building a speech analytics system to process and analyse audio data for transcription and insights generation across enterprise use cases.

Responsibilities:

• Developed and maintained speech-to-text pipelines processing 10K–20K audio files/day, ensuring stable and reliable performance.

• Fine-tuned pre-trained ASR models (Whisper/Wav2Vec2) on domain-specific data, improving transcription quality and reducing errors.

• Built NLP modules for sentiment analysis and keyword extraction to support business insights.

• Designed and deployed REST APIs using FastAPI for model inference and integration with internal applications.

• Optimized audio preprocessing (audio cleaning, segmentation), leading to improvements in transcription consistency.

• Worked on containerizing services using Docker and assisted in deployment on cloud environments.

• Monitored model outputs and logs to identify issues like data noise, edge-case failures, and performance bottlenecks.

Project 1: License Plate Detection & OCR-Based Recognition System Jan 2021 – Jul 2022 Role: Associate Software Engineer –AI/ML Engineer

Description:

Developed a real-time computer vision system to detect vehicle license plates and extract text from images and video streams using YOLOv5 and Tesseract OCR.

Responsibilities:

• Worked with a large dataset of ~300K images and video frames, performing exploratory data analysis (EDA) to understand data distribution and identify quality issues.

• Prepared and annotated datasets in YOLOv5 format, ensuring accurate bounding boxes for license plates and vehicles.

• Built a preprocessing pipeline using OpenCV, including image resizing, normalization, and augmentations (rotation, scaling, brightness) to improve model generalization.

• Trained and fine-tuned a YOLOv5 (PyTorch) model for license plate detection, achieving around 93.5% mAP@0.5 on validation data.

• Reduced false positives by ~13% through confidence threshold tuning, improved preprocessing, and better training data quality.

• Developed a real-time inference pipeline for video streams, improving latency and detection throughput.

• Applied confidence thresholding and Non-Max Suppression (NMS) to eliminate overlapping detections and improve prediction quality.

• Integrated Tesseract OCR to extract structured alphanumeric text from detected license plates.

• Improved OCR performance using preprocessing techniques such as grayscale conversion, noise reduction, and contour detection.

• Evaluated model performance using IoU, precision, and recall, and iteratively refined the system. EDUCATION

Master of Science, Information Systems Dec 2025

Northeastern University Boston, MA

Bachelor of Technology, Bioinformatics May 2021

Dr. D Y Patil Vidyapeeth Pune, IN

ACADEMIC PROJECTS

Cloud-Native Web Application with IaC

• Built a scalable full-stack web app with React, Spring Boot, AWS (EC2, RDS, S3), improving media delivery by 40% and achieving 90% Lighthouse accessibility.

• Automated cloud infrastructure with Terraform and GitHub Actions, reducing environment setup time by 60%. MedFlow – AI Healthcare Intake Chatbot

• Designed a React and Node.js intake chatbot for clinics that used Claude and OpenAI with LangChain and vector search to answer billing and policy questions in real time, helping patients self-serve common queries.

• Implemented guardrails such as PII redaction, emergency intent checks, and output validation, and stored structured summaries in Postgres and Redis, reducing manual data entry and triage effort for staff.



Contact this candidate