Onsite required Tues/Wed/Thurs
We are looking for a Software Development Engineer to build and scale an AI-powered document parsing platform that extracts structured data from complex PDFs (pharmaceutical batch records, certificates, regulatory documents) using OCR, LLMs, and RAG. You will work across the full stack - backend AI pipelines, frontend chat interface, and cloud infrastructure.
Roles & Responsibilities
Design and develop production-grade RAG (Retrieval-Augmented Generation) pipelines for domain-specific document querying with hybrid search, reranking, and multi-agent answer synthesis
Build and optimize document processing pipelines using AWS Textract for OCR extraction from tables, handwritten content, and structured forms
Integrate and orchestrate multiple LLM models (Claude, Gemini) for intent classification, data extraction, validation, and conversational AI
Develop and maintain the FastAPI backend - REST APIs, streaming endpoints (SSE), authentication, and background task processing
Build responsive frontend features using Next.js, React, and TypeScript - chat interface, PDF viewer with highlights, real-time progress tracking
Manage cloud infrastructure on AWS - EC2 deployment, S3 storage, RDS (PostgreSQL), and IAM configuration
Work with vector databases (Weaviate) and graph databases (Neo4j) for semantic search and structural document querying
Implement chunking strategies, embedding generation, cross-encoder reranking, and semantic caching for accurate document retrieval
Deploy and monitor AI models and services in production - model fallback chains, retry mechanisms, error handling
Write clean, maintainable code with proper logging, error handling, and documentation Required Skills
Python (FastAPI, async programming, pandas)
TypeScript / React (Next.js)
RAG systems - vector search, embeddings, chunking, reranking (production-grade)
LLM integration - prompt engineering, structured output, multi-model orchestration
WS - EC2, S3, Textract, RDS
PostgreSQL
REST API design with streaming (SSE)
Git, basic CI/CD, Linux server management Good to Have
Weaviate, Neo4j, or similar vector/graph databases
Gemini Vision or GPT-4V for document image analysis
LangChain / LangGraph
Docke, nginx
Pharmaceutical/regulated document experience Experience:
3-6 years The benefits that you are eligible for with Collins Consulting, Inc:
401(k)
Medical, Dental and Vision Insurance
Term Life Insurance
ccidental Death and Dismemberment
Long Term Disability