We are looking for a Document Extraction and Inference Engineer with expertise in traditional machine learning algorithms and rule based NLP techniques.
The ideal candidate will have a strong foundation in document processing, structured data extraction, and inference modeling using classical ML approaches.
You will work on designing, implementing, and optimizing document extraction pipelines for various applications, ensuring accuracy and efficiency.
Key Responsibilities Develop and implement document parsing and structured data extraction techniques.
Utilize OCR (Optical Character Recognition) and pattern-based NLP for text extraction.
Build Foundational Models (NOT LLM's) to solve inference problems Optimize rule based and statistical models for document classification and entity recognition.
Design feature engineering strategies for improving inference accuracy.
Work with structured and semi-structured data (PDFs, scanned documents, XML, JSON). Implement knowledge based inference models for decision making applications.
Collaborate with data engineers to build scalable document processing pipelines.
Conduct error analysis and improve extraction accuracy through iterative refinements.
Stay updated with advancements in traditional NLP and document processing techniques.
Required Qualifications Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related field.
3+ years of experience in document extraction and inference modeling.
At least 5+ years of overall experience Strong proficiency in Python and ML libraries (Scikit-learn, NLTK, OpenCV, Tesseract). Expertise in OCR technologies, regular expressions, and rule-based NLP.
Experience with SQL and database management for handling extracted data.
Knowledge of probabilistic models, optimization techniques, and statistical inference.
Familiarity with cloud-based document processing (AWS Textract, Azure Form Recognizer). Strong analytical and problem-solving skills.
Preferred Qualifications Experience with graph based document analysis and knowledge graphs.
Knowledge of time series analysis for document-based forecasting.
Exposure to reinforcement learning for adaptive document processing.
Understanding of the credit / loan processing domain.
Location: Chennai, India