Our hiring process:
We thoroughly review applications against our job requirements, ensuring each candidate receives personal attention from our experienced recruiters. We believe individual assessments are essential for recognizing unique talents.
Successful applicants may be invited to submit a video interview for review by the hiring manager, typically followed by a test or short project to gauge fit with our team.
If you progress further, you will be invited to an interview with our hiring manager and/or the interview team. Please note, we only conduct interviews face-to-face or via Zoom for a personal touch.
Finally, if our interests align, we will discuss your offer after a live conversation.
At INKHUB, we are committed to ingesting 10 million raw PDFs to create the internet's most comprehensive catalog of marketing-grade B2B content — all tagged, summarized, and easily searchable by topic, company, or intent.
What You'll Do:
Manage the ETL pipeline process, transforming raw PDFs into structured resources.
Optimize our summarization and classification flow utilizing open-source models, including GPT-4 as a fallback.
Implement quality filtering logic to maintain high standards (e.g., age of documents, page count).
Map assets to a detailed topic taxonomy, covering over 9,000 topics.
Create dense embeddings using sentence-transformers.
Load and query embeddings with Milvus or pgvector.
Establish 'freshness' logic for indexing new or updated content based on various criteria.
Develop a QA and evaluation harness focusing on compliance and monitoring.
Expose semantic search capabilities via FastAPI with advanced filtering and ranking options.
Collaborate closely with our Tech Lead to enhance UX integration and snippet generation.
Your Toolbox:
Proficiency in Python, PyTorch, sentence-transformers, and OpenAI APIs (or similar pretrained LLMs).
Familiarity with FastAPI, Milvus or pgvector, PyPDF/Tika, and orchestration tools like Airflow or Lambda.
Experience with Docker, GPU scheduling, and SQL (Athena/Redshift).
You Might Be a Fit If...
You have built real ML pipelines that impact users beyond theoretical models.
You possess experience in semantic search, embeddings, or extensive tagging systems.
You enjoy tackling unstructured data and transforming it into understandable formats.
You thrive in fast-paced environments, iterating quickly based on feedback and tracking meaningful metrics.
Why This Role Matters:
In this role, your models will define the relevance and freshness of over a million resources and 50,000+ company pages. Your work ensures INKHUB maintains its edge in the industry!