Job Description
Join Our Talented Team at Protagonist
We fuse rigorous, methodologically sound analysis with our cutting-edge technology platform, Narrative Analytics®. This powerful combination enables us to quantitatively analyze open-source media, deliver strategic recommendations, and craft executive-level communication strategies for clients with missions that matter.Why Us?
Our team is a vibrant mix of communication specialists, data scientists, and subject matter experts with extensive experience across U.S. Government agencies, non-profit organizations, and Fortune 500 companies. By joining Protagonist, you'll immerse yourself in a collaborative environment where innovation thrives, and your contributions truly matter.What We Do
Innovative Solutions: We co-develop cutting-edge solutions with our clients to address tough communication problems and capitalize on opportunities to make a tangible impact.
Data-Driven Insights: Our tools and methodologies provide actionable insights that help clients meet their communication objectives and stay ahead of global challenges.
Applied Expertise: We integrate our solutions within client organizations, leveraging our profound expertise to address critical issues and ensure sustainable success.Be Part of Something Bigger
At Protagonist, you'll work on compelling projects that make a real difference. We seek talented individuals eager to contribute to our mission and grow alongside us. If you're passionate about communication, data analysis, and making an impact, we invite you to explore a career with Protagonist.Explore Your Future with Us!
Ready to take the next step in your career? Join us at Protagonist and be part of a team that's making a difference.
About You
The PhD Machine Learning Intern has a passion for cutting-edge AI research and its practical applications in narrative intelligence. You will play a key role in advancing our GEN5 System through the development and optimization of state-of-the-art Retrieval Augmented Generation (RAG) architectures. You are deeply familiar with the latest developments in large language models, vector databases, and information retrieval systems. You thrive on solving complex technical challenges at the intersection of NLP research and production systems, and you're excited about translating academic insights into real-world impact.
Primary Responsibilities
During this internship, you will focus on research, development, and implementation of advanced RAG systems for our GEN5 platform. You will work closely with our Senior Machine Learning Engineers, Data Scientists, and VP of Technology to push the boundaries of what's possible in narrative analytics through intelligent information retrieval and generation.
Specific Responsibilities
Research and implement novel RAG architectures optimized for multi-modal narrative data processing
Design and develop advanced retrieval mechanisms using dense and sparse vector representations
Experiment with hybrid search approaches combining semantic similarity and keyword-based retrieval
Optimize embedding models and vector databases for large-scale narrative content indexing
Develop and evaluate chunking strategies for complex, multi-document narrative datasets
Implement and fine-tune reranking models to improve retrieval precision
Design evaluation frameworks for RAG system performance, including relevance, faithfulness, and narrative coherence metrics
Collaborate with the Data Science team to integrate RAG capabilities into existing narrative detection pipelines
Conduct experiments on prompt engineering and context optimization for improved generation quality
Research and implement techniques for handling multi-language and cross-cultural narrative content
Contribute to research publications and technical documentation of methodologies and findings
Present research progress and findings to cross-functional teams and stakeholders
Requirements
Currently pursuing a PhD in Computer Science, Machine Learning, Natural Language Processing, or related field
Authorized to work in the US
Must be able to work on US Government contracts that may be restricted to US persons
Strong theoretical foundation in machine learning, deep learning, and natural language processing
Hands-on experience with transformer architectures, large language models, and embedding models
Proficiency in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face)
Experience with vector databases and similarity search systems (Pinecone, Milvus, FAISS, OpenSearch, PGVector)
Knowledge of information retrieval concepts and evaluation metrics
Experience with distributed computing and large-scale data processing
Strong research and analytical skills with ability to work independently
Excellent communication skills and ability to present complex technical concepts clearly
Preferred Qualifications
Published research in RAG systems, information retrieval, or related NLP areas
Experience with multi-modal learning and cross-lingual NLP
Knowledge of knowledge graph construction and reasoning
Familiarity with narrative analysis or social media data processing
Experience with A/B testing and experimental design for ML systems
Background in computational social science or digital humanities
What You'll Gain
Hands-on experience applying cutting-edge AI research to real-world problems with societal impact
Opportunity to work with large-scale, diverse datasets spanning global narratives
Mentorship from experienced ML engineers and data scientists
Exposure to production ML systems serving enterprise clients
Potential for research publication and conference presentations
Experience in a fast-paced, mission-driven startup environment
Pay rate for this position is $32.00 per hour and expected duration is 4 months.
Protagonist is an Equal Opportunity Employer.
Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.