Fireworks
–
Redwood City, CA, 94063
... learning * Strong understanding of RL fundamentals, including policy gradients, actor-critic methods, offline RL, and preference-based learning * Experience with reinforcement fine-tuning of LLMs (e.g., PPO, DPO, GRPO) * Experience building and ... - Sep 30