Purbesh Mitra
240-***-**** # ******@***.*** ï linkedin.com/in/purbeshmitra § GitHub Website
Summary
ML research engineer experienced in building LLM post-training pipelines (RLVR, self-distillation, LoRA SFT), distributed inference systems, and model compression for edge deployment. Published at IEEE venues; open-source projects with trained models on Hugging Face. Seeking ML Engineer and Applied Research Scientist roles. Industry Experience
MediaTek USA May 2024 – August 2024
Research Intern, Software R&D Division San Jose, CA
• Built deep learning coordination models for WiFi-8 multi-access point environments; scaled system to 2 orders of magnitude in action-spatial configurations.
• Applied knowledge distillation and quantization to compress trained model by 16 size reduction with 1% performance loss, enabling edge device deployment.
Research Leadership
US Army Research Lab Project 2024 - Present
Technical Lead Adelphi, MD
• Leading development of agentic AI framework for RF data applications for ARL collaboration. Designing end-to-end system integrating LLM-based agents with domain-specific tools. Supervising junior researchers at UMD. Projects
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs [GitHub] 2025
• Built a GRPO-based reinforcement learning fine-tuning pipeline for multi-iteration LLM reasoning using Unsloth, TRL, and vLLM; outperformed standard GRPO on MATH500 and AIME24 benchmarks. Open sourced trained models on Hugging Face.
Semantic Soft Bootstrapping [GitHub] 2025
• Developed an RL-free self-distillation post-training method using offline logit matching with LoRA adapters on Qwen2.5-3B; improved math reasoning on GSM8K and MATH500 with only 256 training samples. Open sourced trained model and dataset on Hugging Face.
Distributed Mixture-of-Agents for Edge LLM Inference [GitHub] 2024
• Implemented a distributed inference framework enabling collaborative LLM inference across edge devices with bounded latency guarantees; demonstrated accuracy–latency trade-offs for resource-constrained environments. Decentralized Federated Learning with Gossip Protocols 2023 – 2024
• Designed decentralized learning algorithms with provable O(1) convergence guarantees; built Bayesian optimization pipeline for fair scheduling in sparse gossip networks. SOLOgenBench LLM Benchmark [GitHub] 2025
• Built an LLM benchmark resistant to training data contamination using generated word lists; evaluated 30+ models
(GPT-4, Claude, Gemini, Grok, Llama, DeepSeek, Qwen) with rules-based Python evaluation; each run costs <$0.05. Computer Vision for NEMS Fabrication Indian Institute of Science Summer 2017
• Built computer vision pipeline for automated monolayer detection; designed Arduino-based microscope automation system integrating image processing and mechanical control. Education
University of Maryland, College Park 2020 – Present M.S. 2024, Ph.D. in Electrical and Computer Engineering College Park, MD
• Advisor: Prof. Sennur Ulukus. Focus: LLM post-training, reinforcement learning, distributed ML Indian Institute of Technology Delhi 2018 – 2020
M.Tech. in Electrical Engineering New Delhi, India Jadavpur University 2014 – 2018
B.E. in Electronics and Telecommunication Engineering Kolkata, India Technical Skills
ML/AI: PyTorch, Hugging Face Transformers, TRL, Unsloth, vLLM, LoRA/QLoRA, GRPO, Knowledge Distillation, Bayesian Optimization, Reinforcement Learning, Agentic AI, Tool Calling, Probability Theory Programming: Python, C, MATLAB
Tools & Infrastructure: HPC cluster, Hugging Face Hub (Models & Datasets), AWS SageMaker, Claude Code, Cursor Selected Publications
P. Mitra, S. Ulukus, “Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without RL,” under review [Link] P. Mitra, S. Ulukus, “MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs,” IEEE ICMLA, 2025 [Link] P. Mitra, P. Kaswan, S. Ulukus, “Distributed Mixture-of-Agents for Edge Inference with LLMs,” IEEE PIMRC, 2025 [Link] Full list of 11 publications available on Google Scholar