Changlong Yu - AI Researcher & Developer

I am a researcher and developer with expertise in natural language processing and large language models. My work focuses on training intelligent LLM agents to solve real-world problems in domains such as coding and shopping. Check out my Google Scholar for publications.

Currently, I am training Amazon Rufus models and improving RL efficiency from both algorithmic (off-policy, adaptive rollout) and system design (async RL) perspectives. I actively contribute to open-source RL projects.

Experience

Senior Applied Scientist @Amazon (Now)
Research Intern @MSR and @Tencent AI
CS PhD @HKUST

Selected Publications

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

ArXiv, 2026 • OpenKimi • Blog

Reproduce Kimi K1.5/K2 RL algorithm and theoretically understand PMD as regularization in LLM post training
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse

ArXiv, 2025 • Adaptive Rollout

Dynamically allocate rollout budgets and reuse previous correct responses for RLVR sampling effiency.
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

NeurIPS, 2025 • Think-RM

A novel framework for RL training of Generative Reward Models enhanced with reasoning processes.
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

EMNLP, 2025 • WebAgent-R1

Effective WebAgent training with multi-turn RL and verifiable rewards.

About

Experience

Selected Publications

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning