About
I am a researcher and developer with expertise in natural language processing and large language models. My work focuses on training intelligent LLM agents to solve real-world problems in domains such as coding and shopping. Check out my Google Scholar for publications.
Currently, I am training Amazon Rufus models and improving RL efficiency from both algorithmic (off-policy, adaptive rollout) and system design (async RL) perspectives. I actively contribute to open-source RL projects.
Experience
- Senior Applied Scientist @Amazon (Now)
- Research Intern @MSR and @Tencent AI
- CS PhD @HKUST
Selected Publications
-
Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
Reproduce Kimi K1.5/K2 RL algorithm and theoretically understand PMD as regularization in LLM post training
-
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
Dynamically allocate rollout budgets and reuse previous correct responses for RLVR sampling effiency.
-
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
A novel framework for RL training of Generative Reward Models enhanced with reasoning processes.
-
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Effective WebAgent training with multi-turn RL and verifiable rewards.