π About Me
I am Yuyang Ding (δΈθͺζ΄), a third-year Ph.D. student at the Institute of Artificial Intelligence, Soochow University, advised by Assoc. Prof. Juntao Li and Prof. Min Zhang.
My research interest lies in RL for LLM Reasoning, with particular interests in Agentic RL and Robust Reward Modeling.
I am currently a research intern at ByteDance Seed. My current research focuses on the joint optimization of algorithms and infrastructure to enable scalable and effective reinforcement learning.
- On the algorithmic side, I work on scaling LLM Agents to effectively operate in more complex environments and real-world interactions.
- On the systems side, I focus on scaling reinforcement learning frameworks to efficiently run on large-scale computational resources.
Research Experience
Research Intern at ByteDance Seed (25.06 - Present)
- Maintainer and Core Contributor of LLM RL Library veRL, advised by Chi Zhang, Xibin Wu, and Haibin Lin
- Reinforcement Learning with Generative Rewards (25.06 - 25.10):
- Algorithm Paper: FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
- Infrastructure Design: Asynchronous Reward Loop design for efficient and Flexible Reward Computation
- Agentic Reinforcement Learning (25.10 - Present):
- On-going Research, Stay Tuned
PhD student at OpenNLG group (23.09 - Present)
- Advised by Juntao Li and Min Zhang
- LLM Reasoning (25.02 - Present):
- SFT with Synthetic Data: ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch
- Robust Process Reward Modeling: SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
- Robust Learning in Traditional NLP Tasks (23.09 - 25.02):
π Publications
* denotes equal contribution.

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Yuyang Ding, Chi Zhang, Juntao Li, Haibin Lin, Xin Liu, Min Zhang
TL;DR: We propose Flawed-Aware Policy Optimization (FAPO), which penalizes flawed patterns to achieve more efficient and reliable reinforcement learning.

SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
Yuyang Ding, Xinyu Shi, Juntao Li, Xiaobo Liang, Zhaopeng Tu, Min Zhang
TL;DR: We propose Self-Denoising Monte Carlo Annotation (SCAN), an efficient Process Reward Model (PRM) data synthesis and noise-tolerant learning framework.

ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch
Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Zhaopeng Tu, Qiaoming Zhu, Min Zhang
TL;DR: We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.

DS-NER: Unveiling and Addressing Latent Noise in Distant Annotations
Yuyang Ding, Dan Qiao, Juntao Li, Jiajie Xu, Pingfu Chao, Xiaofang Zhou, Min Zhang
TL;DR: We investigated the noise distribution in distantly supervised annotations and proposed targeted denoising and robust training strategies.

GNER: Rethinking Negative Instances for Generative Named Entity Recognition
Yuyang Ding, Juntao Li, Pinzheng Wang, Zecheng Tang, Bowen Yan, Min Zhang
TL;DR: We introduce GNER, a Generative Named Entity Recognition framework, which demonstrates enhanced zero-shot capabilities across unseen entity domains.
-
COLING 2022SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training,Dan Qiao*, Chenchen Dai*, Yuyang Ding*, Juntao Li, Qiang Chen, Wenliang Chen, Min Zhang
-
SCIS (CCF-A)OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch,Juntao Li*, Zecheng Tang*, Yuyang Ding*, Pinzheng Wang*, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang
-
EMNLP 2023CMD: a framework for Context-aware Model self-Detoxification,Zecheng Tang, Keyan Zhou, Juntao Li, Yuyang Ding, Pinzheng Wang, Yan Bowen, Renjie Hua, Min Zhang
π Honors and Awards
- National Scholarship
- CCF Elite Collegiate Award
- ICPC National Invitational Programming Contest, Gold Medal
- ICPC Asia-East Continent Final Contest (EC-Final), Silver Medal
π Educations
- 2023.09 - current, PhD Student, Institute of Artificial Intelligence, Soochow University
- 2019.09 - 2023.06, B.Eng., School of Computer Science and Technology, Soochow University
π» Internships
- 2025.06 - current, Research Intern, ByteDance Seed, Shanghai, China,