🎓 About Me

I am Yuyang Ding (丁誉洋), a third-year Ph.D. student at the Institute of Artificial Intelligence, Soochow University, advised by Assoc. Prof. Juntao Li and Prof. Min Zhang.

My research interest lies in RL for LLM Reasoning, with particular interests in Agentic RL and Robust Reward Modeling.

I am currently a research intern at ByteDance Seed. My current research focuses on the joint optimization of algorithms and infrastructure to enable scalable and effective reinforcement learning.

Research Experience

Research Intern at ByteDance Seed (25.06 - Present)

Maintainer and Core Contributor of LLM RL Library veRL, advised by Chi Zhang and Xibin Wu
Reinforcement Learning with Generative Rewards (25.06 - 25.10):
- Algorithm Paper: FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
- Infrastructure Design: Asynchronous Reward Loop design for efficient and Flexible Reward Computation
Agentic Reinforcement Learning (25.10 - Present):
- On-going Research, Stay Tuned

PhD student at OpenNLG group (23.09 - Present)

Advised by Juntao Li and Min Zhang
LLM Reasoning (25.02 - Present):
- SFT with Synthetic Data: ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch
- Robust Process Reward Modeling: SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
Robust Learning in Traditional NLP Tasks (23.09 - 25.02):
- Robust NER: GNER, DS-NER; Robust Classification: SelfMix; Robust QA: COLDQA.

📝 Publications

* denotes equal contribution.

ICLR 2026

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Yuyang Ding, Chi Zhang, Juntao Li, Haibin Lin, Min Zhang

TL;DR: We propose Flawed-Aware Policy Optimization (FAPO), which penalizes flawed patterns to achieve more efficient and reliable reinforcement learning.

NeurIPS 2025

SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning

Yuyang Ding, Xinyu Shi, Juntao Li, Xiaobo Liang, Zhaopeng Tu, Min Zhang

TL;DR: We propose Self-Denoising Monte Carlo Annotation (SCAN), an efficient Process Reward Model (PRM) data synthesis and noise-tolerant learning framework.

ACL 2025

ScaleQuest: Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch

Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Zhaopeng Tu, Qiaoming Zhu, Min Zhang

TL;DR: We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.

IEEE TKDE

DS-NER: Unveiling and Addressing Latent Noise in Distant Annotations

Yuyang Ding, Dan Qiao, Juntao Li, Jiajie Xu, Pingfu Chao, Xiaofang Zhou, Min Zhang

TL;DR: We investigated the noise distribution in distantly supervised annotations and proposed targeted denoising and robust training strategies.

ACL 2024

GNER: Rethinking Negative Instances for Generative Named Entity Recognition

Yuyang Ding, Juntao Li, Pinzheng Wang, Zecheng Tang, Bowen Yan, Min Zhang

TL;DR: We introduce GNER, a Generative Named Entity Recognition framework, which demonstrates enhanced zero-shot capabilities across unseen entity domains.

COLING 2022 SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training,

Dan Qiao*, Chenchen Dai*, Yuyang Ding*, Juntao Li, Qiang Chen, Wenliang Chen, Min Zhang
SCIS (CCF-A) OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch,

Juntao Li*, Zecheng Tang*, Yuyang Ding*, Pinzheng Wang*, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang
EMNLP 2023 CMD: a framework for Context-aware Model self-Detoxification,

Zecheng Tang, Keyan Zhou, Juntao Li, Yuyang Ding, Pinzheng Wang, Yan Bowen, Renjie Hua, Min Zhang
EMNLP 2022 COLDQA: Robust question answering against distribution shifts with test-time adaptation: An empirical study

Hai Ye, Yuyang Ding, Juntao Li, Hwee Tou Ng

🎖 Honors and Awards

National Scholarship
CCF Elite Collegiate Award
ICPC National Invitational Programming Contest, Gold Medal
ICPC Asia-East Continent Final Contest (EC-Final), Silver Medal

📖 Educations

2023.09 - current, PhD Student, Institute of Artificial Intelligence, Soochow University
2019.09 - 2023.06, B.Eng., School of Computer Science and Technology, Soochow University

💻 Internships

2025.06 - current, Research Intern, ByteDance Seed, Shanghai, China,