基本信息

朱圆恒  男    中国科学院自动化研究所
电子邮件: yuanheng.zhu@ia.ac.cn
通信地址: 北京市中关村东路95号
邮政编码: 100190

研究领域

多具身智能,智能博弈决策,大语言模型,深度强化学习,多智能体强化学习

招生信息

我的研究主题包括:

  • 多具身智能
    具身智能作为机器人学与现代人工智能结合的最新方向,强调一个具身智能体(如机器人、虚拟角色)在物理或模拟环境中自主感知、决策和行动。相比与单个具身智能体,一组同构或异构的多具身智能体能够在共享环境中协同或竞争性地完成任务,在交流、协作或博弈过程中共同适应复杂动态的环境,显著提升任务效率、系统鲁棒和群体智能。
    我们以轮式和轮足式实体机器人为载体,建立起能够从虚拟游戏->室内环境->室外开放的多具身智能体系统,协同完成包括追捕、巡逻、搜救、搬运的具身任务,推动通用人工智能的落地。
      

  • 多智能体强化学习
    多智能体强化学习(MARL)作为AlphaGo,AlphaStar的关键技术,取得了长足的成功。与单智能体强化学习相比,MARL会动态环境、部分观测、信誉分配、规模化等挑战。尤其是在虚拟封闭的游戏环境训练的多智能体系统,在现实开放世界会额外面临系统内部和环境外部联合动态和不确定性的因素。因此需要将MARL与先进的人工智能技术,包括元学习、世界模型、持续学习、大模型等结合,以提升开放环境的自主决策能力。
       

  • 大模型强化学习后训练和智能体
    ChatGPT,DeepSeek等大语言模型的成功,极大推动了人工通用智能的发展。其中强化学习在大语言模型不论是偏好对齐,还是后训练中发挥了巨大作用。同时以大语言模型为基础构造通用智能体,也成为了人工智能最前沿的领域。如何提高RL后训练的效率,降低训练难度,同时在除数学、代码等领域外产生更多的智能,是我们关心的重点。此外,如何提升LLM智能体通用性能和应用范围,也是我们正在解决的问题。
      

我每年招收硕士研究生1-2名,欢迎具有人工智能、机器人、自动控制、计算机、电子、数学专业的考生联系报考, 可与我邮件联系 yuanheng.zhu@ia.ac.cn 。


NEWS!

润宇获得IEEE CIS学会的Graduate Student Research Grant (每年全球5位)

ICLR2025 三篇论文录用

  • 润宇的Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games
  • 宇千的INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
  • 嘉骏的Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning [https://github.com/laq2024/MLAQ]

ICML2025 两篇论文录用

  • 润宇的Constrained Exploitability Descent: An Offline Reinforcement Learning Method for Finding Mixed-Strategy Nash Equilibrium
  • 凯旋的 DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

TNNLS2025 一篇论文录用

教育背景

2010-09--2015-07   中国科学院自动化研究所   博士学位
2006-09--2010-07   南京大学   学士学位

工作经历

2017-10~现在, 中国科学院自动化研究所, 副研究员
2017-12~2018-12,美国罗德岛大学, 访问学者
2015-07~2017-10,中国科学院自动化研究所, 助理研究员

社会兼职
2024-01-01-今,IEEE Transactions on Games Associate Editor, Associate Editor
2024-01-01-今,IEEE Computational Intelligence Society, Education Competition Subcommittee Chair
2022-12-31-2028-04-30,IEEE TNNLS Associated Editor, Associated Editor
2022-08-20-2022-08-23,2022 IEEE Conference on Games, Program Co-Chair
2019-04-22-今,中国自动化学会 数据驱动控制、学习与优化专业委员会, 委员
2017-09-29-今,中国自动化学会 自适应动态规划与强化学习专业委员会, 委员

奖励

  • 获得北京市科技新星计划资助(2024)

  • 研究的“高效深度强化学习算法和最优性分析”项目获得北京市科学技术奖自然科学奖二等奖(排名第二).

  • 研究的“受限条件下智能无人系统学习控制理论与方法”项目获得天津市自然科学二等奖(排名第二)。

  • 3篇论文分别获得期刊论文奖,分别是IEEE TASE期刊2022年最佳论文奖(年度唯一)、IEEE TETCI期刊2022年杰出论文奖(年度唯一)、《控制理论与应用》期刊2017年度优秀论文奖

  • 带领团队成员参加国际国内比赛7次获得一等奖或冠军,包括CoG格斗游戏、Robomaster人工智能挑战赛、中国AI+创新创业大赛、SSCAIT星际争霸天梯赛等。

  • 入选斯坦福大学科学影响力排行榜全球前2%顶尖科学家榜单(2022,2023年),电气与电子工程师协会高级会员 IEEE Senior Member,中国科学院青年创新促进会会员

出版信息

游戏人工智能方法. 赵冬斌,朱圆恒,唐振韬,邵坤. 科学出版社. 

智能网联汽车—决策控制技术. 赵冬斌,张启超,朱圆恒,李栋. 人民交通出版社. 

发表论文
[1] Fu, Yuqian, Zhu, Yuanheng, Chai, Jiajun, Zhao, Dongbin. LDR: Learning Discrete Representation to Improve Noise Robustness in Multiagent Tasks. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2025, 第 2 作者  通讯作者  55(1): 513-525, http://dx.doi.org/10.1109/TSMC.2024.3487535.
[2] Hu, Guangzheng, Zhu, Yuanheng, Li, Haoran, Zhao, Dongbin. FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2024, 第 2 作者  通讯作者  8(6): 4033-4045, http://dx.doi.org/10.1109/TETCI.2024.3383454.
[3] 李浩然, 张曜程, 温颢玮, 朱圆恒, 赵冬斌. Stabilizing Diffusion Model for Robotic Control with Dynamic Programming and Transition Feasibility. IEEE Transactions on Artificial Intelligence[J]. 2024, 第 4 作者  通讯作者  
[4] 李博宇, 李浩然, 朱圆恒, 赵冬斌. MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning. IEEE Transactions on Cognitive and Developmental Systems[J]. 2024, 第 3 作者
[5] Li, Luntong, Zhu, Yuanheng. Boosting On-Policy Actor-Critic With Shallow Updates in Critic. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024, 第 2 作者  通讯作者  http://dx.doi.org/10.1109/TNNLS.2024.3378913.
[6] 控制理论与应用. 2024,   通讯作者  
[7] IEEE Transactions on Emerging Topics in Computational Intelligence. 2024,   通讯作者  
[8] IEEE Transactions on Artificial Intelligence. 2024,   通讯作者  
[9] IEEE Transactions on Artificial Intelligence. 2024,   通讯作者  
[10] IEEE Transactions on Cognitive and Developmental Systems. 2024, 第 3 作者
[11] IEEE Transactions on Neural Networks and Learning Systems. 2024,   通讯作者  
[12] Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2024, 第 2 作者  通讯作者  35(12): 17829-17841, http://dx.doi.org/10.1109/TNNLS.2023.3309608.
[13] Zhu, Yuanyang, Wang, Zhi, Zhu, Yuanheng, Chen, Chunlin, Zhao, Dongbin. Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024, 第 3 作者http://dx.doi.org/10.1109/TNNLS.2024.3446371.
[14] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n -Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2023, 第 1 作者53(10): 6443-6455, http://dx.doi.org/10.1109/TCYB.2022.3179775.
[15] Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2023, 第 2 作者  通讯作者  http://dx.doi.org/10.1109/TNNLS.2023.3309608.
[16] Liu, Minsong, Li, Luntong, Hao, Shuai, Zhu, Yuanheng, Zhao, Dongbin. Soft Contrastive Learning With Q-Irrelevance Abstraction for Reinforcement Learning. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS[J]. 2023, 第 4 作者15(3): 1463-1473, http://dx.doi.org/10.1109/TCDS.2022.3218940.
[17] 中国计算机学会通讯. 2023,   通讯作者  
[18] IEEE Transactions on Neural Networks and Learning Systems. 2023,   通讯作者  
[19] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 第 2 作者  通讯作者  34(8): 3966-3978, http://dx.doi.org/10.1109/TNNLS.2021.3121546.
[20] IEEE Transactions on Neural Networks and Learning Systems. 2023,   通讯作者  
[21] Chai, Jiajun, Chen, Wenzhang, Zhu, Yuanheng, Yao, ZongXin, Zhao, Dongbin. A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2023, 第 3 作者  通讯作者  53(9): 5417-5429, http://dx.doi.org/10.1109/TSMC.2023.3270444.
[22] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 第 3 作者34(4): 2093-2104, http://dx.doi.org/10.1109/TNNLS.2021.3105869.
[23] Tang, Zhentao, Zhu, Yuanheng, Zhao, Dongbin, Lucas, Simon M. Enhanced Rolling Horizon Evolution Algorithm With Opponent Model Learning: Results for the Fighting Game AI Competition. IEEE TRANSACTIONS ON GAMES[J]. 2023, 第 2 作者15(1): 5-15, http://dx.doi.org/10.1109/TG.2020.3022698.
[24] Hu, Guangzheng, Li, Haoran, Liu, Shasha, Zhu, Yuanheng, Zhao,Dongbin. NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks. 2023 International Joint Conference on Neural Networks(IJCNN). 2023, 第 4 作者
[25] Zhu, Yuanheng, Zhao, Dongbin. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2022, 第 1 作者33(3): 1228-1241, http://dx.doi.org/10.1109/TNNLS.2020.3041469.
[26] 唐振韬, 梁荣钦, 朱圆恒, 赵冬斌. 实时格斗游戏的智能决策方法. 控制理论与应用. 2022, 第 3 作者39(6): 969-985, https://d.wanfangdata.com.cn/periodical/kzllyyy202206001.
[27] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEX & INTELLIGENT SYSTEMS[J]. 2022, 第 2 作者  通讯作者  8(2): 1205-1216, http://dx.doi.org/10.1007/s40747-021-00577-6.
[28] 刘民颂, 李论通, 邵帅, 朱圆恒, 赵冬斌. Soft Contrastive Learning with Q-irrelevance Abstraction for Reinforcement Learning. Ieee transactions on cognitive and developmental systems[J]. 2022, 第 4 作者
[29] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n-Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2022, 第 1 作者http://dx.doi.org/10.1109/TCYB.2022.3179775.
[30] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 3 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000733450200001.
[31] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Optimal Feedback Control of Pedestrian Flow in Heterogeneous Corridors. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING[J]. 2021, 第 1 作者  通讯作者  18(3): 1097-1108, http://dx.doi.org/10.1109/TASE.2020.2996018.
[32] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEXINTELLIGENTSYSTEMS[J]. 2021, 第 2 作者12, 
[33] Yang, Xiong, Zhu, Yuanheng, Dong, Na, Wei, Qinglai. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者
[34] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者  通讯作者  http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000732283100001.
[35] Tang Zhentao, Zhu Yuanheng, Zhao Dongbin, Lucas Simon M. Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition. 2020, 第 2 作者http://arxiv.org/abs/2003.13949.
[36] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin. LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS[J]. 2020, 第 1 作者21(11): 4516-4525, http://dx.doi.org/10.1109/TITS.2019.2935510.
[37] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Synthesis of Cooperative Adaptive Cruise Control With Feedforward Strategies. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY[J]. 2020, 第 1 作者  通讯作者  69(4): 3615-3627, http://dx.doi.org/10.1109/TVT.2020.2974932.
[38] Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin, IEEE. Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者
[39] 朱圆恒. 在线最小最大Q 网络学习算法解决两人零和马尔科夫博弈过程. IEEE Transactions on Neural Networks and Learning Systems. 2020, 第 1 作者
[40] Liu, Minsong, Zhu, Yuanheng, Zhao, Dongbin, IEEE. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者
[41] 朱圆恒. 基于前馈策略对协同自适应巡航控制的设计. IEEE Transactions on Vehicular Technology. 2020, 第 1 作者
[42] 朱圆恒. 强化水平滚动演化计算算法和对手建模. IEEE Transactions on Games. 2020, 第 1 作者
[43] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2020, 第 1 作者50(11): 3959-3971, http://dx.doi.org/10.1109/TSMC.2019.2911900.
[44] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun, Wang, Ding. Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems. IEEE TRANSACTIONS ON SMART GRID[J]. 2019, 第 1 作者10(4): 4235-4244, https://www.webofscience.com/wos/woscc/full-record/WOS:000472577500065.
[45] Shao, Kun, Zhu, Yuanheng, Zha, Dongbin. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2019, 第 2 作者3(1): 73-84, http://dx.doi.org/10.1109/TETCI.2018.2823329.
[46] 朱圆恒. 强化学习和课程迁移学习结合实现星际争霸微操控制. IEEE Transactions on Emerging Topics in Computational Intelligence. 2019, 第 1 作者
[47] 朱圆恒. 基于LMI设计协同自适应巡航控制系统满足弦稳定的控制器. IEEE Transactions on Intelligent Transportation Systems. 2019, 第 1 作者
[48] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin, Hou, Zhongsheng, IEEE. Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2019, 第 1 作者  通讯作者  
[49] 朱圆恒. 基于深度和强化学习对开源赛车仿真器的视觉驾驶. JOURNALOFAMBIENTINTELLIGENCEANDHUMANIZEDCOMPUTING. 2019, 第 1 作者
[50] 朱圆恒. 控制受限自适应动态规划方法对多电池存储系统的设计. IEEE Transactions on Smart Grid. 2019, 第 1 作者
[51] 朱圆恒. 不变自适应动态规划方法求解离散时间系统最优控制. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019, 第 1 作者
[52] Zhu, Yuanheng, Zhao, Dongbin, Zhong, Zhiguang. Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY[J]. 2019, 第 1 作者27(4): 1772-1779, 
[53] 朱圆恒. 对动力学带有不确定性的异构协同自适应巡航控制系统的自适应最优控制. IEEE Transactions on Control Systems Technology. 2019, 第 1 作者
[54] Tang Zhentao, Shao Kun, Zhu Yuanheng, Li Dong, Zhao Dongbin, Huang Tingwen, Sundaram S. A Review of Computational Intelligence for StarCraft AI. 8th IEEE Symposium Series on Computational Intelligence (IEEE SSCI). 2018, 第 3 作者1167-1173, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800159.
[55] Zhu, Yuanheng, Zhao, Dongbin. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artificial Intelligence Review[J]. 2018, 第 1 作者49(4): 531-547, https://link.springer.com/article/10.1007/s10462-017-9548-4.
[56] Li Dong, Zhao Dongbin, Zhang Qichao, Zhu Yuanheng, Sundaram S. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI). 2018, 第 4 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[57] Yuanheng Zhu, Nannan Li, Kun Shao, Dongbin Zhao. Learning battles in ViZDoom via deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23364.
[58] Zhu, Yuanheng, Zhao, Dongbin, Yang, Xiong, Zhang, Qichao. Policy Iteration for H infinity Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2018, 第 1 作者  通讯作者  48(2): 500-509, https://www.webofscience.com/wos/woscc/full-record/WOS:000422925700005.
[59] Shao, Kun, Zhao, Dongbin, Zhu, Yuanheng, Zhang, Qichao, IEEE. Visual Navigation with Actor-Critic Deep Reinforcement Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2018, 第 3 作者https://webofscience.clarivate.cn/wos/woscc/full-record/WOS:000585967404004.
[60] 朱圆恒. 针对连续时间最优控制的在线自适应动态规划方法的综合比较. Artificial Intelligence Review. 2018, 第 1 作者
[61] Zhu Yuanheng, Zhang Qichao, Zhao Dongbin, Li Dong. An Autonomous Driving Experience Platform with Learning-Based Functions. 8th IEEE Symposium Series on Computational Intelligence (IEEE SSCI). 2018, 第 1 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[62] Yuanheng Zhu, Qichao Zhang, Dongbin Zhao, Kun Shao. Visual navigation with Actor-Critic deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23365.
[63] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS[J]. 2017, 第 1 作者  通讯作者  64(5): 4101-4109, https://www.webofscience.com/wos/woscc/full-record/WOS:000399674000064.
[64] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[65] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. NEUROCOMPUTING[J]. 2017, 第 3 作者238(*): 377-386, http://dx.doi.org/10.1016/j.neucom.2017.01.076.
[66] 朱圆恒. 利用自适应动态规划实现对部分未知、控制受限系统的事件驱动最优控制. IEEE Transactions on Industrial Electronics. 2017, 第 1 作者
[67] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[68] 朱圆恒. 利用平方和编程实现对多项式非线性系统H无穷最优控制的策略迭代求解. IEEE transactions on cybernetics. 2017, 第 1 作者
[69] 朱圆恒. 基于在线数据使用迭代自适应动态规划求解未知非线性零和博弈问题. IEEE Transactions on Neural Networks and Learning Systems. 2017, 第 1 作者
[70] 朱圆恒. 数据驱动自适应动态规划求解部分输入受限的连续时间完全合作博弈问题. Neurocomputing. 2017, 第 1 作者
[71] 朱圆恒, 赵冬斌, 邵坤. Cooperative Reinforcement Learning for Multiple Units Combat in StarCraft. 2017, 第 1 作者http://ir.ia.ac.cn/handle/173211/15399.
[72] 唐振韬, 邵坤, 赵冬斌, 朱圆恒. 深度强化学习进展: 从 AlphaGo 到 AlphaGo Zero. 控 制 理 论 与 应 用[J]. 2017, 第 4 作者34(12): 1529-1546, http://lib.cqvip.com/Qikan/Article/Detail?id=7000480876.
[73] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2017, 第 1 作者  通讯作者  28(3): 714-725, https://www.webofscience.com/wos/woscc/full-record/WOS:000395980500020.
[74] 朱圆恒. 自适应动态规划实现未知连续时间非线性系统的鲁棒网络控制. IET Control Theory & Applications. 2017, 第 1 作者
[75] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Event-Triggered H-infinity Control for Continuous-Time Nonlinear System via Concurrent Learning. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2017, 第 3 作者47(7): 1071-1081, https://www.webofscience.com/wos/woscc/full-record/WOS:000404354600004.
[76] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET CONTROL THEORY AND APPLICATIONS[J]. 2016, 第 1 作者10(12): 1339-1347, 
[77] Zhu Yuanheng, Chen Xi, Zhao Dongbin, Zhang Qichao. Model-free reinforcement learning for nonlinear zero-sum games with simultaneous explorations. 2016, 第 1 作者http://ir.ia.ac.cn/handle/173211/14340.
[78] Tang Zhentao, Shao Kun, Zhao Bongbin, Zhu Yuanheng. Move Prediction in Gomoku Using Deep Learning. 2016, 第 4 作者http://ir.ia.ac.cn/handle/173211/15673.
[79] Zhao Dongbin, Wang Haitao, Shao Kun, Zhu Yuanheng, IEEE. Deep Reinforcement Learning with Experience Replay Based on SARSA. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI). 2016, 第 4 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000400488300013.
[80] 朱圆恒. 使用强化学习技术求解在系统动力学未知情况下连续时间非线性最优追踪问题. IET Control Theory Applications. 2016, 第 1 作者
[81] Zhao, Dongbin, Zhang, Qichao, Wang, Ding, Zhu, Yuanheng. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2016, 第 4 作者  通讯作者  46(3): 854-865, https://www.webofscience.com/wos/woscc/full-record/WOS:000370963500023.
[82] 赵冬斌, 朱圆恒. 概率近似正确的强化学习算法解决连续状态空间控制问题. 控制理论与应用[J]. 2016, 第 2 作者33(12): 1603-1613, http://lib.cqvip.com/Qikan/Article/Detail?id=7000119656.
[83] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. COGNITIVE COMPUTATION[J]. 2015, 第 1 作者7(6): 763-771, http://ir.ia.ac.cn/handle/173211/10525.
[84] Zhao, Dongbin, Zhu, Yuanheng. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2015, 第 2 作者26(2): 346-356, http://www.irgrid.ac.cn/handle/1471x/980893.
[85] Zhu, Yuanheng, Zhao, Dongbin. A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. NEURAL COMPUTING & APPLICATIONS[J]. 2015, 第 1 作者26(4): 775-787, http://www.irgrid.ac.cn/handle/1471x/980902.
[86] 朱圆恒. 对离散时间系统无衰减最优控制使用近似策略迭代的收敛性证明. Cognitive Computation. 2015, 第 1 作者
[87] 朱圆恒. MEC对连续确定性系统的近似最优在线强化学习算法. IEEE Transactions on Neural Networks and Learning Systems. 2015, 第 1 作者
[88] 赵冬斌, Yuanheng Zhu. Model-Free Adaptive Algorithm for Optimal Control of Continuous-Time Nonlinear System. 2015, 第 2 作者http://ir.ia.ac.cn/handle/173211/15282.
[89] Li Dong, Zhao Dongbin, Zhu Yuanheng, Xia Zhongpu, IEEE. Thermal Comfort Control Based on MEC Algorithm for HVAC Systems. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2015, 第 3 作者
[90] Zhu, Yuanheng, Zhao, Dongbin, Liu, Derong. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. NEUROCOMPUTING[J]. 2015, 第 1 作者149: 124-131, http://dx.doi.org/10.1016/j.neucom.2013.11.055.
[91] 朱圆恒. 基于数据的在线强化学习算法实现概率近似正确原理. Neural Computing and Applications. 2015, 第 1 作者
[92] Li Dong, Xia Zhongpu, Zhu Yuanheng, Zhao Dongbin. Thermal Comfort Control Based on MEC Algorithm for HVAC System. 2015, 第 3 作者http://ir.ia.ac.cn/handle/173211/15667.
[93] 朱圆恒. 对非线性离散时间HJB系统的收敛分析和模糊HDP方法应用. Neurocomputing. 2015, 第 1 作者
[94] Yuanheng Zhu, 赵冬斌. A data-based online reinforcement learning algorithm with high-efficient exploration. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15283.
[95] Zhao, Dongbin, Hu, Zhaohui, Xia, Zhongpu, Alippi, Cesare, Zhu, Yuanheng, Wang, Ding. Full-range adaptive cruise control based on supervised adaptive dynamic programming. NEUROCOMPUTING[J]. 2014, 第 5 作者125: 57-67, http://dx.doi.org/10.1016/j.neucom.2012.09.034.
[96] Yuanheng Zhu, Dongbin Zhao, Haibo He. An high-efficient online reinforcement learning algorithm for continuous-state systems. World Congress on Intelligent Control and Automation (WCICA 2014). 2014, 第 1 作者581-586., http://www.irgrid.ac.cn/handle/1471x/973405.
[97] Yuanheng Zhu, 赵冬斌. Online reinforcement learning for continuous-state systems. FRONTIERS OF INTELLIGENT CONTROL AND INFORMATION PROCESSING. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15280.
[98] 赵冬斌, Yuanheng Zhu. Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. 2013, 第 2 作者http://ir.ia.ac.cn/handle/173211/15281.
[99] Yuanheng Zhu, Dongbin Zhao, Haibo He. Integration of fuzzy controller with adaptive dynamic programming. 10th World Congress on Intelligent Control and Automation (WCICA 2012). 2012, 第 1 作者310-315, http://www.irgrid.ac.cn/handle/1471x/973407.
[100] Zhao, Dongbin, Zhu, Yuanheng, He, Haibo. Neural and Fuzzy Dynamic Programming for Under-actuated Systems. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2012, 第 2 作者http://www.irgrid.ac.cn/handle/1471x/973390.

科研活动

   
科研项目
( 1 ) 多智能体通用决策大模型研究, 负责人, 地方任务, 2024-10--2027-09
( 2 ) 开放环境多智能体对抗博弈基础理论, 负责人, 国家任务, 2023-01--2027-12
( 3 ) 动态变化场景下的多智能体强化学习博弈对抗理论与方法, 负责人, 地方任务, 2023-01--2025-12
( 4 ) 面向多任务的多智能体强化学习理论与应用, 参与, 国家任务, 2022-01--2026-12
( 5 ) 多智能体深度强化学习, 负责人, 中国科学院计划, 2022-01--2024-12

合作情况

   
项目协作单位

电科院, 航天二院, 华为,沈飞601,航天二院未来实验室,超参数


指导学生

已指导学生

刘元勋  硕士研究生  081203-计算机应用技术  

黄上京  硕士研究生  081104-模式识别与智能系统  

现指导学生

傅宇千  博士研究生  081104-模式识别与智能系统  

陈文章  硕士研究生  081101-控制理论与控制工程  

左斌斌  硕士研究生  081101-控制理论与控制工程  

赵梓轩  硕士研究生  081203-计算机应用技术  

协助指导学生

学生        学位     时间                     毕业去向

邵坤        硕博     2014.9/2019.7     华为

唐振韬     直博     2016.9/2022.7     华为

李伟凡     普博     2018.9/2023.7     智能研究院

刘元勋     硕士     2020.9/2023.7     外交部

胡光政     普博     2019.9/2024.7     美团

刘民颂     硕博     2018.9/2024.7     军科院

黄上京     硕士     2021.9/2024.7     快手

柴嘉骏     直博     2019.9/2025.7     美团(北斗计划)

陈文章     硕士     2022.9/2025.7     英特尔

陆润宇     直博     2021.9/至今

赵子杰     直博     2022.9/至今

徐凯旋     直博     2023.9/至今