基本信息

朱圆恒  男    中国科学院自动化研究所
电子邮件: yuanheng.zhu@ia.ac.cn
通信地址: 北京市中关村东路95号
邮政编码: 100190

研究领域

智能博弈决策,深度强化学习,多智能体强化学习,游戏人工智能

招生信息

我每年招收硕士研究生1-2名,欢迎具有人工智能、自动控制、计算机、电子、数学专业的考生联系报考, 可与我邮件联系 yuanheng.zhu@ia.ac.cn 。

招生专业
081104-模式识别与智能系统
081101-控制理论与控制工程
招生方向
多智能体强化学习, 多智能体博弈对抗, 深度强化学习

教育背景

2010-09--2015-07   中国科学院自动化研究所   博士学位
2006-09--2010-07   南京大学   学士学位

工作经历

2017-10~现在, 中国科学院自动化研究所, 副研究员
2017-12~2018-12,美国罗德岛大学, 访问学者
2015-07~2017-10,中国科学院自动化研究所, 助理研究员

社会兼职
2024-01-01-今,IEEE Transactions on Games Associate Editor, Associate Editor
2024-01-01-今,IEEE Computational Intelligence Society, Education Competition Subcommittee Chair
2022-12-31-2028-04-30,IEEE TNNLS Associated Editor, Associated Editor
2022-08-20-2022-08-23,2022 IEEE Conference on Games, Program Co-Chair
2019-04-22-今,中国自动化学会 数据驱动控制、学习与优化专业委员会, 委员
2017-09-29-今,中国自动化学会 自适应动态规划与强化学习专业委员会, 委员

教授课程

强化学习
智能控制

专利与奖励

  • 研究的“高效深度强化学习算法和最优性分析”项目获得北京市科学技术奖自然科学奖二等奖(排名第二).

  • 研究的“受限条件下智能无人系统学习控制理论与方法”项目获得天津市自然科学二等奖(排名第二)。

  • 3篇论文分别获得期刊论文奖,分别是IEEE TASE期刊2022年最佳论文奖(年度唯一)、IEEE TETCI期刊2022年杰出论文奖(年度唯一)、《控制理论与应用》期刊2017年度优秀论文奖

  • 带领团队成员参加国际国内比赛7次获得一等奖或冠军,包括CoG格斗游戏、Robomaster人工智能挑战赛、中国AI+创新创业大赛、SSCAIT星际争霸天梯赛等。

  • 入选斯坦福大学科学影响力排行榜全球前2%顶尖科学家榜单(2022,2023年),电气与电子工程师协会高级会员 IEEE Senior Member,中国科学院青年创新促进会会员

奖励信息
(1) 2023年天津市科学技术奖自然科学二等奖, 二等奖, 省级, 2023
(2) IEEE TETCI期刊杰出论文奖 IEEE TETCI Outstanding Paper Award), 其他, 2022
(3) 2022年IEEE Transactions on Automation Science and Engineering 期刊最佳论文奖, 其他, 2022
(4) 中国科学院大学2022年学院级研究生优秀课程, 研究所(学校), 2022
(5) 中国科学院大学2022年校级研究生优秀课程, 研究所(学校), 2022
(6) 2022年度北京市科学技术奖科学自然科学奖, 二等奖, 省级, 2022
(7) 《控制理论与应用》 期刊年度优秀论文奖, 其他, 2017
专利成果
( 1 ) 多机器人协作对抗方法、装置、电子设备和存储介质, 发明专利, 2022, 第 3 作者, 专利号: CN113894780A

( 2 ) 基于强化学习的变化环境多智能体控制方法与装置, 发明专利, 2021, 第 1 作者, 专利号: CN113837348A

( 3 ) 一种基于强化学习的导弹制导方法和装置, 发明专利, 2021, 第 1 作者, 专利号: CN113239472A

( 4 ) 基于反事实回报的多智能体深度强化学习方法、系统, 发明专利, 2020, 第 3 作者, 专利号: CN111105034A

( 5 ) 基于加速度前馈的异构车队协同自适应巡航控制方法, 专利授权, 2021, 第 1 作者, 专利号: CN110888322B

( 6 ) 智能驾驶车道保持方法及系统, 专利授权, 2019, 第 5 作者, 专利号: CN109466552A

( 7 ) 多电池储能系统的优化控制方法、系统及存储介质, 发明专利, 2019, 第 1 作者, 专利号: CN109245196A

( 8 ) 弹簧质量阻尼器的鲁棒跟踪控制方法, 发明专利, 2018, 第 3 作者, 专利号: CN108303876A

( 9 ) 储能电池充/放电异常行为检测方法及检测系统, 专利授权, 2016, 第 3 作者, 专利号: CN106154180A

( 10 ) 基于数据的Q函数自适应动态规划方法, 发明专利, 2013, 第 2 作者, 专利号: CN103217899A

( 11 ) 煤气化炉的控制方法, 发明专利, 2012, 第 5 作者, 专利号: CN102799748A

( 12 ) 模糊自适应动态规划方法, 发明专利, 2012, 第 2 作者, 专利号: CN102645894A

出版信息

   
发表论文
[1] Hu, Guangzheng, Zhu, Yuanheng, Li, Haoran, Zhao, Dongbin. FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE. 2024, 第 2 作者  通讯作者  http://dx.doi.org/10.1109/TETCI.2024.3383454.
[2] 李浩然, 张曜程, 温颢玮, 朱圆恒, 赵冬斌. Stabilizing Diffusion Model for Robotic Control with Dynamic Programming and Transition Feasibility. IEEE Transactions on Artificial Intelligence[J]. 2024, 第 4 作者  通讯作者  
[3] 李博宇, 李浩然, 朱圆恒, 赵冬斌. MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning. IEEE Transactions on Cognitive and Developmental Systems[J]. 2024, 第 3 作者
[4] Li, Luntong, Zhu, Yuanheng. Boosting On-Policy Actor-Critic With Shallow Updates in Critic. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024, 第 2 作者  通讯作者  http://dx.doi.org/10.1109/TNNLS.2024.3378913.
[5] 控制理论与应用. 2024,   通讯作者  
[6] IEEE Transactions on Emerging Topics in Computational Intelligence. 2024,   通讯作者  
[7] IEEE Transactions on Artificial Intelligence. 2024,   通讯作者  
[8] IEEE Transactions on Artificial Intelligence. 2024,   通讯作者  
[9] IEEE Transactions on Cognitive and Developmental Systems. 2024, 第 3 作者
[10] IEEE Transactions on Neural Networks and Learning Systems. 2024,   通讯作者  
[11] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n -Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2023, 第 1 作者53(10): 6443-6455, http://dx.doi.org/10.1109/TCYB.2022.3179775.
[12] Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2023, 第 2 作者  通讯作者  http://dx.doi.org/10.1109/TNNLS.2023.3309608.
[13] Liu, Minsong, Li, Luntong, Hao, Shuai, Zhu, Yuanheng, Zhao, Dongbin. Soft Contrastive Learning With Q-Irrelevance Abstraction for Reinforcement Learning. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS[J]. 2023, 第 4 作者15(3): 1463-1473, http://dx.doi.org/10.1109/TCDS.2022.3218940.
[14] 中国计算机学会通讯. 2023,   通讯作者  
[15] IEEE Transactions on Neural Networks and Learning Systems. 2023,   通讯作者  
[16] IEEE Transactions on Neural Networks and Learning Systems. 2023,   通讯作者  
[17] Chai, Jiajun, Chen, Wenzhang, Zhu, Yuanheng, Yao, ZongXin, Zhao, Dongbin. A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2023, 第 3 作者  通讯作者  53(9): 5417-5429, http://dx.doi.org/10.1109/TSMC.2023.3270444.
[18] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 第 3 作者34(4): 2093-2104, http://dx.doi.org/10.1109/TNNLS.2021.3105869.
[19] Tang, Zhentao, Zhu, Yuanheng, Zhao, Dongbin, Lucas, Simon M. Enhanced Rolling Horizon Evolution Algorithm With Opponent Model Learning: Results for the Fighting Game AI Competition. IEEE TRANSACTIONS ON GAMES[J]. 2023, 第 2 作者15(1): 5-15, http://dx.doi.org/10.1109/TG.2020.3022698.
[20] Hu, Guangzheng, Li, Haoran, Liu, Shasha, Zhu, Yuanheng, Zhao,Dongbin. NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks. 2023 International Joint Conference on Neural Networks(IJCNN). 2023, 第 4 作者
[21] Zhu, Yuanheng, Zhao, Dongbin. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2022, 第 1 作者33(3): 1228-1241, http://dx.doi.org/10.1109/TNNLS.2020.3041469.
[22] 唐振韬, 梁荣钦, 朱圆恒, 赵冬斌. 实时格斗游戏的智能决策方法. 控制理论与应用. 2022, 第 3 作者39(6): 969-985, https://d.wanfangdata.com.cn/periodical/kzllyyy202206001.
[23] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEX & INTELLIGENT SYSTEMS[J]. 2022, 第 2 作者  通讯作者  8(2): 1205-1216, http://dx.doi.org/10.1007/s40747-021-00577-6.
[24] 刘民颂, 李论通, 邵帅, 朱圆恒, 赵冬斌. Soft Contrastive Learning with Q-irrelevance Abstraction for Reinforcement Learning. Ieee transactions on cognitive and developmental systems[J]. 2022, 第 4 作者
[25] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n-Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2022, 第 1 作者http://dx.doi.org/10.1109/TCYB.2022.3179775.
[26] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 3 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000733450200001.
[27] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Optimal Feedback Control of Pedestrian Flow in Heterogeneous Corridors. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING[J]. 2021, 第 1 作者  通讯作者  18(3): 1097-1108, http://dx.doi.org/10.1109/TASE.2020.2996018.
[28] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEXINTELLIGENTSYSTEMS[J]. 2021, 第 2 作者12, 
[29] Yang, Xiong, Zhu, Yuanheng, Dong, Na, Wei, Qinglai. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者
[30] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者  通讯作者  http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000732283100001.
[31] Tang Zhentao, Zhu Yuanheng, Zhao Dongbin, Lucas Simon M. Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition. 2020, 第 2 作者http://arxiv.org/abs/2003.13949.
[32] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin. LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS[J]. 2020, 第 1 作者21(11): 4516-4525, http://dx.doi.org/10.1109/TITS.2019.2935510.
[33] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Synthesis of Cooperative Adaptive Cruise Control With Feedforward Strategies. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY[J]. 2020, 第 1 作者  通讯作者  69(4): 3615-3627, http://dx.doi.org/10.1109/TVT.2020.2974932.
[34] Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin, IEEE. Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者
[35] 朱圆恒. 在线最小最大Q 网络学习算法解决两人零和马尔科夫博弈过程. IEEE Transactions on Neural Networks and Learning Systems. 2020, 第 1 作者
[36] Liu, Minsong, Zhu, Yuanheng, Zhao, Dongbin, IEEE. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者
[37] 朱圆恒. 基于前馈策略对协同自适应巡航控制的设计. IEEE Transactions on Vehicular Technology. 2020, 第 1 作者
[38] 朱圆恒. 强化水平滚动演化计算算法和对手建模. IEEE Transactions on Games. 2020, 第 1 作者
[39] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2020, 第 1 作者50(11): 3959-3971, http://dx.doi.org/10.1109/TSMC.2019.2911900.
[40] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun, Wang, Ding. Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems. IEEE TRANSACTIONS ON SMART GRID[J]. 2019, 第 1 作者10(4): 4235-4244, https://www.webofscience.com/wos/woscc/full-record/WOS:000472577500065.
[41] Shao, Kun, Zhu, Yuanheng, Zha, Dongbin. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2019, 第 2 作者3(1): 73-84, http://dx.doi.org/10.1109/TETCI.2018.2823329.
[42] 朱圆恒. 强化学习和课程迁移学习结合实现星际争霸微操控制. IEEE Transactions on Emerging Topics in Computational Intelligence. 2019, 第 1 作者
[43] 朱圆恒. 基于LMI设计协同自适应巡航控制系统满足弦稳定的控制器. IEEE Transactions on Intelligent Transportation Systems. 2019, 第 1 作者
[44] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin, Hou, Zhongsheng, IEEE. Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2019, 第 1 作者
[45] 朱圆恒. 基于深度和强化学习对开源赛车仿真器的视觉驾驶. JOURNALOFAMBIENTINTELLIGENCEANDHUMANIZEDCOMPUTING. 2019, 第 1 作者
[46] 朱圆恒. 控制受限自适应动态规划方法对多电池存储系统的设计. IEEE Transactions on Smart Grid. 2019, 第 1 作者
[47] 朱圆恒. 不变自适应动态规划方法求解离散时间系统最优控制. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019, 第 1 作者
[48] Zhu, Yuanheng, Zhao, Dongbin, Zhong, Zhiguang. Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY[J]. 2019, 第 1 作者27(4): 1772-1779, 
[49] 朱圆恒. 对动力学带有不确定性的异构协同自适应巡航控制系统的自适应最优控制. IEEE Transactions on Control Systems Technology. 2019, 第 1 作者
[50] Tang Zhentao, Shao Kun, Zhu Yuanheng, Li Dong, Zhao Dongbin, Huang Tingwen, Sundaram S. A Review of Computational Intelligence for StarCraft AI. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI). 2018, 第 3 作者1167-1173, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800159.
[51] Zhu, Yuanheng, Zhao, Dongbin. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. ARTIFICIAL INTELLIGENCE REVIEW[J]. 2018, 第 1 作者49(4): 531-547, https://www.webofscience.com/wos/woscc/full-record/WOS:000426912500004.
[52] Li Dong, Zhao Dongbin, Zhang Qichao, Zhu Yuanheng, Sundaram S. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI). 2018, 第 4 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[53] Yuanheng Zhu, Nannan Li, Kun Shao, Dongbin Zhao. Learning battles in ViZDoom via deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23364.
[54] Zhu, Yuanheng, Zhao, Dongbin, Yang, Xiong, Zhang, Qichao. Policy Iteration for H infinity Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2018, 第 1 作者  通讯作者  48(2): 500-509, https://www.webofscience.com/wos/woscc/full-record/WOS:000422925700005.
[55] Shao, Kun, Zhao, Dongbin, Zhu, Yuanheng, Zhang, Qichao, IEEE. Visual Navigation with Actor-Critic Deep Reinforcement Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2018, 第 3 作者
[56] 朱圆恒. 针对连续时间最优控制的在线自适应动态规划方法的综合比较. Artificial Intelligence Review. 2018, 第 1 作者
[57] Zhu Yuanheng, Zhang Qichao, Zhao Dongbin, Li Dong. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI). 2018, 第 1 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[58] Yuanheng Zhu, Qichao Zhang, Dongbin Zhao, Kun Shao. Visual navigation with Actor-Critic deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23365.
[59] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS[J]. 2017, 第 1 作者  通讯作者  64(5): 4101-4109, https://www.webofscience.com/wos/woscc/full-record/WOS:000399674000064.
[60] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[61] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. NEUROCOMPUTING[J]. 2017, 第 3 作者238(*): 377-386, http://dx.doi.org/10.1016/j.neucom.2017.01.076.
[62] 朱圆恒. 利用自适应动态规划实现对部分未知、控制受限系统的事件驱动最优控制. IEEE Transactions on Industrial Electronics. 2017, 第 1 作者
[63] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[64] 朱圆恒. 利用平方和编程实现对多项式非线性系统H无穷最优控制的策略迭代求解. IEEE transactions on cybernetics. 2017, 第 1 作者
[65] 朱圆恒. 基于在线数据使用迭代自适应动态规划求解未知非线性零和博弈问题. IEEE Transactions on Neural Networks and Learning Systems. 2017, 第 1 作者
[66] 朱圆恒. 数据驱动自适应动态规划求解部分输入受限的连续时间完全合作博弈问题. Neurocomputing. 2017, 第 1 作者
[67] 朱圆恒, 赵冬斌, 邵坤. Cooperative Reinforcement Learning for Multiple Units Combat in StarCraft. 2017, 第 1 作者http://ir.ia.ac.cn/handle/173211/15399.
[68] 唐振韬, 邵坤, 赵冬斌, 朱圆恒. 深度强化学习进展: 从 AlphaGo 到 AlphaGo Zero. 控 制 理 论 与 应 用[J]. 2017, 第 4 作者34(12): 1529-1546, http://lib.cqvip.com/Qikan/Article/Detail?id=7000480876.
[69] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2017, 第 1 作者  通讯作者  28(3): 714-725, https://www.webofscience.com/wos/woscc/full-record/WOS:000395980500020.
[70] 朱圆恒. 自适应动态规划实现未知连续时间非线性系统的鲁棒网络控制. IET Control Theory & Applications. 2017, 第 1 作者
[71] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Event-Triggered H-infinity Control for Continuous-Time Nonlinear System via Concurrent Learning. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2017, 第 3 作者47(7): 1071-1081, https://www.webofscience.com/wos/woscc/full-record/WOS:000404354600004.
[72] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET CONTROL THEORY AND APPLICATIONS[J]. 2016, 第 1 作者10(12): 1339-1347, 
[73] Zhu Yuanheng, Chen Xi, Zhao Dongbin, Zhang Qichao. Model-free reinforcement learning for nonlinear zero-sum games with simultaneous explorations. 2016, 第 1 作者http://ir.ia.ac.cn/handle/173211/14340.
[74] Tang Zhentao, Shao Kun, Zhao Bongbin, Zhu Yuanheng. Move Prediction in Gomoku Using Deep Learning. 2016, 第 4 作者http://ir.ia.ac.cn/handle/173211/15673.
[75] Zhao Dongbin, Wang Haitao, Shao Kun, Zhu Yuanheng, IEEE. Deep Reinforcement Learning with Experience Replay Based on SARSA. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI). 2016, 第 4 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000400488300013.
[76] 朱圆恒. 使用强化学习技术求解在系统动力学未知情况下连续时间非线性最优追踪问题. IET Control Theory Applications. 2016, 第 1 作者
[77] Zhao, Dongbin, Zhang, Qichao, Wang, Ding, Zhu, Yuanheng. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2016, 第 4 作者  通讯作者  46(3): 854-865, https://www.webofscience.com/wos/woscc/full-record/WOS:000370963500023.
[78] 赵冬斌, 朱圆恒. 概率近似正确的强化学习算法解决连续状态空间控制问题. 控制理论与应用[J]. 2016, 第 2 作者33(12): 1603-1613, http://lib.cqvip.com/Qikan/Article/Detail?id=7000119656.
[79] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. COGNITIVE COMPUTATION[J]. 2015, 第 1 作者7(6): 763-771, http://ir.ia.ac.cn/handle/173211/10525.
[80] Zhao, Dongbin, Zhu, Yuanheng. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2015, 第 2 作者26(2): 346-356, http://www.irgrid.ac.cn/handle/1471x/980893.
[81] Zhu, Yuanheng, Zhao, Dongbin. A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. NEURAL COMPUTING & APPLICATIONS[J]. 2015, 第 1 作者26(4): 775-787, http://www.irgrid.ac.cn/handle/1471x/980902.
[82] 朱圆恒. 对离散时间系统无衰减最优控制使用近似策略迭代的收敛性证明. Cognitive Computation. 2015, 第 1 作者
[83] 朱圆恒. MEC对连续确定性系统的近似最优在线强化学习算法. IEEE Transactions on Neural Networks and Learning Systems. 2015, 第 1 作者
[84] 赵冬斌, Yuanheng Zhu. Model-Free Adaptive Algorithm for Optimal Control of Continuous-Time Nonlinear System. 2015, 第 2 作者http://ir.ia.ac.cn/handle/173211/15282.
[85] Li Dong, Zhao Dongbin, Zhu Yuanheng, Xia Zhongpu, IEEE. Thermal Comfort Control Based on MEC Algorithm for HVAC Systems. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2015, 第 3 作者
[86] Zhu, Yuanheng, Zhao, Dongbin, Liu, Derong. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. NEUROCOMPUTING[J]. 2015, 第 1 作者149: 124-131, http://dx.doi.org/10.1016/j.neucom.2013.11.055.
[87] 朱圆恒. 基于数据的在线强化学习算法实现概率近似正确原理. Neural Computing and Applications. 2015, 第 1 作者
[88] Li Dong, Xia Zhongpu, Zhu Yuanheng, Zhao Dongbin. Thermal Comfort Control Based on MEC Algorithm for HVAC System. 2015, 第 3 作者http://ir.ia.ac.cn/handle/173211/15667.
[89] 朱圆恒. 对非线性离散时间HJB系统的收敛分析和模糊HDP方法应用. Neurocomputing. 2015, 第 1 作者
[90] Yuanheng Zhu, 赵冬斌. A data-based online reinforcement learning algorithm with high-efficient exploration. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15283.
[91] Zhao, Dongbin, Hu, Zhaohui, Xia, Zhongpu, Alippi, Cesare, Zhu, Yuanheng, Wang, Ding. Full-range adaptive cruise control based on supervised adaptive dynamic programming. NEUROCOMPUTING[J]. 2014, 第 5 作者125: 57-67, http://dx.doi.org/10.1016/j.neucom.2012.09.034.
[92] Yuanheng Zhu, Dongbin Zhao, Haibo He. An high-efficient online reinforcement learning algorithm for continuous-state systems. IEEE WORLD CONGRESSON INTELLIGENT CONTROL AND AUTOMATION (WCICA). 2014, 第 1 作者581-586., http://www.irgrid.ac.cn/handle/1471x/973405.
[93] Yuanheng Zhu, 赵冬斌. Online reinforcement learning for continuous-state systems. FRONTIERS OF INTELLIGENT CONTROL AND INFORMATION PROCESSING. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15280.
[94] 赵冬斌, Yuanheng Zhu. Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. 2013, 第 2 作者http://ir.ia.ac.cn/handle/173211/15281.
[95] Yuanheng Zhu, Dongbin Zhao, Haibo He. Integration of fuzzy controller with adaptive dynamic programming. IEEE WORLD CONGRESSON INTELLIGENT CONTROL AND AUTOMATION (WCICA). 2012, 第 1 作者310-315, http://www.irgrid.ac.cn/handle/1471x/973407.
[96] Zhao, Dongbin, Zhu, Yuanheng, He, Haibo. Neural and Fuzzy Dynamic Programming for Under-actuated Systems. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2012, 第 2 作者http://www.irgrid.ac.cn/handle/1471x/973390.
发表著作
(1) 智能网联汽车决策控制技术, 人民交通出版社股份有限公司, 2023-03, 第 3 作者
(2) 游戏人工智能方法, 科学出版社, 2024-02, 第 2 作者

科研活动

   
科研项目
( 1 ) 面向多任务的多智能体强化学习理论与应用, 参与, 国家任务, 2022-01--2026-12
( 2 ) 多智能体深度强化学习, 负责人, 中国科学院计划, 2022-01--2024-12
( 3 ) 开放环境多智能体对抗博弈基础理论, 负责人, 国家任务, 2023-01--2027-12
参与会议
(1)Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming   2019-07-14
(2)Driving Control with Deep and Reinforcement Learning in The Open Racing Car Simulator   2018-12-13
(3)Convolutional fitted Q iteration for vision-based control problems   2016-07-24
(4)Model-free adaptive algorithm for optimal control of continuous-time nonlinear system   2015-11-27
(5)A data-based online reinforcement learning algorithm with high-efficient exploration   2014-12-09
(6)An high-efficient online reinforcement learning algorithm for continuous-state systems   2014-06-29
(7)Online Model-Free {RLSPI} Algorithm for Nonlinear Discrete-Time Non-affine Systems   2013-11-03

合作情况

   
项目协作单位

电科院, 航天二院, 华为,沈飞601,航天二院未来实验室,超参数


指导学生

现指导学生

刘元勋  硕士研究生  081203-计算机应用技术  

李博宇  硕士研究生  081101-控制理论与控制工程  

傅宇千  博士研究生  081104-模式识别与智能系统  

陈文章  硕士研究生  081101-控制理论与控制工程  

协助指导学生

学生        学位     时间                     毕业去向

邵坤        硕博     2014.9/2019.7     华为

唐振韬     直博     2016.9/2022.7     华为

李伟凡     普博     2018.9/2023.7     智能研究院

刘元勋     硕士     2020.9/2023.7     外交部

胡光政     普博     2019.9/2024.7     美团

刘民颂     硕博     2018.9/2024.7     军科院

黄上京     硕士     2021.9/2024.7     快手

柴嘉骏     直博     2019.9/至今

陆润宇     直博     2021.9/至今

赵子杰     直博     2022.9/至今

徐凯旋     直博     2023.9/至今