基本信息

朱圆恒  男    中国科学院自动化研究所
电子邮件: yuanheng.zhu@ia.ac.cn
通信地址: 北京市中关村东路95号
邮政编码: 100190

研究领域

深度强化学习,多智能体强化学习,游戏人工智能,智能驾驶,群体疏散

招生信息

我每年招收硕士研究生1-2名,欢迎具有自动控制、计算机、电子、数学专业的考生联系报考, 可与我邮件联系 yuanheng.zhu@ia.ac.cn 。

招生专业
081101-控制理论与控制工程
招生方向
强化学习, 自适应动态规划, 深度强化学习

教育背景

2010-09--2015-07   中国科学院自动化研究所   博士学位
2006-09--2010-07   南京大学   学士学位

工作经历

2017-10~现在, 中国科学院自动化研究所, 副研究员
2017-12~2018-12,美国罗德岛大学, 访问学者
2015-07~2017-10,中国科学院自动化研究所, 助理研究员

社会兼职
2020-01-01-今,IEEE Computational Intelligence Society, 暑期学校委员会主席
2019-04-23-今,中国自动化学会 数据驱动控制、学习与优化专业委员会, 委员
2017-09-30-今,中国自动化学会 自适应动态规划与强化学习专业委员会, 委员
2016-01-01-2016-12-31,IEEE Computational Intelligence Society, 旅行资助委员会主席

教授课程

强化学习
智能控制

专利与奖励

(1) 朱圆恒(1/1); 中国科学院公派出国留学, 中国科学院, 2016.

(2) 朱圆恒(3/9); 《控制理论与应用》2017年度优秀论文奖, 《控制理论与应用》编辑委员会, 2018 (赵冬斌*; 邵坤; 朱圆恒; 李栋; 陈亚冉; 王海涛; 刘德荣; 周彤; 王成红).
(3) 朱圆恒(3/4); 2019年中国AI+创新创业大赛,一等奖, 中国人工智能学会, 指导老师, 2019 (陈亚冉; 张启超; 朱圆恒; 赵冬斌).

专利成果
[1] 朱圆恒, 李伟凡, 熊华, 赵冬斌. 一种基于强化学习的导弹制导方法和装置. CN: CN113239472A, 2021-08-10.
[2] 朱圆恒, 赵冬斌. 基于加速度前馈的异构车队协同自适应巡航控制方法. CN: CN110888322B, 2021-04-13.
[3] 赵冬斌, 李栋, 张启超, 陈亚冉, 朱圆恒. 智能驾驶车道保持方法及系统. CN: CN109466552B, 2020-07-28.
[4] 赵冬斌, 邵坤, 朱圆恒. 基于反事实回报的多智能体深度强化学习方法、系统. CN: CN111105034A, 2020-05-05.
[5] 赵冬斌, 卜丽, 朱圆恒, 李相俊. 储能电池充/放电异常行为检测方法及检测系统. CN: CN106154180B, 2019-02-05.
[6] 王鼎, 张启超, 朱圆恒. 弹簧质量阻尼器的鲁棒跟踪控制方法. 中国: CN108303876A, 2018-07-20.
[7] 赵冬斌, 朱圆恒, 刘德荣. 基于数据的Q函数自适应动态规划方法. 中国: CN103217899A, 2013-07-24.
[8] 赵冬斌, 王滨, 刘德荣, 魏庆来, 朱圆恒, 苏永生. 煤气化炉的控制方法. 中国: CN102799748A, 2012-11-28.
[9] 赵冬斌, 朱圆恒. 模糊自适应动态规划方法. 中国: CN102645894A, 2012-08-22.

出版信息

   
发表论文
[1] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin. LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS[J]. 2020, 21(11): 4516-4525, https://www.webofscience.com/wos/woscc/full-record/WOS:000587709700003.
[2] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Synthesis of Cooperative Adaptive Cruise Control With Feedforward Strategies. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY[J]. 2020, 69(4): 3615-3627, https://www.webofscience.com/wos/woscc/full-record/WOS:000530284400009.
[3] Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin, IEEE. Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2020, [4] 朱圆恒. 在线最小最大Q 网络学习算法解决两人零和马尔科夫博弈过程. IEEE Transactions on Neural Networks and Learning Systems. 2020, [5] Liu, Minsong, Zhu, Yuanheng, Zhao, Dongbin, IEEE. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2020, [6] 朱圆恒. 基于前馈策略对协同自适应巡航控制的设计. IEEE Transactions on Vehicular Technology. 2020, [7] 朱圆恒. 强化水平滚动演化计算算法和对手建模. IEEE Transactions on Games. 2020, [8] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2020, 50(11): 3959-3971, https://www.webofscience.com/wos/woscc/full-record/WOS:000578826300003.
[9] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun, Wang, Ding. Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems. IEEE TRANSACTIONS ON SMART GRID[J]. 2019, 10(4): 4235-4244, https://www.webofscience.com/wos/woscc/full-record/WOS:000472577500065.
[10] Shao, Kun, Zhu, Yuanheng, Zha, Dongbin. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2019, 3(1): 73-84, http://dx.doi.org/10.1109/TETCI.2018.2823329.
[11] 朱圆恒. 基于LMI设计协同自适应巡航控制系统满足弦稳定的控制器. IEEE Transactions on Intelligent Transportation Systems. 2019, [12] 朱圆恒. 强化学习和课程迁移学习结合实现星际争霸微操控制. IEEE Transactions on Emerging Topics in Computational Intelligence. 2019, [13] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin, Hou, Zhongsheng, IEEE. Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2019, [14] 朱圆恒. 基于深度和强化学习对开源赛车仿真器的视觉驾驶. Journal of Ambient Intelligence and Humanized Computing. 2019, [15] 朱圆恒. 控制受限自适应动态规划方法对多电池存储系统的设计. IEEE Transactions on Smart Grid. 2019, [16] 朱圆恒. 不变自适应动态规划方法求解离散时间系统最优控制. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019, [17] Zhu, Yuanheng, Zhao, Dongbin, Zhong, Zhiguang. Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY[J]. 2019, 27(4): 1772-1779, [18] 朱圆恒. 对动力学带有不确定性的异构协同自适应巡航控制系统的自适应最优控制. IEEE Transactions on Control Systems Technology. 2019, [19] Tang Zhentao, Shao Kun, Zhu Yuanheng, Li Dong, Zhao Dongbin, Huang Tingwen, Sundaram S. A Review of Computational Intelligence for StarCraft AI. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1167-1173, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800159.
[20] Zhu, Yuanheng, Zhao, Dongbin. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. ARTIFICIAL INTELLIGENCE REVIEW[J]. 2018, 49(4): 531-547, https://www.webofscience.com/wos/woscc/full-record/WOS:000426912500004.
[21] Li Dong, Zhao Dongbin, Zhang Qichao, Zhu Yuanheng, Sundaram S. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[22] Zhu, Yuanheng, Zhao, Dongbin, Yang, Xiong, Zhang, Qichao. Policy Iteration for H infinity Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2018, 48(2): 500-509, https://www.webofscience.com/wos/woscc/full-record/WOS:000422925700005.
[23] Yuanheng Zhu, Nannan Li, Kun Shao, Dongbin Zhao. Learning battles in ViZDoom via deep reinforcement learning. 2018, http://ir.ia.ac.cn/handle/173211/23364.
[24] Shao, Kun, Zhao, Dongbin, Zhu, Yuanheng, Zhang, Qichao, IEEE. Visual Navigation with Actor-Critic Deep Reinforcement Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2018, [25] 朱圆恒. 针对连续时间最优控制的在线自适应动态规划方法的综合比较. Artificial Intelligence Review. 2018, [26] Zhu Yuanheng, Zhang Qichao, Zhao Dongbin, Li Dong. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[27] Yuanheng Zhu, Qichao Zhang, Dongbin Zhao, Kun Shao. Visual navigation with Actor-Critic deep reinforcement learning. 2018, http://ir.ia.ac.cn/handle/173211/23365.
[28] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS[J]. 2017, 64(5): 4101-4109, https://www.webofscience.com/wos/woscc/full-record/WOS:000399674000064.
[29] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[30] 朱圆恒. 利用自适应动态规划实现对部分未知、控制受限系统的事件驱动最优控制. IEEE Transactions on Industrial Electronics. 2017, [31] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[32] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. NEUROCOMPUTING[J]. 2017, 238(*): 377-386, http://dx.doi.org/10.1016/j.neucom.2017.01.076.
[33] 朱圆恒. 利用平方和编程实现对多项式非线性系统H无穷最优控制的策略迭代求解. IEEE transactions on cybernetics. 2017, [34] 朱圆恒. 基于在线数据使用迭代自适应动态规划求解未知非线性零和博弈问题. IEEE Transactions on Neural Networks and Learning Systems. 2017, [35] 唐振韬, 邵坤, 赵冬斌, 朱圆恒. 深度强化学习进展:从AlphaGo到AlphaGo Zero. 控制理论与应用[J]. 2017, 34(12): 1529-1546, http://lib.cqvip.com/Qikan/Article/Detail?id=7000480876.
[36] 朱圆恒. 数据驱动自适应动态规划求解部分输入受限的连续时间完全合作博弈问题. Neurocomputing. 2017, [37] 朱圆恒, 赵冬斌, 邵坤. Cooperative Reinforcement Learning for Multiple Units Combat in StarCraft. 2017, http://ir.ia.ac.cn/handle/173211/15399.
[38] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2017, 28(3): 714-725, https://www.webofscience.com/wos/woscc/full-record/WOS:000395980500020.
[39] 朱圆恒. 自适应动态规划实现未知连续时间非线性系统的鲁棒网络控制. IET Control Theory & Applications. 2017, [40] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Event-Triggered H-infinity Control for Continuous-Time Nonlinear System via Concurrent Learning. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2017, 47(7): 1071-1081, https://www.webofscience.com/wos/woscc/full-record/WOS:000404354600004.
[41] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET CONTROL THEORY AND APPLICATIONS[J]. 2016, 10(12): 1339-1347, [42] Zhu Yuanheng, Chen Xi, Zhao Dongbin, Zhang Qichao. Model-free reinforcement learning for nonlinear zero-sum games with simultaneous explorations. 2016, http://ir.ia.ac.cn/handle/173211/14340.
[43] Tang Zhentao, Shao Kun, Zhao Bongbin, Zhu Yuanheng. Move Prediction in Gomoku Using Deep Learning. 2016, http://ir.ia.ac.cn/handle/173211/15673.
[44] 朱圆恒. 使用强化学习技术求解在系统动力学未知情况下连续时间非线性最优追踪问题. IET Control Theory Applications. 2016, [45] Zhao, Dongbin, Zhang, Qichao, Wang, Ding, Zhu, Yuanheng. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2016, 46(3): 854-865, https://www.webofscience.com/wos/woscc/full-record/WOS:000370963500023.
[46] 赵冬斌, 朱圆恒. 概率近似正确的强化学习算法解决连续状态空间控制问题. 控制理论与应用. 2016, 33(12): 1603-1613, http://lib.cqvip.com/Qikan/Article/Detail?id=7000119656.
[47] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. COGNITIVE COMPUTATION[J]. 2015, 7(6): 763-771, http://ir.ia.ac.cn/handle/173211/10525.
[48] 朱圆恒. 对离散时间系统无衰减最优控制使用近似策略迭代的收敛性证明. Cognitive Computation. 2015, [49] Zhao, Dongbin, Zhu, Yuanheng. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2015, 26(2): 346-356, http://www.irgrid.ac.cn/handle/1471x/980893.
[50] Zhu, Yuanheng, Zhao, Dongbin. A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. NEURAL COMPUTING & APPLICATIONS[J]. 2015, 26(4): 775-787, http://www.irgrid.ac.cn/handle/1471x/980902.
[51] 朱圆恒. MEC对连续确定性系统的近似最优在线强化学习算法. IEEE Transactions on Neural Networks and Learning Systems. 2015, [52] 赵冬斌, Yuanheng Zhu. Model-Free Adaptive Algorithm for Optimal Control of Continuous-Time Nonlinear System. 2015, http://ir.ia.ac.cn/handle/173211/15282.
[53] 朱圆恒. 连续状态系统的近似最优在线强化学习. 2015, http://www.irgrid.ac.cn/handle/1471x/977322.
[54] Zhu, Yuanheng, Zhao, Dongbin, Liu, Derong. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. NEUROCOMPUTING[J]. 2015, 149: 124-131, http://dx.doi.org/10.1016/j.neucom.2013.11.055.
[55] 朱圆恒. 基于数据的在线强化学习算法实现概率近似正确原理. Neural Computing and Applications. 2015, [56] Li Dong, Xia Zhongpu, Zhu Yuanheng, Zhao Dongbin. Thermal Comfort Control Based on MEC Algorithm for HVAC System. 2015, http://ir.ia.ac.cn/handle/173211/15667.
[57] 朱圆恒. 对非线性离散时间HJB系统的收敛分析和模糊HDP方法应用. Neurocomputing. 2015, [58] Yuanheng Zhu, 赵冬斌. A data-based online reinforcement learning algorithm with high-efficient exploration. 2014, http://ir.ia.ac.cn/handle/173211/15283.
[59] Zhao, Dongbin, Hu, Zhaohui, Xia, Zhongpu, Alippi, Cesare, Zhu, Yuanheng, Wang, Ding. Full-range adaptive cruise control based on supervised adaptive dynamic programming. NEUROCOMPUTING[J]. 2014, 125: 57-67, http://dx.doi.org/10.1016/j.neucom.2012.09.034.
[60] Yuanheng Zhu, Dongbin Zhao, Haibo He. An high-efficient online reinforcement learning algorithm for continuous-state systems. IEEE World Congresson Intelligent Control and Automation (WCICA)null. 2014, 581-586., http://www.irgrid.ac.cn/handle/1471x/973405.
[61] Yuanheng Zhu, 赵冬斌. Online reinforcement learning for continuous-state systems. Frontiers of Intelligent Control and Information Processingnull. 2014, http://ir.ia.ac.cn/handle/173211/15280.
[62] 赵冬斌, Yuanheng Zhu. Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. 2013, http://ir.ia.ac.cn/handle/173211/15281.
[63] Yuanheng Zhu, Dongbin Zhao, Haibo He. Integration of fuzzy controller with adaptive dynamic programming. IEEE World Congresson Intelligent Control and Automation (WCICA)null. 2012, 310-315, http://www.irgrid.ac.cn/handle/1471x/973407.
[64] Zhao, Dongbin, Zhu, Yuanheng, He, Haibo. Neural and Fuzzy Dynamic Programming for Under-actuated Systems. International Joint Conference on Neural Networks (IJCNN)null. 2012, http://www.irgrid.ac.cn/handle/1471x/973390.

科研活动

   
科研项目
( 1 ) 非完全信息条件下的博弈决策子课题-知识与数据共同驱动的深度强化学习算法, 参与, 国家级, 2020-01--2022-12
( 2 ) “蜂群”多智能体系统群智激发汇聚研究与实现子课题-“蜂群”系统群智实时推理与对抗技术, 参与, 国家级, 2020-01--2023-05
( 3 ) 基于事件驱动自适应动态规划的模型未知非线性系统最优控制, 主持, 国家级, 2017-01--2019-12
( 4 ) 深度自适应动态规划理论方法和应用, 参与, 国家级, 2016-01--2019-12
( 5 ) 基于深度强化学习的单弹和双弹围捕机动目标三维制导律研究, 主持, 院级, 2019-11--2020-11
( 6 ) 多个储能控制单元自治协调控制策略研究, 主持, 院级, 2018-01--2018-10
( 7 ) 储能系统多源数据融合与分析装置, 主持, 院级, 2016-05--2016-11
参与会议
(1)Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming   2019-07-14
(2)Driving Control with Deep and Reinforcement Learning in The Open Racing Car Simulator   2018-12-13
(3)Convolutional fitted Q iteration for vision-based control problems   2016-07-24
(4)Model-free adaptive algorithm for optimal control of continuous-time nonlinear system   2015-11-27
(5)A data-based online reinforcement learning algorithm with high-efficient exploration   2014-12-09
(6)An high-efficient online reinforcement learning algorithm for continuous-state systems   2014-06-29
(7)Online Model-Free {RLSPI} Algorithm for Nonlinear Discrete-Time Non-affine Systems   2013-11-03

合作情况

   
项目协作单位

电科院, 航天二院, 华为


指导学生

   
协助指导学生

学生        学位     时间                     毕业去向

邵坤        硕博     2014.9/2019.7     华为

唐振韬     直博     2016.9/至今

李伟凡     普博     2018.9/至今

胡光政     普博     2019.9/至今

刘民颂     硕博     2018.9/至今