基本信息

朱圆恒  男    中国科学院自动化研究所
电子邮件: yuanheng.zhu@ia.ac.cn
通信地址: 北京市中关村东路95号
邮政编码: 100190

研究领域

智能博弈决策,深度强化学习,多智能体强化学习,游戏人工智能

招生信息

我每年招收硕士研究生1-2名,欢迎具有自动控制、计算机、电子、数学专业的考生联系报考, 可与我邮件联系 yuanheng.zhu@ia.ac.cn 。

招生专业
081101-控制理论与控制工程
招生方向
多智能体强化学习, 多智能体博弈对抗, 深度强化学习

教育背景

2010-09--2015-07   中国科学院自动化研究所   博士学位
2006-09--2010-07   南京大学   学士学位

工作经历

2017-10~现在, 中国科学院自动化研究所, 副研究员
2017-12~2018-12,美国罗德岛大学, 访问学者
2015-07~2017-10,中国科学院自动化研究所, 助理研究员

社会兼职
2022-08-21-2022-08-24,2022 IEEE Conference on Games, Program Co-Chair
2022-01-01-2022-12-31,IEEE CIS Content Creation Subcommittee, Subcommittee Chair
2022-01-01-2022-12-31,IEEE TNNLS Associated Editor, Associated Editor
2020-01-01-今,IEEE Computational Intelligence Society, 暑期学校委员会主席
2019-04-23-今,中国自动化学会 数据驱动控制、学习与优化专业委员会, 委员
2017-09-30-今,中国自动化学会 自适应动态规划与强化学习专业委员会, 委员
2016-01-01-2016-12-31,IEEE Computational Intelligence Society, 旅行资助委员会主席

教授课程

强化学习
智能控制

专利与奖励

(1) 朱圆恒(1/1); 中国科学院公派出国留学, 中国科学院, 2016.

(2) 朱圆恒(3/9); 《控制理论与应用》2017年度优秀论文奖, 《控制理论与应用》编辑委员会, 2018 (赵冬斌*; 邵坤; 朱圆恒; 李栋; 陈亚冉; 王海涛; 刘德荣; 周彤; 王成红).
(3) 朱圆恒(3/4); 2019年中国AI+创新创业大赛,一等奖, 中国人工智能学会, 指导老师, 2019 (陈亚冉; 张启超; 朱圆恒; 赵冬斌).

(4) 朱圆恒(2/10); 2022年度北京市科学技术奖科学自然科学奖,二等奖。

专利成果
( 1 ) 多机器人协作对抗方法、装置、电子设备和存储介质, 发明专利, 2022, 第 3 作者, 专利号: CN113894780A

( 2 ) 基于强化学习的变化环境多智能体控制方法与装置, 发明专利, 2021, 第 1 作者, 专利号: CN113837348A

( 3 ) 一种基于强化学习的导弹制导方法和装置, 发明专利, 2021, 第 1 作者, 专利号: CN113239472A

( 4 ) 基于加速度前馈的异构车队协同自适应巡航控制方法, 专利授权, 2021, 第 1 作者, 专利号: CN110888322B

( 5 ) 基于反事实回报的多智能体深度强化学习方法、系统, 发明专利, 2020, 第 3 作者, 专利号: CN111105034A

( 6 ) 智能驾驶车道保持方法及系统, 专利授权, 2019, 第 5 作者, 专利号: CN109466552A

( 7 ) 多电池储能系统的优化控制方法、系统及存储介质, 发明专利, 2019, 第 1 作者, 专利号: CN109245196A

( 8 ) 弹簧质量阻尼器的鲁棒跟踪控制方法, 发明专利, 2018, 第 3 作者, 专利号: CN108303876A

( 9 ) 储能电池充/放电异常行为检测方法及检测系统, 专利授权, 2016, 第 3 作者, 专利号: CN106154180A

( 10 ) 基于数据的Q函数自适应动态规划方法, 发明专利, 2013, 第 2 作者, 专利号: CN103217899A

( 11 ) 煤气化炉的控制方法, 发明专利, 2012, 第 5 作者, 专利号: CN102799748A

( 12 ) 模糊自适应动态规划方法, 发明专利, 2012, 第 2 作者, 专利号: CN102645894A

出版信息

   
发表论文
[1] Chai, Jiajun, Chen, Wenzhang, Zhu, Yuanheng, Yao, ZongXin, Zhao, Dongbin. A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS. 2023, http://dx.doi.org/10.1109/TSMC.2023.3270444.
[2] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 34(4): 2093-2104, http://dx.doi.org/10.1109/TNNLS.2021.3105869.
[3] Hu, Guangzheng, Li, Haoran, Liu, Shasha, Zhu, Yuanheng, Zhao,Dongbin. NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks. 2023 International Joint Conference on Neural Networks(IJCNN)null. 2023, [4] 刘民颂, 李论通, 邵帅, 朱圆恒, 赵冬斌. Soft Contrastive Learning with Q-irrelevance Abstraction for Reinforcement Learning. Ieee transactions on cognitive and developmental systems[J]. 2022, [5] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n-Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2022, http://dx.doi.org/10.1109/TCYB.2022.3179775.
[6] Zhu, Yuanheng, Zhao, Dongbin. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2022, 33(3): 1228-1241, http://dx.doi.org/10.1109/TNNLS.2020.3041469.
[7] 唐振韬, 梁荣钦, 朱圆恒, 赵冬斌. 实时格斗游戏的智能决策方法. 控制理论与应用. 2022, 39(6): 969-985, https://d.wanfangdata.com.cn/periodical/kzllyyy202206001.
[8] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEX & INTELLIGENT SYSTEMS[J]. 2022, 8(2): 1205-1216, http://dx.doi.org/10.1007/s40747-021-00577-6.
[9] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000733450200001.
[10] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Optimal Feedback Control of Pedestrian Flow in Heterogeneous Corridors. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING[J]. 2021, 18(3): 1097-1108, http://dx.doi.org/10.1109/TASE.2020.2996018.
[11] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEXINTELLIGENTSYSTEMS[J]. 2021, 12-, [12] Yang, Xiong, Zhu, Yuanheng, Dong, Na, Wei, Qinglai. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, [13] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000732283100001.
[14] Tang Zhentao, Zhu Yuanheng, Zhao Dongbin, Lucas Simon M. Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition. 2020, http://arxiv.org/abs/2003.13949.
[15] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin. LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS[J]. 2020, 21(11): 4516-4525, https://www.webofscience.com/wos/woscc/full-record/WOS:000587709700003.
[16] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Synthesis of Cooperative Adaptive Cruise Control With Feedforward Strategies. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY[J]. 2020, 69(4): 3615-3627, https://www.webofscience.com/wos/woscc/full-record/WOS:000530284400009.
[17] Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin, IEEE. Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2020, [18] 朱圆恒. 在线最小最大Q 网络学习算法解决两人零和马尔科夫博弈过程. IEEE Transactions on Neural Networks and Learning Systems. 2020, [19] Liu, Minsong, Zhu, Yuanheng, Zhao, Dongbin, IEEE. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2020, [20] 朱圆恒. 基于前馈策略对协同自适应巡航控制的设计. IEEE Transactions on Vehicular Technology. 2020, [21] 朱圆恒. 强化水平滚动演化计算算法和对手建模. IEEE Transactions on Games. 2020, [22] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2020, 50(11): 3959-3971, http://dx.doi.org/10.1109/TSMC.2019.2911900.
[23] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun, Wang, Ding. Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems. IEEE TRANSACTIONS ON SMART GRID[J]. 2019, 10(4): 4235-4244, https://www.webofscience.com/wos/woscc/full-record/WOS:000472577500065.
[24] Shao, Kun, Zhu, Yuanheng, Zha, Dongbin. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2019, 3(1): 73-84, http://dx.doi.org/10.1109/TETCI.2018.2823329.
[25] 朱圆恒. 强化学习和课程迁移学习结合实现星际争霸微操控制. IEEE Transactions on Emerging Topics in Computational Intelligence. 2019, [26] 朱圆恒. 基于LMI设计协同自适应巡航控制系统满足弦稳定的控制器. IEEE Transactions on Intelligent Transportation Systems. 2019, [27] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin, Hou, Zhongsheng, IEEE. Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2019, [28] 朱圆恒. 基于深度和强化学习对开源赛车仿真器的视觉驾驶. JOURNALOFAMBIENTINTELLIGENCEANDHUMANIZEDCOMPUTING. 2019, [29] 朱圆恒. 控制受限自适应动态规划方法对多电池存储系统的设计. IEEE Transactions on Smart Grid. 2019, [30] 朱圆恒. 不变自适应动态规划方法求解离散时间系统最优控制. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019, [31] Zhu, Yuanheng, Zhao, Dongbin, Zhong, Zhiguang. Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY[J]. 2019, 27(4): 1772-1779, [32] 朱圆恒. 对动力学带有不确定性的异构协同自适应巡航控制系统的自适应最优控制. IEEE Transactions on Control Systems Technology. 2019, [33] Li Dong, Zhao Dongbin, Zhang Qichao, Zhu Yuanheng, Sundaram S. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[34] Yuanheng Zhu, Nannan Li, Kun Shao, Dongbin Zhao. Learning battles in ViZDoom via deep reinforcement learning. 2018, http://ir.ia.ac.cn/handle/173211/23364.
[35] Zhu, Yuanheng, Zhao, Dongbin, Yang, Xiong, Zhang, Qichao. Policy Iteration for H infinity Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2018, 48(2): 500-509, https://www.webofscience.com/wos/woscc/full-record/WOS:000422925700005.
[36] Shao, Kun, Zhao, Dongbin, Zhu, Yuanheng, Zhang, Qichao, IEEE. Visual Navigation with Actor-Critic Deep Reinforcement Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2018, [37] Zhu Yuanheng, Zhang Qichao, Zhao Dongbin, Li Dong. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.
[38] 朱圆恒. 针对连续时间最优控制的在线自适应动态规划方法的综合比较. Artificial Intelligence Review. 2018, [39] Yuanheng Zhu, Qichao Zhang, Dongbin Zhao, Kun Shao. Visual navigation with Actor-Critic deep reinforcement learning. 2018, http://ir.ia.ac.cn/handle/173211/23365.
[40] Tang Zhentao, Shao Kun, Zhu Yuanheng, Li Dong, Zhao Dongbin, Huang Tingwen, Sundaram S. A Review of Computational Intelligence for StarCraft AI. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)null. 2018, 1167-1173, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800159.
[41] Zhu, Yuanheng, Zhao, Dongbin. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. ARTIFICIAL INTELLIGENCE REVIEW[J]. 2018, 49(4): 531-547, https://www.webofscience.com/wos/woscc/full-record/WOS:000426912500004.
[42] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS[J]. 2017, 64(5): 4101-4109, https://www.webofscience.com/wos/woscc/full-record/WOS:000399674000064.
[43] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[44] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. NEUROCOMPUTING[J]. 2017, 238(*): 377-386, http://dx.doi.org/10.1016/j.neucom.2017.01.076.
[45] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.
[46] 朱圆恒. 利用自适应动态规划实现对部分未知、控制受限系统的事件驱动最优控制. IEEE Transactions on Industrial Electronics. 2017, [47] 朱圆恒. 利用平方和编程实现对多项式非线性系统H无穷最优控制的策略迭代求解. IEEE transactions on cybernetics. 2017, [48] 朱圆恒. 基于在线数据使用迭代自适应动态规划求解未知非线性零和博弈问题. IEEE Transactions on Neural Networks and Learning Systems. 2017, [49] 朱圆恒, 赵冬斌, 邵坤. Cooperative Reinforcement Learning for Multiple Units Combat in StarCraft. 2017, http://ir.ia.ac.cn/handle/173211/15399.
[50] 朱圆恒. 数据驱动自适应动态规划求解部分输入受限的连续时间完全合作博弈问题. Neurocomputing. 2017, [51] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2017, 28(3): 714-725, https://www.webofscience.com/wos/woscc/full-record/WOS:000395980500020.
[52] 唐振韬, 邵坤, 赵冬斌, 朱圆恒. 深度强化学习进展: 从 AlphaGo 到 AlphaGo Zero. 控 制 理 论 与 应 用[J]. 2017, 34(12): 1529-1546, http://lib.cqvip.com/Qikan/Article/Detail?id=7000480876.
[53] 朱圆恒. 自适应动态规划实现未知连续时间非线性系统的鲁棒网络控制. IET Control Theory & Applications. 2017, [54] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Event-Triggered H-infinity Control for Continuous-Time Nonlinear System via Concurrent Learning. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2017, 47(7): 1071-1081, https://www.webofscience.com/wos/woscc/full-record/WOS:000404354600004.
[55] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET CONTROL THEORY AND APPLICATIONS[J]. 2016, 10(12): 1339-1347, [56] Zhu Yuanheng, Chen Xi, Zhao Dongbin, Zhang Qichao. Model-free reinforcement learning for nonlinear zero-sum games with simultaneous explorations. 2016, http://ir.ia.ac.cn/handle/173211/14340.
[57] Tang Zhentao, Shao Kun, Zhao Bongbin, Zhu Yuanheng. Move Prediction in Gomoku Using Deep Learning. 2016, http://ir.ia.ac.cn/handle/173211/15673.
[58] Zhao Dongbin, Wang Haitao, Shao Kun, Zhu Yuanheng, IEEE. Deep Reinforcement Learning with Experience Replay Based on SARSA. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI)null. 2016, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000400488300013.
[59] 朱圆恒. 使用强化学习技术求解在系统动力学未知情况下连续时间非线性最优追踪问题. IET Control Theory Applications. 2016, [60] Zhao, Dongbin, Zhang, Qichao, Wang, Ding, Zhu, Yuanheng. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2016, 46(3): 854-865, https://www.webofscience.com/wos/woscc/full-record/WOS:000370963500023.
[61] 赵冬斌, 朱圆恒. 概率近似正确的强化学习算法解决连续状态空间控制问题. 控制理论与应用[J]. 2016, 33(12): 1603-1613, http://lib.cqvip.com/Qikan/Article/Detail?id=7000119656.
[62] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. COGNITIVE COMPUTATION[J]. 2015, 7(6): 763-771, http://ir.ia.ac.cn/handle/173211/10525.
[63] Zhao, Dongbin, Zhu, Yuanheng. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2015, 26(2): 346-356, http://www.irgrid.ac.cn/handle/1471x/980893.
[64] Zhu, Yuanheng, Zhao, Dongbin. A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. NEURAL COMPUTING & APPLICATIONS[J]. 2015, 26(4): 775-787, http://www.irgrid.ac.cn/handle/1471x/980902.
[65] 朱圆恒. 对离散时间系统无衰减最优控制使用近似策略迭代的收敛性证明. Cognitive Computation. 2015, [66] 朱圆恒. MEC对连续确定性系统的近似最优在线强化学习算法. IEEE Transactions on Neural Networks and Learning Systems. 2015, [67] 赵冬斌, Yuanheng Zhu. Model-Free Adaptive Algorithm for Optimal Control of Continuous-Time Nonlinear System. 2015, http://ir.ia.ac.cn/handle/173211/15282.
[68] Li Dong, Zhao Dongbin, Zhu Yuanheng, Xia Zhongpu, IEEE. Thermal Comfort Control Based on MEC Algorithm for HVAC Systems. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2015, [69] Zhu, Yuanheng, Zhao, Dongbin, Liu, Derong. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. NEUROCOMPUTING[J]. 2015, 149: 124-131, http://dx.doi.org/10.1016/j.neucom.2013.11.055.
[70] 朱圆恒. 基于数据的在线强化学习算法实现概率近似正确原理. Neural Computing and Applications. 2015, [71] Li Dong, Xia Zhongpu, Zhu Yuanheng, Zhao Dongbin. Thermal Comfort Control Based on MEC Algorithm for HVAC System. 2015, http://ir.ia.ac.cn/handle/173211/15667.
[72] 朱圆恒. 对非线性离散时间HJB系统的收敛分析和模糊HDP方法应用. Neurocomputing. 2015, [73] Yuanheng Zhu, 赵冬斌. A data-based online reinforcement learning algorithm with high-efficient exploration. 2014, http://ir.ia.ac.cn/handle/173211/15283.
[74] Zhao, Dongbin, Hu, Zhaohui, Xia, Zhongpu, Alippi, Cesare, Zhu, Yuanheng, Wang, Ding. Full-range adaptive cruise control based on supervised adaptive dynamic programming. NEUROCOMPUTING[J]. 2014, 125: 57-67, http://dx.doi.org/10.1016/j.neucom.2012.09.034.
[75] Yuanheng Zhu, Dongbin Zhao, Haibo He. An high-efficient online reinforcement learning algorithm for continuous-state systems. IEEE WORLD CONGRESSON INTELLIGENT CONTROL AND AUTOMATION (WCICA)null. 2014, 581-586., http://www.irgrid.ac.cn/handle/1471x/973405.
[76] Yuanheng Zhu, 赵冬斌. Online reinforcement learning for continuous-state systems. FRONTIERS OF INTELLIGENT CONTROL AND INFORMATION PROCESSING. 2014, http://ir.ia.ac.cn/handle/173211/15280.
[77] 赵冬斌, Yuanheng Zhu. Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. 2013, http://ir.ia.ac.cn/handle/173211/15281.
[78] Yuanheng Zhu, Dongbin Zhao, Haibo He. Integration of fuzzy controller with adaptive dynamic programming. IEEE WORLD CONGRESSON INTELLIGENT CONTROL AND AUTOMATION (WCICA)null. 2012, 310-315, http://www.irgrid.ac.cn/handle/1471x/973407.
[79] Zhao, Dongbin, Zhu, Yuanheng, He, Haibo. Neural and Fuzzy Dynamic Programming for Under-actuated Systems. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2012, http://www.irgrid.ac.cn/handle/1471x/973390.

科研活动

   
科研项目
( 1 ) 非完全信息条件下的博弈决策子课题-知识与数据共同驱动的深度强化学习算法, 参与, 国家级, 2020-01--2022-12
( 2 ) “蜂群”多智能体系统群智激发汇聚研究与实现子课题-“蜂群”系统群智实时推理与对抗技术, 参与, 国家级, 2020-01--2023-05
( 3 ) 基于事件驱动自适应动态规划的模型未知非线性系统最优控制, 主持, 国家级, 2017-01--2019-12
( 4 ) 深度自适应动态规划理论方法和应用, 参与, 国家级, 2016-01--2019-12
( 5 ) 基于深度强化学习的单弹和双弹围捕机动目标三维制导律研究, 主持, 院级, 2019-11--2020-11
( 6 ) 多个储能控制单元自治协调控制策略研究, 主持, 院级, 2018-01--2018-10
( 7 ) 储能系统多源数据融合与分析装置, 主持, 院级, 2016-05--2016-11
参与会议
(1)Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming   2019-07-14
(2)Driving Control with Deep and Reinforcement Learning in The Open Racing Car Simulator   2018-12-13
(3)Convolutional fitted Q iteration for vision-based control problems   2016-07-24
(4)Model-free adaptive algorithm for optimal control of continuous-time nonlinear system   2015-11-27
(5)A data-based online reinforcement learning algorithm with high-efficient exploration   2014-12-09
(6)An high-efficient online reinforcement learning algorithm for continuous-state systems   2014-06-29
(7)Online Model-Free {RLSPI} Algorithm for Nonlinear Discrete-Time Non-affine Systems   2013-11-03

合作情况

   
项目协作单位

电科院, 航天二院, 华为


指导学生

现指导学生

刘元勋  硕士研究生  081203-计算机应用技术  

李博宇  硕士研究生  081101-控制理论与控制工程  

傅宇千  博士研究生  081104-模式识别与智能系统  

陈文章  硕士研究生  081101-控制理论与控制工程  

协助指导学生

学生        学位     时间                     毕业去向

邵坤        硕博     2014.9/2019.7     华为

唐振韬     直博     2016.9/2022.7     华为

李伟凡     普博     2018.9/2023.7     智能研究院

胡光政     普博     2019.9/至今

刘民颂     硕博     2018.9/至今

柴嘉骏     直博     2019.9/至今

陆润宇     直博     2021.9/至今