发表论文
[1] Fu, Yuqian, Zhu, Yuanheng, Chai, Jiajun, Zhao, Dongbin. LDR: Learning Discrete Representation to Improve Noise Robustness in Multiagent Tasks. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2025, 第 2 作者 通讯作者 55(1): 513-525, http://dx.doi.org/10.1109/TSMC.2024.3487535.[2] Hu, Guangzheng, Zhu, Yuanheng, Li, Haoran, Zhao, Dongbin. FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2024, 第 2 作者 通讯作者 8(6): 4033-4045, http://dx.doi.org/10.1109/TETCI.2024.3383454.[3] 李浩然, 张曜程, 温颢玮, 朱圆恒, 赵冬斌. Stabilizing Diffusion Model for Robotic Control with Dynamic Programming and Transition Feasibility. IEEE Transactions on Artificial Intelligence[J]. 2024, 第 4 作者 通讯作者 [4] 李博宇, 李浩然, 朱圆恒, 赵冬斌. MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning. IEEE Transactions on Cognitive and Developmental Systems[J]. 2024, 第 3 作者[5] Li, Luntong, Zhu, Yuanheng. Boosting On-Policy Actor-Critic With Shallow Updates in Critic. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024, 第 2 作者 通讯作者 http://dx.doi.org/10.1109/TNNLS.2024.3378913.[6] 控制理论与应用. 2024, 通讯作者 [7] IEEE Transactions on Emerging Topics in Computational Intelligence. 2024, 通讯作者 [8] IEEE Transactions on Artificial Intelligence. 2024, 通讯作者 [9] IEEE Transactions on Artificial Intelligence. 2024, 通讯作者 [10] IEEE Transactions on Cognitive and Developmental Systems. 2024, 第 3 作者[11] IEEE Transactions on Neural Networks and Learning Systems. 2024, 通讯作者 [12] Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2024, 第 2 作者 通讯作者 35(12): 17829-17841, http://dx.doi.org/10.1109/TNNLS.2023.3309608.[13] Zhu, Yuanyang, Wang, Zhi, Zhu, Yuanheng, Chen, Chunlin, Zhao, Dongbin. Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024, 第 3 作者http://dx.doi.org/10.1109/TNNLS.2024.3446371.[14] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n -Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2023, 第 1 作者53(10): 6443-6455, http://dx.doi.org/10.1109/TCYB.2022.3179775.[15] Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2023, 第 2 作者 通讯作者 http://dx.doi.org/10.1109/TNNLS.2023.3309608.[16] Liu, Minsong, Li, Luntong, Hao, Shuai, Zhu, Yuanheng, Zhao, Dongbin. Soft Contrastive Learning With Q-Irrelevance Abstraction for Reinforcement Learning. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS[J]. 2023, 第 4 作者15(3): 1463-1473, http://dx.doi.org/10.1109/TCDS.2022.3218940.[17] 中国计算机学会通讯. 2023, 通讯作者 [18] IEEE Transactions on Neural Networks and Learning Systems. 2023, 通讯作者 [19] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 第 2 作者 通讯作者 34(8): 3966-3978, http://dx.doi.org/10.1109/TNNLS.2021.3121546.[20] IEEE Transactions on Neural Networks and Learning Systems. 2023, 通讯作者 [21] Chai, Jiajun, Chen, Wenzhang, Zhu, Yuanheng, Yao, ZongXin, Zhao, Dongbin. A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2023, 第 3 作者 通讯作者 53(9): 5417-5429, http://dx.doi.org/10.1109/TSMC.2023.3270444.[22] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2023, 第 3 作者34(4): 2093-2104, http://dx.doi.org/10.1109/TNNLS.2021.3105869.[23] Tang, Zhentao, Zhu, Yuanheng, Zhao, Dongbin, Lucas, Simon M. Enhanced Rolling Horizon Evolution Algorithm With Opponent Model Learning: Results for the Fighting Game AI Competition. IEEE TRANSACTIONS ON GAMES[J]. 2023, 第 2 作者15(1): 5-15, http://dx.doi.org/10.1109/TG.2020.3022698.[24] Hu, Guangzheng, Li, Haoran, Liu, Shasha, Zhu, Yuanheng, Zhao,Dongbin. NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks. 2023 International Joint Conference on Neural Networks(IJCNN). 2023, 第 4 作者[25] Zhu, Yuanheng, Zhao, Dongbin. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2022, 第 1 作者33(3): 1228-1241, http://dx.doi.org/10.1109/TNNLS.2020.3041469.[26] 唐振韬, 梁荣钦, 朱圆恒, 赵冬斌. 实时格斗游戏的智能决策方法. 控制理论与应用. 2022, 第 3 作者39(6): 969-985, https://d.wanfangdata.com.cn/periodical/kzllyyy202206001.[27] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEX & INTELLIGENT SYSTEMS[J]. 2022, 第 2 作者 通讯作者 8(2): 1205-1216, http://dx.doi.org/10.1007/s40747-021-00577-6.[28] 刘民颂, 李论通, 邵帅, 朱圆恒, 赵冬斌. Soft Contrastive Learning with Q-irrelevance Abstraction for Reinforcement Learning. Ieee transactions on cognitive and developmental systems[J]. 2022, 第 4 作者[29] Zhu, Yuanheng, Li, Weifan, Zhao, Mengchen, Hao, Jianye, Zhao, Dongbin. Empirical Policy Optimization for n-Player Markov Games. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2022, 第 1 作者http://dx.doi.org/10.1109/TCYB.2022.3179775.[30] Chai, Jiajun, Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin, Ma, Zhe, Sun, Kewu, Ding, Jishiyu. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 3 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000733450200001.[31] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Optimal Feedback Control of Pedestrian Flow in Heterogeneous Corridors. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING[J]. 2021, 第 1 作者 通讯作者 18(3): 1097-1108, http://dx.doi.org/10.1109/TASE.2020.2996018.[32] Li, Weifan, Zhu, Yuanheng, Zhao, Dongbin. Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. COMPLEXINTELLIGENTSYSTEMS[J]. 2021, 第 2 作者12, [33] Yang, Xiong, Zhu, Yuanheng, Dong, Na, Wei, Qinglai. Decentralized Event-Driven Constrained Control Using Adaptive Critic Designs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者[34] Hu, Guangzheng, Zhu, Yuanheng, Zhao, Dongbin, Zhao, Mengchen, Hao, Jianye. Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2021, 第 2 作者 通讯作者 http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000732283100001.[35] Tang Zhentao, Zhu Yuanheng, Zhao Dongbin, Lucas Simon M. Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition. 2020, 第 2 作者http://arxiv.org/abs/2003.13949.[36] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin. LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS[J]. 2020, 第 1 作者21(11): 4516-4525, http://dx.doi.org/10.1109/TITS.2019.2935510.[37] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Synthesis of Cooperative Adaptive Cruise Control With Feedforward Strategies. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY[J]. 2020, 第 1 作者 通讯作者 69(4): 3615-3627, http://dx.doi.org/10.1109/TVT.2020.2974932.[38] Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin, IEEE. Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者[39] 朱圆恒. 在线最小最大Q 网络学习算法解决两人零和马尔科夫博弈过程. IEEE Transactions on Neural Networks and Learning Systems. 2020, 第 1 作者[40] Liu, Minsong, Zhu, Yuanheng, Zhao, Dongbin, IEEE. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2020, 第 2 作者[41] 朱圆恒. 基于前馈策略对协同自适应巡航控制的设计. IEEE Transactions on Vehicular Technology. 2020, 第 1 作者[42] 朱圆恒. 强化水平滚动演化计算算法和对手建模. IEEE Transactions on Games. 2020, 第 1 作者[43] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo. Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2020, 第 1 作者50(11): 3959-3971, http://dx.doi.org/10.1109/TSMC.2019.2911900.[44] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun, Wang, Ding. Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems. IEEE TRANSACTIONS ON SMART GRID[J]. 2019, 第 1 作者10(4): 4235-4244, https://www.webofscience.com/wos/woscc/full-record/WOS:000472577500065.[45] Shao, Kun, Zhu, Yuanheng, Zha, Dongbin. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE[J]. 2019, 第 2 作者3(1): 73-84, http://dx.doi.org/10.1109/TETCI.2018.2823329.[46] 朱圆恒. 强化学习和课程迁移学习结合实现星际争霸微操控制. IEEE Transactions on Emerging Topics in Computational Intelligence. 2019, 第 1 作者[47] 朱圆恒. 基于LMI设计协同自适应巡航控制系统满足弦稳定的控制器. IEEE Transactions on Intelligent Transportation Systems. 2019, 第 1 作者[48] Zhu, Yuanheng, He, Haibo, Zhao, Dongbin, Hou, Zhongsheng, IEEE. Optimal Pedestrian Evacuation in Building with Consecutive Differential Dynamic Programming. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2019, 第 1 作者 通讯作者 [49] 朱圆恒. 基于深度和强化学习对开源赛车仿真器的视觉驾驶. JOURNALOFAMBIENTINTELLIGENCEANDHUMANIZEDCOMPUTING. 2019, 第 1 作者[50] 朱圆恒. 控制受限自适应动态规划方法对多电池存储系统的设计. IEEE Transactions on Smart Grid. 2019, 第 1 作者[51] 朱圆恒. 不变自适应动态规划方法求解离散时间系统最优控制. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019, 第 1 作者[52] Zhu, Yuanheng, Zhao, Dongbin, Zhong, Zhiguang. Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY[J]. 2019, 第 1 作者27(4): 1772-1779, [53] 朱圆恒. 对动力学带有不确定性的异构协同自适应巡航控制系统的自适应最优控制. IEEE Transactions on Control Systems Technology. 2019, 第 1 作者[54] Tang Zhentao, Shao Kun, Zhu Yuanheng, Li Dong, Zhao Dongbin, Huang Tingwen, Sundaram S. A Review of Computational Intelligence for StarCraft AI. 8th IEEE Symposium Series on Computational Intelligence (IEEE SSCI). 2018, 第 3 作者1167-1173, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800159.[55] Zhu, Yuanheng, Zhao, Dongbin. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artificial Intelligence Review[J]. 2018, 第 1 作者49(4): 531-547, https://link.springer.com/article/10.1007/s10462-017-9548-4.[56] Li Dong, Zhao Dongbin, Zhang Qichao, Zhu Yuanheng, Sundaram S. An Autonomous Driving Experience Platform with Learning-Based Functions. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI). 2018, 第 4 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.[57] Yuanheng Zhu, Nannan Li, Kun Shao, Dongbin Zhao. Learning battles in ViZDoom via deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23364.[58] Zhu, Yuanheng, Zhao, Dongbin, Yang, Xiong, Zhang, Qichao. Policy Iteration for H infinity Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2018, 第 1 作者 通讯作者 48(2): 500-509, https://www.webofscience.com/wos/woscc/full-record/WOS:000422925700005.[59] Shao, Kun, Zhao, Dongbin, Zhu, Yuanheng, Zhang, Qichao, IEEE. Visual Navigation with Actor-Critic Deep Reinforcement Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2018, 第 3 作者https://webofscience.clarivate.cn/wos/woscc/full-record/WOS:000585967404004.[60] 朱圆恒. 针对连续时间最优控制的在线自适应动态规划方法的综合比较. Artificial Intelligence Review. 2018, 第 1 作者[61] Zhu Yuanheng, Zhang Qichao, Zhao Dongbin, Li Dong. An Autonomous Driving Experience Platform with Learning-Based Functions. 8th IEEE Symposium Series on Computational Intelligence (IEEE SSCI). 2018, 第 1 作者1174-1179, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000459238800160.[62] Yuanheng Zhu, Qichao Zhang, Dongbin Zhao, Kun Shao. Visual navigation with Actor-Critic deep reinforcement learning. 2018, 第 1 作者http://ir.ia.ac.cn/handle/173211/23365.[63] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS[J]. 2017, 第 1 作者 通讯作者 64(5): 4101-4109, https://www.webofscience.com/wos/woscc/full-record/WOS:000399674000064.[64] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.[65] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. NEUROCOMPUTING[J]. 2017, 第 3 作者238(*): 377-386, http://dx.doi.org/10.1016/j.neucom.2017.01.076.[66] 朱圆恒. 利用自适应动态规划实现对部分未知、控制受限系统的事件驱动最优控制. IEEE Transactions on Industrial Electronics. 2017, 第 1 作者[67] Yang, Xiong, He, Haibo, Liu, Derong, Zhu, Yuanheng. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET CONTROL THEORY AND APPLICATIONS[J]. 2017, 第 4 作者11(14): 2307-2316, https://www.webofscience.com/wos/woscc/full-record/WOS:000409425700015.[68] 朱圆恒. 利用平方和编程实现对多项式非线性系统H无穷最优控制的策略迭代求解. IEEE transactions on cybernetics. 2017, 第 1 作者[69] 朱圆恒. 基于在线数据使用迭代自适应动态规划求解未知非线性零和博弈问题. IEEE Transactions on Neural Networks and Learning Systems. 2017, 第 1 作者[70] 朱圆恒. 数据驱动自适应动态规划求解部分输入受限的连续时间完全合作博弈问题. Neurocomputing. 2017, 第 1 作者[71] 朱圆恒, 赵冬斌, 邵坤. Cooperative Reinforcement Learning for Multiple Units Combat in StarCraft. 2017, 第 1 作者http://ir.ia.ac.cn/handle/173211/15399.[72] 唐振韬, 邵坤, 赵冬斌, 朱圆恒. 深度强化学习进展: 从 AlphaGo 到 AlphaGo Zero. 控 制 理 论 与 应 用[J]. 2017, 第 4 作者34(12): 1529-1546, http://lib.cqvip.com/Qikan/Article/Detail?id=7000480876.[73] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2017, 第 1 作者 通讯作者 28(3): 714-725, https://www.webofscience.com/wos/woscc/full-record/WOS:000395980500020.[74] 朱圆恒. 自适应动态规划实现未知连续时间非线性系统的鲁棒网络控制. IET Control Theory & Applications. 2017, 第 1 作者[75] Zhang, Qichao, Zhao, Dongbin, Zhu, Yuanheng. Event-Triggered H-infinity Control for Continuous-Time Nonlinear System via Concurrent Learning. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS[J]. 2017, 第 3 作者47(7): 1071-1081, https://www.webofscience.com/wos/woscc/full-record/WOS:000404354600004.[76] Zhu, Yuanheng, Zhao, Dongbin, Li, Xiangjun. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET CONTROL THEORY AND APPLICATIONS[J]. 2016, 第 1 作者10(12): 1339-1347, [77] Zhu Yuanheng, Chen Xi, Zhao Dongbin, Zhang Qichao. Model-free reinforcement learning for nonlinear zero-sum games with simultaneous explorations. 2016, 第 1 作者http://ir.ia.ac.cn/handle/173211/14340.[78] Tang Zhentao, Shao Kun, Zhao Bongbin, Zhu Yuanheng. Move Prediction in Gomoku Using Deep Learning. 2016, 第 4 作者http://ir.ia.ac.cn/handle/173211/15673.[79] Zhao Dongbin, Wang Haitao, Shao Kun, Zhu Yuanheng, IEEE. Deep Reinforcement Learning with Experience Replay Based on SARSA. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI). 2016, 第 4 作者http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000400488300013.[80] 朱圆恒. 使用强化学习技术求解在系统动力学未知情况下连续时间非线性最优追踪问题. IET Control Theory Applications. 2016, 第 1 作者[81] Zhao, Dongbin, Zhang, Qichao, Wang, Ding, Zhu, Yuanheng. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics. IEEE TRANSACTIONS ON CYBERNETICS[J]. 2016, 第 4 作者 通讯作者 46(3): 854-865, https://www.webofscience.com/wos/woscc/full-record/WOS:000370963500023.[82] 赵冬斌, 朱圆恒. 概率近似正确的强化学习算法解决连续状态空间控制问题. 控制理论与应用[J]. 2016, 第 2 作者33(12): 1603-1613, http://lib.cqvip.com/Qikan/Article/Detail?id=7000119656.[83] Zhu, Yuanheng, Zhao, Dongbin, He, Haibo, Ji, Junhong. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. COGNITIVE COMPUTATION[J]. 2015, 第 1 作者7(6): 763-771, http://ir.ia.ac.cn/handle/173211/10525.[84] Zhao, Dongbin, Zhu, Yuanheng. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS[J]. 2015, 第 2 作者26(2): 346-356, http://www.irgrid.ac.cn/handle/1471x/980893.[85] Zhu, Yuanheng, Zhao, Dongbin. A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. NEURAL COMPUTING & APPLICATIONS[J]. 2015, 第 1 作者26(4): 775-787, http://www.irgrid.ac.cn/handle/1471x/980902.[86] 朱圆恒. 对离散时间系统无衰减最优控制使用近似策略迭代的收敛性证明. Cognitive Computation. 2015, 第 1 作者[87] 朱圆恒. MEC对连续确定性系统的近似最优在线强化学习算法. IEEE Transactions on Neural Networks and Learning Systems. 2015, 第 1 作者[88] 赵冬斌, Yuanheng Zhu. Model-Free Adaptive Algorithm for Optimal Control of Continuous-Time Nonlinear System. 2015, 第 2 作者http://ir.ia.ac.cn/handle/173211/15282.[89] Li Dong, Zhao Dongbin, Zhu Yuanheng, Xia Zhongpu, IEEE. Thermal Comfort Control Based on MEC Algorithm for HVAC Systems. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2015, 第 3 作者[90] Zhu, Yuanheng, Zhao, Dongbin, Liu, Derong. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. NEUROCOMPUTING[J]. 2015, 第 1 作者149: 124-131, http://dx.doi.org/10.1016/j.neucom.2013.11.055.[91] 朱圆恒. 基于数据的在线强化学习算法实现概率近似正确原理. Neural Computing and Applications. 2015, 第 1 作者[92] Li Dong, Xia Zhongpu, Zhu Yuanheng, Zhao Dongbin. Thermal Comfort Control Based on MEC Algorithm for HVAC System. 2015, 第 3 作者http://ir.ia.ac.cn/handle/173211/15667.[93] 朱圆恒. 对非线性离散时间HJB系统的收敛分析和模糊HDP方法应用. Neurocomputing. 2015, 第 1 作者[94] Yuanheng Zhu, 赵冬斌. A data-based online reinforcement learning algorithm with high-efficient exploration. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15283.[95] Zhao, Dongbin, Hu, Zhaohui, Xia, Zhongpu, Alippi, Cesare, Zhu, Yuanheng, Wang, Ding. Full-range adaptive cruise control based on supervised adaptive dynamic programming. NEUROCOMPUTING[J]. 2014, 第 5 作者125: 57-67, http://dx.doi.org/10.1016/j.neucom.2012.09.034.[96] Yuanheng Zhu, Dongbin Zhao, Haibo He. An high-efficient online reinforcement learning algorithm for continuous-state systems. World Congress on Intelligent Control and Automation (WCICA 2014). 2014, 第 1 作者581-586., http://www.irgrid.ac.cn/handle/1471x/973405.[97] Yuanheng Zhu, 赵冬斌. Online reinforcement learning for continuous-state systems. FRONTIERS OF INTELLIGENT CONTROL AND INFORMATION PROCESSING. 2014, 第 1 作者http://ir.ia.ac.cn/handle/173211/15280.[98] 赵冬斌, Yuanheng Zhu. Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. 2013, 第 2 作者http://ir.ia.ac.cn/handle/173211/15281.[99] Yuanheng Zhu, Dongbin Zhao, Haibo He. Integration of fuzzy controller with adaptive dynamic programming. 10th World Congress on Intelligent Control and Automation (WCICA 2012). 2012, 第 1 作者310-315, http://www.irgrid.ac.cn/handle/1471x/973407.[100] Zhao, Dongbin, Zhu, Yuanheng, He, Haibo. Neural and Fuzzy Dynamic Programming for Under-actuated Systems. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2012, 第 2 作者http://www.irgrid.ac.cn/handle/1471x/973390.