电子邮件: dongbin.zhao@ia.ac.cn
通信地址: 海淀区中关村东路95号智能化大厦1005
邮政编码: 100190
研究领域
深度强化学习,多智能体强化学习,人工智能基础
智能驾驶,具身智能,游戏智能,基础模型训练,AI4S
最新成果
团队成果每月更新,近一个月的成果用黄色背景标记。关于成果的详细介绍,请关注微信公众号:深度强化学习@CASIA

人员获奖
- 2026,赵冬斌,IEEE Computational Intelligence Society Distinguished Lecture Program
- 2025,赵冬斌,中国科学院大学“拾光奉献纪念奖”
- 2025,赵冬斌,北京智源人工智能研究院智源学者
- 2025,赵冬斌,2025年度中国科学院优秀导师
- 2025,陆润宇,博士国家奖学金
- 2025,刘鑫,博士国家奖学金
- 2025,柴嘉骏,中国科学院院长特别奖(最高等级,当年全所唯一,作为毕业生代表在国科大/自动化所毕业典礼上发言)
- 2025,柴嘉骏,中国科学院自动化研究所优秀毕业生,北京市优秀毕业生
- 2025,陆润宇,IEEE CIS Student Research Grant(每年全球6~9名)
- 2025,中国科学院自动化研究所三好学生/优秀学生干部:方兴,陈文章,凃崧峻/刘学义,田帅
- 2025,中国科学院人工智能学院三好学生/优秀学生干部:陆润宇,赵子杰/徐凯旋
- 2025,中国科学院大学在读期间三好学生/优秀学生干部:江震南,邢泽斌,陈庆/秦宇星
- 2025,陈霆鸿,北京市自然科学基金本科生启研计划
- 2025,赵冬斌,中国科学院李佩优秀教师奖
- 2025,赵冬斌,入选2024年斯坦福全球前2%顶尖科学家,终身科学影响力排行榜和年度科学影响力排行榜
竞赛获奖
- 2025, ICCV NAVSIM v2 End-to-End Driving Challenge, 第3名, 张启超,郑宇鹏,刑泽斌,杨鹏轩。
- 2025, CVPR NAVSIM v2 End-to-End Driving Challenge, 第4名(学界排名第1), 张启超,郑宇鹏,刑泽斌,杨鹏轩。https://opendrivelab.com/challenge2025/
- 2025, ICRA ManiSkill Vitac Challenge, 冠军,秦宇星参加。
期刊—录用/发表
- Yinfeng Gao, Deqing Liu, Yupeng Zheng, Qichao Zhang*, Da-Wei Ding*, Dongbin Zhao, “SoAD: Safety-oriented Value Estimation for Enhanced Closed-Loop End-to-End Autonomous Driving,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, DOI: 10.1109/TSMC.2026.3688954.
- YinFeng Gao#, Qichao Zhang#, Deqing Liu, Zhongpu Xia, Guang Li, Kun Ma, Guang Chen, Hangjun Ye, Long Chen, Dawei Ding*, Dongbin Zhao, “PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning,” IEEE Robotics and Automation Letters (RA-L), vol. 11, no. 5, pp. 5821-5828, May 2026. DOI:10.1109/LRA.2026.3675928.
- Yupeng Zheng, Zebin Xing, Qichao Zhang*, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao, “PlanAgent: a multi-modal large language agent for closed-loop vehicle motion planning,” IEEE Transactions on Cognitive and Developmental Systems (TCDS), accepted on January 31, 2026.
- Deqing Liu, YinFeng Gao, Qichao Zhang*, Yupeng Zheng, Xueyi Liu, Zhongpu Xia, Dongbin Zhao, “TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data,” IEEE Robotics and Automation Letters (RA-L), vol. 11, no. 2, pp. 1738–1745, 2026. (SCI Q1, IF 6). DOI: 10.1109/LRA.2025.3643264.
- Zebin Xing, Yupeng Zheng, Qichao Zhang*, Zhixing Ding, Pengxuan Yang, Songen Gu, Zhongpu Xia, Dongbin Zhao, “Mimir: Hierarchical Goal-Driven Diffusion With Uncertainty Propagation for End-to-End Autonomous Driving,” IEEE Robotics and Automation Letters (RA-L), vol. 11, no. 2, pp. 2178-2185, Feb. 2026. (SCI Q1, IF 6). DOI: 10.1109/LRA.2025.3641129. https://github.com/ZebinX/Mimir-Uncertainty-Driving
- Yuhui Chen, Haoran Li*, Zhennan Jiang, Haowei Wen, Dongbin Zhao, “TeViR: text-to-video reward with diffusion models for efficient reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSMCA), vol. 56, no. 2, pp. 893–905, Feb. 2026. (SCI Q1, IF 9.1). DOI: 10.1109/TSMC.2025.3638818.
- Yuqian Fu, Yuanheng Zhu, Haoran Li, Zijie Zhao, Jiajun Chai, Dongbin Zhao*, “CPIG: leveraging consistency policy with intention guidance for multi-agent exploration,” IEEE Transactions on Cognitive and Developmental Systems (TCDS), vol.18, no.1, pp. 154-166, Feb. 2026. (SCI Q1, IF 5.0). DOI: 10.1109/TCDS.2025.3578001. https://github.com/fyqqyf/CPIG
- Ding Li, Qichao Zhang*, Dongfang Yang, Zhi Wang, Ren Fan, Dongbin Zhao, “IP3: Integrated path-guided prediction and planning for safe autonomous driving,” IEEE Transactions on Vehicular Technology (TVT), Vol. 74, No. 11, pp. 16729-16742, Nov. 2025. (SCI Q1, IF 6.1). DOI: 10.1109/TVT.2025.3576204. https://github.com/ld-av/IP3/.
- Yaran Chen, Chenguang Yang, Chaomin Luo, and Dongbin Zhao, “Guest Editorial: Special Issue on Embodied AI in Indoor Robotics: Bridging Perception, Interaction, and Autonomy,” IEEE Transactions on Cognitive and Developmental Systems (TCDS), Vol. 17, No. 5, pp. 1047-1149, Oct. 2025. (SCI Q1, IF 5.0). DOI: 10.1109/TCDS.2025.3595370.
- Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Jinrui Liu, Haoran Li, Dongbin Zhao*, and He Wang, “RoboGPT: an LLM-based long-term decision-making embodied agent for instruction following tasks,” IEEE Transactions on Cognitive and Developmental Systems (TCDS). Vol. 17, No. 5, pp. 1163-1174, Oct. 2025. (SCI Q1, IF 5.0). DOI: 10.1109/TCDS.2025.3543364. https://github.com/Cwb0106/RoboGPT.
- Yaran Chen, Xueyu Chen, Yu Han, Haoran Li, Dongbin Zhao, JingZhong Ji, Xu Wang*, Yong Zhou*, “Multimodal learning-based prediction for nonalcohol fatty liver disease,” Machine Intelligence Research (MIR), Vol. 22, No. 5, pp. 871-887, Oct. 2025. (SCI Q1, IF 6.4). DOI: 10.1007/s11633-024-1506-4.
- Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu*, “SELU: self-learning embodied multimodal large language models in unknown environments,” Transactions on Machine Learning Research (TMLR), 2025
- Runyu Lu, Yuanheng Zhu*, Dongbin Zhao, Yu Liu, You He, “Last-Iterate Convergence to Approximate Nash Equilibria in Multiplayer Imperfect Information Games,” IEEE Transactions on Neural Networks and Learning Systems (TNNLS), Vol. 36, No. 8, pp. 13859-13873, Aug. 2025. (SCI Q1, IF 11.1). DOI: 10.1109/TNNLS.2024.3516693. https://github.com/lryforeal/IESL-implementation
- Zijie Zhao, Yuanheng Zhu*, Dongbin Zhao*, “Meta learning task representation in multi-agent reinforcement learning: from global inference to local inference,” IEEE Transactions on Neural Networks and Learning Systems (TNNLS), Vol. 36, No. 8, pp. 14908-14921, Aug. 2025. (SCI Q1, IF 11.1). DOI: 10.1109/TNNLS.2025.3540758. https://github.com/zhaozijie2022/mg2l.
- Jianjun Chai, Zijie Zhao, Yuanheng Zhu, Dongbin Zhao*, “A Survey of Cooperative Mutil-Agent Reinforcement Learning for Multi-Task Scenarios,” Artificial Intelligence Science and Engineering (AISE), Vol. 1, No. 2, 89-121, 2025. DOI: 10.23919/AISE.2025.000008. Popular Article.
- Xin Liu, Yaran Chen*, Dongbin Zhao*, “Learning future representation with synthetic observations for sample-efficient reinforcement learning,” SCIENCE CHINA Information Sciences (SCIS), Vol. 68, No. 5, 150202: 1-18, May 2025. (SCI Q1, IF 7.3). https://doi.org/10.1007/s11432-024-4380-4.
- Haoran Li, Guangzheng Hu, Shasha Liu, Mingjun Ma, Yaran Chen, Dongbin Zhao*, “NeuronsGym: a hybrid framework and benchmark for robot tasks with Sim2Real policy learning,” IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), Vol. 9, No. 3, pp. 2491-2505, May 2025. (SCI Q1, IF 5.3). DOI: 10.1109/TETCI.2024.3488732. https://github.com/DRL-CASIA/NeuronsGym
- Xin Liu, Yaran Chen, Haoran Li, Dongbin Zhao*, “Balancing state exploration and skill diversity in unsupervised skill discovery,” IEEE Transactions on Cybernetics (TCyb), Vol. 55, No. 5, pp. 2234-2247, May 2025. (SCI Q1, IF 9.4). DOI: 10.1109/TCYB.2025.3548821. https://github.com/liuxin0824/ComSD
- Xin Liu, Yaran Chen*, Haoran Li, Boyu Li, Dongbin Zhao*, “Cross-domain random pretraining with prototypes for reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSMCA), Vol. 55, No. 5, pp. 3601 – 3613, May 2025. (SCI Q1, IF 9.1). DOI: 10.1109/TSMC.2025.3541926. Popular Article. https://github.com/liuxin0824/CRPTpro
- Yuqian Fu, Yuanheng Zhu*, Jiajun Chai, Dongbin Zhao, “LDR: Learning discrete representation to improve noise robustness in multiagent tasks,” IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSMCA), Vol. 55, No. 1, pp. 513-525, January 2025. (SCI Q1, IF 9.1). DOI: 10.1109/TSMC.2024.3487535.
- Nannan Li, Yaran Chen*, Dongbin Zhao, “Adaptive search for broad attention based vision transformers,” Neurocomputing, 611, 2025, 128696. (SCI Q1, IF 5.4). DOI: https://doi.org/10.1016/j.neucom.2024.128696. https://github.com/Bpumpkin/ASB
- Yuanheng Zhu, Shangjing Huang, Binbin Zuo, Dongbin Zhao*, Changyin Sun*, “Multi-task multi-agent reinforcement learning with task-entity transformers and value decomposition training,” IEEE Transactions on Automation Science and Engineering (TASE), Vol. 22, pp. 9164-9177, 2025.(SCI Q1, IF 6.4). DOI: 10.1109/TASE.2024.3501580. https://github.com/YuanhengZhu/TETQmix
- 李浩然,陈宇辉,崔文博,刘卫恒,刘锴,周明才,张正涛,赵冬斌*,面向具身操作的视觉-语言-动作模型综述,自动化学报. 2026, 52(1): 18-51. DOI: 10.16383/j.aas.c250394.
- 胡光政,朱圆恒,赵冬斌*,两团队零和博弈下熵引导的极小极大值分解强化学习方法,自动化学报, 2025, 51(4): 875-888. DOI: 10.16383/j.aas.c240258.
- 刘民颂,朱圆恒*,赵冬斌,基于Transform状态-动作-奖赏预测表征学习,自动化学报,2025, 51(1): 117-132. DOI: 10.16383/j.aas.c240230.
- 梁荣钦,朱圆恒*,赵冬斌,基于对手池的两人格斗游戏深度强化学习,控制理论与应用,2025, 42(2): 226-234. DOI: 10.7641/CTA.2024.30688. https://github.com/zhongqian97/TwoPlayerGameSelfPlayFramework,
会议-录用/发表
- Boyu Li, Chaoyi Xu, Haoqi Yuan, Xinrun Xu, Dongbin Zhao, Haoran Li*, Zongqing Lu*, “X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models,” RSS 2026.
- Jiangran Lyu, Kai Liu, Xuheng Zhang, Wenxuan Zhu, Tingrui Shen, Haoran Liao, Yusen Feng, Jiayi Chen, Jiazhao Zhang, Yifei Dong, Cui Wenbo, Senmao Qi, Shuo Wang, Yixin Zheng, Mi Yan, Xuesong Shi, Haoran Li, Dongbin Zhao, Ming-Yu Liu, Zhizheng Zhang, Li Yi, Yizhou Wang, He Wang*, “LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion,” RSS 2026.
- Yuan Liu, Haoran Li*, Shuai Tian, Yuxing Qin, Yupeng Zheng, Yongzhen Huang, Dongbin Zhao, “Towards Long-lived Robots: Continual Learning of VLA Models via Reinforcement Fine-tuning,” RSS 2026.
- Minghui Jia, Qichao Zhang, Ali Luo, Linjing Li, Shuo Ye, Hailing Lu, Wen Hou, Dongbin Zhao, “Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection,” ACL 2026 main.
- Jingbo Sun#, Wenyue Chong#, Songjun Tu, Qichao Zhang*, Yaocheng Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao, “AutoSearch: Self-Decision-Driven Reinforcement Learning for Adaptive Search Depth in Agentic RAG” ACL 2026 findings.
- Yaocheng Zhang, Haohuan Huang, Zijun Song, Zijie Zhao, Qichao Zhang, Yuanheng Zhu*, Dongbin Zhao, “CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic,” ACL 2026 findings.
- Bo Lv#, Jingbo Sun#, Jianwei Lv, Chen Tang, shaojie zhang, Nayu Liu, Guoxin Yu, Zihao Li, Qichao Zhang, Dongbin Zhao, Ping Luo, Yue Yu, “Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching,” ACL 2026 main.
- Jingbo Sun, Songjun Tu, Xing Fang, Qichao Zhang*, Haoran Li, Ke Chen, Dongbin Zhao, Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning, CVPR 2026. https://github.com/bofusun/SRCP
- Junli Wang, Yinan Zheng, Xueyi Liu, Zebin Xing, Pengfei Li, Kun Ma, Hangjun Ye, Guang Chen, Guang Li, Long Chen, Zhongpu Xia, Qichao Zhang*, MeanFuser: Fast One-Step Multi-Modal Trajectory Generation and Adaptive Reconstruction via MeanFlow for End-to-End Driving, CVPR 2026. https://github.com/wjl2244/MeanFuser
- Boyu Li, Siyuan He, Hang Xu, Haoqi Yuan, Yu Zang, Liwei Hu, Junpeng Yue, Zhenxiong Jiang, Pengbo Hu, Börje F. Karlsson, Dongbin Zhao, Yehui Tang, Zongqing Lu*, “Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots,” ICRA 2026.
- Yingting Zhou, Wenbo Cui, Weiheng Liu, Guixing Chen, Haoran Li*, Dongbin Zhao, “DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim-to-Real Robotic Grasping,” ICRA 2026. (CCF-B).
- Qichao Zhang, Xing Fang, Dongbin Zhao*, “ConsistencyPlanner: Real-time Planning with Fast-Sampling Consistency Models,” ICRA 2026. (CCF-B).
- Yupeng Zheng, Pengxuan Yang, Zhongpu Xia, Qichao Zhang*, Dongbin Zhao, “Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving,” ICRA 2026. (CCF-B).
- Wenbo Cui, Chengyang Zhao, Yuhui Chen, Haoran Li, Zhizheng Zhang, Dongbin Zhao, He Wang*, “CLAR: Learning 3D Representations for Robotic Manipulation by Fusing Masked Reconstruction with Multi-Level Contrastive Alignment,” ICRA 2026. (CCF-B).
- Runyu Lu, Ruochuan Shi, Yuanheng Zhu*, Dongbin Zhao, “R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability,” ICLR 2026. (领域顶会).
- Zijie Zhao, Honglei Guo, Shengqian Chen, Kaixuan Xu, Bo Jiang, Yuanheng Zhu*, Dongbin Zhao, Empowering Multi-Robot Cooperation via Sequential World Models, ICLR 2026. (领域顶会).
- Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao, SRFT: A single-stage method with supervised and reinforcement fine-tuning for reasoning, ICLR 2026.(领域顶会). https://arxiv.org/abs/2506.19767.
- Yixuan Li, Yuhui Chen, Mingcai Zhou, Haoran Li*, “QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models”, The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Paphos, Cyprus, May 25-29, 2026. (CCF B) https://github.com/ucasmichael/QDepth-VLA.
- Jinrui Liu, Bingyan Nie, Boyu Li, Yaran Chen, Yuze Wang, Shunsen He, Haoran Li*, “RoboGPT-R1: Enhancing Robot Task Planning with Reinforcement Learning,” The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Paphos, Cyprus, May 25-29, 2026. (CCF B)
- Pengxuan Yang, Ben Lu, Zhongpu Xia, Chao Han, Yinfeng Gao, Teng Zhang, Kun Zhan, XianPeng Lang, Yupeng Zheng, Qichao Zhang*, WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving, The 40th Annual AAAI Conference on Artificial Intelligence (AAAI), Singapore, Jan 20-27, 2026. (CCF A). https://github.com/pengxuanyang/WorldRFT.
- Xin Liu, Haoran Li*, Dongbin Zhao, “Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations,” NeurIPS 2025. (CCF A) https://github.com/liuxin0824/BCV-LR
- Songjun Tu, Jiahao Lin, Qichao Zhang*, Xiangyu Tian, Linjing Li, Xiangyuan Lan, Dongbin Zhao, “Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL,” NeurIPS 2025. (CCF A), https://github.com/ScienceOne-AI/AutoThink
- Runyu Lu, Peng Zhang, Ruochuan Shi, Yuanheng Zhu*, Dongbin Zhao, Yang Liu, Dong Wang, Cesare Alippi, “Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games,” NeurIPS 2025. (CCF A) https://github.com/Cahemgco/EPG_code
- Zijie Zhao, Zhongyue Zhao, Kaixuan Xu, Yuqian Fu, Jiajun Chai, Yuanheng Zhu*, Dongbin Zhao, “Learning and Planning Multi-Agent Tasks via a MoE-based World Model,” NeurIPS 2025. (CCF A) https://github.com/zhaozijie2022/m3w-marl
- Ruochuan Shi, Runyu Lu, Yuanheng Zhu*, Dongbin Zhao*, “ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games,” DAI 2025 oral.
- Yuqian Fu, Yuanheng Zhu*, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao, “RLAE: Reinforcement Learning-Assisted Ensemble for LLMs,” EMNLP 2025 main (CCF B). https://github.com/fyqqyf/RLAE
- Weiheng Liu, Yuxuan Wan, Jilong Wang, Yuxuan Kuang, Haoran Li, Dongbin Zhao, Zhizheng Zhang, He Wang, FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real, CoRL 2025 oral. (领域顶会).
- Xueyi Liu, Zuodong Zhong, Qichao Zhang*, Yuxin Guo, Yupeng Zheng, Junli Wang, Dongbin Zhao, “ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving,” CoRL 2025. (领域顶会). https://github.com/Liuxueyi/ReasonPlan
- Shuai Tian, Haoran Li*, Dongbin Zhao, Fast and Accurate Visuomotor Imitation Learning via 2D Consistency Flow Matching Policy, ICONIP 2025. (CCF C)
- Songjun Tu, Qichao Zhang*, Linjing Li, Yuqian Fu, Nan Xu, Xiangyuan Lan, Wei He, Xiangyuan Lan, Dongmei Jiang, Dongbin Zhao, “Enhancing LLM reasoning with iterative DPO: a comprehensive empirical investigation,” COLM 2025. https://github.com/TU2021/DPO-VP
- Shugao Liu, Qichao Zhang, Haoran Li*, Dongbin Zhao, “FusionNav: Enhancing Zero-Shot Object-Goal Navigation via 3D Semantic Fusion and Farsight Value Reasoning,” IEEE SMCC 2025. (CCF C)
- Yupeng Zheng, Pengxuan Yang, Zebin Xing, Yuhang Zheng, Pengfei Li, Yinfeng Gao, Qichao Zhang*, Teng Zhang, Zhongpu Xia, Peng Jia, XianPeng Lang, Dongbin Zhao, “World4Drive: Hierarchical Latent World Models for Perception-Free End-to-End Autonomous Driving,” ICCV 2025. (CCF A)
- Mengying Lin#, Shugao Liu#, Dingxi Zhang, Yaran Chen, Zhaoran Wang, Haoran Li*, Dongbin Zhao, Advancing Object-Goal Navigation through LLM-enhanced Object Affinities Transfer, 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (CCF C)
- Runyu Lu, Yuanheng Zhu*, Dongbin Zhao, “Constrained exploitability descent: finding mixed-strategy Nash equilibrium by offline reinforcement learning,” ICML 2025. (CCF A)
- Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu*, Dongbin Zhao, “DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy,” ICML 2025. (CCF A) https://github.com/KaiXIIM/dipllm
- Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li*, Dongbin Zhao, “Fine-tuning VLA models via Human-in-the-Loop Consistency Policy,” RSS 2025. https://github.com/cccedric/conrft.
- Yuanwen Chen, Haoran Li, Yaran Chen, Dongbin Zhao, “LeAffordNav: Enhancing open-vocabulary mobile manipulation with LLM-guided exploration and affordance-aware navigation,” ICME 2025. (CCF B) https://github.com/Cyuanwen/LeAffordNav.
- Pengxuan Yang, Yupeng Zheng, Kefei Zhu, Zebin Xing, Pengfei Li, Qichao Zhang*, Zhongpu Xia, Dongbin Zhao, “UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty,” ICRA 2025. (CCF B)
- Jiajun Chai, Yuqian Fu, Sicheng Li, Yuanheng Zhu*, Dongbin Zhao, "Empowering LLM Agents with zero-shot optimal decision-making through Q-learning, ICLR 2025.(领域顶会) https://github.com/laq2024/MLAQ.
- Jingbo Sun, Songjun Tu, Qichao Zhang*, Haoran Li, Xin Liu, Yaran Chen, Ke Chen, Dongbin Zhao, “Unsupervised zero-shot reinforcement learning via dual-value forward-backward representation,” ICLR 2025. https://github.com/bofusun/DVFB.
- Runyu Lu, Yuanheng Zhu*, Dongbin Zhao, “Divergence-regularized discounted aggregation: equilibrium finding in multiplayer partially observable stochastic games,” ICLR 2025. (领域顶会)
- Yuqian Fu, Yuanheng Zhu*, Jian Zhao, Jiajun Chai, Dongbin Zhao, “INS: Interaction-aware synthesis to enhance offline multi-agent reinforcement learning,” ICLR 2025. (领域顶会). https://github.com/fyqqyf/INS.
- Songjun Tu, Jingbo Sun, Qichao Zhang*, Yaocheng Zhang, Jia Liu, Ke Chen, Dongbin Zhao, “In-dataset trajectory return regularization for offline preference-based reinforcement learning,” AAAI 2025. (CCF A). https://github.com/TU2021/DTR.
- Jingbo Sun, Songjun Tu, Qichao Zhang*, Ke Chen, Dongbin Zhao*, “Salience-invariant consistent policy learning for generalization in visual reinforcement learning,” AAMAS 2025 oral. (CCF-B)
- Xing Fang, Qichao Zhang*, Haoran Li, Dongbin Zhao, “Consistency policy with categorical critic for autonomous driving,” AAMAS 2025 oral. (CCF-B)
- Yaocheng Zhang, Yuanheng Zhu*, Yuqian Fu, Songjun Tu, Dongbin Zhao, “Offline goal-conditioned reinforcement learning with elastic-subgoal diffused policy learning,” AAMAS 2025 oral. (CCF-B) https://github.com/zhyaoch/ESD.
- Songjun Tu, Qichao Zhang*, Dongbin Zhao, “Online preference-based reinforcement learning with self-augmented feedback from large language model,” AAMAS 2025 oral. (CCF-B) https://github.com/TU2021/RL-SaLLM-F.
图书章节
- 陈亚冉,李楠楠,丁子祥,赵冬斌,神经网络架构搜索,清华大学出版社,2025年9月出版
团队成员报告
- 2026年 3月13日,具身VLA强化学习后训练,智猩猩公开课,线上,李浩然。
- 2026年3月21日,畅聊个人成长、洞察行业发展,深蓝学院《与优秀的人同行》第七期,线上,夏中谱。
- 2026 年 3 月 27 日,基于世界模型的端到端自动驾驶探索,智能网联汽车高质量发展闭门研讨会,赵冬斌。
- 2026年4月11日,基于模仿学习与世界模型的端到端自动驾驶,中国汽车工程学会具身智能电动汽车前沿研讨会,北京,张启超。
- 2026年4月25日,具身操作模型和强化学习方法,2026认知系统与信息处理研讨会暨专委会年会,福州,李浩然。
- 2025年1月6日,面向高级别自动驾驶的人工智能方法的探索实践,中关村智能网联汽车创新发展论坛,北京,赵冬斌。
- 2025年1月11日,从强化学习到大模型和具身智能,IEEE 计算智能学会郑州分会成立大会&计算智能前沿论坛,郑州,赵冬斌。
- 2025年1月14日,监督学习式端到端自动驾驶的进展与挑战,第四届全球自动驾驶峰会,北京,张启超。
- Feb. 7, 2025, Reinforcement Learning Assisted Large Models and Embodied Intelligence, 13th International Conference on Intelligent Control and Information Processing (ICICIP 2025), Abu Dhabi, UAE & Muscat, Oman, Dongbin Zhao.
- 2025年3月22日,面向多任务的多智能体强化学习理论与应用,第四届智能优化与决策前沿论坛会议,北京,赵冬斌。
- 2025年3月29日,基于强化学习的视觉-语言-动作模型后训练,中国具身智能大会,北京,李浩然。
- 2025年4月26日,基于人工智能方法的高级别自动驾驶,2025年重庆交通大学神经网络与智能控制前沿论坛,重庆,赵冬斌。
- 2025年4月27日,基于生成式模型的强化学习,2025年西南大学智能系统感知与控制前沿论坛,重庆,赵冬斌。
- 2025年5月9日, 强化学习算法及其自动驾驶应用进展, Pre-conference workshop on Reinforcement Learning and Adaptive Dynamic Programming, IEEE 14th Data Driven Control and Learning System Conference (DDCLS’25), Wuxi, China, Qichao Zhang.
- 2025年5月14日,深度强化学习助力智能产业应用,聚合智能产业概念验证实验室启动论坛,北京,赵冬斌。
- 2025年5月24日,基于强化学习的机器人具身智能,第三届山东省计算智能大会,徐州,赵冬斌。
- 2025年6月14日,开放环境的多智能体决策智能,第四届智能决策论坛-智能学习与博弈论坛,南京,朱圆恒。
- 2025年6月14日,基于强化学习的视觉-语言-动作模型后训练,第四届智能决策论坛-具身智能前沿技术论坛,南京,李浩然。
- 2025年7月8日,深度强化学习和具身智能,人工智能与学习系统专题研讨会,宁波奉化,赵冬斌。
- 2025年8月2日,具身智能中的强化学习,第三届人工智能大模型技术高峰论坛,合肥,赵冬斌
- 2025年8月31日,自动驾驶大模型,嘉程创业流水席第271系,北京,夏中谱
- 2025年9月20日,面向具身操作的VLA现状和展望,第六届中国智能机器人学术年会,南通,赵冬斌。
- 2025年9月20日,大语言模型的深度思考能力探索,RL China 2025,科学智能体论坛,北京,张启超。
- 2025年9月21日,强化学习在多模态具身大模型中的应用,RL China 2025,多模态智能体论坛,北京,李浩然。
- 2025年9月26日,开放环境的多智能体决策智能,第十三届中国(绵阳)科技城国际科技博览会及新质生产力人工智能大会暨对接交流会,中国生产力促进中心协会,绵阳,朱圆恒。
- 2025年9月28日,端到端自动驾驶的探索和实践,2025车机人创新发展论坛,北京,赵冬斌。
- 2025年9月28日,磐石筑基:从AI理论到系统实践”-大语言模型推理技术,张启超
- 2025年10月23日,端到端自动驾驶的实践和探索,第三十二届中国汽车工程学会年会,重庆,赵冬斌。
- 2025年10月24日,端到端自动驾驶:从模仿学习到强化学习,2025中国车辆控制与智能化大会,Pre-conference Workshop on Trustworthy Autonomous Vehicles,青岛,张启超。
- 2025年10月29日,强化学习赋能具身智能,国科大2025-2026学年秋季学期的研究生科学前沿讲座,北京,赵冬斌。
- 2025年11月6日,具身智能的实践和探索,北京软件和信息服务业协会人工智能应用大讲堂,北京,赵冬斌
招生信息
招生专业1:控制理论与控制工程--群体智能与博弈对抗
招生专业2:模式识别--人工智能理论与方法
招生方向
教育背景
出国学习工作
工作经历
工作简历
社会兼职
2022-09-01-今,中国人工智能学会智能自适应协同优化控制专委会, 秘书长
2022-01-01-今,中国自动化学会“数据驱动、学习与优化”专业委员会, 副主任
2022-01-01-今,IEEE Computational Intelligence Magazine, Associate Editor
2021-09-01-2022-08-31,IEEE Conference on Games, General Chair
2021-01-01-2021-07-22,The International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, July 18-22, 2021, Competition Chair
2020-07-19-2020-07-24,IEEE World Congress on Computational Intelligence (WCCI 2020), Glasgow, UK, July 19 -24, 2020, Awards Chair
2020-03-01-今,IEEE Transactions on Artificial Intelligence, Associate Editor
2020-01-01-2020-12-31,IEEE CIS Distinguished Lectures Program, Chair
2019-12-11-2019-12-16,The 10th International Conference on Intelligent Control and Information Processing (ICICIP 2019), Marrakesh, Morocco, Program Chair
2019-12-06-2019-12-09,IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2019), Xiamen, China, Program Chair
2019-07-13-2019-07-18,IEEE International Joint Conference on Neural Networks (IJCNN 2019), Budapest, Hungary, Program Co-Chair
2019-05-04-2019-05-06,IEEE International Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr 2019), Shenzhen, China, General Co-Chair
2019-01-01-2019-12-31,IEEE CIS Technical Activities Strategy Planning Sub-Committee, Chair
2018-12-01-2018-12-04,The 25th International Conference on Neural Information Processing (ICONIP 2018), Siem Reap, Cambodia, Dec 1-4, 2018, Tutorial Chair
2018-11-18-2018-11-21,IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2018), Bangalore, India, Nov. 18 -21, 2018, Program Chair
2018-09-01-2019-08-31,IEEE Computation Intelligence Magazine special issue on “Deep Reinforcement Learning and Games”., Lead Guest Chair
2018-06-29-2018-07-06,2018 Eighth International Conference on Information Science and Technology (ICIST 2018), Cordoba, Granada, and Seville, Spain during June 30-July 6, 2018, Program Chair
2018-05-31-2018-12-31,IEEE Transactions on Neural Networks and Learning Systems special issue on “Deep Reinforcement Learning and Adaptive Dynamic Programming”, Lead Guest Editor
2018-03-01-今,IEEE Transactions on Cybernetics, Associate Editor
2017-11-26-2017-11-30,IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2017), Honolulu, Hawaii, USA, Program Chair
2017-11-13-2017-11-17,The 24th International Conference on Neural Information Processing (ICONIP 2017), Guangzhou, China, Program Chair
2017-07-05-2017-07-27,2017 IEEE CIS Summer School on Computational and Artificial Intelligence, Chair
2016-12-31-2017-12-31,IEEE计算智能学会北京分会, 主席
2016-12-05-2016-12-08,IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2016), Athens, Greece, Program Chair
2016-07-25-2017-07-29,IEEE World Congress on Computational Intelligence (WCCI 2016), Vancouver, Canada, Publicity Co-chair
2016-06-11-2016-06-14,The 13th World Congress on Intelligent Control and Automation (WCICA 2016), Guilin, China, Program Co-Chair
2015-10-15-2015-10-18,12th International Symposium on Neural Networks (ISNN 2015), Jeju, Korea, Program Co-Chair
2015-04-24-2015-04-26,The 5th International Conference on Information Science and Technology (ICIST 2015), Changsha, China, Program Chair
2015-01-01-今,Artificial Intelligence Review, Associate Editor
2014-12-31-2016-12-31,IEEE计算智能学会自适应动态规划和强化学习技术委员会, 主席
2014-12-31-2015-12-31,IEEE计算智能学会旅行资助委员会, 主席
2014-12-31-2016-12-31,IEEE计算智能学会多媒体委员会, 主席
2014-12-31-2016-12-31,IEEE计算智能学会北京分会, 副主席
2014-12-09-2014-12-12,IEEE Symposiums Series on Computational Intelligence (SSCI 2014), Atlanta, USA, Poster Chair
2014-07-06-2014-07-11,IEEE World Congress on Computational Intelligence (WCCI 2014), Beijing, China, Finance Co-Chair
2014-07-06-2014-07-11,IEEE CIS Summer School on Automated Computational Intelligence, Beijing, China, Chair
2013-12-31-2020-12-31,IEEE Computational Intelligence Magazine, Associate Editor,
2013-06-09-2013-06-11,The 4th International Conference on Intelligent Control and Information Processing (ICICIP 2013), Beijing, China, Program Chair
2012-12-31-2014-12-30,IEEE CIS Newsletter, Editor,
2012-07-11-2012-07-14,International Symposium on Neural Networks (ISNN 2012), Shenyang, China, Registration Chair
2012-07-11-2012-07-14,Brain Inspired Cognitive Systems (BICS 2012), Shenyang, China, Finance Chair
2011-12-31-2021-12-31,IEEE Transactions on Neural Networks and Learning Systems, Associate Editor
2011-11-01-今,Cognitive Computation, Associate Editor,
2010-09-30-2019-12-31,IEEE高级会员,
-今,
教授课程
专利与奖励
奖励信息
出版信息
发表论文
发表著作
科研活动
科研项目
参与会议
指导学生
已指导学生
田艺 硕士研究生 081101-控制理论与控制工程
胡朝辉 硕士研究生 081101-控制理论与控制工程
戴钰桀 博士研究生 081101-控制理论与控制工程
苏永生 硕士研究生 081101-控制理论与控制工程
张震 博士研究生 081101-控制理论与控制工程
王滨 博士研究生 081101-控制理论与控制工程
朱圆恒 博士研究生 081101-控制理论与控制工程
王海涛 硕士研究生 081101-控制理论与控制工程
夏中谱 博士研究生 081101-控制理论与控制工程
张启超 博士研究生 081101-控制理论与控制工程
吕乐 博士研究生 081101-控制理论与控制工程
卜丽 博士研究生 081101-控制理论与控制工程
陈亚冉 博士研究生 081101-控制理论与控制工程
唐振韬 博士研究生 081101-控制理论与控制工程
邵坤 博士研究生 081101-控制理论与控制工程
李栋 博士研究生 081101-控制理论与控制工程
卢毅 博士研究生 081101-控制理论与控制工程
李浩然 博士研究生 081101-控制理论与控制工程
丁子祥 博士研究生 081203-计算机应用技术
刘育琦 博士研究生 081101-控制理论与控制工程
李伟凡 博士研究生 081104-模式识别与智能系统
胡光政 博士研究生 081203-计算机应用技术
李楠楠 博士研究生 081101-控制理论与控制工程
王俊杰 博士研究生 081101-控制理论与控制工程
李丁 博士研究生 081203-计算机应用技术
刘民颂 博士研究生 081101-控制理论与控制工程
刘莎莎 硕士研究生 085410-人工智能
马名骏 硕士研究生 085410-人工智能
郭又天 硕士研究生 085211-计算机技术
柴嘉骏 博士研究生 081101-控制理论与控制工程
现指导学生
陆润宇 博士研究生 081203-计算机应用技术
范昌易 硕士研究生 085410-人工智能
赵子杰 博士研究生 081203-计算机应用技术
傅宇千 博士研究生 081104-模式识别与智能系统
徐凯旋 博士研究生 081203-计算机应用技术
田帅 硕士研究生 081104-模式识别与智能系统
刘鑫 博士研究生 081104-模式识别与智能系统
江震南 博士研究生 081203-计算机应用技术
孙敬博 博士研究生 081101-控制理论与控制工程
陈宇辉 博士研究生 081101-控制理论与控制工程
陈庆 硕士研究生 085410-人工智能
凃崧峻 博士研究生 081101-控制理论与控制工程
刘学义 博士研究生 081101-控制理论与控制工程
崔文博 博士研究生 081101-控制理论与控制工程
李博宇 博士研究生 081101-控制理论与控制工程
刘卫恒 博士研究生 081104-模式识别与智能系统
李思成 博士研究生 081101-控制理论与控制工程
郑宇鹏 博士研究生 081101-控制理论与控制工程