发表论文
[1] Cao, Hang, Yuan, Liang, Zhang, He, Zhang, Yunquan, Wu, Baodong, Li, Kun, Li, Shigang, Zhang, Minghua, Lu, Pengqi, Xiao, Junmin. AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3-D Parallelization and Leap-Format. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS[J]. 2023, 第 4 作者34(3): 766-780, [2] 郭金鑫, 张广婷, 张云泉, 陈泽华, 贾海鹏. Cooley-Tukey FFT算法高性能实现与优化研究. 计算机科学与探索[J]. 2022, 第 3 作者16(6): 1304-1315, http://lib.cqvip.com/Qikan/Article/Detail?id=7107347841.[3] 牟明任, 贾海鹏, 张云泉, 邓明森, 曲国远, 魏大洲, 张广婷. 基于ARM架构的中值滤波算法优化. 计算机工程与科学[J]. 2022, 第 3 作者44(10): 1738-1746, http://lib.cqvip.com/Qikan/Article/Detail?id=7108225462.[4] 纪璎芮, 袁良, 张云泉. 红黑Gauss-Seidel Stencil并行性和局部性优化. 计算机科学[J]. 2022, 第 3 作者49(5): 363-370, http://lib.cqvip.com/Qikan/Article/Detail?id=7107076530.[5] 韦存阳, 贾海鹏, 张云泉, 曲国远, 魏大洲, 张广婷. 基于ARMv8处理器的高性能图像处理算法实现与优化研究. 计算机工程与科学[J]. 2022, 第 3 作者44(10): 1711-1720, http://lib.cqvip.com/Qikan/Article/Detail?id=7108225459.[6] Li, Kun, Yuan, Liang, Zhang, Yunquan, Chen, Gongwei. An Accurate and Efficient Large-Scale Regression Method Through Best Friend Clustering. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS[J]. 2022, 第 3 作者33(11): 3129-3140, http://dx.doi.org/10.1109/TPDS.2021.3134336.[7] 陈岳涛, 邱柯妮, 陈莉, 贾海鹏, 张云泉, 肖利民, 刘磊. Smart Scheduler: an Adaptive NVM-Aware Thread Scheduling Approach on NUMA Systems. CCF Transactions on High Performance Computing (THPC)[J]. 2022, 第 5 作者[8] 王麓涵, 贾海鹏, 张云泉, 张广婷. 基于ARM的图像几何变换算法库实现和优化技术研究. 计算机科学. 2022, 第 3 作者49(10): 10-17, https://d.wanfangdata.com.cn/periodical/jsjkx202210002.[9] Cheng, Daning, Li, Shigang, Zhang, Hanping, Xia, Fen, Zhang, Yunquan. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS[J]. 2021, 第 5 作者 通讯作者 32(7): 1702-1712, http://dx.doi.org/10.1109/TPDS.2020.3048836.[10] 张云泉, 袁良, 袁国兴, 李希代. 2021年中国高性能计算机发展现状分析与展望. 数据与计算发展前沿[J]. 2021, 第 1 作者3(6): 98-107, http://www.jfdc.cnic.cn/CN/10.11871/jfdc.10-1649.2021.06.007.[11] Shang, Honghui, Liang, WanZhen, Zhang, Yunquan, Yang, Jinlong. Efficient parallel linear scaling method to get the response density matrix in all-electron real-space density-functional perturbation theory. COMPUTERPHYSICSCOMMUNICATIONS[J]. 2021, 第 3 作者258: http://dx.doi.org/10.1016/j.cpc.2020.107613.[12] 袁国兴, 张云泉, 袁良. 2021年中国高性能计算机发展现状分析. 计算机工程与科学[J]. 2021, 第 2 作者43(12): 2091-2097, http://lib.cqvip.com/Qikan/Article/Detail?id=7106227775.[13] Shang, Honghui, Duan, Xiaohui, Li, Fang, Zhang, Libo, Xu, Zhiqian, Liu, Kan, Luo, Haiwen, Ji, Yingrui, Zhao, Wenxuan, Xue, Wei, Chen, Li, Zhang, Yunquan. Many-core acceleration of the first-principles all-electron quantum perturbation calculations. COMPUTER PHYSICS COMMUNICATIONS[J]. 2021, 第 12 作者267: http://dx.doi.org/10.1016/j.cpc.2021.108045.[14] 赵永浩, 贾海鹏, 张云泉, 张思佳. 基于SIMD的Square Root函数高性能实现与优化. 计算机工程与科学[J]. 2021, 第 3 作者43(4): 662-669, http://lib.cqvip.com/Qikan/Article/Detail?id=7104519623.[15] Cheng Daning, Li Shigang, Zhang Yunquan. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING[J]. 2020, 第 3 作者145: 202-216, http://dx.doi.org/10.1016/j.jpdc.2020.06.011.[16] 曹杭, 袁良, 黄珊, 张云泉, 徐勇军, 陆鹏起, 张广婷. 一种基于空间密铺的星型Stencil并行算法. 计算机研究与发展[J]. 2020, 第 4 作者57(12): 2621-2634, http://lib.cqvip.com/Qikan/Article/Detail?id=7103384456.[17] Shang, Honghui, Xu, Lei, Wu, Baodong, Qin, Xinming, Zhang, Yunquan, Yang, Jinlong. The dynamic parallel distribution algorithm for hybrid density-functional calculations in HONPAS package. COMPUTER PHYSICS COMMUNICATIONS[J]. 2020, 第 5 作者254: http://dx.doi.org/10.1016/j.cpc.2020.107204.[18] 周广庆, 张云泉, 姜金荣, 张贺, 吴保东, 曹杭, 王天一, 郝卉群, 朱家文, 袁良, 张明华. 地球系统模式CAS-ESM. 数据与计算发展前沿[J]. 2020, 第 2 作者2(1): 38-54, http://www.jfdc.cnic.cn/CN/10.11871/jfdc.issn.2096-742X.2020.01.004.[19] Li, Kun, Li, Shigang, Huang, Shan, Chen, Yifeng, Zhang, Yunquan. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. JOURNAL OF SUPERCOMPUTING[J]. 2020, 第 5 作者76(7): 5501-5520, https://www.webofscience.com/wos/woscc/full-record/WOS:000538267400033.[20] Li, Zhihao, Jia, Haipeng, Zhang, Yunquan, Chen, Tun, Yuan, Liang, Vuduc, Richard. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS[J]. 2020, 第 3 作者31(8): 1925-1941, http://dx.doi.org/10.1109/TPDS.2020.2977629.[21] 程大宁, 张汉平, 夏粉, 李士刚, 袁良, 张云泉. AccSMBO:一种基于超参梯度和元学习的SMBO加速算法. 计算机研究与发展[J]. 2020, 第 6 作者57(12): 2596-2609, http://lib.cqvip.com/Qikan/Article/Detail?id=7103384454.[22] 袁国兴, 张云泉, 袁良. 2020年中国高性能计算机发展现状分析. 计算机工程与科学[J]. 2020, 第 2 作者42(12): 2103-2108, http://lib.cqvip.com/Qikan/Article/Detail?id=7103580856.[23] 尚子豪, 商红慧, 王东杰, 张云泉, 贺新福, 陈泽华, 王栋, 张广婷. 原子动力学蒙特卡洛程序OpenKMC在反应堆压力容器钢缺陷损伤研究中的优化与应用. 计算机工程与科学[J]. 2020, 第 4 作者42(12): 2151-2162, http://lib.cqvip.com/Qikan/Article/Detail?id=7103580862.[24] 王栋, 商红慧, 张云泉, 李琨, 贺新福, 贾丽霞. 原子动力学蒙特卡洛程序MISA-KMC在反应堆压力容器钢辐照损伤研究中的应用. 计算机科学[J]. 2020, 第 3 作者47(4): 30-35, http://lib.cqvip.com/Qikan/Article/Detail?id=7101330964.[25] Chen, Daobi, Yuan, Liang, Zhang, Yunquan, Yan, Jingfu, Kahaner, David. HPC software capability landscape in China. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS[J]. 2020, 第 3 作者34(1): 115-153, https://www.webofscience.com/wos/woscc/full-record/WOS:000503082100010.[26] 张云泉, 袁良, 袁国兴, 李希代. 2020年中国高性能计算机发展现状分析与展望. 数据与计算发展前沿[J]. 2020, 第 1 作者2(6): 1-10, http://www.jfdc.cnic.cn/CN/10.11871/jfdc.issn.2096-742X.2020.06.001.[27] 张云泉, 袁良, 袁国兴, 李希代. 2019年中国高性能计算机发展现状分析与展望. 数据与计算发展前沿[J]. 2020, 第 1 作者2(1): 18-26, http://www.jfdc.cnic.cn/CN/10.11871/jfdc.issn.2096-742X.2020.01.002.[28] 张云泉, 袁良, 陈一峯, 冯晓兵, 张贺. 高性能计算多层次不连续非线性可扩展现象研究. 计算机学报[J]. 2020, 第 1 作者43(6): 973-989, https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFQ&dbname=CJFDLAST2020&filename=JSJX202006001&v=MDcyMjg3RGgxVDNxVHJXTTFGckNVUjdxZVp1ZHZGeURrVWJySUx6N0Jkckc0SE5ITXFZOUZaWVI4ZVgxTHV4WVM=.[29] Qin, Xinming, Shang, Honghui, Xu, Lei, Hu, Wei, Yang, Jinlong, Li, Shigang, Zhang, Yunquan. The static parallel distribution algorithms for hybrid density-functional calculations in HONPAS package. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS[J]. 2020, 第 7 作者34(2): 159-168, http://dx.doi.org/10.1177/1094342019845046.[30] Yuan, Liang, Ding, Chen, Smith, Wesley, Denning, Peter, Zhang, Yunquan. A Relational Theory of Locality. ACMTRANSACTIONSONARCHITECTUREANDCODEOPTIMIZATION[J]. 2019, 第 5 作者16(3): http://dx.doi.org/10.1145/3341109.[31] Li, Zhihao, Jia, Haipeng, Zhang, Yunquan, Liu, Shice, Li, Shigang, Wang, Xiao, Zhang, Hao. Efficient parallel optimizations of a high-performance SIFT on GPUs. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING[J]. 2019, 第 3 作者124: 78-91, http://dx.doi.org/10.1016/j.jpdc.2018.10.012.[32] 袁国兴, 张云泉, 袁良. 2019年中国高性能计算机发展现状分析. 计算机工程与科学[J]. 2019, 第 2 作者41(12): 2095-2100, http://lib.cqvip.com/Qikan/Article/Detail?id=7100629190.[33] Zhang, Di, Zhang, Yunquan, Niu, Qiang, Qiu, Xingbao. Mining concise patterns on graph-connected itemsets. NEUROCOMPUTING[J]. 2019, 第 2 作者336: 27-35, http://dx.doi.org/10.1016/j.neucom.2018.03.084.[34] 张云泉. 2018年中国高性能计算机发展现状分析与展望. 计算机科学[J]. 2019, 第 1 作者46(1): 1-5, http://lib.cqvip.com/Qikan/Article/Detail?id=7001144965.[35] 郭鹏, 袁良, 张云泉, 黄珊. 基于空间密铺的并行Stencil算法. 计算机科学与探索[J]. 2019, 第 3 作者13(2): 181-194, http://lib.cqvip.com/Qikan/Article/Detail?id=7001186343.[36] Li, Kun, Li, Shigang, Huang, Shan, Chen, Yifeng, Zhang, Yunquan. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations (vol 457, pg 235, 2020). JOURNAL OF SUPERCOMPUTING. 2019, 第 5 作者75(12): 8339-8340, [37] 陈暾, 李志豪, 贾海鹏, 张云泉. 基于ARMv8平台的多维FFT实现与优化研究. 计算机学报[J]. 2019, 第 4 作者42(11): 2384-2402, http://lib.cqvip.com/Qikan/Article/Detail?id=7100202299.[38] Guo, Bingli, Shang, Yu, Zhang, Yunquan, Li, Wenzhe, Yin, Shan, Zhang, Yongjun, Huang, Shanguo. Timeslot Switching-Based Optical Bypass in Data Center for Intrarack Elephant Flow With an Ultrafast DPDK-Enabled Timeslot Allocator. JOURNAL OF LIGHTWAVE TECHNOLOGY[J]. 2019, 第 3 作者37(10): 2253-2260, http://dx.doi.org/10.1109/JLT.2019.2901600.[39] Li, Shigang, Zhang, Yunquan, Hoefler, Torsten. Cache-Oblivious MPI All-to-All Communications Based on Morton Order. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS[J]. 2018, 第 2 作者29(3): 542-555, https://www.webofscience.com/wos/woscc/full-record/WOS:000425173200005.[40] 张云泉. 2017年中国高性能计算机发展现状分析与展望. 科研信息化技术与应用[J]. 2018, 第 1 作者9(1): 5-12, http://lib.cqvip.com/Qikan/Article/Detail?id=676032706.[41] 袁国兴, 张云泉, 袁良. 2018年中国高性能计算机发展现状分析. 计算机工程与科学[J]. 2018, 第 2 作者40(12): 2097-2102, http://lib.cqvip.com/Qikan/Article/Detail?id=7001036157.[42] Xiao, Junmin, Li, Shigang, Wu, Baodong, Zhang, He, Li, Kun, Yao, Erlin, Zhang, Yunquan, Tan, Guangming. Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model. PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING. 2018, 第 7 作者[43] 张云泉. 对当前人工智能热的冷思考. 高科技与产业化[J]. 2018, 第 1 作者14-17, http://lib.cqvip.com/Qikan/Article/Detail?id=675018244.[44] Wang, Xiao, Ma, Haipeng, Li, Zhihao, Zhang, Yunquan, Vaidya, J, Li, J. Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT II. 2018, 第 4 作者11335: 338-353, [45] 王庆磊, 罗文慧, 邬玉良, 张云泉. 交通大数据应用分析及共享支撑平台设计. 信息技术与标准化[J]. 2018, 第 4 作者66-69, http://lib.cqvip.com/Qikan/Article/Detail?id=676233169.[46] Wu, Baodong, Li, Shigang, Zhang, Yunquan, Nie, Ningming. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. COMPUTER PHYSICS COMMUNICATIONS[J]. 2017, 第 3 作者211: 113-123, http://www.corc.org.cn/handle/1471x/2374191.[47] Li, Zhihao, Jia, Haipeng, Zhang, Yunquan, IEEE. HartSift: A High-Accuracy and Real-Time SIFT based on GPU. 2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS). 2017, 第 3 作者135-142, [48] 张云泉. 中国高性能计算机发展现状分析与展望. 民主与科学[J]. 2017, 第 1 作者26-27, http://lib.cqvip.com/Qikan/Article/Detail?id=7000298988.[49] 李琨, 贾海鹏, 曹婷, 张云泉. 大规模集群上多维FFT算法的实现与优化研究. 计算机科学与探索[J]. 2017, 第 4 作者11(6): 863-874, http://lib.cqvip.com/Qikan/Article/Detail?id=7000227257.[50] 张迪, 张云泉, 张广治. 一种在图连接项集上发掘精简模式的方法. 中国传媒大学学报:自然科学版[J]. 2017, 第 2 作者25-30, http://lib.cqvip.com/Qikan/Article/Detail?id=66747166504849554851484854.[51] 聂宁明, 胡长军, 张云泉, 贺新福, 张博尧, 李士刚. 材料微观结构演化大规模分子动力学软件比较. 计算机科学与探索[J]. 2017, 第 3 作者11(3): 355-364, http://lib.cqvip.com/Qikan/Article/Detail?id=7000132516.[52] Wang Chenxi, Cao Ting, Zigman John, Lv Fang, Zhang Yunquan, Feng Xiaobing, Gao GR, Qian DP, Gao XB, Chapman B, Chen W. Efficient Management for Hybrid Memory in Managed Language Runtime. NETWORK AND PARALLEL COMPUTING. 2016, 第 5 作者9966: 29-42, [53] 逄仁波, 张云泉, 谭光明, 徐建良, 贾海鹏, 解庆春. 边缘海静力数值预报模式并行算法研究. 计算机科学[J]. 2016, 第 2 作者43(1): 14-17,29, http://lib.cqvip.com/Qikan/Article/Detail?id=667766682.[54] Zhang, Yunquan, Zhang, JiLin. Workshop on high performance data intensive computing. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE. 2016, 第 1 作者28(6): 1695-1696, https://www.webofscience.com/wos/woscc/full-record/WOS:000374011400001.[55] 贾海鹏, 张云泉, 袁良, 李士刚. 基于OpenCL的Viola-Jones人脸检测算法性能优化研究. 计算机学报[J]. 2016, 第 2 作者39(9): 1775-1789, http://lib.cqvip.com/Qikan/Article/Detail?id=669845563.[56] Wu Baodong, Li Shigang, Zhang Yunquan, Chen W, Yin G, Zhao G, Han Q, Jing W, Sun G, Lu Z. Optimizing Parallel Kinetic Monte Carlo Simulation by Communication Aggregation and Scheduling. BIGDATATECHNOLOGYANDAPPLICATIONS. 2016, 第 11 作者590: 282-297, [57] Zhang, Yunquan, Cao, Ting, Li, Shigang, Tian, Xinhui, Yuan, Liang, Jia, Haipeng, Vasilakos, Athanasios V. Parallel Processing Systems for Big Data: A Survey. PROCEEDINGS OF THE IEEE[J]. 2016, 第 1 作者 通讯作者 104(11): 2114-2136, https://www.webofscience.com/wos/woscc/full-record/WOS:000386244000005.[58] Zhang, Yunquan, Li, Shigang, Yan, Shengen, Zhou, Huiyang. A Cross-Platform SpMV Framework on Many-Core Architectures. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2016, 第 1 作者13(4): http://dx.doi.org/10.1145/2994148.[59] Zhang, Yunquan, Li, Shigang, Yan, Shengen, Zhou, Huiyang. A Cross-Platform SpMV Framework on Many-Core Architectures. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2016, 第 1 作者13(4): https://www.webofscience.com/wos/woscc/full-record/WOS:000392416400002.[60] 吴保东, 张云泉, 李士刚, 贺新福, 周宇世强, 周宇世强. 面向RPV钢中富Cu团簇析出的KMC模拟算法研究. 第十七届全国科学计算与信息化会议暨智慧科研论坛. 2015, 第 2 作者http://ir.ihep.ac.cn/handle/311005/211444.[61] An Xiaojing, Jia Haipeng, Zhang Yunquan, IEEE. Optimized Password Recovery for Encrypted RAR on GPUs. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS). 2015, 第 3 作者591-598, [62] Li ShiGang, Hu ChangJun, Zhang JunChao, Zhang YunQuan. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. SCIENCE CHINA-INFORMATION SCIENCES[J]. 2015, 第 4 作者58(9): http://dx.doi.org/10.1007/s11432-014-5254-x.[63] Zhu Xiaomin, Zhang Junchao, Yoshii Kazutomo, Li Shigang, Zhang Yunquan, Balaji Pavan, IEEE. Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING. 2015, 第 5 作者1099-1106, [64] Fan Mengran, Jia Haipeng, Zhang Yunquan, An Xiaojing, Cao Ting, IEEE. Optimizing Image Sharpening Algorithm on GPU. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP). 2015, 第 3 作者230-239, [65] Li ShiGang, Hu ChangJun, Zhang JunChao, Zhang YunQuan. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. SCIENCE CHINA-INFORMATION SCIENCES[J]. 2015, 第 4 作者58(9): https://www.sciengine.com/doi/10.1007/s11432-014-5254-x.[66] Li Shigang, Zhang Yunquan, Xiang Chunyang, Shi Lei, IEEE. Fast Convolution Operations on Many-Core Architectures. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS). 2015, 第 2 作者316-323, [67] 安小景, 张云泉, 贾海鹏. 基于OpenCL的直方图生成算法优化方法研究. 计算机科学[J]. 2015, 第 2 作者42(11): 32-36, http://lib.cqvip.com/Qikan/Article/Detail?id=666686707.[68] 詹科, 张云泉, 王婷, 郑晶晶, 张鹏. 基于Pthreads的并行DSRC压缩算法设计与实现. 计算机科学[J]. 2015, 第 2 作者42(1): 90-91,100, http://lib.cqvip.com/Qikan/Article/Detail?id=663510085.[69] Liu, YiQun, Li, Yan, Zhang, YunQuan, Zhang, XianYi. Memory Efficient Two-Pass 3D FFT Algorithm for Intel (R) Xeon Phi(TM) Coprocessor. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2014, 第 3 作者29(6): 989-1002, https://www.webofscience.com/wos/woscc/full-record/WOS:000345382500005.[70] Yan Shengen, Li Chao, Zhang Yunquan, Zhou Huiyang, Assoc Comp Machinery. yaSpMV: Yet Another SpMV Framework on GPUs. PPOPP'14: PROCEEDINGS OF THE 2014 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING. 2014, 第 3 作者107-118, http://dx.doi.org/10.1145/2555243.2555255.[71] Xie Qingchun, Zhang Yunquan, Jia Haipeng, Lu Yongquan, IEEE. Research on Mahalanobis Distance Algorithm Optimization Based on OpenCL. 2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS). 2014, 第 2 作者84-91, http://dx.doi.org/10.1109/HPCC.2014.19.[72] Liu Yiqun, Li Yan, Zhang Yunquan, Zhang Xianyi. Memory Efficient Two-Pass 3D FFT Algorithm for Intel? Xeon Phi~(TM) Coprocessor. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2014, 第 3 作者29(6): 989-1002, http://sciencechina.cn/gw.jsp?action=detail.jsp&internal_id=5279753&detailType=1.[73] Changmao Wu. Physically Based Parallel Ray Tracer for the Metropolis Light Transport Algorithm on the Tianhe-2 Supercomputer. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS). 2014, 444-453, [74] Changmao Wu. Large Scale Satellite Imagery Simulations with Physically Based Ray Tracing on Tianhe-1A Supercomputer. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC). 2013, 549-556, [75] Lama Palden, Li Yan, Aji Ashwin M, Balaji Pavan, Dinan James, Xiao Shucai, Zhang Yunquan, Feng Wuchun, Thakur Rajeev, Zhou Xiaobo, IEEE. pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments. 2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS). 2013, 第 7 作者145-154, [76] Luo Tao, Liao Yin, Chen Guoliang, Zhang Yunquan, Hu X, Lin TY, Raghavan V, Wah B, BaezaYates R, Fox G, Shahabi C, Smith M, Yang Q, Ghani R, Fan W, Lempel R, Nambiar R. P-DOT: A Model of Computation for Big Data. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA. 2013, 第 4 作者[77] Wang Qian, Zhang Xianyi, Zhang Yunquan, Yi Qing. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs. 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013. 2013, 第 3 作者http://ir.iscas.ac.cn/handle/311060/16662.[78] Li, Yan, Zhang, YunQuan, Liu, YiQun, Long, GuoPing, Jia, HaiPeng. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2013, 第 2 作者 通讯作者 28(1): 90-105, https://www.webofscience.com/wos/woscc/full-record/WOS:000314190600008.[79] 张云泉. MPFFT:异构平台上性能自适应FFT框架. 计算机研究与发展. 2013, 第 1 作者[80] Yan Shengen, Long Guoping, Zhang Yunquan. StreamScan: Fast scan algorithms for GPUs without global barrier synchronization. 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2013. 2013, 第 3 作者229-238, http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000324158900022&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=3a85505900f77cc629623c3f2907beab.[81] 袁国兴, 孙家昶, 张林波, 张云泉. 2013年中国高性能计算机发展现状分析及系统测评技术简析. 计算机工程与科学[J]. 2013, 第 4 作者35(11): 1, https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFQ&dbname=CJFDHIS2&filename=JSJK201311001&v=MjQwNjk5TE5ybzlGWllSOGVYMUx1eFlTN0RoMVQzcVRyV00xRnJDVVI3cWZidVp0RkNybFViekJMejdCWmJHNEg=.[82] 张云泉. 针对应用对角线矩阵特点的SpMV自适应性能优化. 计算机研究与发展. 2013, 第 1 作者[83] Zhang Xianyi, Wang Qian, Zhang Yunquan. Model-driven level 3 blas performance optimization on loongson 3a processor. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS - ICPADS. 2012, 第 3 作者684-691, [84] Yuan Liang, Ding Chen, , Tefankovic Daniel, Zhang Yunquan. Modeling the locality in graph traversals. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING. 2012, 第 5 作者138-147, http://ir.iscas.ac.cn/handle/311060/15873.[85] Yuan Liang, Zhang Yunquan. A locality-based performance model for load-and-compute style computation. PROCEEDINGS - 2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER 2012. 2012, 第 2 作者566-571, [86] Jia Haipeng, Zhang Yunquan, Long Guoping, Xu Jianliang, Yan Shengen, Li Yan, Kaklamanis C, Papatheodorou T, Spirakis PG. GPURoofline: A Model for Guiding Performance Optimizations on GPUs. EURO-PAR 2012 PARALLEL PROCESSING. 2012, 第 2 作者7484: 920-932, [87] Sun Xiangzheng, Zhang Yunquan, Wang Ting, Zhang Xianyi, Yuan Liang, Rao Li. Optimizing spmv for diagonal sparse matrices on gpu. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING. 2011, 第 2 作者492-501, http://ir.iscas.ac.cn/handle/311060/16207.[88] Sun Xiangzheng, Zhang Yunquan, Wang Ting, Long Guoping, Zhang Xianyi, Li Yan. Crsd: application specific auto-tuning of spmv for diagonal sparse matrices. LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS). 2011, 第 2 作者316-327, http://124.16.136.157/handle/311060/14335.[89] 张云泉. Heterogeneous Multi-core Parallel SGEMM Performance Testing and Analysis on Cell/B.E Processor. IEEE NAS 2010. 2010, 第 1 作者[90] 张云泉. BLAS库在多核处理器上的性能测试与分析. 软件学报. 2010, 第 1 作者[91] 孙相征, 张云泉, 王宣强, 王磊. 数值软件自适应性能优化搜索过程评价技术研究. 计算机研究与发展[J]. 2010, 第 2 作者679-686, http://lib.cqvip.com/Qikan/Article/Detail?id=33523775.[92] Wang Lei, Zhang Yunquan, Zhang Xianyi, Liu Fangfang. Accelerating linpack performance with mixed precision algorithm on cpu+gpgpu heterogeneous cluster. PROCEEDINGS - 10TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, CIT-2010, 7TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, ICESS-2010, SCALCOM-2010. 2010, 第 2 作者1169-1174, http://124.16.136.157/handle/311060/8642.[93] 余元, 张云泉, 李会元. 一类非张量积区域快速傅立叶变换算法在国产并行机上的可扩展性测试. 数值计算与计算机应用[J]. 2010, 第 2 作者31(2): 123-130, [94] 张云泉. 基于延迟隐藏因子的GPU计算模型. 软件学报. 2010, 第 1 作者[95] 张云泉. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness.. IEEE CSE 2010. 2010, 第 1 作者[96] 袁娥, 张云泉, 刘芳芳, 孙相征. SpMV的自动性能优化实现技术及其应用研究. 计算机研究与发展[J]. 2009, 第 2 作者1117-1126, http://lib.cqvip.com/Qikan/Article/Detail?id=30839323.[97] Yuxin Tang, Yunquan Zhang, Hu Chen. A parallel shortest path algorithm based on graph-partitioning and iterative correcting. COMPUTERSYSTEMSSCIENCEENGINEERING[J]. 2009, 第 2 作者24(5): 351-360, http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000277952300007&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=3a85505900f77cc629623c3f2907beab.[98] Zhang Yunquan. Early Performance Evaluation of Dawning 5000A and DeepComp 7000. Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2009). 2009, 第 1 作者[99] Zhang Yunquan. Memory Access Complexity Analysis of SpMV in RAM (h) Model. Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. 2008, 第 1 作者[100] Zhang Di, Zhang Yunquan, Liu Shengfei, Huang Xiaodi. Parallelization of fm-index. PROCEEDINGS - 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, HPCC 2008. 2008, 第 2 作者169-173, http://124.16.136.157/handle/311060/10742.[101] 刘胜飞, 张云泉. 一种改进的OpenMP Guided 调度策略研究. 2008年全国高性能计算机学术年会论文集. 2008, 第 2 作者486, http://124.16.136.157/handle/311060/10780.[102] Tang Yuxin, Zhang Yunquan, Chen Hu. A parallel shortest path algorithm based on graph-partitioning and iterative correcting. PROCEEDINGS - 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, HPCC 2008. 2008, 第 2 作者155-161, http://124.16.136.157/handle/311060/10708.[103] Zhang Di, Zhang Yunquan, Chen Jing, Amati G, Carpineto C, Romano G. Efficient construction of FM-index using overlapping block processing for large scale texts. ADVANCES IN INFORMATION RETRIEVAL. 2007, 第 2 作者4425: 113-+, [104] Zhang Yunquan. Models of Parallel Computation: A Survey and Classification. Frontiers of Computer Science in China, Springer. 2007, 第 1 作者[105] Chen, GuoLiang, Sun, GuangZhong, Zhang, YunQuan, Mo, ZeYao. Study on parallel computing. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2006, 第 3 作者21(5): 665-673, http://sciencechina.cn/gw.jsp?action=detail.jsp&internal_id=2430013&detailType=1.[106] 陈靖, 张云泉, 张林波, 袁伟. 一种新的MPI Allgather算法及其在万亿次机群系统上的实现与性能分析. 计算机学报[J]. 2006, 第 2 作者29(5): 808-814, http://lib.cqvip.com/Qikan/Article/Detail?id=21884373.[107] Chen Jing, Zhang Linbo, Zhang Yunquan, Yuan Wei. Performance evaluation of allgather algorithms on terascale linux cluster with fast ethernet. PROCEEDINGS - EIGHTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, HPC ASIA 2005. 2005, 第 3 作者437-442, http://124.16.136.157/handle/311060/12590.[108] 袁伟, 张云泉, 孙家昶, 李玉成. 国产万亿次机群系统NPB性能测试分析. 计算机研究与发展[J]. 2005, 第 2 作者42(6): 1079-1084, http://lib.cqvip.com/Qikan/Article/Detail?id=15707305.[109] 张云泉. 面向高性能数值计算的并行计算模型DRAM(h). 计算机学报[J]. 2003, 第 1 作者26(12): 1660-1670, http://lib.cqvip.com/Qikan/Article/Detail?id=8809569.[110] 张云泉, 孙家昶, 唐志敏, 迟学斌. 数值计算程序的存储复杂性分析. 计算机学报[J]. 2000, 第 1 作者23(4): 362-373, http://lib.cqvip.com/Qikan/Article/Detail?id=4149067.