基本信息
马文静 女 硕导 软件研究所
电子邮件:wenjing@iscas.ac.cn
通信地址:北京市海淀区中关村南四街4号中科院软件所5号楼
邮政编码:

招生信息

   
招生专业
081202-计算机软件与理论
085211-计算机技术
招生方向
并行计算,GPU计算,并行代码生成与优化,并行系统性能优化
加速器并行计算技术,并行系统任务调度

教育背景

2005-09--今 The Ohio State University, USA Ph.D
2000-09--今 南开大学 学士

工作经历

   
工作简历
2011-04--今 Pacific Northwest National Laboratory, USA Postdoc Research Associate
2004-07--今 天津泛凯科贸有限公司 软件工程师

专利与奖励

   
专利成果
[1] 刘芳芳, 吴丽鑫, 马文静, 汪荃, 王志军, 孙家昶, 杨超. 一种众核平台上面向规则网格问题的结构化着色方法. CN: CN110942504B, 2021-07-27.

[2] 黎雷生, 马文静, 赵海涛, 孙家昶, 李会元. 一种适合复杂异构系统的HPL矩阵更新优化方法. CN: CN111913748A, 2020-11-10.

[3] 邓嗣琦, 刘超, 龙国平, 马文静. 一种Web应用程序自动化测试工具及方法. 中国: CN106776343A, 2017-05-31.

[4] 邓嗣琦, 杜长营, 马文静, 龙国平. 一种基于在线变分贝叶斯支持向量回归的交通事故率预测系统. 中国: CN106339608A, 2017-01-18.

[5] 吴振华, 马文静, 龙国平, 李玉成. 一种基于异构加速平台的二维相位解缠绕方法. 中国: CN103942095A, 2014-07-23.

出版信息

   
发表论文
[1] 刘芳芳, 王志军, 汪荃, 吴丽鑫, 马文静, 杨超, 孙家昶. 国产异构系统上的HPCG并行算法及高效实现. 软件学报. 2021, 32(8): 2341-2351, [2] 黎雷生, 杨文浩, 马文静, 张娅, 赵慧, 赵海涛, 李会元, 孙家昶. 复杂异构计算系统HPL的优化. 软件学报[J]. 2021, 32(8): 2307-2318, [3] Jiang, Lijuan, Yang, Chao, Ma, Wenjing. Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2020, 17(1): https://www.webofscience.com/wos/woscc/full-record/WOS:000582614800003.
[4] Ma, Wenjing, Ao, Yulong, Yang, Chao, Williams, Samuel. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS[J]. 2020, 23(2): 493-507, https://www.webofscience.com/wos/woscc/full-record/WOS:000549737600007.
[5] Min Li, Chao Yang, Qiao Sun, WenJing Ma, WenLong Cao, YuLong Ao. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight. 计算机科学技术学报:英文版[J]. 2019, 34(1): 77-93, http://lib.cqvip.com/Qikan/Article/Detail?id=6100199649.
[6] Wenjing Ma, Yulong Ao, Chao Yang, Samuel Williams. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight 在Sunway太湖光上用HPGMG求解一万亿次每秒的未知. Cluster Computing,. 2019, http://kns.cnki.net/KCMS/detail/detail.aspx?QueryID=0&CurRec=1&recid=&FileName=SSJD55ADEBB656D3CEA5ADD0A45C6A48CD3A&DbName=SSJD_01&DbCode=SSJD&yx=&pr=&URLID=&bsm=.
[7] Li, Min, Yang, Chao, Sun, Qiao, Ma, WenJing, Cao, WenLong, Ao, YuLong. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2019, 34(1): 77-93, http://lib.cqvip.com/Qikan/Article/Detail?id=6100199649.
[8] Cai Ying, Yang Chao, Ma Wenjing, Ao Yulong, IEEE. Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores. 2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)null. 2018, 566-571, http://dx.doi.org/10.1109/CCGRID.2018.00086.
[9] Cai, Ying, Ao, Yulong, Yang, Chao, Ma, Wenjing, Zhao, Haitao. Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2018, 15(2): http://ir.iscas.ac.cn/handle/311060/19175.
[10] Ao Yulong, Ma Wenjing, Yang Chao, Cai Ying. Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computingnull. 2018, http://ir.iscas.ac.cn/handle/311060/19176.
[11] Ao Yulong, Ma Wenjing, Yang Chao, Cai Ying. Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computingnull. 2018, http://ir.iscas.ac.cn/handle/311060/19176.
[12] Ao, Yulong, Yang, Chao, Wang, Xinliang, Xue, Wei, Fu, Haohuan, Liu, Fangfang, Gan, Lin, Xu, Ping, Ma, Wenjing, IEEE. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)null. 2017, 535-544, [13] 卜宁, 牛树梓, 马文静, 龙国平. 面向相似App推荐的列表式多核相似性学习算法. 计算机系统应用. 2017, 26(1): 116-121, http://lib.cqvip.com/Qikan/Article/Detail?id=671037999.
[14] 马文静. Localized Fault Recovery for Nested Fork-Join Programs. IPDPS. 2017, [15] Ao, Yulong, Yang, Chao, Wang, Xinliang, Xue, Wei, Fu, Haohuan, Liu, Fangfang, Gan, Lin, Xu, Ping, Ma, Wenjing, IEEE. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)null. 2017, 535-544, [16] 卜宁, 牛树梓, 马文静, 龙国平. 面向相似App推荐的列表式多核相似性学习算法. 计算机系统应用. 2017, 26(1): 116-121, http://lib.cqvip.com/Qikan/Article/Detail?id=671037999.
[17] 马文静. Localized Fault Recovery for Nested Fork-Join Programs. IPDPS. 2017, [18] Deng, Siqi, Gao, Kan, Du, Changying, Ma, Wenjing, Long, Guoping, Li, Yucheng, IEEE. Online Variational Bayesian Support Vector Regression. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2016, 3950-3957, [19] Xue Pei, Li Tao, Zhao Kezhao, Dong Qiankun, Ma Wenjing, Wu J, Li L. GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU. ADVANCED COMPUTER ARCHITECTURE, ACA 2016null. 2016, 626: 97-107, [20] Ma Wenjing, Cao Liangliang, Yu Lei, Long Guoping, Li Yucheng, ACM. GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVALnull. 2016, 39-46, http://dx.doi.org/10.1145/2911996.2911997.
[21] Ma, WenJing, Gao, Kan, Long, GuoPing. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2016, 31(6): 1262-1274, http://lib.cqvip.com/Qikan/Article/Detail?id=670590270.
[22] Li, Tao, Zhao, Kezhao, Dong, Qiankun, Leng, Jiabing, Yang, Yulu, Ma, Wenjing, IEEE. Data-Oriented Runtime Scheduling Framework on Multi-GPUs. 2016 IEEE TRUSTCOM/BIGDATASE/ISPAnull. 2016, 1311-1318, [23] Deng, Siqi, Gao, Kan, Du, Changying, Ma, Wenjing, Long, Guoping, Li, Yucheng, IEEE. Online Variational Bayesian Support Vector Regression. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)null. 2016, 3950-3957, [24] Xue Pei, Li Tao, Zhao Kezhao, Dong Qiankun, Ma Wenjing, Wu J, Li L. GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU. ADVANCED COMPUTER ARCHITECTURE, ACA 2016null. 2016, 626: 97-107, [25] Ma Wenjing, Cao Liangliang, Yu Lei, Long Guoping, Li Yucheng, ACM. GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVALnull. 2016, 39-46, http://dx.doi.org/10.1145/2911996.2911997.
[26] Ma, WenJing, Gao, Kan, Long, GuoPing. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2016, 31(6): 1262-1274, http://lib.cqvip.com/Qikan/Article/Detail?id=670590270.
[27] Li, Tao, Zhao, Kezhao, Dong, Qiankun, Leng, Jiabing, Yang, Yulu, Ma, Wenjing, IEEE. Data-Oriented Runtime Scheduling Framework on Multi-GPUs. 2016 IEEE TRUSTCOM/BIGDATASE/ISPAnull. 2016, 1311-1318, [28] ChavarriaMiranda, Daniel, Panyala, Ajay, Ma, Wenjing, Prantl, Adrian, Krishnamoorthy, Sriram. Global transformations for legacy parallel applications via structural analysis and rewriting. PARALLEL COMPUTING[J]. 2015, 43: 1-26, http://dx.doi.org/10.1016/j.parco.2015.01.001.
[29] Bu, Ning, Yu, Lei, Ma, Wenjing, Du, Changying, Niu, Shuzi, Long, Guoping, Liu, X, Hsu, R, Wang, P, Xia, F, Wang, Y, Dong, M, Deng, Y. Detect Similar Mobile Applications with Transfer Learning. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)null. 2015, 856-859, http://dx.doi.org/10.1109/SmartCity.2015.175.
[30] ChavarriaMiranda, Daniel, Panyala, Ajay, Ma, Wenjing, Prantl, Adrian, Krishnamoorthy, Sriram. Global transformations for legacy parallel applications via structural analysis and rewriting. PARALLEL COMPUTING[J]. 2015, 43: 1-26, http://dx.doi.org/10.1016/j.parco.2015.01.001.
[31] Bu, Ning, Yu, Lei, Ma, Wenjing, Du, Changying, Niu, Shuzi, Long, Guoping, Liu, X, Hsu, R, Wang, P, Xia, F, Wang, Y, Dong, M, Deng, Y. Detect Similar Mobile Applications with Transfer Learning. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)null. 2015, 856-859, http://dx.doi.org/10.1109/SmartCity.2015.175.
[32] Wu Zhenhua, Ma Wenjing, Long Guoping, Li Yucheng, Tang Qiuyan, Wang Zhongjie. High performance two-dimensional phase unwrapping on gpus. 11th ACM International Conference on Computing Frontiers, CF 2014null. 2014, http://ir.iscas.ac.cn/handle/311060/16597.
[33] Wu Zhenhua, Ma Wenjing, Long Guoping, Li Yucheng, Tang Qiuyan, Wang Zhongjie. High performance two-dimensional phase unwrapping on gpus. 11th ACM International Conference on Computing Frontiers, CF 2014null. 2014, http://ir.iscas.ac.cn/handle/311060/16597.
[34] Wenjing Ma, Sriram Krishnamoorthy, Oreste Villa, Karol Kowalski, Gagan Agrawal. Optimizing tensor contraction expressions for hybrid CPU-GPU execution. Cluster Computing,. 2013, 16(1): [35] Wenjing Ma, Sriram Krishnamoorthy, Oreste Villa, Karol Kowalski, Gagan Agrawal. Optimizing tensor contraction expressions for hybrid CPU-GPU execution. Cluster Computing,. 2013, 16(1): [36] 马文静. Parameterized micro-benchmarking: an auto-tuning approach for complex applications. ACM International Conference on Computing Frontiers. 2012, [37] Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal. Compiler and runtime support for enabling reduction computations on heterogeneous systems. Concurrency and Computation: Practice and Experience. 2012, [38] 马文静. Data-driven fault tolerance for work stealing computations. International Conference on Supercomputing (ICS). 2012, [39] 马文静. Parameterized micro-benchmarking: an auto-tuning approach for complex applications. ACM International Conference on Computing Frontiers. 2012, [40] Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal. Compiler and runtime support for enabling reduction computations on heterogeneous systems. Concurrency and Computation: Practice and Experience. 2012, [41] 马文静. Data-driven fault tolerance for work stealing computations. International Conference on Supercomputing (ICS). 2012, [42] Ma Wenjing, Krishnamoorthy Sriram, Agrawal Gagan, Knoop J. Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies. COMPILER CONSTRUCTIONnull. 2011, 6601: 266-+, [43] Ma Wenjing, Krishnamoorthy Sriram, Agrawal Gagan, Knoop J. Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies. COMPILER CONSTRUCTIONnull. 2011, 6601: 266-+, [44] 马文静. An integer programming framework for optimizing shared memory use on GPUs. International conference on High Performance Computing. 2010, [45] 马文静. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. International Conference on Supercomputing (ICS). 2010, [46] 马文静. AUTO-GC: Automatic translation of data mining applications to GPU clusters. IPDPS workshops. 2010, [47] 马文静. Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters. CLUSTER. 2010, [48] 马文静. An integer programming framework for optimizing shared memory use on GPUs. International conference on High Performance Computing. 2010, [49] 马文静. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. International Conference on Supercomputing (ICS). 2010, [50] 马文静. AUTO-GC: Automatic translation of data mining applications to GPU clusters. IPDPS workshops. 2010, [51] 马文静. Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters. CLUSTER. 2010, 

科研活动

   
科研项目
(1) 异构系统上基于任务窃取的负载平衡研究,主持,国家级,2014-01--2016-12