基本信息
马文静 女 硕导 软件研究所
电子邮件:wenjing@iscas.ac.cn
通信地址:北京市海淀区中关村南四街4号中科院软件所5号楼
邮政编码:

招生信息

   
招生专业
081202-计算机软件与理论
085211-计算机技术
招生方向
并行计算,GPU计算,并行代码生成与优化,并行系统性能优化
加速器并行计算技术,并行系统任务调度

教育背景

2005-09--今 The Ohio State University, USA Ph.D
2000-09--今 南开大学 学士

工作经历

   
工作简历
2011-04--今 Pacific Northwest National Laboratory, USA Postdoc Research Associate
2004-07--今 天津泛凯科贸有限公司 软件工程师

专利与奖励

   
专利成果
[1] 胡怡, 杨超, 刘芳芳, 马文静, 陈道琨. 面向SW39000处理器的稠密矩阵乘法的高性能实现方法及装置. CN: CN113849771A, 2021-12-28.

[2] 胡怡, 陈道琨, 杨超, 刘芳芳, 马文静. 面向SW26010-Pro处理器的1、2级BLAS函数库的高性能实现方法. CN: CN113641956A, 2021-11-12.

[3] 黎雷生, 马文静, 赵海涛, 孙家昶, 李会元. 一种适合复杂异构系统的HPL矩阵更新优化方法. CN: CN111913748A, 2020-11-10.

[4] 刘芳芳, 吴丽鑫, 马文静, 汪荃, 王志军, 孙家昶, 杨超. 一种众核平台上面向规则网格问题的结构化着色方法. CN: CN110942504B, 2021-07-27.

[5] 邓嗣琦, 刘超, 龙国平, 马文静. 一种Web应用程序自动化测试工具及方法. CN: CN106776343A, 2017-05-31.

[6] 邓嗣琦, 杜长营, 马文静, 龙国平. 一种基于在线变分贝叶斯支持向量回归的交通事故率预测系统. CN: CN106339608A, 2017-01-18.

[7] 吴振华, 马文静, 龙国平, 李玉成. 一种基于异构加速平台的二维相位解缠绕方法. CN: CN103942095A, 2014-07-23.

出版信息

   
发表论文
[1] ACM Transactions on Architecture and Code Optimization. 2023, 第 3 作者
[2] 软件学报. 2023, 第 4 作者
[3] 软件学报. 2022, 第 5 作者
[4] 软件学报. 2022, 第 3 作者
[5] Liu fangfang, Ma Wenjing, Zhao Yuwen, Chen Daokun, Hu Yi, Lu Qinglin, Yin Wanwang, Yuan Xinhui, Jiang Lijuan, Yan Hao, Li Min, Wang Hongsen, Wang Xinyu, Yang Chao. xMath2.0: a high‑performance extended math library for SW26010‑Pro many‑core processor. CCF Transactions on High Performance Computing[J]. 2022, 第 2 作者
[6] Ma wenjing, Liu Fangfang, Chen Daokun, Lu Qinglin, Hu Yi, Wang Hongsen, Yuan Xinhui. An optimized framework for Matrix Factorization on the New Sunway many-core Platform. ACM Transactions on Architecture and Code Optimization[J]. 2022, 第 1 作者
[7] 黎雷生, 杨文浩, 马文静, 张娅, 赵慧, 赵海涛, 李会元, 孙家昶. 复杂异构计算系统HPL优化研究. 软件学报[J]. 2021, 第 3 作者
[8] 黎雷生, 杨文浩, 马文静, 张娅, 赵慧, 赵海涛, 李会元, 孙家昶. 复杂异构计算系统HPL的优化. 软件学报[J]. 2021, 第 3 作者32(8): 2307-2318, 
[9] 刘芳芳, 王志军, 汪荃, 吴丽鑫, 马文静, 杨超, 孙家昶. 国产异构系统上的HPCG并行算法及高效实现. 软件学报[J]. 2021, 第 5 作者32(8): 2341-2351, http://lib.cqvip.com/Qikan/Article/Detail?id=7105477914.
[10] Jiang, Lijuan, Yang, Chao, Ma, Wenjing. Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2020, 第 3 作者17(1): http://dx.doi.org/10.1145/3378176.
[11] Ma, Wenjing, Ao, Yulong, Yang, Chao, Williams, Samuel. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS[J]. 2020, 第 1 作者23(2): 493-507, https://www.webofscience.com/wos/woscc/full-record/WOS:000549737600007.
[12] Min Li, Chao Yang, Qiao Sun, WenJing Ma, WenLong Cao, YuLong Ao. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight. 计算机科学技术学报英文版[J]. 2019, 第 4 作者34(1): 77-93, http://lib.cqvip.com/Qikan/Article/Detail?id=6100199649.
[13] Wenjing Ma, Yulong Ao, Chao Yang, Samuel Williams. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight 在Sunway太湖光上用HPGMG求解一万亿次每秒的未知. CLUSTER COMPUTING,. 2019, 第 1 作者http://kns.cnki.net/KCMS/detail/detail.aspx?QueryID=0&CurRec=1&recid=&FileName=SSJD55ADEBB656D3CEA5ADD0A45C6A48CD3A&DbName=SSJD_01&DbCode=SSJD&yx=&pr=&URLID=&bsm=.
[14] Li, Min, Yang, Chao, Sun, Qiao, Ma, WenJing, Cao, WenLong, Ao, YuLong. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2019, 第 4 作者34(1): 77-93, http://sciencechina.cn/gw.jsp?action=detail.jsp&internal_id=6414059&detailType=1.
[15] Cai Ying, Yang Chao, Ma Wenjing, Ao Yulong, IEEE. Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores. 2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID). 2018, 第 3 作者566-571, http://dx.doi.org/10.1109/CCGRID.2018.00086.
[16] Cai, Ying, Ao, Yulong, Yang, Chao, Ma, Wenjing, Zhao, Haitao. Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION[J]. 2018, 第 4 作者15(2): http://ir.iscas.ac.cn/handle/311060/19175.
[17] Ao Yulong, Ma Wenjing, Yang Chao, Cai Ying. Extreme-scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 2018, 第 2 作者http://ir.iscas.ac.cn/handle/311060/19176.
[18] Ao, Yulong, Yang, Chao, Wang, Xinliang, Xue, Wei, Fu, Haohuan, Liu, Fangfang, Gan, Lin, Xu, Ping, Ma, Wenjing, IEEE. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS). 2017, 第 9 作者535-544, 
[19] 卜宁, 牛树梓, 马文静, 龙国平. 面向相似App推荐的列表式多核相似性学习算法. 计算机系统应用[J]. 2017, 第 3 作者26(1): 116-121, http://lib.cqvip.com/Qikan/Article/Detail?id=671037999.
[20] 马文静. Localized Fault Recovery for Nested Fork-Join Programs. IPDPS. 2017, 第 1 作者
[21] Deng, Siqi, Gao, Kan, Du, Changying, Ma, Wenjing, Long, Guoping, Li, Yucheng, IEEE. Online Variational Bayesian Support Vector Regression. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). 2016, 第 4 作者3950-3957, 
[22] Xue Pei, Li Tao, Zhao Kezhao, Dong Qiankun, Ma Wenjing, Wu J, Li L. GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU. ADVANCED COMPUTER ARCHITECTURE, ACA 2016. 2016, 第 5 作者626: 97-107, 
[23] Ma Wenjing, Cao Liangliang, Yu Lei, Long Guoping, Li Yucheng, ACM. GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL. 2016, 第 1 作者39-46, http://dx.doi.org/10.1145/2911996.2911997.
[24] Ma, WenJing, Gao, Kan, Long, GuoPing. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY[J]. 2016, 第 1 作者31(6): 1262-1274, http://lib.cqvip.com/Qikan/Article/Detail?id=670590270.
[25] Li, Tao, Zhao, Kezhao, Dong, Qiankun, Leng, Jiabing, Yang, Yulu, Ma, Wenjing, IEEE. Data-Oriented Runtime Scheduling Framework on Multi-GPUs. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA. 2016, 第 6 作者1311-1318, 
[26] ChavarriaMiranda, Daniel, Panyala, Ajay, Ma, Wenjing, Prantl, Adrian, Krishnamoorthy, Sriram. Global transformations for legacy parallel applications via structural analysis and rewriting. PARALLEL COMPUTING[J]. 2015, 第 3 作者43: 1-26, http://dx.doi.org/10.1016/j.parco.2015.01.001.
[27] Bu, Ning, Yu, Lei, Ma, Wenjing, Du, Changying, Niu, Shuzi, Long, Guoping. Detect Similar Mobile Applications with Transfer Learning. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)[J]. 2015, 第 3 作者856-859, http://dx.doi.org/10.1109/SmartCity.2015.175.
[28] Wu Zhenhua, Ma Wenjing, Long Guoping, Li Yucheng, Tang Qiuyan, Wang Zhongjie. High performance two-dimensional phase unwrapping on gpus. 11th ACM International Conference on Computing Frontiers, CF 2014. 2014, 第 2 作者http://ir.iscas.ac.cn/handle/311060/16597.
[29] Wenjing Ma, Sriram Krishnamoorthy, Oreste Villa, Karol Kowalski, Gagan Agrawal. Optimizing tensor contraction expressions for hybrid CPU-GPU execution. CLUSTER COMPUTING,. 2013, 第 1 作者16(1): 
[30] 马文静. Parameterized micro-benchmarking: an auto-tuning approach for complex applications. ACM International Conference on Computing Frontiers. 2012, 第 1 作者
[31] Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal. Compiler and runtime support for enabling reduction computations on heterogeneous systems. CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE. 2012, 第 2 作者
[32] 马文静. Data-driven fault tolerance for work stealing computations. International Conference on Supercomputing (ICS). 2012, 第 1 作者
[33] Ma Wenjing, Krishnamoorthy Sriram, Agrawal Gagan, Knoop J. Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies. COMPILER CONSTRUCTION. 2011, 第 1 作者6601: 266-+, 
[34] 马文静. An integer programming framework for optimizing shared memory use on GPUs. International conference on High Performance Computing. 2010, 第 1 作者
[35] 马文静. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. International Conference on Supercomputing (ICS). 2010, 第 1 作者
[36] 马文静. AUTO-GC: Automatic translation of data mining applications to GPU clusters. IPDPS workshops. 2010, 第 1 作者
[37] 马文静. Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters. CLUSTER. 2010, 第 1 作者

科研活动

   
科研项目
(1) 异构系统上基于任务窃取的负载平衡研究,主持,国家级,2014-01--2016-12