General

Jie Liu, institute of software, Chinese Academy of Sciences

Research Directions:

Big Data and Machine Learning Systems: Distributed computing for big data, serverless architecture for big data and machine learning systems, EarthDataMiner platform for Earth big data mining and analysis, big data technologies for Sustainable Development Goals (SDGs).

Domain Decision Intelligence: Construction of domain knowledge graphs and semantic computing, machine learning and data mining algorithms for fields such as scientific computation, healthcare, aviation, law, remote sensing, and education, along with their distributed optimization.

Main Contributions:

Participated in the development of the "Sustainable Development Big Data Platform System." Led a team to break through the challenges of large-scale remote sensing image distributed computing and interactive analysis cloud services with EarthDataMiner. This platform enables scientists to intelligently analyze and process remote sensing images and other scientific data online, supporting full-process online computation of SDGs indicators. EarthDataMiner supports scientists in developing SDGs indicator calculation algorithms online and publishing algorithmic results as web app tools, accessible to users globally.

Achievements related to knowledge graphs and question-answering systems have been applied in domains such as healthcare, aviation (travel and aviation services), and law.

As the principal investigator, led 2 projects funded by the National Natural Science Foundation of China, 1 project under the National Key Research and Development Program, 1 project under the Chinese Academy of Sciences' technological innovation program, and 1 project under the major technology initiative of the Civil Aviation Administration of China.



Publications

   
Papers
    • (1) 第三方库依赖冲突问题研究综述, 软件学报, 2022, 通讯作者
      (2) Cloud-based storage and computing for remote sensing big data: a technical review, International Journal of Digital Earth, 2022, 第 7 作者
      (3) 面向问题意图识别的深度主动学习方法, Deep Active Learning Method for Question Intention Recognition, 中文信息学报, 2021, 第 3 作者
      (4) Meta-graph Embedding in Heterogeneous Information Network for Top-N Recommendation, IJCNN 2021, 2021, 第 3 作者
      (5) FaasRS: Remote Sensing Image Processing System on Serverless Platform, IEEE Computer Society Signature Conference on Computers, Software and Applications(COMPSAC), 2021, 通讯作者
      (6) DeepCon: Contribution Coverage Testing for Deep Learning Systems, 28th International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2021, 通讯作者
      (7) Semi-supercised emotion recognition in textual conversation via a context-augmented auxiliary training task, Information Processing and Management, 2021, 通讯作者
      (8) Identity-linked Group Channel Pruning for Deep Neural Networks, International Joint Conference on Neural Network(IJCNN), 2021, 通讯作者
      (9) Semi-supervised emotion recognition in textual conversation via a context-augmented auxiliary training task, INFORMATION PROCESSING & MANAGEMENT, 2021, 通讯作者
      (10) Label Definitions Augmented Interaction Model for Legal Charge Prediction, 43rd EUROPEAN CONFERENCE ON INFORMATION RETRIEVAL(ECIR), 2021, 通讯作者
      (11) EarthDataMiner: A Cloud-Based Big Earth Data Intelligence Analysis Platform, IOP Conference Series: Earth and Environmental Science 509 (1), 2020, 第 1 作者
      (12) 科学大数据智能分析软件的现状与趋势, Current Situation and Trend of Intelligent Analysis Software for Scientific Big Data, 中国科学院院刊, 2018, 第 2 作者
      (13) 分布式随机方差消减梯度下降算法topkSVRG, Distributed Stochastic Variance Reduction Gradient Descent Algorithm topkSVRG, 计算机科学与探索, 2018, 第 3 作者
      (14) Characterizing and diagnosing out of memory errors in MapReduce applications, THE JOURNAL OF SYSTEMS AND SOFTWARE (JSS), 2018, 第 5 作者
      (15) 可扩展机器学习的并行与分布式优化算法综述, Survey on Parallel and Distributed Optimization Algorithms for Scalable Machine Learning, 软件学报, 2018, 第 3 作者
      (16) 基于Spark SQL的分布式全文检索框架的设计与实现, Design and Implementation of Distributed Full-text Search Framework Based on Spark SQL, 计算机科学, 2018, 第 3 作者
      (17) Fine-grained Patient Similarity Measuring using Deep Metric Learning, CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, 通讯作者
      (18) 一种大数据分析组件的自动化开发集成方法, 计算机应用与软件, 2016, 第 4 作者
      (19) 基于多段间隔监督度量学习的病人相似度算法, Patient Similarity Based on Supervised Metric Learning of Multi-Margin, 计算机系统应用, 2016, 第 3 作者
      (20) Hug the Elephant: Migrating a Legacy Data Analytics Application to Hadoop Ecosystem, The 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME, CCF B), 2016, 第 2 作者
      (21) 分布式文件系统元数据服务的负载均衡框架, Load Balancing Framework for Metadata Service of Distributed File Systems, 软件学报, 2016, 第 3 作者
      (22) Plogs: Materializing Datalog Programs with MapReduce for Scalable Reasoning, 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, 第 2 作者
      (23) Dependency-Aware Parallel Materialization of Datalog Programs with Spark for Scalable Reasoning, 17th International Conference on Web Information System Engineering (WISE’16), 2016, 第 1 作者
      (24) 基于Spark的流程化机器学习分析方法, Method of Implement Machine Learning Analysis with Workflow Based on Spark Platform, 计算机系统应用, 2016, 第 2 作者
      (25) A Lightweight Evaluation Framework for Table Layouts in MapReduce Based Query Systems, The 17th Asia-Pacific Web Conference (APWeb, CCF C), 2015, 第 1 作者
      (26) 一种简历语义搜索系统的实现方法, SmartHR:A Resume Query and Management System Based on Semantic Web, 计算机科学, 2015, 第 4 作者
      (27) 基于组件的大数据分析服务平台, Module Based Big Data Analysis Platform, 计算机科学, 2014, 第 2 作者
      (28) 一种云存储服务客户端增量同步算法, Increment Based Data Transmission Technique for Cloud Storage Service, 计算机系统应用, 2014, 第 2 作者
      (29) Scalable Horn-Like Rule Inference of Semantic Data Using MapReduce, KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014, 2014, 第 2 作者
      (30) Mining user daily behavior patterns from access logs of massive software and websites, 5th Asia-Pacific Symposium on Internetware, Internetware 2013, 2013, 第 2 作者
      (31) FMEM: A Fine-grained Memory Estimator for MapReduce Jobs, The 10th International Conference on Autonomic Computing (ICAC, Core B), 2013, 第 2 作者
      (32) A distributed rule execution mechanism based on MapReduce in sematic web reasoning, Proceedings of the 5th Asia-Pacific Symposium on Internetware, 2013, 
      (33) A Distributed Cache Framework for Metadata Service of Distributed File System, The 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2013, 第 1 作者
      (34) Consistent query answering based on repairing inconsistent attributes with nulls, 18th International Conference on Database Systems for Advanced Applications, DASFAA 2013, 2013, 第 1 作者
      (35) 基于操作日志的云存储服务多终端同步算法, Operation log based synchronization algorithm for cloud storage service with multiple clients, 计算机工程与设计, 2013, 第 2 作者
      (36) A distributed cache framework for metadata service of distributed file systems, 2013 19th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2013, 2013, 第 2 作者
      (37) FlowS:一种MapReduce数据流公平调度方法, FlowS:A Fair Scheduling Method for Mapreduce Dataflow, 计算机科学, 2012, 第 2 作者
      (38) A fast and high throughput sql query system for big data, LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS), 2012, 第 2 作者
      (39) 基于数据流程变换的Mashup性能优化方法, Performance Optimization of Mashup Through Data Flow Transformation, 小型微型计算机系统, 2011, 第 1 作者
      (40) 一种基于内容模型图的XML Schema Definition的提取方法, Novel Approach for Extracting XML Schema Definition Based on Content Model Graph, 计算机科学, 2010, 第 2 作者
      (41) 基于空值修复的数据库一致性查询方法, Consistent query answering based on virtual repairs with nulls, 计算机应用研究, 2009, 第 2 作者
      (42) Etl workflow analysis and verification using backwards constraint propagation, LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS), 2009, 第 1 作者
      (43) ETL Workflow Analysis and Verification Using Backwards Constraint Propagation, ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2009, 通讯作者
      (44) Efficient Consistent Query Answering Based on Attribute Deletions, CSA 2008: INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND ITS APPLICATIONS, PROCEEDINGS, 2008, 通讯作者
      (45) Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN, 第 2 作者
    • 发表著作