方芳,王亚,王石,符建辉,曹存根. 基于语义分类和描述框架的网络攻击知识抽取研究及其应用[J]. 中文信息学报, 2019, 33(4): 48-59.
FANG Fang, WANG Ya, WANG Shi, FU Jianhui, CAO Cungen. Knowledge Acquisition from Chinese Records of Cyber Attacks Based on a Framework of Semantic Taxonomy and Description. , 2019, 33(4): 48-59.
Knowledge Acquisition from Chinese Records of Cyber Attacks Based on a Framework of Semantic Taxonomy and Description
FANG Fang1,2, WANG Ya1, WANG Shi1, FU Jianhui1, CAO Cungen1
1.Key Laboratory of Intelligent Information Processing, Institute of Computer Technology, Chinese Academy of Sciences, Beijing 100190, China; 2.University of Chinese Academy of Sciences, Beijing 100049, China
摘要随着计算机技术的迅猛发展,自然语言处理成为计算机科学领域与人工智能领域中的一个重要方向,且文本知识获取(knowledge acquisition from text, KAT)是人工智能的重要研究内容。当前对于文本研究,大多采用关键字以及机器学习方法,准确率并不高。该文提出了一种基于语义文法的中文网络攻击事件知识获取方法。首先介绍参考FrameNet构建的语义分类和描述框架,它在现代汉语基本句模分类的基础上进行了扩充和改进。其次,重点介绍了攻击文本中最常见的遭受类语义类的设计和形成过程。然后将语义分类和描述框架应用在“网络安全”领域,形成“网络攻击语义类”,并介绍在建立“网络攻击语义类”时遇到的难题,包括文法的设计中对事元的确定、复合句的处理、“的是”结构句型的分析设计、谓词设计等。最后,使用国家某安全部门提供的真实数据进行网络攻击知识抽取,实验表明该方法具有较高的准确率。
Abstract:Knowledge acquisition from texts is an important research of artificial intelligence. We present a method of knowledge acquisition from Chinese records of cyber attack events based on semantic grammar. Firstly, we introduce a framework of semantic taxonomy and description(FSTD) according to FrameNet, as an expansion to the taxonomy of basic sentence patterns in modern Chinese. Secondly, we focus on the design process about the "suffering" category in the semantic taxonomy, which is the most common in the Chinese records of cyber attack events. Then we apply the framework of semantic taxonomy and description to the cyber attack domain and build the cyber attack FSTD. We also introduce the problems encountered in the process of building the cyber attack FSTD, including the role determination of semantic grammar, compound sentence design, sentence analysis which contains “的是”, and predicate design. The experiments on a real corpus provided by a national security department shows that our method reaches a high accuracy.