中文信息学报 ›› 2025, Vol. 39 ›› Issue (2) : 27-40.


  • 徐进1,辛欣1,2
Chinese Verb Occurrence State Dataset Construction

  • XU Jin1, XIN Xin1,2
判断动词是否在现实中真实发生是自然语言理解中的重要问题,其不仅能够为事件抽取等自然语言处理应用提供支撑,也有助于更深入地理解语言。虽然动词实现状态的辨析在英文领域已有一定的研究基础,但中文领域的相关工作仍比较缺乏。一方面,中文动词实现状态缺乏标注规范;另一方面,缺乏相关的中文语料。针对目前中文动词实现状态缺乏标注规范的问题,该文在英文规范的基础上,分析《人民日报》中文语料,结合时间提示词、句式等信息,总结了中文动词实现状态标注规范。针对中文目前缺少动词实现状态相关语料的问题,该文构建了中文动词实现状态数据集,包括5 430条语句和21 226个中文动词实例。实验表明,神经网络模型在处理描述客观规律以及缺少时间提示词等情况下的分类时还欠准确。


Judging whether verbs really occur is an important issue in natural language understanding with potential applications in event extraction. In contrast to certain works for English, there is still little related work addressing this issue for Chinese. This paper analyzes Peoples Daily corpus and summarizes the labeling rules by with a reference to the practices in English. Then, we construct a dataset of Chinese verb occurrence states, including 5430 sentences and 21,226 Chinese verb instances labelled. The experiment shows that the cases describing objective rules and the cases that lack time phrases are more difficult to predict than general cases for the neural model.


中文动词实现状态 / 数据集构建

Chinese verb occurrence state / dataset construction


徐进,辛欣. 中文动词实现状态数据集构建. 中文信息学报. 2025, 39(2): 27-40
XU Jin, XIN Xin. Chinese Verb Occurrence State Dataset Construction. Journal of Chinese Information Processing. 2025, 39(2): 27-40


E-mail: 3120210994@bit.edu.cn辛欣(1984—),通信作者,博士,副教授,主要研究领域为自然语言处理。
E-mail: xxin@bit.edu.cn


