基于广义话题理论的话题句识别

蒋玉茹1,3, 宋 柔1,2

PDF(1112 KB)
PDF(1112 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (5) : 114-120.
综述

基于广义话题理论的话题句识别

  • 蒋玉茹1,3, 宋 柔1,2
作者信息 +

Topic Clause Identification Based on Generalized Topic Theory

  • JIANG Yuru1,3, SONG Rou1,2
Author information +
History +

摘要

汉语标点句句首话题缺失是机器翻译、信息抽取准确率不高的原因之一。该文从广义话题理论出发,根据汉语话题结构的特点,提出标点句的话题句识别研究方案,包括两个阶段性任务 单个标点句的话题句识别和序列标点句的话题句序列构建。识别出标点句的话题句也就找到了标点句句首缺失的话题。该文解决单个标点句的话题句识别任务,主要采用语义泛化和编辑距离两种手段。实验中开放测试的准确率比基线高出12.51个百分点。该结果说明,运用广义话题理论进行单个标点句的话题句识别可产生明显的效果。

Abstract

Nowadays the Chinese machine translation and information extraction is still far from satisfactory. One important reason is that the topics are often omitted in the head of Chinese Punctuation Clause (abbreviated as PClause). Based on the Generalized Topic Theory, this paper proposes a novel method for topic clause identification from PClause based on the characteristic of topic strcture. The method consists of two tasks in practicetopic clause identification from a single PClause and topic clause construction for a series of PClauses. In the first task,semantic generalization and edit distance are applied in this paper, and the accuracy rate for open test is 12.51% higher than baseline. The result proves the effectiveness of the generalized topic theory in topic clause identification from a single PClause.
Key wordspunctuation clause;generalized topic;discourse structure;topic clause;topic clause identification

关键词

标点句 / 广义话题 / 话题结构 / 话题句 / 话题句识别

Key words

punctuation clause / generalized topic / discourse structure / topic clause / topic clause identification

引用本文

导出引用
蒋玉茹1,3, 宋 柔1,2. 基于广义话题理论的话题句识别. 中文信息学报. 2012, 26(5): 114-120
JIANG Yuru1,3, SONG Rou1,2. Topic Clause Identification Based on Generalized Topic Theory. Journal of Chinese Information Processing. 2012, 26(5): 114-120

参考文献

[1] 陈平.汉语零形回指的话语分析[J].中国语文,1987,(5): 363-378
[2] 黄娴,张克亮.汉语零形回指研究综述[J].中文信息学报,2009,23(4): 10-15.
[3] Rou Song, Yuru Jiang, Jingyi Wang. On Generalized-Topic-Based Chinese Discourse Structure[C]//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing, Beijing, 2010: 23-33.
[4] 宋柔.现代汉语跨标点句句法关系的性质研究[J].世界汉语教学.2008,(2):26-44.
[5] Michael Gilleland, Levenshtein Distance, in Three Flavors[DB/OL]http://www.merriampark.com/ld.htm.
[6] Ron Kohavi.A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence 2. San Mateo: Morgan Kaufmann, 1995: 1137-1143.

基金

国家自然科学基金资助项目(60872121,60873013);北京信息科技大学校基金资助项目(J0725019)本文是在第一作者博士开题报告的基础上形成的,感谢董振东、黄河燕、刘群、刘椿年、杨尔弘老师给予的建议和意见。
PDF(1112 KB)

Accesses

Citation

Detail

段落导航
相关文章

/