添加冒号和分号分类标签特征的汉语逗号分类

李艳翠,谷晶晶,周国栋

PDF(1761 KB)
PDF(1761 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (5) : 215-222.
信息抽取与文本挖掘

添加冒号和分号分类标签特征的汉语逗号分类

  • 李艳翠1,2,3,谷晶晶1,3,周国栋1,3
作者信息 +

Adding Colon and Semicolon Label Feature to Chinese Comma Classification

  • LI Yancui1,2,3, GU Jingjing1,3, ZHOU Guodong1,3
Author information +
History +

摘要

标点分析在句子和篇章分析中有重要作用,其中逗号的功能分类是标点分析的重点和难点。该文研究添加冒号和分号分类标签为特征的逗号自动分类。首先给出逗号、冒号和分号的分类方法;然后介绍基于此分类方法的逗号、冒号和分号标点分类语料库;最后分别考察添加冒号类别标签、分号类别标签以及同时添加冒号和分号类别标签为特征的逗号分类结果。实验结果表明,三种情况下的逗号分类正确率均有不同程度的提高。

Abstract

Punctuation analysis plays an important role in sentence and discourse analysis, in which the functional classification of the comma is the key and most challenging issue. This paper explores Chinese comma automatic classification by adding the classification labels of Chinese colon or semicolon as new features. First, we describe the classification method of comma, colon and semi-colon. Then the corpora of comma, colon and semicolon are introduced. Finally, we investigate comma classification results by adding Chinese colon and semicolon, respectively and jointly as new feature(s). Experimental results show that the accuracy of comma classification improves in all three cases.

关键词

逗号分类 / 冒号标签 / 分号标签 / 篇章分析

Key words

Chinese comma classification / colon labels / semicolon labels / discourse analysis

引用本文

导出引用
李艳翠,谷晶晶,周国栋. 添加冒号和分号分类标签特征的汉语逗号分类. 中文信息学报. 2014, 28(5): 215-222
LI Yancui, GU Jingjing, ZHOU Guodong. Adding Colon and Semicolon Label Feature to Chinese Comma Classification. Journal of Chinese Information Processing. 2014, 28(5): 215-222

参考文献

[1] 中华人民共和国国家质量监督检验检疫总局、中国国家标准化管理委员会. GB/T15834-2011标点符号用法[M].北京:中国标准出版社, 2011.
[2] 李幸, 宗成庆. 引入标点处理的层次化汉语长句句法分析方法[J]. 中文信息学报, 2006, 20(4): 8-15.
[3] Mei xunjin,Mi-Yong kim,Dongi kim, et al. Segmentation of Chinese long sentences using commas[C]// Proceedings of 3rd ACL SIGHAN Workshop. Barcelona,2004: 1-8.
[4] Nianwen Xue, Yaqin Yang. Chinese sentence segmentation as comma classification. [C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011: 631-635.
[5] Yaqin Yang, Nianwen Xue. Chinese Comma Disambiguation for Discourse Analysis. [C]//Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), 2012: 786-794
[6] 谷晶晶, 周国栋. 基于分词与词性标注的汉语逗号自动分类[J]. 计算机工程与应用,http://www.cnki.net/kcms/doi/10.3778/j.ssn.1002-8331,2014: 1310-0034.
[7] 黄河燕, 陈肇雄. 基于多策略分析的复杂长句翻译处理算法[J]. 中文信息学报, 2002, 16(3): 1-7.
[8] 李艳翠, 冯文贺, 周国栋. 基于逗号的汉语子句识别研究[J].北京大学学报,2013,49(1): 7-14.

基金

国家863计划前沿技术研究类项目(2012AA011102);国家自然科学基金面上项目(61273320)
PDF(1761 KB)

Accesses

Citation

Detail

段落导航
相关文章

/