谷晶晶,周国栋. 汉语冒号标注与自动识别方法研究[J]. 中文信息学报, 2016, 30(3): 16-22.
GU Jingjing,ZHOU Guodong. A Research on Chinese Colon Annotation and Automatic Identification. , 2016, 30(3): 16-22.
汉语冒号标注与自动识别方法研究
谷晶晶,周国栋
苏州大学 计算机科学与技术学院,江苏 苏州,215006
A Research on Chinese Colon Annotation and Automatic Identification
GU Jingjing,ZHOU Guodong
School of Computer Science & Technology,Soochow University,Suzhou,Jiangsu 215006,China
Abstract:With the pragress of discourse analysis,punctuation researches have become an important entry to the analysis and disambiguation of discourse. Effective identificaton of the role of a punctuation in sentences;will help the development of syntax analysis,discourse analysis and other natural language processing technologies. The main task of this paper is to annotate and identify Chinese colon automatically. We adopt rule-based method and maximum entropy method. Rule-based method is relatively simpler and easier to implement. The maximum entropy method uses these rules into statistics,and gets better results in the experiments.
[1] Hobbs J R Information,intention,and Structure in Discourse: A first draft[C]//Proceedings of the Burning Issus in Discourse. 1993: 41-66. [2] Mann William C Sandra A Thompson. Rhetorical Structure Theory: [J].Toward a functional theory of text organization. 1988,8(3): 243-281. [3] L Carlson,D Marcu,M E Okurowski. RST Discourse TreeBank[C]//Linguistic Data Consortium. 2002. [4] 乐明. 汉语篇章修辞结构的标注研究[J]. 中文信息学报,2008,22(4): 19-23. [5] Nianwen Xue,Fei Xia,Fu-Dong Chiou and Martha Palmer. The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus[C]/Proceedings of Natural Language Engineering. 2005,11(2): 207-238. [6] Yuping Zhou,Nianwen Xue. PDTB-style Discourse Annotation of Chinese Text[C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL-12). 2012: 69-77. [7] 中华人民共和国国家质量监督检验检疫总局、中国国家标准化管理委员会. GB/T15834-2011标点符号用法[M].中国标准出版社,2011. [8] Yuqing Guo,Haifeng Wang,and Josef Van Genabith. A Linguistically Inspired Statistical Model for ChinesePunctuation Generation[C]//Proceedings of ACM Transactions on Asian Language Processing.2010,9(2). [9] Hen-Hsen Huang and Hsin-His Chen. Chinese Discourse Relation Recognition[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing 2011: 1442-1446. [10] Vanessa Wei Feng,Graeme Hirst. Text-level Discourse with Rich Linguistic Feature[C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL-12),2012: 60-68. [11] Meixun Jin,Mi-Young Kim,Dong-Il Kim,and Jong-Hyeok Lee. Segmentation of Chinese Long Sentences Using Commas[C]//Proceedings of the SIGHANN Workshop on Chinese Language Processing,2004. [12] Xing Li,Chengqing Zong,Rile Hu. A Hierarchical Parsing Approach with Punctuation Processing for Long Sentence Sentences[C]//Proceedings of the Second International Joint Conference on Natural Language Processing: Companion Volume including Posters/Demos and Tutorial Abstracts,2005. [13] 李幸,宗成庆. 引入标点处理的层次化汉语长句句法分析方法[J]. 中文信息学报,2006: 20(4): 8-15. [14] Nianwen Xue,Yaqin Yang. Chinese sentence segmentation as comma classification [C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL-11). 2011: 631-635. [15] Yaqin Yang,Nianwen Xue. Chinese Comma Disambiguation for Discourse Analysis[C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL-12). 2012: 786-794. [16] Adam L.Berger,Stephen A.Della Pietra,Vincent J.Della Pietra. A Maximum Entropy Approach to Natural Language Processing[C]//Proceedings of Annual Meeting on Association for Computational Linguistics(ACL). 1996: 39-71.