基于层叠CRF模型的词结构分析

方 艳,周国栋

PDF(2188 KB)
PDF(2188 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (4) : 1-7.
句法语义分析

基于层叠CRF模型的词结构分析

  • 方 艳,周国栋
作者信息 +

Word Structure Analysis Based on Cascaded CRFs

  • FANG Yan,ZHOU Guodong
Author information +
History +

摘要

传统的中文分词就是识别出每个词的边界,它忽略了汉语中词与短语分界不清这一特点。在理论上,语言学家对词边界的确定往往各持己见,各语料库的分词标准不能统一,在实践中也不能完全满足具体应用的需求。该文给出了基于层叠CRF模型的词结构自动分析方法,能够以较高的精确度获得词的边界信息和内部结构信息。相比于传统的分词,词的结构分析更加符合汉语词法与句法边界模糊的事实,解决了语料库标准的不一致性以及应用的不同需求。

Abstract

Traditional research in Chinese word segmentation focuses on identifying word boundaries, without considering the ambiguity of boundaries between Chinese words and phrases. In theory, linguists stick to their own view of word boundaries such that no uniform standard exists in Chinese word segmentation, and in practice, the corpus of various guidelines cannot bring satisfactory reusltsto wide applications. In this paper, we present a model based on cascaded CRF models to automatically parse internal structures of words, deciding both word boundaries and internal structures simultaneously with high precision. Compared with the traditional word segmentation methods, analyzing the structure of words is more consistent with the fact of fuzzy boundaries between Chinese lexical and syntactic units, solving the problem of inconsistent corpus standards and meeting different application requirements.

关键词

中文分词 / 内部结构 / 分词标准 / 层叠CRF

Key words

Chinese word segmentation / internal structure / annotation standard / cascaded CRFs

引用本文

导出引用
方 艳,周国栋. 基于层叠CRF模型的词结构分析. 中文信息学报. 2015, 29(4): 1-7
FANG Yan,ZHOU Guodong. Word Structure Analysis Based on Cascaded CRFs. Journal of Chinese Information Processing. 2015, 29(4): 1-7

参考文献

[1]Hai Zhao. Character-level dependencies in Chinese: Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the ACL(EACL 2009). 2009:879-887.
[2] Zhengdong Dong, Qiang Dong, Changling Hao. Word segmentation needs change-from a linguists view[C]//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010:1-7.
[3] Andi Wu. Customizable segmentation of morphologically derived words in Chinese[C]//Computational Linguistics and Chinese language processing. 2003,8(1):1-27.
[4] Jianfeng Gao, Andi Wu, Mu Li Chang-Ning Huang, et al. Adaptive Chinese word segmentation[C]//Processings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004:62-469.
[5] Wenbin Jiang, Liang Huang, Qun Liu. Automatic adaptation of annotation standards: Chinese word segmentation and POS tagging-a case study[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 522-530.
[6] 孟凡东, 徐金安, 姜文斌, 等. 异种语料融合方法: 基于统计的中文词法分析应用[J]. 中文信息学报,2012, 26(2): 3-7.
[7] Zhongguo Li. Parsing the Internal Structure of Words: A new paradigm for Chinese word segmentation[C]//Proceedings of the 49th Annual Meeting of the Association of Computational Linguistics. 2011:1405-1414.
[8] Hai Zhao, Changning Huang, Mu Li. An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006: 162-165.
[9] Yoshimasa Tsuruoka, Junichi Tsujii, Sophia Ananiadou. Fast full parsing by linear_chain conditional random fields[C]//Proceedings of the 12th Conference of the European Chapter of the ACL. 2009:790-798.
[10] S Abney, S Flicknger, C Gdaniec, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars [C]//Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics. 1991: 306-311.
[11] Meishan Zhang, Yue Zhang, Wanxiang Che, et al. Chinese Parsing Exploiting Characters [C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013:125-134.
[12] 张梅山,邓知龙,车万翔,等. 统计与字典相结合的领域自适应中文分词[J]. 中文信息学报. 2012, 26(2): 8-12.
[13] Qian Xian, Yang Liu. Joint Chinese word segmentation, POS tagging and parsing[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics. 2012:501-511.

基金

自然科学基金青年项目(61202162),教育部博士点基金新教师类课题(20123201120011)
PDF(2188 KB)

Accesses

Citation

Detail

段落导航
相关文章

/