基于用法的现代汉语连词结构短语识别研究

昝红英,周丽娟,张坤丽

PDF(1146 KB)
PDF(1146 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (6) : 72-79.
综述

基于用法的现代汉语连词结构短语识别研究

  • 昝红英,周丽娟,张坤丽
作者信息 +

Modern Chinese Conjunction Phrase Recognition Based on Usage

  • ZAN Hongying, ZHOU Lijuan, ZHANG Kunli
Author information +
History +

摘要

连词能够连接词语、短语、小句、句子乃至句群,连词结构短语是连词所连接对象的一种,不同的连词形成不同长度、不同关系的连词结构短语。该文根据虚词用法知识库中的连词用法,构建了连词结构短语识别规则,实现了基于规则的连词结构短语识别,并将连词用法作为特征采用条件随机场模型实现了基于统计的连词结构短语识别。实验结果表明,统计的识别效果高于规则的识别效果,连词用法能够较好地用于连词结构短语的识别中。

Abstract

Conjunctions connect words, phrases, clauses, sentences and even sentence groups. The conjunction phrase is the words or phrases connected by conjuctions, bearing different lengths and relations. According to conjunction usage in the functional word usage knowledge base, the paper formulates a rule based method for the recognition of conjunction structure phrases. Meanwhile, the paper adopts the conditional random field to build a statistical model for the conjunction phrase recognition based on the conjunction usage. Results indicate that the statistical method performs better than the rule method, and conjunction usage is beneficial to the conjunction phrase recognition.
Key wordsconjunction phrase; conjunction usages; conditional random fields

关键词

连词结构短语 / 连词用法 / 条件随机场

Key words

conjunction phrase / conjunction usages / conditional random fields

引用本文

导出引用
昝红英,周丽娟,张坤丽. 基于用法的现代汉语连词结构短语识别研究. 中文信息学报. 2012, 26(6): 72-79
ZAN Hongying, ZHOU Lijuan, ZHANG Kunli. Modern Chinese Conjunction Phrase Recognition Based on Usage. Journal of Chinese Information Processing. 2012, 26(6): 72-79

参考文献

[1] 周强.汉语语料库的短语自动划分和标注研究[D].北京:北京大学,1996.
[2] 孙宏林.现代汉语非受限文本的实语块分析[D].北京:北京大学,2001.
[3] 吴云芳.面向中文信息处理的现代汉语并列结构研究[D].北京:北京大学,2003.
[4] 王东波,陈小荷,年洪东. 基于条件随机场的有标记联合结构自动识别[J]. 中文信息学报,2008,22 (6):3-8.
[5] Dongbo Wang, Danhao Zhu, Xinning Su, et al. Automatic Identification of Parallel Structure Based on Conditional Random Field[C]//Proceedings of the 3rd International Symposium on Computer Science and Computational Technology(ISCSCT 10), Jiaozuo,2010:400-404.
[6] Hongying Zan, Lijuan Zhou, Kunli Zhang. Studies on the Automatic Recognition of Modern Chinese Conjunction Usages[J]. Lecture Notes in Computer Science, 2011,6838:472-479.
[7] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th ICML-01, Montreal,2001:282-289.
[8] Hai Zhao, Changning Huang, Mu Li. An Improved Chinese Word Segmentation System with Conditional Random Field[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing(SIGHAN-5). Sydeny,2006:162-165.
[9] 周俊生,戴新宇,尹存燕,等. 基于层叠条件随机场模型的中文机构名自动识别[J]. 电子学报, 2006, 5: 804-809.
[10] 丁德鑫,曲维光,徐涛,等. 基于CRF模型的组合型歧义消解研究[J]. 南京师范大学学报, 2008,8(4): 73-76.

基金

国家自然科学基金资助项目(60970083);模式识别国家重点实验室开放课题基金资助项目;河南省科技创新人才杰出青年基金资助项目(104100510026)
PDF(1146 KB)

547

Accesses

0

Citation

Detail

段落导航
相关文章

/