该文提出一种融入简单名词短语信息的介词短语识别方法。该方法首先使用CRF模型识别语料中的简单名词短语,并使用转换规则对识别结果进行校正,使其更符合介词短语的内部短语形式;然后依据简单名词短语识别结果对语料进行分词融合;最后,通过多层CRFs模型对测试语料进行介词短语识别,并使用规则进行校正。介词短语识别的精确率、召回率及F-值分别为: 93.02%、92.95%、92.99%,比目前发表的最好结果高1.03个百分点。该实验结果表明基于简单名词短语的介词短语识别算法的有效性。
Abstract
This paper proposes a new approach integrating simple noun phrase information into preposition phrase recognition. We recognize simple noun phrases through basic CRF model, and filter the phrases with conversion rules in order to adapt to the inner phrase patterns in the preposition phrases. Then we utilize the simple noun phrases to merge fragmental participles into a complete phrase in our corpus. Finally, we recognize the preposition phrases through multilayer CRFs, and use rules to correct the result. The optimized model performs 1.03 point higher than the current best model yielding 93.02% precision , 92.95% recall, and 92.99%, F-measure.
Key words simple noun phrase recognition;CRF;participle fusion
关键词
简单名词短语识别 /
CRF /
分词融合
{{custom_keyword}} /
Key words
simple noun phrase recognition /
CRF /
participle fusion
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张谊生,张斌.现代汉语虛词[M].上海: 华东师范大学出版社,2000.
[2] Brill E,Resnik P.A rule-based approach to prepositional phrase attachment disambiguation[C]//Proceedings of the 15th conference on Computational linguistics-Volume 2.Association for Computational Linguistics,1994: 1198-1204.
[3] Ratnaparkhi A.Statistical models for unsupervised prepositional phrase attachment[C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2.Association for Computational Linguistics,1998: 1079-1085.
[4] Branigan H P,Pickering M J,McLean J F.Priming prepositional-phrase attachment during comprehension[J].Journal of Experimental Psychology: Learning, Memory, and Cognition,2005,31(3): 468-481.
[5] 干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4): 17-23.
[6] 奚建清,罗强.基于HMM的汉语介词短语自动识别研究[J].计算机工程,2008,33(3): 172-173,182.
[7] 卢朝华,黄广君,郭志兵.基于最大熵的汉语介词短语识别研究[J].通信技术,2010(05): 181-183,186.
[8] 张杰.基于多层CRFs的汉语介词短语识别研究[D].大连: 大连理工大学硕士学位论文,2013.
[9] Cardie C,Pierce D.Error-driven pruning of treebank grammars for base noun phrase identification [C]//Proceedings of the 17th international conference on Computational linguistics-Volume 1.Association for Computational Linguistics,1998: 218-224.
[10] 胡乃全,朱巧明,周国栋.混合的汉语基本名词短语识别方法[J].计算机工程,2009,35(20): 199-201.
[11] 钱小飞,侯敏.基于混合策略的汉语最长名词短语识别[J].中文信息学报,2013,27(6): 16-22.
[12] 孙玉祥.汉语简单名词短语自动识别的研究[D].大连: 大连理工大学硕士学位论文,2014.[13] Lafferty J,McCallum A,Pereira F C N.Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning 2001: 282-289.
[14] Degen H, Deqin T. Context information and fragments based cross-domain word segmentation [J]. China Communications, 2012, 9(3): 49-57.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61173100,61173101,61272375),2013教育部人文社会科学研究规划基金项目(13YJAZH062)
{{custom_fund}}