桑乐园,黄德根. 基于简单名词短语的汉语介词短语识别研究[J]. 中文信息学报, 2015, 29(6): 8-12.
SANG Leyuan, HUANG Degen. The Chinese Preposition Phrase Recognition Based on Simple Noun Phrase. , 2015, 29(6): 8-12.
基于简单名词短语的汉语介词短语识别研究
桑乐园,黄德根
大连理工大学 电信学部计算机学院,辽宁 大连 116024
The Chinese Preposition Phrase Recognition Based on Simple Noun Phrase
SANG Leyuan, HUANG Degen
School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China)
Abstract:This paper proposes a new approach integrating simple noun phrase information into preposition phrase recognition. We recognize simple noun phrases through basic CRF model, and filter the phrases with conversion rules in order to adapt to the inner phrase patterns in the preposition phrases. Then we utilize the simple noun phrases to merge fragmental participles into a complete phrase in our corpus. Finally, we recognize the preposition phrases through multilayer CRFs, and use rules to correct the result. The optimized model performs 1.03 point higher than the current best model yielding 93.02% precision , 92.95% recall, and 92.99%, F-measure. Key words simple noun phrase recognition;CRF;participle fusion
[1] 张谊生,张斌.现代汉语虛词[M].上海: 华东师范大学出版社,2000. [2] Brill E,Resnik P.A rule-based approach to prepositional phrase attachment disambiguation[C]//Proceedings of the 15th conference on Computational linguistics-Volume 2.Association for Computational Linguistics,1994: 1198-1204. [3] Ratnaparkhi A.Statistical models for unsupervised prepositional phrase attachment[C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2.Association for Computational Linguistics,1998: 1079-1085. [4] Branigan H P,Pickering M J,McLean J F.Priming prepositional-phrase attachment during comprehension[J].Journal of Experimental Psychology: Learning, Memory, and Cognition,2005,31(3): 468-481. [5] 干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4): 17-23. [6] 奚建清,罗强.基于HMM的汉语介词短语自动识别研究[J].计算机工程,2008,33(3): 172-173,182. [7] 卢朝华,黄广君,郭志兵.基于最大熵的汉语介词短语识别研究[J].通信技术,2010(05): 181-183,186. [8] 张杰.基于多层CRFs的汉语介词短语识别研究[D].大连: 大连理工大学硕士学位论文,2013. [9] Cardie C,Pierce D.Error-driven pruning of treebank grammars for base noun phrase identification [C]//Proceedings of the 17th international conference on Computational linguistics-Volume 1.Association for Computational Linguistics,1998: 218-224. [10] 胡乃全,朱巧明,周国栋.混合的汉语基本名词短语识别方法[J].计算机工程,2009,35(20): 199-201. [11] 钱小飞,侯敏.基于混合策略的汉语最长名词短语识别[J].中文信息学报,2013,27(6): 16-22. [12] 孙玉祥.汉语简单名词短语自动识别的研究[D].大连: 大连理工大学硕士学位论文,2014.[13] Lafferty J,McCallum A,Pereira F C N.Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning 2001: 282-289. [14] Degen H, Deqin T. Context information and fragments based cross-domain word segmentation [J]. China Communications, 2012, 9(3): 49-57.