Chinese Maximal Noun Phrase Recognition Based on Mixed Strategy
QIAN Xiaofei1, HOU Min2
1. College of Liberal Arts, Shanghai University, Shanghai 200444, China; 2. Broadcast Media Language Branch, National Language Resources Monitoring and Research Center, Communication University of China,Beijing 100024, China
Abstract:This paper proposed a classifier ensemble method based on the language evaluation, and fused the MNP recognition results of SVMs and cascade CRFs based on reduction method, using the automatically obtained collocations and the manual assess rules. It then further targeted recognized the error-prone structures of the classifiers based on deterministic rules. The methods improve the recognition ability of boundary ambiguities of continuous verbs and prepositions as well as continuous nouns. The experiment is successful with a precision rate of 89.30% and a recall rate of 89.62%, especially it improves F1-score of multi-words MNPs by 0.75% in contrast with the reduction method. Key wordsmaximal noun phrase recognition; language knowledge assess; classifier ensemble; rule
[1] 周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别[J].软件学报,2000,(2):195-201. [2] 李文捷,周明,潘海华,等.基于语料库的中文最长名词短语的自动提取 [C].陈力为,袁琦.计算语言学进展与应用.北京: 清华大学出版社,1995: 119-124. [3] 冯冲,陈肇雄,黄河燕,等.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,(6): 1134-1139. [4] Chang-hao Yin. Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases [D]. POSTECH, Korea, 2005. [5] Gui-ping Zhang, Wenjing Lang, Qiaoli Zhou, et al. Identification of Maximal-Length Noun Phrases Based on Maximal-Length Preposition Phrases in Chinese [C]// Proceedings of IALP 2010: 65-68. [6] 代翠,周俏丽,蔡东风,等.统计和规则相结合的汉语最长名词短语自动识别[J].中文信息学报,2008,(6):110-115. [7] Xue-Mei Bai, Jin-Ji Li, Dong-Il Kim, et al. Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese [C]// Proceedings of the 21st ICCPOL, 2006:268-276. [8] 鉴萍,宗成庆.基于双向标注融合的汉语最长短语识别方法[J].智能系统学报,2009,(5):406-413.