Chinese Maximal Noun Phrase Recognition Based on Reduction
QIAN Xiaofei1, HOU Min2
1. College of Liberal Arts, Shanghai University, Shanghai 200444, China; 2. National Broadcast Media Language Resources Monitoring & Research Center, Communication University of China,Beijing 100024, China
Abstract:This paper proposes an operational definition of Maximal Noun Phrase(MNP), and then analyzes its structure and distribution features. A MNP recognition based on baseNP reduction is also designed, which exploits the structural characteristics of MNP as well as the linguistic features such as initial definite references and semantic heads. This method eases the conflict between the long distance dependency of MNP and the limits of observation windows in classical models. The experiment indicates a good precision of 88.68% and a recall of 89.21%. The reduction method comprehensively improves system performance, especially it improves F1-score by 1% and optimal margin by 6% on multiword MNP, showing its efficiency in complex MNP recognition.
[1] Voutilainen A. NPTool: a detector of English noun phrases[C]//Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, 1993. [2] 李文捷,周明,潘海华,等.基于语料库的中文最长名词短语的自动提取[C]//陈力为,袁琦.计算语言学进展与应用.北京: 清华大学出版社,1995: 119-124. [3] 周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别[J].软件学报,2000,11(2):195-201. [4] Guiping Zhang, Wenjing Lang, Qiaoli Zhou, et al. Identification of Maximal-Length Noun Phrases Based on Maximal-Length Preposition Phrases in Chinese[C]//Proceedings of IALP 2010: 65-68. [5] Changhao Yin. Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases [D]. POSTECH, Korea, 2005. [6] Xue-Mei Bai, Jin-Ji Li, Dong-Il Kim, et al. Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese[C]//Proceedings of the 21st ICCPOL, 2006:268-276. [7] Kuang-hua Chen. Extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation[C]//Proceedings of the 32nd ACL, 1994. [8] 代翠,周俏丽,蔡东风,等.统计和规则相结合的汉语最长名词短语自动识别[J].中文信息学报,2008,22(6): 110-115. [9] 鉴萍,宗成庆.基于双向标注融合的汉语最长短语识别方法[J].智能系统学报,2009,4(5): 406-413. [10] Steven Abney. Syntactic affixation and performance structures[C]//Proceeding of Views on Phrase Structure, 1990. [11] 赵军,黄昌宁.结合句法组成模板识别汉语基本名词短语的概率模型[J].计算机研究与发展,1999,36(11): 1384-1390. [12] Elias Ponvert, Jason Baldridge, Katrin Erk. Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models[C]//Proceedings of the 49th ACL, 2011:1077-1086. [13] 钱小飞.最长名词短语识别研究[J].现代语文,2009,21:124-126. [14] 冯冲,陈肇雄,黄河燕,等.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,27(6): 1134-1139.