Abstract:This paper proposed a hybrid model to identify Chinese base phrases.At first step ,We use a memory-based learning (MBL) approach to the chunking of nine types of Chinese base phrases and compare the results coming from different feature vectors. In the second series of experiments we used grammar rules that represent the inner structures of base phrases and lexical information to correct the incorrect predictions from the first step. The experiments reported in this paper show competitive results : the precision is 95.2% and the recall is 93.7%.
[1] Steven Abney ,Parsing by chunks. In Principle-Based Parsing. Kluwer Academic Publishers ,Dordrecht ,1991 [2] K. Church ,A stochastic parts program and noun phrase parser for unrestricted text. In : Proc. of Second Conference on Applied Natural Language Processing ,1988 [3] Walter Daelemans , Sabine Buchholz ,Jorn Veenstra ,Memory-Based Shallow Parsing. In Proc. Of COLING, Bergen ,Norway ,1999 [4] Walter Daelemans ,Jakub Zavrel ,Ko van der Sloot ,TiMBL : Tilburg Memory-Based Learner version 4.0 Reference Guide ,2001. http:∥ilk.kub.nl/downloads/pub/papers/ilk0104.ps.pz [5] Erik F. Tjong Kim Sang and Sabine Buchholz ,Introduction to CoNLL-2000 Shared Task : Chunking. Proceedings of CoNLL-2000 and LLL-2000.Lisbon ,Portugal. 127 - 132. [6] J .B.Veenstra ,Fast np chunking using memory-based learning techniques. In Proceedings of BENELEARN’98 , pages 71 - 78 ,Wageningen ,The Netherlands ,1998 [7] Wojciech Skut and Thorsten Brants ,Chunk Tagger ,Statistical Recognition of Noun Phrase , In ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing ,Saarbrvcken ,1998 [8] Tie-jun ZHAO ,et al. Statistics Based Hybrid Approach to Chinese Base Phrase Identification , In Proceedings of the Second Chinese Language Processing Workshop ,ACL 2000 ,73 - 77 [9] 周强. 汉语基本短语标注规范. 清华大学计算机系智能技术与系统国家重点实验室,技术资料,2001.4 [10] 周强. 汉语语料库的短语自动划分和标注研究. 北京大学博士研究生学位论文,1996 [11] 赵军. 汉语基本名词短语识别及结构分析研究. 清华大学工学博士学位论文,1998 [12] 孙宏林. 现代汉语非受限文本的实语块分析. 北京大学博士研究生学位论文,2001