Improvements on Head-Driven Probabilistic Parsing for Chinese
HE Liang1, DAI Xin-yu1, ZHOU Jun-sheng2, CHEN Jia-jun1
1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China; 2. Department of Computer Science, Nanjing Normal University, Nanjing, Jiangsu 210097, China
Abstract:After an analysis on Dan Bikel’s Parser which is based on head-driven statistical model, this paper presents some improvements on this distinctive parser for Chinese parsing. Firstly, a separate N-best POS-tagging module is provided to enhance the morphological processing. Secondly, an independent BaseNP identification module is integrated as another preprocessing module to decrease the complexity in Chinese parsing. And according to the characteristic of Chinese, several extended definitions of BaseNP are introduced, demonstrating that a suitable definition to BaseNP can help improve the performance for Chinese parsing. Finally, experiments are conducted for the refined Chinese statistical parser and the results indicate that both the efficiency and accuracy of Chinese parsing are improved significantly.
[1] T. L. Booth, R. A. Thompson. Applying Probability Measures to Abstract Languages[J]. IEEE Transactions on Computers, 1973, C-22(5) :442-450. [2] E. Black, F. Jelinek, J. Lafferty, D. Magerman. Towards history-based grammars : using richer models for probabilistic parsing[C]//Proc. ACL’93, Columbus, OH, 1993. 31-37. [3] M.P. Marcus. Deterministic Parsing and Description Theory[C]//Whitelock et al., eds. Linguistic Theory and Computer Applications. Academic Press.1987. 69-112. [4] M. Collins. Head-Driven Statistical Models for Natural Language Parsing[D]. Ph. D. Thesis, The University of Pennsylvania. 1999. [5] D.M. Bikel. On the Parameter Space of Generative Lexicalized Statistical Parsing Models[D]. Ph. D. Thesis, The University of Pennsylvania. 2004. [6] S. Abney. Parsing by chunks[C]//R. Berwick, S. Abney, C. Tenny eds. Principle-Based Parsing. Dordrecht: Kluwer Academic Publishers, 1991. 257-278. [7] 张昱琪,周强.汉语基本短语的自动识别[J].中文信息学报,2002,16(6): 1-8. [8] 李素建, 刘群, 白硕. 统计和规则相结合的汉语组块分析[J]. 计算机研究与发展, 2002, 39(4): 385-391. [9] 赵军,黄昌宁.汉语基本名词短语结构分析模型[J]. 计算机学报, 1999, 22(2) : 141-146.