Abstract:To deal with the defiency in employing sentence information of probabilistic context free grammar (PCFG) on parsing, subsidiary context and lexical information are introduced to propose two structure disambiguation methods based on PCFG. Both the accuracy and comprehensiveness are elevated at the cost of efficiency through the proposed layered parsing strategy. The experimental results show that the model of Chinese syntactic parsing based on subsidiary context and lexical information using more sentence information do better at disambiguation compared to PCFG. Key wordsChinese syntactic parsing; probabilistic context free grammar; subsidiary context; lexical information; layered parsing
[1] 苗夺谦, 卫志华. 中文文本信息处理的原理与应用[M]. 北京: 清华大学出版社, 2007. [2] 张浩, 刘群, 白硕. 结构上下文相关的概率句法分析[C]//第一届学生计算语言学研讨会. 北京, 2002, 46-51. [3] 冀铁亮, 穗志方. 词汇化句法分析与子语类框架获取的互动方法[J]. 中文信息学报. 2007(01): 120-126. [4] 张耀中. 融合语义和句型信息的中文句法分析方法研究与实现[D]. 北京大学, 2009. [5] Manning, C., Schütze, H. Foundations of Statistical Natural Language Processing[M]. Massachusetts: MIT Press, 1999. [6] 冯志伟, 自然语言处理中的概率语法[J]. 当代语言学. 2005(2). [7] Charniak, E. Treebank Grammars[R]. Providence: Department of Computer Science, Brown University, 1996. [8] 周强. 汉语语料库的短语自动划分和标注研究[D]. 北京: 北京大学, 2002. [9] Charniak, E. 1997. Statistical Parsing with a Context-free Grammar and Word Statistics[C]//Proceedings of National Conference on Artificial Intelligence(NCAI)-1997, 598-603. [10] Daniel M. Bikel, David Chiang. Two statistical parsing models applied to the Chinese Treebank[C]//Proceedings of ACL 2nd Chinese Language Processing Workshop, 2000.1. [11] Roger Levy, Christopher D. Manning. Is it harder to parse Chinese, or the Chinese Treebank[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003.439. [12] Deyi Xiong, Shuanglong Li, QunLiu, Shouxun Lin, and Yueliang Qian. Parsing the Penn Chinese Treebank with semantic knowledge[C]//Proceedings of IJCNLP’05, 2005. [13] Daniel M.Bikel On the Parameter Space of Generative Lexicalized Statistical Parsing Models[D]. Pennsylvania: Thesis of University of Pennsylvania, 2004. [14] David Chiang, Daniel M. Bikel. Recovering latent information in treebanks[C]//COLING ’02 Proceedings of the 19th international conference on Computational linguistics-Volume 1, 2002. [15] Zhengping Jiang Statistical Chinese parsing[D]. Singapore: National University of Singapore, 2004. [16] 米海涛, 熊德意, 刘群. 中文词法分析与句法分析融合策略研究[J]. 中文信息学报. 2008,22(2): 10-17. [16] Slav Petrov, Dan Klein. Improved inference for unlexicalized parsing[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. Rochester, New York, 2007:404-411. [18] 曹海龙, 赵铁军, 李生. 基于中心驱动模型的宾州中文树库(CTB)句法分析[J]. 高技术通讯. 2007,17(1): 15-20. [19] MaryP. Harper, Zhongqiang Huang. Chinese Statistical Parsing[C]//Joseph Olive, John McCary, and Caitlin Christianson (eds). Handbook of Natural Language Processing and Machine Translation. Defense Advanced Research Projects Agency, Reston Virginia, 2011:90-102.