本文研究了PCFG独立性假设的局限性,并针对这一局限性提出了句法结构共现的概念以引入上下文信息,给出了计算方法;为了打破中文树库规模过小的局限性,对于句法规则参数的获取,本文利用Inside-Outside算法进行迭代,最后提出了一个基于统计模型的自顶向下的汉语句法分析器。在封闭测试下,其标记精确率和标记召回率分别为88.1%和86.8%。实验结果表明,这种方法确实能够提高标记的精确率和召回率,值得深入研究。
Abstract
This paper studies the limitations of probabilistic context free grammar , and proposes a concept of co-occurrence in syntax structure so as to use the context information. To address the limitation of the Chinese Treebank’s small scale , an Inside-Outside algorithm to obtain the parameters of syntactic rules is given. At last , we present a probabilistic top-down Chinese parser. In the closed test , we get the result that label precision and label recall are 88.1% and 86.8% , showing that this method has potential to get a better performance in parsing and deserves further research.
关键词
人工智能 /
自然语言处理 /
统计句法分析 /
概率上下文无关文法 /
汉语自动分析
{{custom_keyword}} /
Key words
artificial intelligence /
natural language processing /
statistical paring /
probabilistic context-free grammar /
Chinese NLP
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Christopher D. Manning Hinrich Schutze. Foundations of Statistical Natural Language Processing[M] . The MIT Press Cambridge ,Massachusetts London ,England ,1999.
[2] Eugene Charniak. Parsing With Context-free Grammar and Word Statistics[A] . Technical Report CS-95-28 ,Dept. of Computer Science ,Brown University ,1995.
[3] Michael Collins. Head-Driven Statistical Model for Natural Language Parsing[D] . Ph. D. Thesis ,The University of Pennsylvania. 1999.
[4] Brian Roark. Probabilistic Top-Down Parsing and Language Modeling[J] . Computational Linguistics 2001 Volume 27 , Number 2.
[5] Michael Collins ,Three Generative Lexicalised Models for Statistical Parsing[C] ,CoRR cmp-lg/9706022.
[6] 孟遥. 四种基本统计句法分析模型在汉语句法分析中的性能比较[J] . 中文信息学报,2003 ,17 (3) :1 - 8.
[7] 吴竞存. 现代汉语句法分析[M] ,北京:北京大学出版社,1982.
[8] Charniak ,Eugene. 2000. A maximum-entropy-inspired parser[A] . In : Proceedings of the lst Conference of the North American Chapter of the Association for Computational Linguistics[C] ,132 - 139.
[9] 张浩. 结构上下文相关的概率句法分析[EB] http://www.nlp.org.cn/categories/default.php?cat—id=13.
[10] 杨开城. 一种基于句法语义特征的汉语句法分析器[J] . 中文信息学报,2000 ,14 (3) :46 - 53.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家高科技研究发展计划(863)资助项目(2002AA117010)
{{custom_fund}}