结合结构下文及词汇信息的汉语句法分析方法

陈 功,罗森林,陈开江,冯 扬,潘丽敏

PDF(2581 KB)
PDF(2581 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (1) : 9-16.
综述

结合结构下文及词汇信息的汉语句法分析方法

  • 陈 功,罗森林,陈开江,冯 扬,潘丽敏
作者信息 +

Method for Layered Chinese Parsing Based
on Subsidiary Context and Lexical Information

  • CHEN Gong, LUO Senlin, CHEN Kaijiang, FENG Yang, PAN Limin
    ()
Author information +
History +

摘要

针对句法分析中上下文无关语法模型对句子信息利用的不足,通过融入结构下文和部分词汇信息,提出两种基于概率上下文无关语法模型的短语结构消歧方法,以达到消解结构歧义的目的;引入分层分析的算法,通过损失一定的时间效率使得在提高分析准确率的同时保证分析结果的全面性。实验结果表明,融入结构下文及词汇信息的汉语句法分析方法,利用了更多的句子信息,与上下文无关语法相比有着更强的消歧能力。

Abstract

To deal with the defiency in employing sentence information of probabilistic context free grammar (PCFG) on parsing, subsidiary context and lexical information are introduced to propose two structure disambiguation methods based on PCFG. Both the accuracy and comprehensiveness are elevated at the cost of efficiency through the proposed layered parsing strategy. The experimental results show that the model of Chinese syntactic parsing based on subsidiary context and lexical information using more sentence information do better at disambiguation compared to PCFG.
Key wordsChinese syntactic parsing; probabilistic context free grammar; subsidiary context; lexical information; layered parsing

关键词

汉语句法分析 / 概率上下文无关语法 / 结构下文相关 / 词汇信息 / 分层分析

Key words

Chinese syntactic parsing / probabilistic context free grammar / subsidiary context / lexical information / layered parsing

引用本文

导出引用
陈 功,罗森林,陈开江,冯 扬,潘丽敏. 结合结构下文及词汇信息的汉语句法分析方法. 中文信息学报. 2012, 26(1): 9-16
CHEN Gong, LUO Senlin, CHEN Kaijiang, FENG Yang, PAN Limin
()
.
Method for Layered Chinese Parsing Based
on Subsidiary Context and Lexical Information. Journal of Chinese Information Processing. 2012, 26(1): 9-16

参考文献

[1] 苗夺谦, 卫志华. 中文文本信息处理的原理与应用[M]. 北京: 清华大学出版社, 2007.
[2] 张浩, 刘群, 白硕. 结构上下文相关的概率句法分析[C]//第一届学生计算语言学研讨会. 北京, 2002, 46-51.
[3] 冀铁亮, 穗志方. 词汇化句法分析与子语类框架获取的互动方法[J]. 中文信息学报. 2007(01): 120-126.
[4] 张耀中. 融合语义和句型信息的中文句法分析方法研究与实现[D]. 北京大学, 2009.
[5] Manning, C., Schütze, H. Foundations of Statistical Natural Language Processing[M]. Massachusetts: MIT Press, 1999.
[6] 冯志伟, 自然语言处理中的概率语法[J]. 当代语言学. 2005(2).
[7] Charniak, E. Treebank Grammars[R]. Providence: Department of Computer Science, Brown University, 1996.
[8] 周强. 汉语语料库的短语自动划分和标注研究[D]. 北京: 北京大学, 2002.
[9] Charniak, E. 1997. Statistical Parsing with a Context-free Grammar and Word Statistics[C]//Proceedings of National Conference on Artificial Intelligence(NCAI)-1997, 598-603.
[10] Daniel M. Bikel, David Chiang. Two statistical parsing models applied to the Chinese Treebank[C]//Proceedings of ACL 2nd Chinese Language Processing Workshop, 2000.1.
[11] Roger Levy, Christopher D. Manning. Is it harder to parse Chinese, or the Chinese Treebank[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003.439.
[12] Deyi Xiong, Shuanglong Li, QunLiu, Shouxun Lin, and Yueliang Qian. Parsing the Penn Chinese Treebank with semantic knowledge[C]//Proceedings of IJCNLP’05, 2005.
[13] Daniel M.Bikel On the Parameter Space of Generative Lexicalized Statistical Parsing Models[D]. Pennsylvania: Thesis of University of Pennsylvania, 2004.
[14] David Chiang, Daniel M. Bikel. Recovering latent information in treebanks[C]//COLING ’02 Proceedings of the 19th international conference on Computational linguistics-Volume 1, 2002.
[15] Zhengping Jiang Statistical Chinese parsing[D]. Singapore: National University of Singapore, 2004.
[16] 米海涛, 熊德意, 刘群. 中文词法分析与句法分析融合策略研究[J]. 中文信息学报. 2008,22(2): 10-17.
[16] Slav Petrov, Dan Klein. Improved inference for unlexicalized parsing[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. Rochester, New York, 2007:404-411.
[18] 曹海龙, 赵铁军, 李生. 基于中心驱动模型的宾州中文树库(CTB)句法分析[J]. 高技术通讯. 2007,17(1): 15-20.
[19] MaryP. Harper, Zhongqiang Huang. Chinese Statistical Parsing[C]//Joseph Olive, John McCary, and Caitlin Christianson (eds). Handbook of Natural Language Processing and Machine Translation. Defense Advanced Research Projects Agency, Reston Virginia, 2011:90-102.

基金

国家242项目(2005C48);北京理工大学基础研究基金(20060142014);北京理工大学研究生创新项目(GC200802)
PDF(2581 KB)

591

Accesses

0

Citation

Detail

段落导航
相关文章

/