该文提出并比较了三种基于最大熵模型的依存句法分析算法,其中最大生成树(MST)算法取得了最好的效果。MST算法的目标是在一个带有权重的有向图中寻找一棵最大的生成树。有向图的每条边都对应于一个句法依存关系,边的权重通过最大熵模型获得。训练和测试数据来源于CoNLL2008 Share Task的公用语料。预测的F1值在WSJ和Brown两个测试集上分别达到87.42%和80.8%,在参加评测单位中排名第6。
Abstract
This paper presents three algorithms for dependency parsing based on the Maximum Entropy Models. The Maximum Spanning Tree (MST) algorithm achieves the best result. The target of MST is to find a Maximum Spanning Tree in a directed graph. Each edge of the directed graph corresponds to a dependency relation of the dependency parser, and the weights of the edges are obtained by using a Maximum Entropy Model. The training and test data sets are the CoNLL2008 share task corpora. The system achieves F1 scores of 87.42 and 80.8 for WSJ and Brown test data respectively, ranking sixth among all the competition teams.
关键词
计算机应用 /
中文信息处理 /
句法分析 /
最大生成树 /
最大熵
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
parsing /
maximum spanning tree /
maximum entropy
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Lucien Tesnière. éléments de syntaxe structurale[M]. Klincksieck, Paris 1959.
[2] Ryan McDonald, Fernando Pereira, Kiril Ribarov, Non-projective Dependency Parsing using Spanning Tree Algorithms[M]. HLT-EMNLP 2005.
[3] Ryan McDonald, Fernando Pereira. Online Learning of Approximate Dependency Parsing Algorithms. [C]//EACL 2006.
[4] Hiroyasu Yamada and Yuji Matsumoto. Statistical dependency analysis with support vector machines[C]//Proceedings of the 8th InternationalWorkshop on Parsing Technologies (IWPT), 2003.
[5] Joakim Nivre and Mario Scholz. Deterministic dependency parsing of English text[C]//Proceedings of the 20th International Conference on Computational Linguistics (COLIN G), 2004.
[6] Adam Berger, Stephen Della Pietra, Vincent Della Pietra. A Maximum Entropy Approach to Natural Language Processing[J]. Computational Linguistics, 1996.
[7] M. Collins, A new statistical parser based on bigram lexical de-pendencies[C]//Proc. 34th Annu. Meeting Association for Computational Linguistics, May 1996: 184-191.
[8] 段湘煜,赵军,徐波.基于动作建模的中文依存句法分析[J]. 中文信息学报, 2007,21(5): 25-30.
[9] 张亮,陈家骏.基于大规模语料库的句法模式匹配研究[J].中文信息学报, 2007,21(5): 31-35.
[10] 周明. 汉语句法分析器的鲁棒性研究[R]. 清华大学博士后出站报告,1993.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
自然科学基金资助项目(60435020, 90612005);国家863高科技计划资助项目(2006AA01Z197)
{{custom_fund}}