该文提出了一种全新的分层式依存句法分析方法。该方法以依存深度不大于1的依存层作为分析单位,自底向上构建句子的依存结构。在层内,通过穷尽搜索得到层最优子结构;在层与层之间,分析状态确定性地转移。依存层的引入,使该模型具有比典型的基于图的方法更低的算法复杂度,与基于转换的方法相比,又一定程度上缓解了确定性过程的贪婪性。此外,该方法使用典型序列标注模型进行层依存子结构搜索,证明了序列标注技术完全可以胜任句法分析等层次结构分析任务。实验结果显示,该文提出的分层式依存分析方法具有与主流方法可比的分析精度和非常高的分析效率,在宾州树库上可以达到每秒2 500个英语单词。
Abstract
A layer-based projective dependency parsing approach is presented. This novel approach works layer by layer in a bottom-up manner, in which the depth of token dependency is allowed no more than one. Inside the layer the dependency graphs are searched exhaustively while between the layers the parser state transfers deterministically. Taking the dependency layer as the parsing unit, the proposed parser has a lower computational complexity than graph-based models which search for a whole dependency graph, alleviating the error propagation in transition-based models to some extent. Furthermore, our parser adopts the sequence labeling models to find the optimal sub-graph of the layer, which demonstrates the sequence labeling techniques qualified for hierarchical structure analysis tasks. Experimental results indicate that the proposed approach offers desirable accuracies and especially a very fast parsing speed, with 2500 words per second for Penn Treebank.
Key wordsdependency parsing; dependency layer; sequence labeling
关键词
依存句法分析 /
依存层 /
序列标注
{{custom_keyword}} /
Key words
dependency parsing /
dependency layer /
sequence labeling
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] R. McDonald, K. Crammer and F. Pereira. Online large-margin training of dependency parsers[C]//Proc. of ACL. Ann Arbor, USA: 2005, 91-98.
[2] R. McDonald and F. Pereira. Online learning of approximate dependency parsing algorithms[C]//Proc. of EACL. Trento, Italy: 2006, 81-88.
[3] H. Yamada and Y. Matsumoto. Statistical dependency analysis with support vector machines[C]//Proc. of IWPT. Nancy, France: 2003, 195-206.
[4] J. Nivre and M. Scholz. Deterministic dependency parsing of English text[C]//Proc. of COLING. Switzerland: 2004, 64-70.
[5] R. McDonald and J. Nivre. Characterizing the errors of data-driven dependency parsing models[C]//Proc. of EMNLP-CoNLL. Prague, Czech: 2007, 122-131.
[6] F. Sha and F. Pereira. Shallow parsing with conditional random fields[C]//Proc. of NAACL. Edmonton, Canada: 2003, 213-220.
[7] 李珩, 朱靖波, 姚天顺. 基于SVM的中文组块分析[J]. 中文信息学报, 2004, 18(2): 1-7.
[8] Y. Wu, J. Yang and Y. Lee. Multilingual deterministic dependency parsing framework using modified finite Newton method support vector machines[C]//Proc. of EMNLP-CoNLL. Prague, Czech: 2007, 1175-1181.
[9] Y. Cheng, M. Asahara and Y Matsumoto. Multi-lingual dependency parsing at NAIST[C]//Proc. of CoNLL. New York City, USA: 2006, 191-195.
[10] J. Hall, J. Nilsson, J. Nivre, et. al. Single Malt or blended? A study in multilingual parser optimization[C]//Proc. of EMNLP-CoNLL. Prague, Czech: 2007, 933-939.
[11] M. P. Marcus, B. Santorini and M. A. Marcinkiewicz. Building a large annotated corpus of English: the Penn Treebank[J]. Computational Linguistics, 1993, 19(2): 313-330.
[12] J. Hall, J. Nivre and J. Nilsson. Discriminative classifiers for deterministic dependency parsing[C]//
Proc. of COLING-ACL. Sydney, Australia: 2006, 316-323.
[13] http://w3.msi.vxu.se/~nivre/research/Penn2Malt.html[CP/OL].
[14] A. Ratnaparkhi. MXPOST[CP/OL]. http://www.inf.ed.ac.uk/resources/nlp/local_doc/MXPOST.html.
[15] N. Xue, F. Xia, F. Chiou et, al. The Penn Chinese Treebank: phrase structure annotation of a large corpus[J]. Natural Language Engineering, 2005,11(2): 207-238.
[16] X. Duan, J. Zhao and B. Xu. Probabilistic models for action-based Chinese dependency parsing[C]//Proc.of ECML-PKDD. Warsaw, Poland: 2007, 559-566.
[17] J. Nivre, J. Hall and J. Nilsson. MaltParser: a data-driven parser-generator for dependency parsing[C]//Proc. of LREC. Genoa, Italy: 2006, 2216-2219.
[18] J. Nivre. Incrementality in deterministic dependency parsing[C]//Proc. of ACL. Barcelona, Spain: 2004, 50-57.
[19] R. McDonald. MSTParser[CP/OL]. http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html.
[20] Y. Cheng, M. Asahara and Y. Matsumoto. Machine learning-based dependency analyzer for Chinese[C]//Proc. of ICCC. Mauritius: 2005, 66-73.
[21] M. Wang, K. Sagae and T. Mitamura. A fast, accurate deterministic parser for Chinese[C]//Proc. of COLING-ACL. Sydney, Australia: 2006, 425-432.
[22] Y. Goldberg and M. Elhadad. SplitSVM: fast, space-efficient, non-heuristic, polynomial kernel computation for NLP applications[C]//Proc. of ACL. Columbus USA: 2008, 237-240.
[23] J. Hall, J. Nilsson and J. Nivre. MaltParser in the CoNLL shared task 2007-Multilingual track[EB/OL]. http://maltparser.org/conll/conll07/.
[24] K. Sagae and A. Lavie. Parser combination by reparsing[C]//Proc. of NAACL. New York City, USA: 2006, 129-132.
[25] J. Nivre and R. McDonald. Integrating graph-based and transition-based dependency parsers[C]//Proc. of ACL. Columbus, USA: 2008, 950-958.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}