基于统计学习模型的句法分析方法综述

吴伟成1,周俊生1,曲维光1,2

PDF(1103 KB)
PDF(1103 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 9-20.
综述

基于统计学习模型的句法分析方法综述

  • 吴伟成1,周俊生1,曲维光1,2
作者信息 +

A Survey of Syntactic Parsing Based on Statistical Learning

  • WU Weicheng1, ZHOU Junsheng1, QU Weiguang1,2
Author information +
History +

摘要

句法分析是自然语言处理领域中重要的基础研究问题之一。近年来,基于统计学习模型的句法分析方法研究受到了广泛关注,多种模型与算法先后被提出。从采用的学习模型和算法类型着手,该文系统地对各种主流和前沿方法进行了归纳与分类,着重对各类模型和算法的思想进行了分析和对比,并对中文句法分析的研究现状进行了综述;最后,对句法分析下一步的研究方向与趋势进行了展望。

Abstract

Syntactic parsing is one of the fundamental issues in natural language processing. In recent years, much effort has been devoted to syntactic parsing, resulting in a variety of approaches based on statistical learning. This paper systemically summarizes and classifies various approaches to syntactic parsing from the view of the statistical learning models and algorithms, focusing on the analysis and comparison of the different types of models and algorithms. The current researches on the Chinese syntactic parsing are also presented in this paper. Finally we give the future directions and trends in syntactic parsing research, especially for Chinese syntactic parsing.
Key wordssyntactic parsing; statistical learning model; generative model; discriminative model; shift-reduce; data oriented parsing

关键词

句法分析 / 统计学习模型 / 生成式模型 / 判别式模型 / 移进—归约决策 / 面向数据的句法分析

Key words

syntactic parsing / statistical learning model / generative model / discriminative model / shift-reduce / data oriented parsing
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
吴伟成1,周俊生1,曲维光1,2. 基于统计学习模型的句法分析方法综述. 中文信息学报. 2013, 27(3): 9-20
WU Weicheng1, ZHOU Junsheng1, QU Weiguang1,2. A Survey of Syntactic Parsing Based on Statistical Learning. Journal of Chinese Information Processing. 2013, 27(3): 9-20

参考文献

[1] Mitchell P Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini. Building a Large Annotated Corpus of English:The Penn TreeBank [J]. Computational linguistics, 1993,19(2):313-330.
[2] Naiwen Xue, Fei Xia, Fu-Dong Chiou, et al. The Penn Chinese Treebank:Phrase Structure Annotation of a Large Corpus [J]. Natural Language Engineering, 2005,11(2):207 -238.
[3] 周强.汉语句法树库标注体系[J].中文信息学报, 2004, 18(4):1-8.
[4] Huang Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, et al. Sinica Treebank:Design Criteria,Annotation Guidelines, and On-line Interface[C]//Proceedings of the Chinese Language Processing Worshop. Stroudsburg: Association for Computational Linguistics, 2000:29-37.
[5] E Black, S Abney, D Flickenger, et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars[C]//Proceedings of the DARPA Speech and Natural Language Workshop. Stroudsburg: Association for Computational Linguistics, 1991:306-311.
[6] Eugene Charniak. Statistical parsing with a context-free grammar and word statistics[C]//Proceedings of the 14th National Conference on Artificial Intelligence. MenloPark: AAAI Press/MIT Press, 1997: 598-603.
[7] Eugene Charniak. A maximum-entropy inspired parser[C]//Proceedings of NAACL 2000. San Francisco: Morgan Kaufmann Publishers, 2000:132-139.
[8] Michael Collins. Head-Driven Statistical Models for Natural Language Parsing [D]. Philadelphia: University of Pennsylvania, 1999.
[9] Michael Collins. Discriminative reranking for natural language parsing[C]//Proceedings of ICML 2000: 175-182.
[10] Michael Collins, Nigel Duffy. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron[C]//Proceedings of the ACL 2002. Stroudsburg: Association for Computational Linguistics, 2002:263-270.
[11] Eugene Charniak, Mark Johnson. Coarse-to-fine n-best parsing and maxent discriminative reranking[C]//Proceedings of ACL 2005. Stroudsburg: Association for Computational Linguiscs, 2005:173-180.
[12] Johnson Mark. PCFG models of linguistic tree representations [J]. Computations Linguistics, 1998,24(4):613-632.
[13] Dan Klein, Christopher D Manning. Accurate Unlexicalized Parsing[C]//Proceedings of ACL 2003. Stroudsburg: Association for Computational Linguistics, 2003:423-430.
[14] Slav Petrov, Leon Barrett, Romain Thibaux, et al. Learning accurate, compact, and interpretable tree annotation[C]//Proceedings of COLING-ACL 2006. Stroudsburg: Association for Computational Linguistics, 2006:443-440.
[15] Slav Petrov, Dan Klein. Improved inference for unlexicalized parsing[C]//Proceedings of HLT-NAACL 2007. Rochester, 2007:404-411.
[16] Taskar B, Klein D, Collins M, et al. Max-margin parsing[C]//Proceedings of EMNLP 2004. Barcelona, 2004.
[17] Jenny Rose Finkel, Alex Kleeman, Christopher D Manning. Efficient, feature-based, conditional random field parsing[C]//Proceedings of ACL-HLT 2008. 959-967.
[18] B Taskar, C Guestrin, D Koller. Max margin Markov networks[C]//Proceedings of NIPS 2003. Vancouver, 2003.
[19] Turian J, Melamed ID. Constituent parsing by classification[C]//Proceedings of IWPT 2005. Stroudsburg: Association for Computational Linguistics, 2005.
[20] Turian J, Melamed ID. Advances in discriminative parsing[C]//Proceedings of COLING-ACL 2006. Stroudsburg: Association for Computational Linguistics, 2006.
[21] Kenji Sagae, Alon Lavie. A classifier-based parser with linear run-time complexity[C]//Proceedings of IWPT 2005: 125-132.
[22] Magerman David M. Statistical Decision-Tree Models for Parsing[C]//Proceedings of ACL 1995. Stroudsburg: Association for Computational Linguistics, 1995:276-283.
[23] Adwait Ratnaparkhi. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models[C]//Proceedings of EMNLP 1997.
[24] Yue Zhang, Stephen Clark. Syntactic Processing Using the Generalized Perceptron and Beam Search [J]. Computational Linguistics, 2011,37(1): 105-151.
[25] Rens Bod. A computational model of language performance: data oriented parsing[C]//Proceedings of COLING 1992. Stroudsburg: Association for Computational Linguistics, 1992:855-859.
[26] Rens Bod. Using an Annotated Corpus as a Stochastic Grammar[C]//Proceedings of the Sixth Conference of the European Chapter of the ACL. Stroudsburg: Association for Computational Linguistics, 1993:37-44.
[27] Rens Bod. What is the minimal set of fragments that achieves maximal parse accuracy?[C]//Proceedings of ACL 2001. Stroudsburg: Association for Computational Linguistics, 2001.
[28] Joshua Goodman. Efficient algorithms for parsing the DOP model[C]//Proceedings of EMNLP 1996: 143-152.
[29] Rens Bod. An efficient implementation of a new DOP model[C]//Proceedings of EACL. Stroudsburg: Association for Computational Linguistics, 2003:19-26.
[30] Mohit Bansal, Dan Klein. Simple, accurate parsing with an all-fragments grammar[C]//Proceedings of ACL 2010. Stroudsburg: Association for Computational Linguistics, 2010:1098-1107.
[31] Federico Sangati, Willem Zuidema. Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP[C]//Proceedings of EMNLP 2011: 84-95.
[32] Sima’an K. Computational Complexity of Probabilistic Disambiguation by Means of Tree Grammars[C]//Proceedings of COLING 1996.Stroudsburg: Association for Computational Linguistics, 1996:1175-1180.
[33] Rens Bod. Parsing with the Shortest Derivation[C]//Proceedings of COLING [C]. Stroudsburg: Association for Computational Linguistics, 2000:69-75.
[34] Remko Scha. Taaltheorie en taaltechnologie: competence en performance [C]//R. de Kort and G.L.J. Leerdam (eds.): Computertoepassingen in de Neerlandistiek. Almere: LVVN, 1990:7-22.
[35] John Henderson, Eric Brill. Exploiting diversity in natural language processing: combining parsers[C]//Proceedings of EMNLP 1999: 187-194.
[36] Kenji Sagae, Alon Lavie. Parser combination by reparsing[C]//Proceedings of NAACL 2006. Stroudsburg: Association for Computational Linguistics, 2006:129-132.
[37] Hui Zhang, Min Zhang, Chew Lim Tan, et al. K-Best Combination of Syntactic Parsers[C]//Proceedings of EMNLP 2009. Stroudsburg: Association for Computational Linguistics, 2009:1552-1560.
[38] 林颖,史晓东,郭峰. 一种基于概率上下文无关文法的汉语句法分析[J].中文信息学报, 2006,20(2):1-7.
[39] Daniel M Bikel. On the parameter space of generative lexicalized statistical models [D]. Philadelphia: University of Pennsylvania, 2004.
[40] Deyi Xiong, Shuanglong Li, Qun Liu, et al.Parsing the Penn Chinese Treebank with semantic knowledge[C]//Proceedings of IJCNLP 2005: 70-81.
[41] 曹海龙. 基于词汇化统计模型的汉语句法分析研究[D].哈尔滨:哈尔滨工业大学, 2006.
[42] 张浩, 刘群, 白硕.结构上下文相关的概率句法分析[C]//第一届学生计算语言学研讨会.北京:北京大学,2002.
[43] Mengqiu Wang, Kenji Sagae, Teruko Mitamura. A fast, accurate deterministic parser for Chinese[C]//Proceedings of COLING/ACL. Stroudsburg: Association for Computational Linguistics, 2006:425-432.
[44] Li Junhui, Zhou Guodong, Ng Hwee Tou. Syntactic Parsing with Hierarchical Modeling[C]//Proceedings of AIRS 2008: 561-566.
[45] Li Junhui, Zhou Guodong, Ng Hwee Tou. Joint Syntatic and Semantic Parsing of Chinese[C]//Proceedings of ACL 2010. Stroudsburg: Association for Computational Linguistics, 2010:1108-1117.
[46] Zhiguo Wang, Chengqing Zong. Phrase Structure Parsing with Dependency Structure[C]//Proceedings of COLING 2010. Stroudsburg: Association for Computational Linguistics, 2010:1292-1300.
[47] Hal Daumé III, Langford J, Marcu D. Search-based structured prediction [J]. Machine Learning, 2009,75(3):297-325.
[48] Daniel M. Bikel. Two Statistical Parsing Models Applied to the Chinese Treebank[C]//Proceedings of the Second Chinese Language Processing Workshop. Stroudsburg: Association for Computational Linguistics, 2000:1-6.
[49] 陈小荷. 从自动句法分析角度看汉语词类问题[J]. 语言教学与研究,1999.
[50] 徐艳华. 现代汉语实词语法功能考察及词类体系重构[D].南京:南京师范大学,2006.

基金

国家自然科学基金资助项目(61073119,61272221);江苏省社会科学基金资助项目(12YYA002);江苏省自然科学基金资助项目(BK2010547);南京大学计算机软件新技术国家重点实验室开放基金(KFKT2012B05)
PDF(1103 KB)

843

Accesses

0

Citation

Detail

段落导航
相关文章

/