句法分析前沿动态综述

PDF(7152 KB)

中文信息学报 ›› 2020, Vol. 34 ›› Issue (7) : 30-41.

综述

句法分析前沿动态综述

屠可伟,李俊

作者信息 +

A Survey of Recent Developments in Syntactic Parsing

TU Kewei, LI Jun

Author information +

History +

摘要

句法分析的目标是分析输入句子并得到其句法结构,是自然语言处理领域的经典任务之一。目前针对该任务的研究主要集中于如何通过从数据中自动学习来提升句法分析器的精度。该文对句法分析方向的前沿动态进行了调研,分别从有监督句法分析、无监督句法分析和跨领域跨语言句法分析三个子方向梳理和介绍了2018—2019年发表的新方法和新发现,并对句法分析子方向的研究前景进行了分析和展望。

Abstract

Syntactic parsing aims to analyze an input sentence for its syntactic structure. It is one of the most classic tasks in natural language processing. Current researches of syntactic parsing are focused on improving the accuracy of syntactic parsers via automatic learning from data. This paper surveys recent developments in syntactic parsing, classifies and introduces the new approaches and new discoveries over the past year in three subareas (supervised parsing, unsupervised parsing, and cross-domain/cross-language parsing), and finally discusses the future perspective of syntactic parsing research.

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

屠可伟,李俊. 句法分析前沿动态综述. 中文信息学报. 2020, 34(7): 30-41

TU Kewei, LI Jun. A Survey of Recent Developments in Syntactic Parsing. Journal of Chinese Information Processing. 2020, 34(7): 30-41

参考文献

[1] Kulmizev A, De Lhoneux M, Gontrum J, et al. Deep contextualized word embeddings in transition-based and graph-based dependency parsing-A tale of two parsers revisited[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). A-ssociation for Computational Linguistics, 2019: 2755-2768.
[2] De Lhoneux M, Ballesteros M, Nivre J. Recursive subtree composition in LSTM-based dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 1566-1576.
[3] Falenska A, Kuhn J. The (non-) utility of structural features in BiLSTM-based dependency parsers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 117-128.
[4] Zhang Z, Ma X, Hovy E. An empirical investigation of structured output modeling for graph-based neural dependency parsing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5592-5598.
[5] Gómez-Rodríguez C, Vilares D. Constituent parsing as sequence labeling[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 1314-1324.
[6] Vilares D, Abdou M, Sgaard A. Better, faster, stronger sequence tagging constituent parsers[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 3372-3383.
[7] Strzyz M, Vilares D, Gómez-Rodríguez C. Viable dependency parsing as sequence labeling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 717-723.
[8] Strzyz M, Vilares D, Gómez-Rodríguez C. Sequence labeling parsing by learning across representations[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5350-5357.
[9] Ji T, Wu Y, Lan M. Graph-based dependency parsing with graph neural networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2475-2485.
[10] Wang X, Huang J, Tu K. Second-order semantic dependency parsing with end-to-end neural networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4609-4618.
[11] Li X L, Eisner J. Specializing word embeddings (for parsing) by information bottleneck[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 2744-2754.
[12] Devlin J, Chang M-W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 4171-4186.
[13] Klein D, Manning C. Corpus-based induction of syntactic structure: Models of dependency and constituency[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004: 478-485.
[14] Jiang Y, Han W, Tu K. Unsupervised neural dependency parsing[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 763-771.
[15] Han W, Jiang Y, Tu K. Enhancing unsupervised generative dependency parser with contextual information[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5315-5325.
[16] Kingma D P, Welling M. Auto-encoding variational Bayes [J]. CoRR, abs/1312.6114, 2013.
[17] Kim Y, Dyer C, Rush A. Compound probabilistic context-free grammars for grammar induction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2369-2385.
[18] Jin L, Doshi-Velez F, Miller T, et al. Unsupervised learning of pcfgs with normalizing flow[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2442-2452.
[19] Peters M, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227-2237.
[20] Rezende D, Mohamed S. Variational inference with normalizing flows[M]. Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research; PMLR. 2015: 1530-1538.
[21] Cai J, Jiang Y, Tu K. CRF autoencoder for unsupervised dependency parsing[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017: 1638-1643.
[22] Drozdov A, Verga P, Yadav M, et al. Unsupervised latent tree induction with deep inside-outside recursive auto-encoders[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 1129-1141.
[23] Dyer C, Kuncoro A, Ballesteros M, et al. Recurrent neural network grammars[J].arXiv preprint. arXiv: 16020.7776, 2016.
[24] Li B, Cheng J, Liu Y, et al. Dependency grammar induction with a neural variational transition-based parser[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 6658-6665.
[25] Ganchev K, Gillenwater J, Taskar B. Posterior regularization for structured latent variable models [J]. Journal of Machine Learning Research, 2010, 11(7):2001-2049.
[26] Kim Y, Rush A, Yu L, et al. Unsupervised recurrent neural network grammars[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 1105-1117.
[27] Durrett G, Klein D. Neural CRF parsing[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2015: 302-312.
[28] Caio C, Ivan T. Differentiable perturb-and-parse: semi-supervised parsing with a structured variational autoencoder[C]//Proceedings of the International Conference on Learning Representations, 2018.
[29] Jang E, Gu S, Poole B. Categorical reparameterization with Gumbel-softmax [J]. CoRR, abs/1611.01144, 2016.
[30] He J, Zhang Z, Berg-Kirkpatrick T, et al. Cross-lingual syntactic transfer through unsupervised adaptation of invertible projections[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3211-3223.
[31] Jiang Y, Han W, Tu K. A regularization-based framework for bilingual grammar Induction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 1423-1428.
[32] Dehouck M, Denis P. Phylogenic multi-lingual dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 192-203.
[33] Li Z, Peng X, Zhang M, et al. Semi-supervised domain adaptation for dependency parsing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2386-2395.
[34] Han W, Wang G, Jiang Y, et al. Multilingual grammar induction with continuous language identification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 5727-5732.
[35] Fried D, Kitaev N, Klein D. Cross-domain generalization of neural constituency parsers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 323-330.
[36] Schuster T, Ram O, Barzilay R, et al. Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 1599-1613.
[37] Kitaev N, Cao S, Klein D. Multilingual constituency parsing with self-attention and pre-training[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3499-3505.
[38] Wang Y, Che W, Guo J, et al. Cross-lingual BERT transformation for zero-shot dependency parsing[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 5720-5726.
[39] Kondratyuk D, Straka M. 75 languages, 1 model: Parsing universal dependencies universally[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 2779-2795.
[40] Ahmad W, Zhang Z, Ma X, et al. On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 2440-2452.
[41] Dryer M S, Haspelmath M. WALS Online [M/OL]. 2013. https://wals.info/.
[42] Scholivet M, Dary F, Nasr A, et al. Typological features for multilingual delexicalised dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 3919-3930.
[43] Fisch A, Guo J, Barzilay R. Working hard or hardly working: Challenges of integrating typology into neural dependency parsers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019: 5713-5719.
[44] Rasooli M S, Collins M. Low-resource syntactic transfer with unsupervised source reordering[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 3845-3856.

基金

国家自然科学基金(61976139)

PDF(7152 KB)

1874

Accesses

Citation

Detail

段落导航

摘要
Abstract
引用本文
参考文献
基金

Received	Published
2020-01-19	2020-08-10
Issue Date
2020-08-10

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金