隐式篇章关系识别是在缺少显式连接词的条件下,自动判定论元之间的语义关系。其挑战性在于现有训练数据的规模较小,其蕴含的语义多样性也相对有限。针对上述问题,该文利用掩码语言模型架构建立篇章关系分类模型。其动因包括: ①掩码语言模型在自监督学习过程中具备局部的语言生成能力,即在理解上下文语义的基础上“重构掩码区域语义表示”的能力; ②掩码重构形成了数据增强(潜在的自动数据扩展)的效果,有助于提高篇章关系分类模型的鲁棒性。特别地,该文提出一种基于交互注意力的掩码语言模型,该方法计算论元之间的交互注意力矩阵,并依赖交互注意力动态选择论元之间高关联性的关键词项进行遮蔽、掩码重构,从而形成更有针对性的数据增强(非关键信息的数据增强对关系分类影响不大)。该文利用宾州篇章树库语料进行实验。实验结果表明,相较于基准系统,我们提出的方法的F1值在四大类关系(对比关系、偶然性关系、扩展关系和时序关系)上分别提高了3.21%、6.46%、2.74%和6.56%。
Abstract
Implicit discourse relation recognition is to determine the semantic relations between arguments in the absence of explicit connectives. The challenge lies in the small scale of the existing training data and the relatively limited semantic diversity contained in it. To address the issue, this paper proposes a novel discourse relation recognition method based on the interactive-attention-based mask language model. The motivations include ①the mask language model has local language generation capabilities in the self-supervised learning process, that is, the ability to "reconstruct the semantic representation of the mask region" based on understanding the contextual semantics; ②the mask reconstruction has formed the effect of data enhancement (potentially automatic data expansion) and improves the robustness of discourse relation recognition. Technically, the method calculates interactive-attention weights between the arguments. Then, we select the keywords between arguments for masking according to interactive-attention weights. The experiments on Penn Discourse Treebank 2.0 (PDTB 2.0) show that the proposed method increases F1 score by 3.21%, 6.46%, 2.74%, and 6.56% for four top relations (Comparison, Contingency, Expansion, and Temporal), respectively.
关键词
隐式篇章关系 /
交互式注意力 /
掩码语言模型
{{custom_keyword}} /
Key words
implicit discourse relation /
interactive-attention /
mask language model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Riaz M, Girju R. Another look at causality: Discovering scenario-specific contingency relationships with no supervision[C]//Proceedings of the 4th IEEE International Conference on Semantic Computing, September 22-24, Carnegie Mellon University, Pittsburgh, PA, USA. IEEE Computer Society, 2010: 361-368.
[2] Do Q, Chan Y S, Roth D. Minimally supevised event causality identification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, Scotland, UK., 2011: 294-303.
[3] Litkowski K C. Question-answering using semantic relation triples[C]//Proceedings of the National Institute of Standards and Technology, 1999, 500-246.
[4] Zhou L, Li B, Gao W, et al. Unsupervised discovery of discourse relations for eliminating intra-sentence polarity ambiguities[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, Scotland, UK., 2011: 162-171.
[5] Yoshida Y, Suzuki J, Hirao T, et al. Dependency-based discourse parser for single-document summarization[C]//Proceedings of EMNLP. ACL, 2014: 1834-1839.
[6] Meyer T, PopescuBelis A. Using sense-labeled discourse connectives for statistical machine translation[C]//Proceedings of ESIRMT/HyTra@EACL. Association for Computational Linguistics, 2012: 129-138.
[7] Xiong D, Ding Y, Zhang M, et al. Lexical chain based cohesion models for document-level statistical machine translation[C]//Proceedings of EMNLP. ACL, 2013: 1563-1573.
[8] Prasad R, Dinesh N, Lee A, et al.The penn discourse TreeBank 2.0[C]//Proceedings of LREC. European Language Res-ources Association, 2008.
[9] Devlin J, Chang M W, Lee K, et al. BERT:Ppre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv:1810.04805, 2018.
[10] Liu Y, Ott M, Goyal N, et al. RoBERTA: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[11] Pitler E, Nenkova A. Using syntax to disambiguate explicit discourse connectives in text[C]//Proceedings of ACL/ IJCNLP Conference, 2009: 13-16.
[12] Lin Z, Kan M Y, Ng H T. Recognizing implicit discourse relations in the Penn Discourse Treebank[C]//Proceedings of EMNLP. ACL, 2009: 343-351.
[13] Zhang B, Su J, Xiong D, et al. Shallow convolutional neural network for implicit discourse relation re-cognition[C]//Proceedings of EMNLP. The Association for Computational Linguistics, 2015: 2230-2235.
[14] Chen J, Zhang Q, Liu P, et al. Implicit discourse relation detection via a deep architecture with gated relevance network[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1726-1735.
[15] Qin L, Zhang Z, Zhao H, et al. Adversarial connective-exploiting networks for implicit discourse relation classification[C]//Proceedings of the 55th Annual Meeting of the ACL. Association for Computational Linguistics, 2017: 1006-1017.
[16] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st Information Conference on Neural Information Processing Systems, 2017: 5998-6008.
[17] Guo F, He R, Jin D, et al. Implicit discourse relation recognition using neural tensor network with interactive attention and sparse learning[G]. COLING. Association for Computational Linguistics, 2018: 547-558.
[18] Graves A. Genevating sequences with vecurrent neural networks[J]. arXiv preprint arXiv: 1308.0850, 2013.
[19] Bai H, Zhao H. Deep enhanced representation for implicit discourse relation recogntion[G]. COLING. Association for Computational Linguistics, 2018: 571-583.
[20] Nguyen L T, Linh N V, Than K, et al. Employing the correspondence of relations and connectives to identify implicit discourse relations via label embeddings[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4201-4207.
[21] Varia S, Hidey C, Chakrabarty T. Discourse relation prediction: Revisiting word pairs with convolutional networks[C]//Proceedings of the Conference, 2019: 442-452.
[22] He R, Wang J, Guo F, et al. Transs-driven joint learning architecture for implicit discourse relation recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 139-148.
[23] Ruan H, Hong Y, Xu Y, et al. Interactively-propagative attention learning for implicit discourse relation recognition[C]//Proceedings of the 28th International Committee on Computational Linguistics, 2020: 3168-3178.
[24] Liu X, Ou J, Song Y, et al. On the importance of word and sentence representation learning in implicit discourse relation classification[C]//Proceedings of the 29th International Conference on Artificial Intelligence, 2020: 3830-3836.
[25] Rubinstein R.The cross-entropy method for combi-natorial and continuous optimization[J]. Methodology and Computing in Applied Probability, 1999,1(2):127-190
[26] Kingma D P, BA J. Adam: A method for stochastic optimization[C]//Proceedings of the 3rd ICLR, 2015.
[27] Qin L, Zhang Z, Zhao H. A stacking gated neural architecture for implicit discourse relation classi-fication[C]//Proceedings of the EMNLP. The Association for Computational Linguistics, 2016: 2263-2270.
[28] Liu Y, Li S, Zhang X, et al. Implicit discourse relation classification via multi-task neural net-works[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016: 2750-2756.
[29] Lan M, Wang J, Wu Y, et al. Multi-task attention-based neural networks for implicit discourse relationship representation and identifition[C]//Proceedings of the EMNLP. Association for Computational Linguistics, 2017: 1299-1308.
[30] Lei W, Xiang Y, Wang Y, et al. Linguistic properties matter for implicit discourse relation recognition: combining semantic interaction, topic continuity and attribution[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 4848-4855.
[31] Dai Z, Huang R. Improving implicit discourse relation classification by modeling inter-dependencies of discourse units in a paragraph[C]//Proceedings of the NAACL-HLT. Association for Computational Linguistics, 2018: 141-151.
[32] Dror R, Baumer G, Shlomov S, et al. The hitchikers guide to testing statistical significance in natural language processing[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1383-1392.
[33] Johnson D H. The insignificance of statistical sinificance testing[J]. The Journal of Wildlife Managemnt, 1999: 763-772.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
科技部重大专项课题(2020YFB1313601);国家自然科学基金(62076174,61773276)
{{custom_fund}}