在自然语言处理的各种任务上,基于自监督预训练的相关工作大幅度提升了模型的效果。尽管缺乏结构化信息的监督,但在预训练模型学习得到的词向量中仍能反映出诸如语法依赖关系、知识三元组等语言结构信息。这也表明结构信息对于深层次理解语言尤为重要。该文将探索如何在NLP任务中引入额外的结构化信息,与之前研究工作的差别是直接在词向量上添加结构化监督的方法,该文提出的知识驱动的编码器模型——Prior-Driven Transformer(PDT)模型将结构化知识直接融入模型的自注意模块内,从而可进行领域知识驱动的信息传递和高层推理。具体来说,对于一个给定的序列,首先利用现有的工具提取包括语法级别到知识级别的多种结构化知识;然后,PDT模型分别将不同种类的结构信息转化为对应的掩码矩阵,并输入到Transformer模型中;最后,PDT模型仅在掩码矩阵覆盖的单词序列上进行自注意力运算,生成融合结构知识的文本向量表示来辅助常识问答。该文在常识问答数据集CosmosQA和CommonsenseQA上进行了大量的实验,实验结果表明,PDT模型不仅能够在预训练模型基础上进一步提升准确率,同时针对不同的样本,PDT模型也能关注到最有助于回答当前问题的结构化信息。
Abstract
To add additional structured information to NLP tasks, this paper, instead of adding structured constraints on word representation, proposes the Prior-Driven Transformer to incorporate human-developed structures directly into the model architecture. Specifically, given a sequence, we first extract different types of structures from syntax level to knowledge level. These structures, extracted by off-the-self tool developed by previous words, are considered as priors of how human extract information from the text. Then, we dynamically fuse the extracted structure based on the input context, question or answer candidate. Finally, the Prior-Driven Transformer performs self-attention only on the tokens that are related in the fused structure to generate word representation and answer the question. Experiments on CosmosQA and CommonsenseQA datasets show the proposed Prior-Driven Transformer not only achieve consistent improvements over the pretrained RoBERTa model, but also provide a clear focus on the importance of structure.
关键词
常识问答 /
结构化知识 /
动态融合 /
多项选择问答
{{custom_keyword}} /
Key words
commonsense QA /
structured knowledge /
dynamic integration /
multiple choice QA
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[2] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[3] HEWITT J, MANNING C D. A structural probe for finding syntax in word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4129-4138.
[4] REIF E, YUAN A, WATTENBERG M, et al. Visualizing and measuring the geometry of BERT[C]//Proceedings of the 32rd International Conference on Neural Information Processing Systems, 2019, 32: 8594-8603.
[5] PETRONI F, ROCKTSCHEL T, RIEDEL S, et al. Language models as knowledge bases?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 2463-2473.
[6] KBLER S, MCDONALD R,NIVRE J. Dependency parsing[C]//Proceedings of the Synthesis Lectures on Human Language, 2009.
[7] MILNE D N, WITTEN I H. Learning to link with wikipedia[C]//Proceedings of the CIKM 2008: 509-518.
[8] SURDEANU M, JI H. Overview of the English slot filling track at the tac2014 knowledge base population evaluation[C]//Proceedings of the Text Analysis Conference, 2014.
[9] WILLIAMS A, NANGIA N, BOWMAN S. A broad-coverage challenge corpus for sentence understanding through inference[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 1112-1122.
[10] SOCHER R, PERELYGIN A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1631-1642.
[11] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD: 100,000+ questions for machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 2383-2392.
[12] REDDY S, CHEN D, MANNING C D.CoQA: A conversational question answering challenge[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 249-266.
[13] 谭红叶, 李宣影, 刘蓓. 基于外部知识和层级篇章表示的阅读理解方法[J]. 中文信息学报, 2020, 34(4): 85-91.
[14] DONG Z, DONG Q. HowNet: A hybrid language and knowledge resource[C]//Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, IEEE, 2003: 820-824.
[15] ZHANG Z, HAN X, LIU Z, et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1441-1451.
[16] 朱杰, 李军辉. 基于 Transformer 的 AMR-to-Text 生成[J]. 中文信息学报, 2020, 34(10): 10-18.
[17] VASWANI A,SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[18] 曹阳, 曹存根, 王石. 基于 Transformer 网络的中文单字词检错方法研究[J]. 中文信息学报, 2021, 35(1): 135-142.
[19] HONNIBAL M, MONTANI I. spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing[J]. To Appear, 2017, 7(1): 411-420.
[20] Recurrent neural networks: Design and applications[M]. CRC Press, 1999.
[21] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[22] TAI K S,SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1556-1566.
[23] LAFFERTY J D. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning, 2001.
[24] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
[25] ZHANG Z, WU Y, ZHOU J, et al. Sg-net: Syntax-guided machine reading comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 9636-9643.
[26] WANG W, BI B, YAN M, et al.StructBERT: Incorporating language structures into pre-training for deep language understanding[C]//Proceedings of the International Conference on Learning Representations, 2019.
[27] LIU Y, OTT M, GOYAL N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[28] HERMANN K M,KOCISKY T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 1693-1701.
[29] SORDONI A, BACHMAN P, TRISCHLER A, et al. Iterative alternating neural attention for machine reading[J]. arXiv preprint arXiv:1606.02245, 2016.
[30] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD: 100,000+ questions for machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 2383-2392.
[31] CUI Y, CHEN Z, WEI S, et al. Attention-over-attention neural networks for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 593-602.
[32] SEO M, KEMBHAVI A, FARHADI A, et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv:1611.01603, 2016.
[33] YU A W, DOHAN D, LUONG M T, et al.QANet: Combining local convolution with global self-attention for reading comprehension[C]//Proceedings of the International Conference on Learning Representations, 2018.
[34] CHEN D, FISCH A, WESTON J, et al. Reading wikipedia to answer open-domain questions[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1870-1879.
[35] CAO Y, FANG M, TAO D. BAG: Bi-directional attention entity graph convolutional network for multi-hop reasoning question answering[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 357-362.
[36] LIN B Y, CHEN X, CHEN J, et al.KagNet: Knowledge-aware graph networks for commonsense reasoning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 2822-2832.
[37] HUANG L, LE BRAS R, BHAGAVATULA C, et al. Cosmos QA: Machine reading comprehension with contextual commonsense reasoning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 2391-2401.
[38] TALMOR A, HERZIG J, LOURIE N, et al. CommonsenseQA: A question answering challenge targeting commonsense knowledge[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4149-4158.
[39] SPEER R, CHIN J,HAVASI C. ConceptNet 5.5: An open multilingual graph of general knowledge[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017: 4444-4451.
[40] LV S, GUO D, XU J, et al. Graph-based reasoning over heterogeneous external knowledge for commonsense question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8449-8456.
[41] VELIKOVI P, CUCURULL G, CASANOVA A, et al. Graph attention networks[J]. arXiv preprint arXiv:1710.10903, 2017.
[42] BACCIANELLA S, ESULI A, SEBASTIANI F. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining[C]//Proceedings of the 7th International Conference on Language Resources and Evaluation, 2010.
[43] RICHARDSON M, BURGES C J C, RENSHAW E. MCTest: A challenge dataset for the open-domain machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 193-203.
[44] CHEN D, BOLTON J, MANNING C D. A thorough examination of the cnn/daily mail reading comprehension task[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2358-2367.
[45] DHINGRA B, LIU H, YANG Z, et al. Gated-attention readers for text comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1832-1846.
[46] WANG S, YU M, JIANG J, et al. A Co-matching model for multi-choice reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 746-751.
[47] WANG L, SUN M, ZHAO W, et al. Yuanfudao at SemEval-2018 Task 11: Three-way attention and relational knowledge for commonsense machine comprehension[C]//Proceedings of the 12th International Workshop on Semantic Evaluation, 2018: 758-762.
[48] RADFORD A, NARASIMHAN K,SALIMANS T, et al. Improving language understanding with unsupervised learning[J/OL]https://openci.com/research/language-unsupervised.[2018-06-11].
[49] ZHANG S, ZHAO H, WU Y, et al. DCMN+: Dual co-matching network for multi-choice reading comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 9563-9570.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61976233,62006255)
{{custom_fund}}