面向文本推理的知识增强预训练语言模型

熊凯, 杜理, 丁效, 刘挺, 秦兵, 付博

PDF(2149 KB)
PDF(2149 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (12) : 27-35.
语言分析与计算

面向文本推理的知识增强预训练语言模型

  • 熊凯1,杜理1,丁效1,刘挺1,秦兵1,付博2
作者信息 +

Knowledge Enhanced Pre-trained Language Model for Textual Inference

  • XIONG Kai 1, DU Li1, DING Xiao 1, LIU Ting1, QIN Bing1, FU Bo2
Author information +
History +

摘要

该文聚焦于利用丰富的知识对预训练语言模型进行增强以进行文本推理。预训练语言模型虽然在大量的自然语言处理任务上达到了很高的性能表现,具有很强的语义理解能力,但是大部分预训练语言模型自身包含的知识很难支撑其进行更高效的文本推理。为此,该文提出了一个知识增强的预训练语言模型进行文本推理的框架,使得图以及图结构的知识能够更深入地与预训练语言模型融合。在文本推理的两个子任务上,该文框架的性能超过了一系列的基线方法,实验结果和分析验证了模型的有效性。

Abstract

Although the pre-trained language model has achieved high performance on a large number of natural language processing tasks, the knowledge contained in some pre-trained language models is difficult to support more efficient textual inference. Focused on using a wealth of knowledge to enhance the pre-trained language model for textual inference, we propose a framework for textual inference to integrate the knowledge of graphs and graph structures into the pre-trained language model. Experiments on two subtasks of textual inference indicate our framework outperforms a series of baseline methods.

关键词

文本推理 / 事理图谱 / 知识图谱 / 预训练语言模型

Key words

textual inference / eventic graph / knowledge graph / pre-trained language model

引用本文

导出引用
熊凯, 杜理, 丁效, 刘挺, 秦兵, 付博. 面向文本推理的知识增强预训练语言模型. 中文信息学报. 2022, 36(12): 27-35
XIONG Kai , DU Li, DING Xiao , LIU Ting, QIN Bing, FU Bo. Knowledge Enhanced Pre-trained Language Model for Textual Inference. Journal of Chinese Information Processing. 2022, 36(12): 27-35

参考文献

[1] Devlin J,Chang M W,Lee K,et al.BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805,2018.
[2] Yang A,Wang Q,Liu J,et al. Enhancing pre-trained language representations with rich knowledge for machine reading comprehension[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 2346-2357.
[3] Li Z,Ding X,Liu T. Story ending prediction by transferable bert[J]. arXiv preprint arXiv: 1905.07504,2019.
[4] Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,2017: 5998-6008.
[5] Zhong W,Tang D,Duan N,et al. Improving question answering by commonsense-based pre-training[C]//CCF International Conference on Natural Language Processing and Chinese Computing. Springer,Cham,2019: 16-28.
[6] Sun Y,Wang S,Li Y,et al. Ernie: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223,2019.
[7] Zhang S,Liu X,Liu J,et al. Record: Bridging the gap between human and machine commonsense reading comprehension[J]. arXiv preprint arXiv: 1810.12885,2018.
[8] Clark K,Khandelwal U,Levy O,et al. What does bert look at? an analysis of bert's attention[J]. arXiv preprint arXiv: 1906.04341,2019.
[9] Coenen A,Reif E,Yuan A,et al. Visualizing and measuring the geometry of BERT[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 8594-8603.
[10] Bauer L,Wang Y,Bansal M. Commonsense for generative multi-hop question answering tasks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2018: 4220-4230.
[11] Mihaylov T,Frank A. Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 821-832.
[12] Le Q,Mikolov T. Distributed representations of sentences and documents[C]//Proceedings of the International Conference on Machine Learning. PMLR,2014: 1188-1196.
[13] Pennington J,Socher R,Manning C D. GloVe: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1532-1543.
[14] Peters M,Neumann M,Iyyer M,et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018: 2227-2237.
[15] Liu Y,Ott M,Goyal N,et al. RoBERTA: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv: 1907.11692,2019.
[16] Yang Z,Dai Z,Yang Y,et al. XLNet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems,2019: 1-11.
[17] Brown T B,Mann B,Ryder N,et al. Language models are few-shot learners[J]. arXiv preprint arXiv: 2005.14165,2020.
[18] Zhang Z,Han X,Liu Z,et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 1441-1451.
[19] Bordes A,Usunier N,Garcia Duran A,et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013: 2787-2795.
[20] Yamada I,Asai A,Shindo H,et al. LUKE: Deep contextualized entity representations with entity-aware self-attention[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 6442-6454.
[21] Huang P S,He X,Gao J,et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management,2013: 2333-2338.
[22] Koncel-Kedziorski R,Bekal D,Luan Y,et al. Text generation from knowledge graphs with graph transformers[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2019: 2284-2293.
[23] Yao L,Mao C,Luo Y. KG-BERT: BERT for knowledge graph completion[J]. arXiv preprint arXiv: 1909.03193,2019.
[24] Roemmele M,Bejan C A,Gordon A S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning[C]//Proceedings of AAAI Spring Symposium Series,2011.
[25] Hermann K M,Kocisky T,Grefenstette E,et al. Teaching machines to read and comprehend[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems,2015,28: 1693-1701.
[26] Rajpurkar P,Zhang J,Lopyrev K,et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv: 1606.05250,2016.
[27] Rajpurkar P,Jia R,Liang P. Know what you don't know: Unanswerable questions for SQuAD[J]. arXiv preprint arXiv: 1806.03822,2018.
[28] Bajaj P,Campos D,Craswell N,et al. Ms marco: A human generated machine reading comprehension dataset[J]. arXiv preprint arXiv: 1611.09268,2016.
[29] Wang S,Jiang J. Machine comprehension using match-lstm and answer pointer[J]. arXiv preprint arXiv: 1608.07905,2016.
[30] Cui Y,Chen Z,Wei S,et al. Attention-over-attention neural networks for reading comprehension[J]. arXiv preprint arXiv: 1607.04423,2016.
[31] Yu A W,Dohan D,Luong M T,et al. QANet: Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint arXiv: 1804.09541,2018.
[32] Granroth-Wilding M,Clark S. What happens next? event prediction using a compositional neural network model[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2016.
[33] Wang Z,Zhang Y,Chang C Y. Integrating order information and event relation for script event prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2017: 57-67.
[34] Pichotta K,Mooney R J. Using sentence-level LSTM language models for script inference[J]. arXiv preprint arXiv: 1604.02993,2016.
[35] Li Z,Ding X,Liu T. Constructing narrative event evolutionary graph for script event prediction[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence,2018: 4201-4207.
[36] Lv S,Qian W,Huang L,et al. SAM-Net: Integrating event-level and chain-level attentions to predict what happens next[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(01): 6802-6809.
[37] Miller G A. WordNet: A lexical database for English[J]. Communications of the ACM,1995,38(11): 39-41.
[38] Yang B,Mitchell T. Leveraging knowledge bases in LSTMs for improving machine reading[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017: 1436-1446.
[39] Seo M,Kembhavi A,Farhadi A,et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv: 1611.01603,2016.
[40] Loper E,Bird S. NLTK: The Natural Language Toolkit[C]//Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics,2002: 63-70.
[41] Yang B,Yih W,He X,et al. Embedding entities and relations for learning and inference in knowledge bases[J]. arXiv preprint arXiv: 1412.6575,2014.
[42] Robertson S E,Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of SIGIR. Springer,London,1994: 232-241.
[43] Clark C,Gardner M. Simple and effective multi-paragraph reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 845-855.
[44] Liu X,Shen Y,Duh K,et al. Stochastic answer networks for machine reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 1694-1704.
[45] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781,2013.

基金

科技创新2030——“新一代人工智能”重大项目(2018AAA0101901);国家自然科学基金(62176079,61976073)
PDF(2149 KB)

3045

Accesses

0

Citation

Detail

段落导航
相关文章

/