熊凯,杜理,丁效,刘挺,秦兵,付博. 面向文本推理的知识增强预训练语言模型[J]. 中文信息学报, 2022, 36(12): 27-35.
XIONG Kai , DU Li, DING Xiao , LIU Ting, QIN Bing, FU Bo. Knowledge Enhanced Pre-trained Language Model for Textual Inference. , 2022, 36(12): 27-35.
Knowledge Enhanced Pre-trained Language Model for Textual Inference
XIONG Kai 1, DU Li1, DING Xiao 1, LIU Ting1, QIN Bing1, FU Bo2
1.Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, Heilongjiang 150006, China; 2.Fundamental Technology Center, China Construction Bank Financial Technology Co., Ltd., Beijing 100032, China
Abstract:Although the pre-trained language model has achieved high performance on a large number of natural language processing tasks, the knowledge contained in some pre-trained language models is difficult to support more efficient textual inference. Focused on using a wealth of knowledge to enhance the pre-trained language model for textual inference, we propose a framework for textual inference to integrate the knowledge of graphs and graph structures into the pre-trained language model. Experiments on two subtasks of textual inference indicate our framework outperforms a series of baseline methods.
[1] Devlin J,Chang M W,Lee K,et al.BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805,2018. [2] Yang A,Wang Q,Liu J,et al. Enhancing pre-trained language representations with rich knowledge for machine reading comprehension[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 2346-2357. [3] Li Z,Ding X,Liu T. Story ending prediction by transferable bert[J]. arXiv preprint arXiv: 1905.07504,2019. [4] Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,2017: 5998-6008. [5] Zhong W,Tang D,Duan N,et al. Improving question answering by commonsense-based pre-training[C]//CCF International Conference on Natural Language Processing and Chinese Computing. Springer,Cham,2019: 16-28. [6] Sun Y,Wang S,Li Y,et al. Ernie: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223,2019. [7] Zhang S,Liu X,Liu J,et al. Record: Bridging the gap between human and machine commonsense reading comprehension[J]. arXiv preprint arXiv: 1810.12885,2018. [8] Clark K,Khandelwal U,Levy O,et al. What does bert look at? an analysis of bert's attention[J]. arXiv preprint arXiv: 1906.04341,2019. [9] Coenen A,Reif E,Yuan A,et al. Visualizing and measuring the geometry of BERT[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 8594-8603. [10] Bauer L,Wang Y,Bansal M. Commonsense for generative multi-hop question answering tasks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2018: 4220-4230. [11] Mihaylov T,Frank A. Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 821-832. [12] Le Q,Mikolov T. Distributed representations of sentences and documents[C]//Proceedings of the International Conference on Machine Learning. PMLR,2014: 1188-1196. [13] Pennington J,Socher R,Manning C D. GloVe: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1532-1543. [14] Peters M,Neumann M,Iyyer M,et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018: 2227-2237. [15] Liu Y,Ott M,Goyal N,et al. RoBERTA: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv: 1907.11692,2019. [16] Yang Z,Dai Z,Yang Y,et al. XLNet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems,2019: 1-11. [17] Brown T B,Mann B,Ryder N,et al. Language models are few-shot learners[J]. arXiv preprint arXiv: 2005.14165,2020. [18] Zhang Z,Han X,Liu Z,et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 1441-1451. [19] Bordes A,Usunier N,Garcia Duran A,et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013: 2787-2795. [20] Yamada I,Asai A,Shindo H,et al. LUKE: Deep contextualized entity representations with entity-aware self-attention[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 6442-6454. [21] Huang P S,He X,Gao J,et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management,2013: 2333-2338. [22] Koncel-Kedziorski R,Bekal D,Luan Y,et al. Text generation from knowledge graphs with graph transformers[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2019: 2284-2293. [23] Yao L,Mao C,Luo Y. KG-BERT: BERT for knowledge graph completion[J]. arXiv preprint arXiv: 1909.03193,2019. [24] Roemmele M,Bejan C A,Gordon A S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning[C]//Proceedings of AAAI Spring Symposium Series,2011. [25] Hermann K M,Kocisky T,Grefenstette E,et al. Teaching machines to read and comprehend[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems,2015,28: 1693-1701. [26] Rajpurkar P,Zhang J,Lopyrev K,et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv: 1606.05250,2016. [27] Rajpurkar P,Jia R,Liang P. Know what you don't know: Unanswerable questions for SQuAD[J]. arXiv preprint arXiv: 1806.03822,2018. [28] Bajaj P,Campos D,Craswell N,et al. Ms marco: A human generated machine reading comprehension dataset[J]. arXiv preprint arXiv: 1611.09268,2016. [29] Wang S,Jiang J. Machine comprehension using match-lstm and answer pointer[J]. arXiv preprint arXiv: 1608.07905,2016. [30] Cui Y,Chen Z,Wei S,et al. Attention-over-attention neural networks for reading comprehension[J]. arXiv preprint arXiv: 1607.04423,2016. [31] Yu A W,Dohan D,Luong M T,et al. QANet: Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint arXiv: 1804.09541,2018. [32] Granroth-Wilding M,Clark S. What happens next? event prediction using a compositional neural network model[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2016. [33] Wang Z,Zhang Y,Chang C Y. Integrating order information and event relation for script event prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2017: 57-67. [34] Pichotta K,Mooney R J. Using sentence-level LSTM language models for script inference[J]. arXiv preprint arXiv: 1604.02993,2016. [35] Li Z,Ding X,Liu T. Constructing narrative event evolutionary graph for script event prediction[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence,2018: 4201-4207. [36] Lv S,Qian W,Huang L,et al. SAM-Net: Integrating event-level and chain-level attentions to predict what happens next[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(01): 6802-6809. [37] Miller G A. WordNet: A lexical database for English[J]. Communications of the ACM,1995,38(11): 39-41. [38] Yang B,Mitchell T. Leveraging knowledge bases in LSTMs for improving machine reading[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017: 1436-1446. [39] Seo M,Kembhavi A,Farhadi A,et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv: 1611.01603,2016. [40] Loper E,Bird S. NLTK: The Natural Language Toolkit[C]//Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics,2002: 63-70. [41] Yang B,Yih W,He X,et al. Embedding entities and relations for learning and inference in knowledge bases[J]. arXiv preprint arXiv: 1412.6575,2014. [42] Robertson S E,Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of SIGIR. Springer,London,1994: 232-241. [43] Clark C,Gardner M. Simple and effective multi-paragraph reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 845-855. [44] Liu X,Shen Y,Duh K,et al. Stochastic answer networks for machine reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 1694-1704. [45] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781,2013.