|
|
Knowledge Enhanced Pre-trained Language Model for Textual Inference |
XIONG Kai 1, DU Li1, DING Xiao 1, LIU Ting1, QIN Bing1, FU Bo2 |
1.Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, Heilongjiang 150006, China; 2.Fundamental Technology Center, China Construction Bank Financial Technology Co., Ltd., Beijing 100032, China |
|
|
Abstract Although the pre-trained language model has achieved high performance on a large number of natural language processing tasks, the knowledge contained in some pre-trained language models is difficult to support more efficient textual inference. Focused on using a wealth of knowledge to enhance the pre-trained language model for textual inference, we propose a framework for textual inference to integrate the knowledge of graphs and graph structures into the pre-trained language model. Experiments on two subtasks of textual inference indicate our framework outperforms a series of baseline methods.
|
Received: 28 October 2021
|
|
|
|
|
[1] Devlin J,Chang M W,Lee K,et al.BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805,2018. [2] Yang A,Wang Q,Liu J,et al. Enhancing pre-trained language representations with rich knowledge for machine reading comprehension[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 2346-2357. [3] Li Z,Ding X,Liu T. Story ending prediction by transferable bert[J]. arXiv preprint arXiv: 1905.07504,2019. [4] Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,2017: 5998-6008. [5] Zhong W,Tang D,Duan N,et al. Improving question answering by commonsense-based pre-training[C]//CCF International Conference on Natural Language Processing and Chinese Computing. Springer,Cham,2019: 16-28. [6] Sun Y,Wang S,Li Y,et al. Ernie: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223,2019. [7] Zhang S,Liu X,Liu J,et al. Record: Bridging the gap between human and machine commonsense reading comprehension[J]. arXiv preprint arXiv: 1810.12885,2018. [8] Clark K,Khandelwal U,Levy O,et al. What does bert look at? an analysis of bert's attention[J]. arXiv preprint arXiv: 1906.04341,2019. [9] Coenen A,Reif E,Yuan A,et al. Visualizing and measuring the geometry of BERT[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 8594-8603. [10] Bauer L,Wang Y,Bansal M. Commonsense for generative multi-hop question answering tasks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2018: 4220-4230. [11] Mihaylov T,Frank A. Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 821-832. [12] Le Q,Mikolov T. Distributed representations of sentences and documents[C]//Proceedings of the International Conference on Machine Learning. PMLR,2014: 1188-1196. [13] Pennington J,Socher R,Manning C D. GloVe: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1532-1543. [14] Peters M,Neumann M,Iyyer M,et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018: 2227-2237. [15] Liu Y,Ott M,Goyal N,et al. RoBERTA: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv: 1907.11692,2019. [16] Yang Z,Dai Z,Yang Y,et al. XLNet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems,2019: 1-11. [17] Brown T B,Mann B,Ryder N,et al. Language models are few-shot learners[J]. arXiv preprint arXiv: 2005.14165,2020. [18] Zhang Z,Han X,Liu Z,et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 1441-1451. [19] Bordes A,Usunier N,Garcia Duran A,et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013: 2787-2795. [20] Yamada I,Asai A,Shindo H,et al. LUKE: Deep contextualized entity representations with entity-aware self-attention[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 6442-6454. [21] Huang P S,He X,Gao J,et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management,2013: 2333-2338. [22] Koncel-Kedziorski R,Bekal D,Luan Y,et al. Text generation from knowledge graphs with graph transformers[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2019: 2284-2293. [23] Yao L,Mao C,Luo Y. KG-BERT: BERT for knowledge graph completion[J]. arXiv preprint arXiv: 1909.03193,2019. [24] Roemmele M,Bejan C A,Gordon A S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning[C]//Proceedings of AAAI Spring Symposium Series,2011. [25] Hermann K M,Kocisky T,Grefenstette E,et al. Teaching machines to read and comprehend[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems,2015,28: 1693-1701. [26] Rajpurkar P,Zhang J,Lopyrev K,et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv: 1606.05250,2016. [27] Rajpurkar P,Jia R,Liang P. Know what you don't know: Unanswerable questions for SQuAD[J]. arXiv preprint arXiv: 1806.03822,2018. [28] Bajaj P,Campos D,Craswell N,et al. Ms marco: A human generated machine reading comprehension dataset[J]. arXiv preprint arXiv: 1611.09268,2016. [29] Wang S,Jiang J. Machine comprehension using match-lstm and answer pointer[J]. arXiv preprint arXiv: 1608.07905,2016. [30] Cui Y,Chen Z,Wei S,et al. Attention-over-attention neural networks for reading comprehension[J]. arXiv preprint arXiv: 1607.04423,2016. [31] Yu A W,Dohan D,Luong M T,et al. QANet: Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint arXiv: 1804.09541,2018. [32] Granroth-Wilding M,Clark S. What happens next? event prediction using a compositional neural network model[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2016. [33] Wang Z,Zhang Y,Chang C Y. Integrating order information and event relation for script event prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2017: 57-67. [34] Pichotta K,Mooney R J. Using sentence-level LSTM language models for script inference[J]. arXiv preprint arXiv: 1604.02993,2016. [35] Li Z,Ding X,Liu T. Constructing narrative event evolutionary graph for script event prediction[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence,2018: 4201-4207. [36] Lv S,Qian W,Huang L,et al. SAM-Net: Integrating event-level and chain-level attentions to predict what happens next[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(01): 6802-6809. [37] Miller G A. WordNet: A lexical database for English[J]. Communications of the ACM,1995,38(11): 39-41. [38] Yang B,Mitchell T. Leveraging knowledge bases in LSTMs for improving machine reading[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017: 1436-1446. [39] Seo M,Kembhavi A,Farhadi A,et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv: 1611.01603,2016. [40] Loper E,Bird S. NLTK: The Natural Language Toolkit[C]//Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics,2002: 63-70. [41] Yang B,Yih W,He X,et al. Embedding entities and relations for learning and inference in knowledge bases[J]. arXiv preprint arXiv: 1412.6575,2014. [42] Robertson S E,Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of SIGIR. Springer,London,1994: 232-241. [43] Clark C,Gardner M. Simple and effective multi-paragraph reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 845-855. [44] Liu X,Shen Y,Duh K,et al. Stochastic answer networks for machine reading comprehension[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 1694-1704. [45] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781,2013. |
|
|
|