自然语言预训练模型知识增强方法综述

孙毅,裘杭萍,郑雨,张超然,郝超

PDF(5578 KB)
PDF(5578 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (7) : 10-29.
综述

自然语言预训练模型知识增强方法综述

  • 孙毅,裘杭萍,郑雨,张超然,郝超
作者信息 +

Knowledge Enhancement for Pre-trained Language Models: A Survey

  • SUN Yi, QIU Hangping, ZHENG Yu, ZHANG Chaoran, HAO Chao
Author information +
History +

摘要

将知识引入到依靠数据驱动的人工智能模型中是实现人机混合智能的一种重要途径。当前以BERT为代表的预训练模型在自然语言处理领域取得了显著的成功,但是由于预训练模型大多是在大规模非结构化的语料数据上训练出来的,因此可以通过引入外部知识在一定程度上弥补其在确定性和可解释性上的缺陷。该文针对预训练词嵌入和预训练上下文编码器两个预训练模型的发展阶段,分析了它们的特点和缺陷,阐述了知识增强的相关概念,提出了预训练词嵌入知识增强的分类方法,将其分为四类:词嵌入改造、层次化编解码过程、优化注意力和引入知识记忆。将预训练上下文编码器的知识增强方法分为任务特定和任务通用两大类,并根据引入知识的显隐性对其中任务通用的知识增强方法进行了进一步的细分。该文通过分析预训练模型知识增强方法的类型和特点,为实现人机混合的人工智能提供了模式和算法上的参考依据。

Abstract

Introducing knowledge into data-driven artificial intelligence models is an important way to realize human-machine hybrid intelligence. The current pre-trained language models represented by BERT have achieved remarkable success in the field of natural language processing. However, the pre-trained language models are trained on large scale unstructured corpus data, and it is necessary to introduce external knowledge to alleviate its defects in determinacy and interpretability to some extent. In this paper, the characteristics and limitations of two kinds of pre-trained language models, pre-trained word embeddings and pre-trained context encoders, are analyzed. The related concepts of knowledge enhancement are explained. Four types of knowledge enhancement methods of pre-trained word embeddings are summarized and analyzed, which are pre-trained word embeddings retrofitting, hierarchizing the process of encoding and decoding, attention mechanism optimization and knowledge memory introduction. The knowledge enhancement methods of pre-training context encoders are described from two perspectives: 1) task-specific and task-agnostic; 2) explicit knowledge and implicit knowledge. Through the summary and analysis of the knowledge enhancement methods of the pre-trained language model, the basic pattern and algorithm are provided for the human-machine hybrid artificial intelligence.

关键词

预训练语言模型 / 知识增强 / 预训练词嵌入 / 预训练上下文编码器

Key words

pre-trained language model / knowledge enhancement / pre-trained word embedding / pre-trained contextual encoder

引用本文

导出引用
孙毅,裘杭萍,郑雨,张超然,郝超. 自然语言预训练模型知识增强方法综述. 中文信息学报. 2021, 35(7): 10-29
SUN Yi, QIU Hangping, ZHENG Yu, ZHANG Chaoran, HAO Chao. Knowledge Enhancement for Pre-trained Language Models: A Survey. Journal of Chinese Information Processing. 2021, 35(7): 10-29

参考文献

[1] Qiu X, Sun T, Xu Y, et al. Pre-trained models for natural language processing: A survey[J]. Science China (Technological Sciences), 2020,0:1872-1897.
[2] Almeida F,Xexéo G. Word embeddings: A survey[J]. arXiv preprint arXiv:1901.09069, 2019.
[3] Camacho Collados J, Pilehvar M T. From word to sense embeddings: A survey on vector representations of meaning[J]. Journal of Artificial Intelligence Research, 2018, 63: 743-788.
[4] 李舟军,范宇,吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(03): 162-173.
[5] Goldberg Y. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57: 345-420.
[6] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3: 1137-1155.
[7] Mikolov T, Chen K,Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[8] Pennington J,Socher R, Manning C D. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1532-1543.
[9] Joulin A, Grave , Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 427-431.
[10] Mccann B, Bradbury J, Xiong Caiming, et al. Learned in translation: Contextualized word vectors[C]//Proceedings of the Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2017: 6294-6305.
[11] Peters M, Neumann M,Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 2227-2237.
[12] Devlin J, Chang Mingwei, Lee K, et al. BERT: Pre-training of deep bidirectional Transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 4171-4186.
[13] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference or Neural Information Processing Systems. Cambridge, MA: MIT Press, 2017: 5998-6008.
[14] Jawahar G, Sagot B, Seddah D. What does BERT learn about the structure of language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 3651-3657.
[15] Liu N F, Gardner M,Belinkov Y, et al. Linguistic knowledge and transferability of contextual representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 1073-1094.
[16] Wiedemann G, Remus S, Chawla A, et al. DoesBERT make any sense? Interpretable word sense disambiguation with contextualized embeddings[J]. arXiv preprint arXiv:1909. 10430, 2019.
[17] Ethayarajh K. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 55-65.
[18] Liu Y, Ott M, Goyal N, et al.RoBERTa: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv:1907. 11692, 2019.
[19] Petroni F, Rocktschel T, Riedel S, et al. Language models as knowledge bases?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 2463-2473.
[20] Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text Transformer[J]. arXiv preprint arXiv:1910.10683, 2019.
[21] Kwiatkowski T,Palomaki J, Redfield O, et al. Natural questions: A benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 453-466.
[22] Berant J, Chou A, Frostig R, et al. Semantic parsing on freebase from question-answer pairs[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2013: 1533-1544.
[23] Joshi M, Choi E, Weld D S, et al. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1601-1611.
[24] Kassner N, Schütze H. Negated LAMA: Birds cannot fly[J]. arXiv preprint arXiv:1911.03343, 2019.
[25] Poerner N, Waltinger U, Schütze H. BERT is not a knowledge base (yet): Factual knowledge vs. Name-based reasoning in unsupervised QA[J]. arXiv preprint arXiv:1911.03681, 2019.
[26] Niven T, Kao H-Y. Probing neural network comprehension of natural language arguments[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 4658-4664.
[27] Habernal I, Wachsmuth H, Gurevych I, et al. The argument reasoning comprehension task: Identification and reconstruction of implicit warrants[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1930-1940.
[28] Kovaleva O, Romanov A, Rogers A, et al. Revealing the dark secrets of BERT[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 4356-4365.
[29] Searle J R. Minds, brains, and programs[J]. Behavioral and Brain Sciences, 1980, 3(3): 417-424.
[30] Talmor A, Elazar Y, Goldberg Y, et al.oLMpics—On what language model pre-training captures[J]. arXiv preprint arXiv:1912.13283, 2019.
[31] 郁振华.波兰尼的默会认识论[J]. 自然辩证法研究, 2001, (08): 5-10.
[32] Polanyi M. Knowing and being[J]. Mind, 1961: 458-470.
[33] Polanyi M. The study of man[M]. Chicago: University of Chicago Press, 1959.
[34] Davies J, Studer R, Warren P. Semantic Web technologies: trends and research in ontology-based systems[M]. Hoboken, NJ: John Wiley and Sons, 2006.
[35] Ehrlinger L, W W. Towards a definition of knowledge graphs[C]//Proceedings of the Conference of SEMANTiCS. Vienna: Semantic Web Company, 2016.
[36] Miller G A. WordNet: A lexical database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[37] Navigli R, Ponzetto S P. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network[J]. Artificial Intelligence, 2012, 193: 217-250.
[38] Chen X, Liu Z, Sun M. A unified model for word sense representation and disambiguation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1025-1035.
[39] Chen T, Xu R, He Y, et al. A gloss composition and context clustering based distributed word sense representation model[J]. Entropy, 2015, 17(9): 6007-6024.
[40] Chen T, Xu R, He Y, et al. Improving distributed representation of word sense via wordnet gloss composition and context clustering[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Conference on Natural Language Processing. Stroudsburg: ACL, 2015: 15-20.
[41] Neelakantan A, Shankar J, Passos A, et al. Efficient non-parametric estimation of multiple embeddings per word in vector space[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1059-1069.
[42] Rothe S, Schütze H. AutoExtend: Extending word embeddings to embeddings for synsets and lexemes[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Conference on Natural Language Processing. Stroudsburg: ACL, 2015: 1793-1803.
[43] Pilehvar M T, Collier N. De-Conflated semantic representations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2016: 1680-1690.
[44] Faruqui M, Dodge J, Jauhar S K, et al. Retrofitting word vectors to semantic lexicons[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2015: 1606-1615.
[45] Mrkic' N, Vulic' I, Séaghdha D , et al. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints[J]. Transactions of the association for Computational Linguistics, 2017, 5: 309-324.
[46] Speer R, Lowry-Duda J. ConceptNet at SemEval-2017 Task 2: Extending word embeddings with multilingual relational knowledge[C]//Proceedings of the 11th Workshop on Semantic Evaluation. Stroudsburg: ACL, 2017: 85-89.
[47] Levy O, Goldberg Y, Dagan I. Improving distributional similarity with lessons learned from word embeddings[J]. Transactions of the Association for Computational Linguistics, 2015, 3: 211-225.
[48] Speer R, Chin J. An ensemble method to produce high-quality word embeddings[J].arXiv preprint arXiv:1604.01692, 2016.
[49] 孙茂松,陈新雄. 借重于人工知识库的词和义项的向量表示: 以HowNet为例[J]. 中文信息学报, 2016, 30(6): 1-6.
[50] Mancini M, Camacho-Collados J, Iacobacci I, et al. Embedding words and senses together via joint knowledge-enhanced training[C]//Proceedings of the 21st Conference on Computational Natural Language Learning. Stroudsburg: ACL, 2017: 100-111.
[51] Niu Y, Xie R, Liu Z, et al. Improved word representation learning with sememes[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 2049-2058.
[52] Zeng X, Yang C, Tu C, et al. Chinese liwc lexicon expansion via hierarchical classification of word embeddings with sememe attention[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2018: 5650-5657.
[53] Gu Y, Yan J, Zhu H, et al. Language modeling with sparse product of sememe experts[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 4642-4651.
[54] Chen Q, Zhu X, Ling Z, et al. Neural natural language inference models enhanced with external knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 2406-2417.
[55] Zou Y, Gui T, Zhang Q, et al. A lexicon-based supervised attention model for neural sentiment analysis[C]//Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: ACL, 2018: 868-877.
[56] Ghazvininejad M, Brockett C, Chang M W, et al. A knowledge-grounded neural conversation model[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2018: 5110-5117.
[57] Parthasarathi P, Pineau J. Extending neural generative conversational model using external knowledge sources[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 690-695.
[58] Carlson A,Betteridge J, Kisiel B, et al. Toward an architecture for never-ending language learning[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2010: 1306-1313.
[59] Dinan E, Roller S, Shuster K, et al. Wizard of Wikipedia: knowledge-powered conversational agents[C]//Proceedings of the International Conference on Learning Representations. San Diego, CA: ICLR, 2018.
[60] Chen D, Fisch A, Weston J, et al. Reading wikipedia to answer open-domain questions[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1870-1879.
[61] Mazare P E, Humeau S, Raison M, et al. Training millions of personalized dialogue agents[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 2775-2779.
[62] Meng C, Ren P, Chen Z, et al. RefNet: A reference-aware network for background based conversation[J]. arXiv preprint arXiv:1908.06449, 2019.
[63] Yang B, Mitchell T. Leveraging knowledge bases in LSTMs for improving machine reading[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1436-1446.
[64] Yang B, Yih W T, He Xiaodong, et al. Embedding entities and relations for learning and inference in knowledge bases[J]. arXiv preprint arXiv:1412.6575, 2014.
[65] Young T, Cambria E, Chaturvedi I, et al. Augmenting end-to-end dialogue systems with commonsense knowledge[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2018.
[66] Mihaylov T, Frank A. Knowledgeable reader: Enhancing cloze-Style reading comprehension with external commonsense knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 821-832.
[67] Miller A, Fisch A, Dodge J, et al. Key-value memory networks for directly reading documents[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2016: 1400-1409.
[68] Zhong P, Wang D, Miao C. Knowledge-enriched Transformer for emotion detection in textual conversations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 165-176.
[69] Zhou H, Young T, Huang M, et al. Commonsense knowledge aware conversation generation with graph attention[C]//Proceedings of the 27th International Conference on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2018: 4623-4629.
[70] 李强,黄辉,周沁,等. 模板驱动的神经机器翻译[J]. 计算机学报, 2019, 42(03): 116-131.
[71] Huang L, Sun C,Qiu X, et al. GlossBERT: BERT for word sense disambiguation with gloss knowledge[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3500-3505.
[72] Guu K, Lee K, Tung Z, et al. Realm: Retrieval-augmented language model pre-training[J]. arXiv preprint arXiv:2002.08909, 2020.
[73] Yang A, Wang Q, Liu J, et al. Enhancing pre-trained language representations with rich knowledge for machine reading comprehension[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2346-2357.
[74] Zhang Z, Han X, Liu Z, et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 1441-1451.
[75] Peters M E, Neumann M, Logan R, et al. Knowledge enhanced contextual word representations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 43-54.
[76] He B, Zhou D, Xiao J,et al. Integrating graph contextualized knowledge into pre-trained language models[J]. arXiv preprint arXiv:1912.00147, 2019.
[77] Levine Y, Lenz B, Dagan O, et al.SenseBERT: Driving some sense into BERT[J]. arXiv preprint arXiv:1908.05646, 2019.
[78] Lauscher A, Vulic' I, Ponti E M, et al. Specializing unsupervised pretraining models for word-level semantic similarity[J]. arXiv preprint arXiv:1909.02339, 2020.
[79] Liu W J, Zhou P, Zhao Z, et al. K-BERT: Enabling language representation with knowledge graph[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI. 2020: 2901-2908.
[80] Sun Y, Wang S, Li Y, et al. ERNIE: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv:1904.09223, 2019.
[81] Cui Y M, Che W X, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J].arXiv preprint arXiv:1906.08101, 2019.
[82] Xiong W, Du J, Wang W, et al. Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model[J]. arXiv preprint arXiv:1912.09637, 2019.
[83] Yamada I, Asai A, Shindo H, et al. LUKE: Deep contextualized entity representations with entity-aware self-attention[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6442-6454.
[84] Zhang Z, Wu Y, Zhao H, et al. Semantics-aware BERT for language understanding[J]. arXiv preprint arXiv:1909.02209, 2019.
[85] Ke P, Ji H, Liu S, et al. SentiLR: Linguistic knowledge enhanced language representation for sentiment analysis[J]. arXiv preprint arXiv:1911.02493, 2019.
[86] Tian H, Gao C, Xiao X, et al. SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis[J]. arXiv preprint arXiv:2005.05635, 2020.
[87] Wang R, Tang D, Duan N, et al. K-adapter: Infusing knowledge into pre-trained models with adapters[J]. arXiv preprint arXiv:2002.01808, 2020.
[88] Wang X, Gao T, Zhu Z, et al. KEPLER: A unified model for knowledge embedding and pre-trained language representation[J]. arXiv preprint arXiv: 1911.06136, 2020.
[89] Sun T,Shao Y, Qiu X, et al. CoLAKE: Contextualized language and knowledge embedding[C]//Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg: ACL, 2020: 3660-3670.
[90] Clark K, Luong M T, Le Q V, et al. ELECTRA: Pre-training text encoders as discriminators rather than generators[C]//Proceedings of the International Conference on Learning Representations. San Diego, CA: ICLR, 2019.
[91] Lan Z, Chen M, Goodman S, et al. ALBERT: A Lite BERT for self-supervised learning of language representations[C]//Proceedings of the International Conference on Learning Representations. San Diego, CA: ICLR, 2020.
[92] Bordes A, Usunier N, Garcia-Durán A, et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference or Neural Information Processing Systems. Cambridge, MA: MIT Press, 2013: 2787-2795.
[93] Ferragina P, Scaiella U. Tagme: on-the-fly annotation of short text fragments (by wikipediaentities)[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York: ACM, 2010: 1625-1628.
[94] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference or Neural Information Processing Systems. Cambridge, MA: MIT Press, 2013: 3111-3119.
[95] Balazevic I, Allen C, Hospedales T. TuckER: Tensor factorization for knowledge graph completion[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 5188-5197.
[96] Spitkovsky V I, Chang A X. A Cross-lingual dictionary for English Wikipedia concepts[C]//Proceedings of the 8th International Conference on Language Resources and Evaluation. Stroudsburg: ACL, 2012: 3168-3175.
[97] Hoffart J, Yosef M A, Bordino I, et al. Robust disambiguation of named entities in text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2011: 782-792.
[98] Vulic' I. Injecting lexical contrast into word vectors by guiding vector space specialisation[C]//Proceedings of the 3rd Workshop on Representation Learning for NLP. Stroudsburg: ACL, 2018: 137-143.
[99] Sun Y, Wang S, Li Y, et al. ERNIE 2.0: A continual pre-training framework for language understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI, 2020, 34(5):8968-8975.
[100] Che W, Li Z, Liu T. LTP: A Chinese language technology platform[C]//Proceedings of the COLING. Stroudsburg: ACL, 2010: 13-16.
[101] Wei J, Ren X, Li X, et al. NEZHA: Neural contextualized representation forchinese language understanding[J]. arXiv preprint arXiv:1909.00204, 2019.
[102] Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the International Conference on Machine Learning. New York: ACM, 2019: 2790-2799.
[103] Mccloskey M, Cohen N J. Catastrophic interference in connectionist networks: the sequential learning problem[J]. The Psychology of Learning and Motivation, 1989, 24: 109-165.
[104] Gururangan S, Marasovic' A, Swayamdipta S, et al. Dont stop pretraining: Adapt language models to domains and tasks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 8342-8360.
[105] Lee J, Yoon W, Kim S, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[106] Dai Z, Yang Z, Yang Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2978-2988.
[107] Burtsev M S, Sapunov G V. Memory Transformer[J]. arXiv preprint arXiv:2006.11527, 2020.

基金

国防科技创新特区计划项目(1916311LZ001003);装备发展部基金项目(6141B08010101)
PDF(5578 KB)

3375

Accesses

0

Citation

Detail

段落导航
相关文章

/