BERT通过遮掩语言模型、下一句预测等自监督学习任务学习通用语言规律,在自然语言理解任务中取得了良好效果。但BERT的下一句预测任务不能直接建模句子的语义匹配关系,且随机遮掩策略也不能高效处理句子的关键内容。针对上述问题,该文提出基于动态词遮掩的预训练模型: 基于预训练模型获得句子的向量表示,并通过近似语义计算获取大规模“句子对”预训练数据,最后遮掩重要字词训练遮掩语言模型。在4个句子匹配数据集上的实验表明,使用该文提出的预训练方法,RBT3和BERT base的效果都有一定提升,平均准确率分别提升1.03%和0.61%。
Abstract
Pre-trained model such as BERT achieves good results in natural language understanding tasks via random masking strategy and next sentence prediction task. To capture the semantic matching relationship between sentences, this paper proposes a pre-trained model based on dynamic word masking. A large-scale sentence-pairs are obtained through sentence embeddings, and then the important words are masked to train a new kind of masked language model. Experimental on four datasets show that, the performance of RBT3 and BERT base are improved by the proposed method by 1.03% and 0.61%, respectively, according to average accuracy.
关键词
句子匹配 /
预训练模型 /
自然语言理解
{{custom_keyword}} /
Key words
sentence matching /
pre-trained model /
natural language understanding
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Lan W, Xu W. Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3890-3902.
[2] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL].[2018-10-12].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
[3] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019: 4171-4186.
[4] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 30: 5998-6008.
[5] Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, 2013: 2333-2338.
[6] Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence, 2016: 2786-2792.
[7] Parikh A, Tckstrm O, Das D, et al. A decomposable attention model for natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 2249-2255.
[8] Chen Q, Zhu X, Ling Z H, et al. Enhanced LSTM for natural language inference[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1657-1668.
[9] Joshi M, Chen D, Liu Y, et al. SpanBERT: Improving pre-training by representing and predicting spans[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 64-77.
[10] Liu Y, Ott M, Goyal N, et al. RoBERTA: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[11] Lan Z, Chen M, Goodman S, et al. ALBERT: A Lite BERT for self-supervised learning of language representations[C]//Proceedings of the 8th International Conference on Learning Representations, 2020.
[12] Gururangan S, Marasovic' A, Swayamdipta S, et al. Don‘t stop pretraining: Adapt language models to domains and tasks[J]. arXiv preprint arXiv:2004.10964, 2020.
[13] Glass M, Gliozzo A, Chakravarti R, et al. Span selection pre-training for question answering[J]. arXiv preprint arXiv:1909.04120, 2019.
[14] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. arXiv preprint arXiv:1906.08101, 2019.
[15] Xu L, Zhang X, Li L, et al. CLUE: A Chinese language understanding evaluation benchmark[J]. arXiv preprint arXiv:2004.05986, 2020.
[16] Liu X, Chen Q, Deng C, et al. LCQMC: A large-scale Chinese question matching corpus[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 1952-1962.
[17] Chen J, Chen Q, Liu X, et al. The BQ corpus: A large-scale domain-specific Chinese corpus for sentence semantic equivalence identification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 4946-4951.
[18] Zhang N, Jia Q, Yin K, et al. Conceptualized representation learning for Chinese biomedical text mining[J]. arXiv preprint arXiv:2008.10813, 2020.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划(2017YFB1002101);国家自然科学基金(61533018, U1936207, 61976211, 61702512)
{{custom_fund}}