基于多粒度交互推理的答案选择方法研究

金志凌,朱鸿雨,苏玉兰,唐竑轩,洪宇,张民

PDF(1592 KB)
PDF(1592 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (1) : 104-111,120.
问答与对话

基于多粒度交互推理的答案选择方法研究

  • 金志凌,朱鸿雨,苏玉兰,唐竑轩,洪宇,张民
作者信息 +

Multi-granular Interactive Inference Based Answer Selection

  • JIN Zhiling,ZHU Hongyu,SU Yulan,TANG Hongxuan,HONG Yu,ZHANG Min
Author information +
History +

摘要

预训练语言模型已经广泛应用于不同自然语言处理任务,其蕴含的自注意力机制能够在“文本对子”之上形成统一的语义编码表示,从而使BERT模型的输入结构和运算模式理论上适用于处理“目标问题和候选答案”样本。然而,直接应用BERT等语言模型将面临两种局限性: ①BERT并不侧重词块、短语和子句的独立语义信息表示,使得文本在匹配过程中往往错失不同颗粒度语义相关性的感知;②BERT中的多头注意力机制不能在不同粒度的语义结构之间计算交互强度(相关性)。针对上述问题,该文提出一种基于BERT的多粒度交互推理网络,该方法将问题与候选答案的语言信息进行多粒度语义编码,丰富了句子间的语义信息与交互性。此外,该文提出句子级的编码损失策略,借以提高编码过程对关键子句的加权能力。在WPQA数据集上的实验结果显示,该方法有效提高了非事实性问题的答案选择性能。

Abstract

The pre-trained language models, e.g. BERT, have been widely used in many natural language processing tasks for their unified semantic representation from “text pair” with the self-attention mechanism. However, there are two limitations in directly using BERT in answer selection task: 1) BERT fails to perceive that of the independent semantic representation of word chunks, phrases and clauses, so that the matching process tends to lack the information of different granularities; 2) The multi-head attention mechanism in BERT cannot calculate the correlation between semantic structures of different granularities. To address these issues, we propose a BERT based multi-granularity interactive inference network. This method encodes the language information of questions and answers through multi-granularity convolution to construct high-order interaction tensor, which enriches the semantic information and the interactivity of questions and answers. In addition, we propose a sentence-level loss to the emphasize key sentences in paragraph-level answers. Experiments on WPQA dataset show that the method proposed in this paper effectively improves the performance of answer selection for non-factoid questions.

关键词

答案选择 / 预训练模型 / 多粒度编码

Key words

answer selection / pre-trained model / multi-granularity encoding

引用本文

导出引用
金志凌,朱鸿雨,苏玉兰,唐竑轩,洪宇,张民. 基于多粒度交互推理的答案选择方法研究. 中文信息学报. 2023, 37(1): 104-111,120
JIN Zhiling,ZHU Hongyu,SU Yulan,TANG Hongxuan,HONG Yu,ZHANG Min. Multi-granular Interactive Inference Based Answer Selection. Journal of Chinese Information Processing. 2023, 37(1): 104-111,120

参考文献

[1] SANTOS C,TAN M,XIANG B,et al. Attentive pooling networks[J]. arXiv preprint arXiv: 1602.03609,2016.
[2] KIM S,KANG I,KWAK N. Semantic sentence matching with densely-connected recurrent and co-attentive information[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 6586-6593.
[3] DEVLIN J,CHANG M W,LEE K,et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805,2018.
[4] GONG Y,LUO H,ZHANG J. Natural language inference over interaction space[J]. arXiv preprint arXiv: 1709.04348,2017.
[5] COHEN D,YANG L,CROFT W B. Wikipassageqa: A benchmark collection for research on non-factoid answer passage retrieval[C]//Proceedings of 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018: 1165-1168.
[6] SALTON G,WONG A,YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM,1975,18(11): 613-620.
[7] Robertson S E,Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of the SIGIR Springer,London,1994: 232-241.
[8] HUANG P S,HE X,GAO J,et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013: 2333-2338.
[9] SHEN Y,HE X,GAO J,et al. A latent semantic model with convolutional-pooling structure for information retrieval[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014: 101-110.
[10] PALANGI H,DENG L,SHEN Y,et al. Semantic modelling with long-short-term memory for information retrieval[J]. arXiv preprint arXiv: 1412.6629,2014.
[11] PANG L,LAN Y,GUO J,et al. Text matching as image recognition[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence, 2016.
[12] WANG Z,HAMZA W,FLORIAN R. Bilateral multi-perspective matching for natural language sentences[J]. arXiv preprint arXiv: 1702.03814,2017.
[13] RCKL A,MOOSAVI N S,GUREVYCH I. Coala: A neural coverage-based approach for long answer selection with small data[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 6932-6939.
[14] NOGUEIRA R,CHO K. Passage re-ranking with BERT[J]. preprint arXiv: 1901.04085,2019.
[15] MASS Y,ROITMAN H,ERERA S,et al. A study of BERT for non-factoid question-answering under passage length constraints[J]. arXiv preprint arXiv: 1908.06780,2019.
[16] Gary S,Vu T,Moschitti A. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection[J]. arXiv preprint arXiv: 1911.04118,2019.
[17] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 5998-6008.
[18] NAIR V,HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning, 2010: 807-814.
[19] HAN J,MORAGA C. The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//Proceedings of the International Workshop on Artificial Neural Networks. Springer,Berlin,Heidelberg,1995: 195-201.
[20] RAMOS J. Using TF-IDF to determine word relevance in document queries[C]//Proceedings of the 1st Instructional Conference on Machine Learning, 2003,242: 133-142.
[21] TAN M,DOS SANTOS C,XIANG B,et al. Improved representation learning for question answer matching[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 464-473.
[22] RCKL A,GUREVYCH I. Representation learning for answer selection with LSTM-based importance weighting[C]//Proceedings of the IWCS International Conference on Computational Semantics—Short papers, 2017.
[23] PARIKH A P,TCKSTRM O,DAS D,et al. A decomposable attention model for natural language inference[J]. arXiv preprint arXiv: 1606.01933,2016.
[24] HAN H,CHOI S,PARK H,et al. MICRON: Multigranular interaction for contextualizing representation in non-factoid question answering[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 5892-5897.
[25] XU P,MA X,NALLAPATI R,et al. Passage ranking with weak supervsion[J]. arXiv preprint arXiv: 1905.05910,2019.
[26] DROR R,BAUMER G,SHLOMOV S,et al. The hitchhiker’s guide to testing statistical significance in natural language processing[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1383-1392.

基金

科技部重点研发项目(2017YFB1002104)
PDF(1592 KB)

814

Accesses

0

Citation

Detail

段落导航
相关文章

/