基于RoBERTa的语义图特征融合多跳问题生成模型

胡婕; 高珊; 孙洁; 刘梦赤

doi:10.3969/j.issn.1003-0077.2026.05.013

基于RoBERTa的语义图特征融合多跳问题生成模型

Multi-hop Question Generation Model Based on RoBERTa and Semantic Graph Feature Fusion

摘要

摘要: 现有多跳问题生成模型主要侧重于增强文档表示，很少关注答案和上下文的语义关系，无法准确捕获和处理长距离依赖关系并充分理解全局语义信息。对此，该文首先使用预训练模型RoBERTa对文档和答案进行编码得到文本的表示向量和嵌入表示，然后构建基于依存分析的语义图来挖掘文本间丰富的语义关系以更好地理解上下文信息；最后，最大输出指针解码器根据捕获的语义信息选择输入序列中合适的内容作为问题的组成部分，并引入强化学习来整合句法指标作为增强模型训练的奖励，从而更准确地生成多跳问题。在HotpotQA数据集上进行了模型验证，实验结果表明，该文所提模型在BLEU1-4，METEOR和 ROUGE-L指标上均优于对比模型，综合性能得到有效提升。

Abstract: Existing multi-hop question generation models are usually focus on enhancing document representations. To further capture the semantic relationships between answers and contexts, this paper applies the pre-trained model RoBERTA encode documents and answers. To model the long distance dependencies, a semantic graph based on dependency analysis is then constructed to mine rich semantic relationships between texts. Finally, appropriate elements of the input sequence are selected as components of the problem via the maximum output pointer decoder. In addition, reinforcement learning is introduced to integrate syntactic metrics as a reward for augmenting the model training. Experiments on the HotpotQA dataset demonstrate that the proposed model achieves significant improvements according to BLEU1-BLEU4, METEOR and ROUGE-L compared with baseline models.

HTML全文

参考文献(21)

施引文献

资源附件(0)