基于对比学习的表达式偏序关系建模

PDF(3927 KB)

中文信息学报 ›› 2023, Vol. 37 ›› Issue (4) : 166-174.

自然语言处理应用

基于对比学习的表达式偏序关系建模

胡星武¹,桂韬¹,张奇¹,陈运文²,高翔²

作者信息 +

Modeling Partial Order Relation in Math Expression by Contrastive Learning

HU Xingwu¹, GUI Tao¹, ZHANG Qi¹, CHEN Yunwen², GAO Xiang²

Author information +

History +

摘要

数学公式解题任务要求模型根据数学问题生成表达式用于解答。该任务的主流方法是将目标表达式当作文本序列来生成。然而,这一设定导致模型忽略了表达式树作为树形结构所带有的偏序关系,如交换律、分配律等。这不仅降低了模型对表达式生成的学习效率,也减弱了模型的泛化能力。为解决这一问题,该文提出一种基于对比学习的表达式偏序关系建模方法。该方法的核心做法是在模型训练时,对表达式树做微调扰动,产生和原有表达式等价和不等价的正样本和负样本,并通过对比学习最小化原式和等价式子之间的距离,且最大化与不等价负样本式子之间的距离。在公开数据集Math23K和MAWPS上的对比实验表明,该文方法相对于基线模型具有显著性能提升。

Abstract

In math word problem solving task, mainstream methods treat target expression as text sequence. The consequence of this setting is that the model cannot capture the partial order relation of expression trees, such as the law of exchange or the distributive law. This hinders models' learning efficiency in generating expression, and also reduces their generalization capability. To address this problem, this paper proposes a method to model expression tree's partial order relation by contrastive learning. The core practice is to generate both positive and negative samples when training models. The samples are obtained by modifying the original expression tree. The training objective is to minimize the distance between original expression with equivalent samples and maximize the distance with inequivalent ones. Based on experiments performed on Math23k and MAWPS, our model outperforms the baseline models by a significant margin.

导出引用

胡星武,桂韬,张奇,陈运文,高翔. 基于对比学习的表达式偏序关系建模. 中文信息学报. 2023, 37(4): 166-174

HU Xingwu, GUI Tao, ZHANG Qi, CHEN Yunwen, GAO Xiang. Modeling Partial Order Relation in Math Expression by Contrastive Learning. Journal of Chinese Information Processing. 2023, 37(4): 166-174

参考文献

[1] BOBROW D G. Natural language input for a computer problem solving system[J]. Semantic Information Processing, 1968: 146-226.
[2] WANG Y, LIU X, SHI S. Deep neural solver for math word problems[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 845-854.
[3] XIE Z, SUN S. A goal-driven tree-structured neural model for math word problems[C]//Proceedings of the IJCAI. 2019: 5299-5305.
[4] KONCEL-KEDZIORSKI R, ROY S, AMINI A, et al. MAWPS: A math word problem repository[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1152-1157.
[5] BAKMAN Y. Robust understanding of word problems with extraneous information[J]. arXiv preprint math/0701393, 2007.
[6] KUSHMAN N, ARTZI Y, ZETTLEMOYER L, et al. Learning to automatically solve Algebra word problems[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 271-281.
[7] SHI S, WANG Y, LIN C Y, et al. Automatically solving number word problems by semantic parsing and reasoning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 1132-1142.
[8] ZHANG J, WANG L, LEE R K W, et al. Graph-to-tree learning for solving math word problems[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3928-3937.
[9] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[10] KIM B, KI K S, LEE D,et al. Point to the expression: solving algebraic word problems using the expression-pointer transformer model[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 3768-3779.
[11] LAN Z, CHEN M, GOODMAN S, et al. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942, 2019.
[12] SHEN Y, JIN C. Solving math word problems with multi-encoders and multi-decoders[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 2924-2934.
[13] WU Q, ZHANG Q, FU J, et al. A knowledge-aware sequence-to-tree network for math word problem solving[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 7137-7146.
[14] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduction by learning an invariant mapping[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006(2): 1735-1742.
[15] BOSE A J, LING H, CAO Y. Adversarial contrastive estimation[J]. arXiv preprint arXiv:1805.03642, 2018.
[16] YANG Z, CHENG Y, LIU Y, et al. Reducing word omission errors in neural machine translation: A contrastive learningapproach[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6191-6196.
[17] GAO T, YAO X, CHEN D. Simcse: simple contrastive learning of sentence embeddings[J]. arXiv preprint arXiv:2104.08821, 2021.
[18] YU Y, ZUO S, JIANG H, et al. Fine-tuning pre-trained language model with weak supervision: a contrastive-regularized self-training approach[J]. arXiv preprint arXiv:2010.07835, 2020.
[19] LEWIS M, LIU Y,GOYAL N, et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv preprint arXiv:1910.13461, 2019.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
[21] MARTINEZ M,STIEFELHAGEN R. Taming the cross entropy loss[C]//Proceedings of the German Conference on Pattern Recognition. Springer, Cham, 2018: 628-637.
[22] DEODATIS G, SHINOZUKA M. Auto-regressive model for nonstationary stochastic processes[J]. Journal of Engineering Mechanics, 1988, 114(11): 1995-2012.
[23] GE W. Deep metric learning with hierarchical triplet loss[C]//Proceedings of the European Conference on Computer Vision, 2018: 269-285.
[24] LIU Y,GU J, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742.
[25] LI J, WANG L, ZHANG J, et al. Modeling intra-relation in math word problems with different functional multi-head attentions[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6162-6167.

PDF(3927 KB)

761

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注



Issue Date
2023-06-21