基于对比学习的表达式偏序关系建模

胡星武,桂韬,张奇,陈运文,高翔

PDF(3927 KB)
PDF(3927 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (4) : 166-174.
自然语言处理应用

基于对比学习的表达式偏序关系建模

  • 胡星武1,桂韬1,张奇1,陈运文2,高翔2
作者信息 +

Modeling Partial Order Relation in Math Expression by Contrastive Learning

  • HU Xingwu1, GUI Tao1, ZHANG Qi1, CHEN Yunwen2, GAO Xiang2
Author information +
History +

摘要

数学公式解题任务要求模型根据数学问题生成表达式用于解答。该任务的主流方法是将目标表达式当作文本序列来生成。然而,这一设定导致模型忽略了表达式树作为树形结构所带有的偏序关系,如交换律、分配律等。这不仅降低了模型对表达式生成的学习效率,也减弱了模型的泛化能力。为解决这一问题,该文提出一种基于对比学习的表达式偏序关系建模方法。该方法的核心做法是在模型训练时,对表达式树做微调扰动,产生和原有表达式等价和不等价的正样本和负样本,并通过对比学习最小化原式和等价式子之间的距离,且最大化与不等价负样本式子之间的距离。在公开数据集Math23K和MAWPS上的对比实验表明,该文方法相对于基线模型具有显著性能提升。

Abstract

In math word problem solving task, mainstream methods treat target expression as text sequence. The consequence of this setting is that the model cannot capture the partial order relation of expression trees, such as the law of exchange or the distributive law. This hinders models' learning efficiency in generating expression, and also reduces their generalization capability. To address this problem, this paper proposes a method to model expression tree's partial order relation by contrastive learning. The core practice is to generate both positive and negative samples when training models. The samples are obtained by modifying the original expression tree. The training objective is to minimize the distance between original expression with equivalent samples and maximize the distance with inequivalent ones. Based on experiments performed on Math23k and MAWPS, our model outperforms the baseline models by a significant margin.

关键词

数学公式生成 / 对比学习 / 偏序关系

Key words

math word problem solving / contrastive learning / partial order relation

引用本文

导出引用
胡星武,桂韬,张奇,陈运文,高翔. 基于对比学习的表达式偏序关系建模. 中文信息学报. 2023, 37(4): 166-174
HU Xingwu, GUI Tao, ZHANG Qi, CHEN Yunwen, GAO Xiang. Modeling Partial Order Relation in Math Expression by Contrastive Learning. Journal of Chinese Information Processing. 2023, 37(4): 166-174

参考文献

[1] BOBROW D G. Natural language input for a computer problem solving system[J]. Semantic Information Processing, 1968: 146-226.
[2] WANG Y, LIU X, SHI S. Deep neural solver for math word problems[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 845-854.
[3] XIE Z, SUN S. A goal-driven tree-structured neural model for math word problems[C]//Proceedings of the IJCAI. 2019: 5299-5305.
[4] KONCEL-KEDZIORSKI R, ROY S, AMINI A, et al. MAWPS: A math word problem repository[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1152-1157.
[5] BAKMAN Y. Robust understanding of word problems with extraneous information[J]. arXiv preprint math/0701393, 2007.
[6] KUSHMAN N, ARTZI Y, ZETTLEMOYER L, et al. Learning to automatically solve Algebra word problems[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 271-281.
[7] SHI S, WANG Y, LIN C Y, et al. Automatically solving number word problems by semantic parsing and reasoning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 1132-1142.
[8] ZHANG J, WANG L, LEE R K W, et al. Graph-to-tree learning for solving math word problems[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3928-3937.
[9] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[10] KIM B, KI K S, LEE D,et al. Point to the expression: solving algebraic word problems using the expression-pointer transformer model[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 3768-3779.
[11] LAN Z, CHEN M, GOODMAN S, et al. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942, 2019.
[12] SHEN Y, JIN C. Solving math word problems with multi-encoders and multi-decoders[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 2924-2934.
[13] WU Q, ZHANG Q, FU J, et al. A knowledge-aware sequence-to-tree network for math word problem solving[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 7137-7146.
[14] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduction by learning an invariant mapping[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2006(2): 1735-1742.
[15] BOSE A J, LING H, CAO Y. Adversarial contrastive estimation[J]. arXiv preprint arXiv:1805.03642, 2018.
[16] YANG Z, CHENG Y, LIU Y, et al. Reducing word omission errors in neural machine translation: A contrastive learningapproach[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6191-6196.
[17] GAO T, YAO X, CHEN D. Simcse: simple contrastive learning of sentence embeddings[J]. arXiv preprint arXiv:2104.08821, 2021.
[18] YU Y, ZUO S, JIANG H, et al. Fine-tuning pre-trained language model with weak supervision: a contrastive-regularized self-training approach[J]. arXiv preprint arXiv:2010.07835, 2020.
[19] LEWIS M, LIU Y,GOYAL N, et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv preprint arXiv:1910.13461, 2019.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
[21] MARTINEZ M,STIEFELHAGEN R. Taming the cross entropy loss[C]//Proceedings of the German Conference on Pattern Recognition. Springer, Cham, 2018: 628-637.
[22] DEODATIS G, SHINOZUKA M. Auto-regressive model for nonstationary stochastic processes[J]. Journal of Engineering Mechanics, 1988, 114(11): 1995-2012.
[23] GE W. Deep metric learning with hierarchical triplet loss[C]//Proceedings of the European Conference on Computer Vision, 2018: 269-285.
[24] LIU Y,GU J, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742.
[25] LI J, WANG L, ZHANG J, et al. Modeling intra-relation in math word problems with different functional multi-head attentions[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6162-6167.
PDF(3927 KB)

761

Accesses

0

Citation

Detail

段落导航
相关文章

/