基于词对关联网络的句子对齐研究

丁颖,李军辉,周国栋

PDF(1980 KB)
PDF(1980 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (7) : 31-39.
语言资源建设

基于词对关联网络的句子对齐研究

  • 丁颖,李军辉,周国栋
作者信息 +

Word-Pair Relevance Network for Sentence Alignment

  • DING Ying, LI Junhui, ZHOU Guodong
Author information +
History +

摘要

句子对齐能够为跨语言的自然语言处理任务提供高质量的对齐句子对。受对齐句子对通常包含大量对齐的单词对这种直觉的启发,该文通过探索神经网络框架下词对间的语义相互作用来解决句子对齐问题。特别地,该文提出的词对关联网络通过融合三种相似性度量方法从不同角度来捕获词对之间的语义关系,并进一步融合它们之间的语义关系来确定两个句子是否对齐。在单调和非单调文本上的实验结果表明,该文提出的方法显著提高了句子对齐的性能。

Abstract

Sentence alignment provides high quality parallel sentence pairs for cross-language natural language processing tasks. Inspired by the intuition that aligned sentence pairs consists of a large number of aligned word pairs, this paper proposes the sentence alignment method by the semantic interaction between word pairs in neural network framework. In particular, this paper proposes word-pair relevance network, which first captures the semantic interaction between word pairs from different perspectives, then incorporates the semantic interaction to predict whether a sentence pair is aligned or not. Experimental results on monotonic and non-monotonic bitexts show that the proposed approach significantly improves the performance of sentence alignment.

关键词

句子对齐 / 词对关联网络 / 神经网络

Key words

sentence alignment / word-pair relevance network / neural network

引用本文

导出引用
丁颖,李军辉,周国栋. 基于词对关联网络的句子对齐研究. 中文信息学报. 2019, 33(7): 31-39
DING Ying, LI Junhui, ZHOU Guodong. Word-Pair Relevance Network for Sentence Alignment. Journal of Chinese Information Processing. 2019, 33(7): 31-39

参考文献

[1] Stephan Vogel,Alicia Tribble.Improving statistical machine translation for a speech-to-speech translation task[C]//Proceedings of ICSLP 2002,2002:1901-1904.
[2] Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio.Neural machine translation by jointly learning to align and translate[C]//Proceedings of ICLR 2015,2015.
[3] Ashish Vaswani,Noam Shazeer,Niki Parmar,et al.Attention is all you need[C]//Proceedings of NIPS 2017,2017.
[4] Jian-Yun Nie,Michel Simard,Pierre Isabelle,et al.Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web[C]//Proceedings of SIGIR 1999,1999:74-81.
[5] Wessel Kraaij,Jian-Yun Nie,Michel Simard.Embedding web-based statistical translation models in cross-language information retrieval[J].Computational Linguistics,2003,29(3):381-419.
[6] Giovanni Da San Martinom,Salvatore Romeo,Alberto Barroon-Cedeno,et al.Cross-language question re-ranking[C]//Proceedings of SIGIR 2017,2017:1145-1148.
[7] Karl Moritz Hermann,Phil Blunsom.Multilingual models for compositional distributed semantics[C]//Proceedings of ACL 2014,2014:58-68.
[8] Judith Klavans,Evelyne Tzoukcrmann.The BICORD System:Combining lexical information from bilingual corpora and machine readable dictionaries[J].Computational Linguistics,1990,62(4):174-179.
[9] William A Gale,Kenneth W Church.A program for aligning sentences in bilingual corpora[C]//Proceedings of ACL 1991,1991:177-184.
[10] Martin Kay,Martin Roscheisen.Text-Translation Alignment[J].Computational Linguistics,1993,19 (1):121-142.
[11] 刘昕,周明,朱胜火,等.基于自动抽取词汇信息的双语句子对齐[J].计算机学报,1998,21(s1):151-158.
[12] Francis Gregoire,Philippe Langlais.A deep neural network approach to parallel sentence extraction[J].avXiv preprint arXiv:1709.09783v1,2017.
[13] Jeenu Grover,Pabitra Mitra.Bilingual word embed-dings with bucketed CNN for parallel sentence extraction[C]//Proceedings of ACL:Student Research Workshop 2017,2017:11-16.
[14] Xiaojun Quan,Chunyu Kit,Yan Song.Non-monotonic sentence alignment via semisupervised learning[C]//Proceedings of ACL 2013.2013:622-630.
[15] Robert C Moore.Fast and accurate sentence alignment of bilingual corpora[C]//Proceedings of AMTA 2012,2012:135-144.
[16] Fabienne Braune,Alexander Fraser.Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora[C]//Proceedings of COLING 2010,2010:81-89.
[17] Xiaoyi Ma.Champollion:A robust parallel text sentence aligner[C]//Proceedings of International Conference on Language Resources and Evaluation,2006:489-492.
[18] Peng Li,Maosong Sun,Ping Xue.Fast-champollion:A fast and robust sentence alignment algorithm[C]//Proceedings of COLING 2010,2010:710-718.
[19] Hua He,Jimmy Lin.Pairwise word interaction mo-deling with deep neural networks for semantic similarity measurement[C]//Proceedings of NAACL 2016,2016:937-948.
[20] Zhiguo Wang,Wael Hamza,Radu Florian.Bilateral multi-perspective matching for natural language sentences[C]//Proceedings of IJCAI 2017,2017:4144-4150.
[21] Minjoon Seo,Aniruddha Kembhavi,Ali Farhadi,et al.Bidirectional attention flow for machine comprehension[C]//Proceedings of ICLR 2015,2015.
[22] Philippe Langlais,Michel Simard,Jean Véronis.Methods and practical issues in evaluating alignment techniques[C]//Proceedings of COLING-ACL 1998,1998:711-717.
[23] Dekai Wu.Alignment[M].Handbook of Natural Language Processing,CRC Press.2010:367-408.
[24] Chunyu Kit,Jonathan J Webster,King Kui Sin,et al.Clause alignment for Hong Kong legal texts:A lexical-based approach[J].International Journal of Corpus Linguistics,2004:9(1):29-52.
[25] Ilya Sutskever,Ruslan Salakhutdinov,Joshua B Tenenbaum.Modelling relational data using Bayesian clustered tensor factorization[C]//Proceedings of NIPS 2009,2009:1821-1828.
[26] Rodolphe Jenatton,Nicolas Le Roux,Antoine Bordes,et al.A latent factor model for highly multi-relational data[C]//Proceedings of NIPS 2012,2012:3167-3175.
[27] Ronan Collobert,Jason Weston.A unified architecture for natural language processing:Deep neural networks with multitask learning [C]//Proceedings of ICML 2008,2008:160-167.
[28] Will Y Zou,Richard Socher,Daniel Cer,et al.Bilingual word embeddings for phrase-based machine translation[C]//Proceedings of EMNLP 2013,2013:1393-1398.
[29] Kyunghyun Cho,Bart van Merrienboer,Caglar Gulcehre,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of EMNLP 2014,2014:1724-1734.
[30] Matthew D Zeiler.ADADELTA:An Adaptive learning rate method[J].arXiv preprint arXiv:1212.5701,2012.
[31] Zhouhan Lin,Minwei Feng,Cicero Nogueira dos Santos,et al.A structured self-attentive sentence embedding[C]//Proceedings of ICLR 2017,2017.
[32] Yonghui Wu,Mike Schuster,Zhifeng Chen,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].avXiv preprint arXiv:1609.08144v2,2016.

基金

国家自然科学基金(61401295,61502149)
PDF(1980 KB)

699

Accesses

0

Citation

Detail

段落导航
相关文章

/