翻译规则剪枝与基于半强制解码和变分贝叶斯推理的模型训练

高恩婷,段湘煜,巢佳媛,张 民

PDF(695 KB)
PDF(695 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (5) : 141-147.
机器翻译

翻译规则剪枝与基于半强制解码和变分贝叶斯推理的模型训练

  • 高恩婷1,段湘煜2,巢佳媛2,张 民2
作者信息 +

Translation Rule Pruning and Model Training with Semi-Forced Decoding and Variational Bayesian Inference

  • GAO Enting1, DUAN Xiangyu2, CHAO Jiayuan2, ZHANG Min2
Author information +
History +

摘要

统计机器翻译一般采用启发式方法训练翻译模型。但启发式方法的理论基础不够完善,因此,会导致翻译模型规模庞大以及模型参数精确率不高。针对以上两个问题,该文提出一种基于变分贝叶斯推理的模型训练方法,形成更精确的精简翻译模型。该方法首先通过强制解码对齐语料,然后利用变分贝叶斯EM算法获得模型参数。该文的实验语料为NIST汉英翻译任务数据,实验结果显示,基于句法(基于短语)的统计机器翻译中,超过95%(76%)的规则被剪枝,且BLEU值显著提高。

Abstract

SMT usually learns translation models with heuristics, which leads to large models and potentially less accurate model parameters due to the poor theoretical justification of heuristics. This paper presents a variational Bayesian inference-based training method to address these two issues, targeting to learn a compact translation model with more accurate translation probabilities. It is achieved by translation model parameter estimation using variational Bayesian EM over alignments obtained by forced decoding. Experimental results on the Chinese-English NIST translation data shows that our proposed method is very effective, resulting in more than 95% (76%) rule pruned out with significant performance improvement in Bleu score for syntax-based SMT and phrase-based SMT.

关键词

机器翻译 / 规则剪枝 / 半强制解码 / 变分贝叶斯

Key words

machine translation / rule pruning / semi-forced decoding / variational bayesian

引用本文

导出引用
高恩婷,段湘煜,巢佳媛,张 民. 翻译规则剪枝与基于半强制解码和变分贝叶斯推理的模型训练. 中文信息学报. 2014, 28(5): 141-147
GAO Enting, DUAN Xiangyu, CHAO Jiayuan, ZHANG Min. Translation Rule Pruning and Model Training with Semi-Forced Decoding and Variational Bayesian Inference. Journal of Chinese Information Processing. 2014, 28(5): 141-147

参考文献

[1] Antoniak C E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems[J]. The annals of statistics, 1974: 1152-1174.
[2] Blei D M, Jordan M I. Variational inference for Dirichlet process mixtures[J]. Bayesian analysis, 2006, 1(1): 121-143.
[3] Kurihara K, Welling M, Teh Y W. Collapsed Variational Dirichlet Process Mixture Models[C]Proceedings of the IJCAI, 2007, 7: 2796-2801.
[4] Mark Johnson, Sharon Goldwater. Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars[C]//Proceedings of the HLT-NAACL, 2009: 317-325.
[5] Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society. Series B (Methodological), 1977: 1-38.
[6] Gonzalo Iglesias, Adri`a de Gispert, Eduardo R. Banga, et al. Rule filtering by pattern for efficient hierarchical translation[C]Proccedings of the . EACL, 2009.380 388.
[7] Zhongjun He, Yao Meng, YajuanLj, et al. Reducing SMT Rule Table with Monolingual Key Phrase[C]//Proceedings of the ACL-IJCNLP (short paper), 2009: 121-1245.
[8] Katerina T. Frantzi, Sophia Ananiadou. Extracting nested collocations[C]Proceedings of the COLING, 1996: 41 46.
[9] Zhiyang Wang, YajuanLv, Qun Liu et al. Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules[C]//Proceedings of the ACL (short paper), 2010: 142-146.
[10] Eck M, Vogel S, Waibel A. Translation model pruning via usage statistics for statistical machine translation[C]//Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. Association for Computational Linguistics, 2007: 21-24.
[11] Howard Johnson, Joel Martin, George Foster et al. Improving translation quality by discarding most of the phrasetable[C]Proceedings of the EMNLP-CoNLL, 2007. 967 97
[12] Daniel Marcu, William Wong. A Phrase-based, Joint Probability Model for Statistical Machine Translation[C]//Proceedings of the EMNLP, 2002: 133-139.
[13] DeNero J, Gillick D, Zhang J, et al. Why generative phrase models underperform surface heuristics[C]//Proceedings of the Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2006: 31-38.
[14] Daniel Marcu, W. Wang, A. Echihabi et al. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases[C]//Proceedings of the EMNLP, 2006: 44-52.
[15] May J, Knight K. Syntactic Re-Alignment Models for Machine Translation[C]//Proceedings of the EMNLP-CoNLL, 2007: 360-368.
[16] JoernWuebker, Arne Mauser, Hermann Ney. Training Phrase Translation Models with Leaving-One-Out[C]//Proceedings of the ACL, 2010: 475-484
[17] DeNero J, Bouchard-C t A, Klein D. Sampling alignment structure under a Bayesian translation model[C]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008: 314-323.
[18] Blunsom P, Cohn T, Dyer C, et al. A Gibbs sampler for phrasal synchronous grammar induction[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 2009: 782-790.
[19] Blunsom P, Cohn T, Osborne M. A Discriminative Latent Variable Model for Statistical Machine Translation[C]Proceedings of the ACL. 2008: 200-208.
[20] Blunsom P, Osborne M. Probabilistic inference for machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008: 215-223.
[21] Trevor Cohn, Phil Blunsom. A Bayesian Model of Syntax-Directed Tree to String Grammar Induction[C]//Proceedings of the EMNLP. 2009. 352-361.
[22] Percy Liang, Dan Klein. Structured Bayesian Nonparametric Models with Variational Inference[C]//Proceedings of the ACL Tutorial.-2007.
[23] Philipp Koehn, H. Hoang, A. Birch, et al. Moses: Open Source Toolkit for Statistical Machine Translation[C]Proceedings of the ACL (poster), 2007: 77-180
[24] Kishore Papineni, Salim Roukos, ToddWard et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the . ACL, 2002. 311-318.
[25] HaitaoMi, Liang Huang, Qun Liu. Forest-based translation[C]//Proceedings of the ACL-HLT, 2008: 192-199.
[26] Zhang H, Zhang M, Li H, et al. Forest-based tree sequence to string translation model[C]//Proceedings of the ACL, 2009: 172-180.
[27] Franz J. Och, Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation[C]//Proceedings of the ACL, 2002: 295-302.
[28] Franz J. Och. Minimum error rate training in statistical machine translation[C]//Proceedings of the . ACL\, 2003: 160-167
[29] Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, Sheng Li. A Tree Sequence Alignment-based Tree-to-Tree Translation Model[C]//Proceedings of the ACL-HLT, 2008: 559-567
[30] Andreas Stolcke. SRILM - an extensible language modeling toolkit[C]//Proceedings of the . ICSLP, 2002: 901-904.
[31] Reinhard Kneser, Hermann Ney. Improved backing-off for M-gram language modeling[C]Proceedings of the ICASSP, 1995: 181-184
[32] Charniak E. A maximum-entropy-inspired parser[C]. Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Association for Computational Linguistics, 2000: 132-139.
[33] Yang Liu, Qun Liu, Shouxun Lin. Tree-to-String Alignment Template for Statistical Machine Translation[C]//Proceedings of the COLING-ACL, 2006: 609-616.
[34] Agresti, Alan. An introduction to categorical data analysis [M]. New York: Wiley, 1996.
[35] Birch A, Callison-Burch C, Osborne M, et al. Constraining the phrase-based, joint probability statistical translation model[C]//Proceedings of the workshop on statistical machine translation. Association for Computational Linguistics, 2006: 154-157.
[36] Peter F Brown, Stephen A Della Pietra, Vincent J. Della Pietra et al. The mathematics of statistical machine translation: Parameter estimation [J]. Computational Linguistics, 19(2): 263-311
[37] David Chiang. A hierarchical phrase-based model for SMT[C]//Proceedings of the ACL. 2005: 263-270
[38] Nicola Ehling, Richard Zens, Hermann Ney. Minimum bayes risk decoding for BLEU[C]//Proceedings of the ACL. 2007: 101 104.
[39] Jesus-Andres Ferrer, Alfons Juan. A phrase-based hidden semi-markov approach to machine translation[C]//Proceedings of the EAMT. 2009: 132-139.
[40] Ferguson T S. A Bayesian analysis of some nonparametric problems[J]. The annals of statistics, 1973: 209-230.
[41] Michel Galley, Mark Hopkins, Kevin Knight et al. What's in a translation rule?[C]Proceedings of the HLT-NAACL, 2004: 273-280.
[42] Michel Galley, J. Graehl, K. Knight, et al. Scalable Inference and Training of Context-Rich Syntactic Translation Models Proceedings of the COLING-ACL, 2006: 961-968.
[43] Abraham Ittycheriah, Salim Roukos. Direct translation model 2[C]//Proceedings of the HLT-NAACL, 2007: 57 64
[44] Mark Johnson. The DOP estimation is biased and inconsistent[J]. Computational Linguistics, 2002, 28(1): 71-76
[45] Dan Klein, Christopher D. Manning. Accurate Unlexicalized Parsing[C]Proceedings of the ACL, 2003: 423-430.
[46] Philipp Koehn, Franz J. Och, Daniel Marcu. Statistical phrase-based translation[C]Proceedings of the HLT-NAACL, 2003: 127-133
[47] Philipp Koehn. Statistical significance tests for machine translation evaluation[C]Proceedings of the EMNLP, 2004: 388-395
[48] Percy Liang, Alexandre Buchard-Cté, Dan Klein, et al. An End-to-End Discriminative Approach to Machine Translation[C]Proceedings of the COLING-ACL, 2006. 761 768
[49] HaitaoMi, Liang Huang. Forest-based Translation Rule Extraction[C]//Proceedings of the EMNLP, 2008: 206-214
[50] Franz J. Och, Hermann Ney. The alignment template approach to statistical machine translation [J]. Computational Linguistics, 2004, 30(4): 417-449
[51] Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, et al. A Smorgasbord of Features for Statistical Machine Translation[C]//Proceedings of the . HLT-NAACL, 2004: 161-168.
[52] ChristophTillmann, Tong Zhang. A block bigram prediction model for statistical machine translation[J]. ACM Transactions Speech Language Processing, 2007,4(3): 6.
[53] TaroWatanabe, Jun Suzuki, Hajime Tsukada, et al. Online large-margin training for statistical machine translation[C]//Proceedings of the EMNLP, 2007: 764 773.
[54] Dekai Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J]. Computational Linguistics, 1997, 23(3): 377-403
[55] Kenji Yamada, Kevin Knight. A syntax-based statistical translation model[C]//Proceedings of the . ACL, 2001: 523-530

基金

国家自然科学基金(61373095)
PDF(695 KB)

583

Accesses

0

Citation

Detail

段落导航
相关文章

/