本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统的解码算法改进了线图分析的CKY算法,融入了线性的N-gram语言模型。目前,本文主要针对中文-英文的口语翻译进行了一系列实验,并以国际口语评测IWSLT(International Workshop on Spoken Language Translation)为标准,在2005年的评测测试集上,BLEU和NIST得分均比统计短语翻译系统有所提高。
Abstract
This paper describes a Hierarchical chunking-phrase based (HCPB) statistical translation model. The model not only comply with formal synchronous context-free grammar but also learned partial parsing knowledge using CRF (Conditional Random Fields) . Therefore it can be taken as combination of fundamental ideas from both syntax-based translation and phrase-based translation. The decoder for HCPB MT system is based on Chart-CKY algorithm, and integrates N-gram language model effectively. In our benchmark evaluation focusing on Chinese-English spoken language translation. The method achieves higher accuracy in measure of Bleu and NIST score in IWSLT2005.
关键词
人工智能 /
机器翻译 /
基于分层语块分析的统计翻译模型 /
条件随机场 /
CKY算法
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
hierarchical chunking-phrase based SMT /
conditional random fields /
chart-based CKY algorithm
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Peter F. Brown , Stephen A. Della Pietra, Vincent J. Della Pietra, and Pobert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation[J]. Computational Linguistics, 1993, 19(2): 263-311.
[2] Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation[A]. In: Proc.of NAACL [C]. Edmonton, Canada: 2003. 48-54.
[3] Richard Zens and Hermann Ney. A comparative study on reordering constraints in statistical machine translation[A]. In: Proc. of ACL 2003[C]. 144-151.
[4] Christoph Tillman. A unigram orientation model for statistical machine translation[A]. In: HLT-NAACL Short Papers[C]. Boston, Massachusetts, USA: 2004. May 2 - May 7, 101-104.
[5] David Chiang. A hierarchical phrase-based model for statistical machine translation[A]. In: Proc. of ACL 2005[C]. Ann Arbor, Michigan: June, 263-270.
[6] Alfred V. Aho and Jeffrey D. Ullman. Syntax directed translations and the pushdown assembler[J]. J.Comput. Syst. Sci., 1969, 3(1):37-56.
[7] J. Lafferty A. McCallum and F. Pereira. Conditional random Fields: probabilistic models for segmenting and labeling sequence data [A]. Harry Q. Bovik. Proceedings of ICML [C]. Massachusetts, USA: 2001. 282-289.
[8] Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields [A]. Eduard Hovy. Proceedings of HLT-NAACL [C]. Edmonton, Alberta: 2003. 134-141.
[9] F. J. Och and H. Ney. Discriminative training and maximum entropy models for statistical machine translation[A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistic[C]. 2002. 295-302.
[10] Fang Xu, Chengqing Zong, and Jun Zhao. A Hybrid Approach to Chinese Base Noun Phrase Chunking[A]. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing[C]. Sydney: July 22-23. 2006. 87-93.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863计划资助项目(2006AA01Z194);富士通合作项目(K0604040)
{{custom_fund}}