2005统计机器翻译研讨班研究报告

徐波,史晓东,刘群,宗成庆,庞薇,陈振标,杨振东,魏玮,杜金华,陈毅东,刘洋,熊德意,侯宏旭,何中军

PDF(381 KB)
PDF(381 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (5) : 3-11.

2005统计机器翻译研讨班研究报告

  • 徐波1,史晓东2,刘群3,宗成庆1,庞薇1,陈振标1,杨振东1,魏玮1,杜金华1,陈毅东2,刘洋3,熊德意3,侯宏旭3,何中军3
作者信息 +

Current Statistical Machine Translation Research in China

  • XU Bo1,SHI Xiao-dong2,LIU Qun3,ZONG Cheng-qing1,PANG Wei1,CHEN Zhen-biao1,YANG Zhen-dong1,WEI Wei1,DU Jin-hua1,CHEN Yi-dong2,LIU Yang3,XIONG De-yi3,HOU Hong-xu3,HE Zhong-jun3
Author information +
History +

摘要

2005年7月13日至15日,中国科学院自动化研究所、计算技术研究所和厦门大学计算机系联合举办了我国首届统计机器翻译研讨班。本文主要介绍本次研讨班参加单位的测试系统和实验结果,并给出相应的分析。测试结果表明,我国的统计机器翻译研究起步虽晚,但已有快速进展,参评系统在短期内得到了较好的翻译质量,与往年参加863评测的基于规则方法的系统相比性能虽还有差距,但差距已经不大。从目前国际统计机器翻译研究的现状和发展趋势来看,随着数据资源规模的不断扩大和计算机性能的迅速提高,统计机器翻译还有很大的发展空间。在未来几年内,在基于短语的主流统计翻译方法中融入句法、语义信息,必将成为机器翻译发展的趋势。

Abstract

Institute of Automation, Institute of Computing Technology of Chinese Academy of Sciences, and Department of Computer Science of Xiamen University held the first Statistical Machine Translation Workshop in China together, from July 13th to 15th in 2005. This paper describes the tested systems of involved institutions, and analyzes the results of their experiments. The test results show that although the research of statistical machine translation started late in China, it develops rapidly. The tested systems got quite good results in a short period. Compared with the rule-based systems reported in the formal “863” evaluation, the performance is somewhat lower; however, the difference is small. According to the state of art and the trend of international statistic machine translation research,we believe that there is still great space for the improvement of statistic machine translation, with larger-scale data resources and more powerful hardware. In near future, phrase-based method incorporated with syntax and semantic information will become the mainstream of statistical machine translation.

关键词

人工智能 / 机器翻译 / 统计机器翻译 / 基于短语的翻译模型 / 机器翻译评测

Key words

artificial intelligence / machine translation / statistical machine translation / phrase-based translation model / machine translation evaluation

引用本文

导出引用
徐波,史晓东,刘群,宗成庆,庞薇,陈振标,杨振东,魏玮,杜金华,陈毅东,刘洋,熊德意,侯宏旭,何中军. 2005统计机器翻译研讨班研究报告. 中文信息学报. 2006, 20(5): 3-11
XU Bo,SHI Xiao-dong,LIU Qun,ZONG Cheng-qing,PANG Wei,CHEN Zhen-biao,YANG Zhen-dong,WEI Wei,DU Jin-hua,CHEN Yi-dong,LIU Yang,XIONG De-yi,HOU Hong-xu,HE Zhong-jun. Current Statistical Machine Translation Research in China. Journal of Chinese Information Processing. 2006, 20(5): 3-11

参考文献

[1] Peter F. Brown, Stephen A. Della Pietra,Vincent J. Della Pietra, and Pobert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics [J] , vol. 19, no. 2, 263 - 311.
[2] Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model[A]. In: Proceedings of the 39th Annual Meeting of the ACL [C] , pages 523 - 530.
[3] Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venugopal, Bing Zhao,Alex Waibel. 2003. The CMU Statistical Machine Translation System[A]. In: proceedings of the Ninth Machine Translation Summit[C]. 402 - 409.
[4] Xie, Guodong, Chengqing Zong and Bo Xu. 2002. Chinese Spoken Language Analyzing Based on Combination of Statistical and Rule Methods[A]. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP’2002) [C]. Sept. 16 - 20, 2002. Colorado,USA. Pages 613 - 616.
[5] Wu Hua, Taiyi Huang, Chengqing Zong and Bo Xu. 2000. Chinese Generation in a Spoken Dialogue Translation System[A]. In: Proceedings of COLING, [C] July 27 - August 4, 2000. Germany. Pages 1141 - 1145.
[6] Zhou Yu, Chengqing Zong and Bo Xu. 2005. Various Aligned Models In Chinese-to-English Statistical Machine Translation[A]. In: Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE) [C]. October 30th - November 1st, 2005. Wuhan, China. Pages 443 - 448.
[7] Zong Chengqing, Yumi WAKITA, Bo Xu, Kenji Matsui and Zhenbiao Chen. 2000. Japanese-to-Chinese Spoken Language Translation Based on the Simple Expression [A]. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP) [C]. October 16 - 20, 2000. Beijing. Pages 418 - 421.
[8] 胡日勒. 2005. 口语翻译知识自动获取方法研究[D]. 博士学位论文,中科院自动化研究所.
[9] Pang Wei, Zhendong Yang, Zhenbiao Chen,Wei Wei,Bo Xu and Chengqing Zong. 2005. The CASIA Phrase-based Machine Translation System[A]. In: Proc. IWSLT-05 [C] ,Oct. 24 - 25, 2005. Pittsburgh, USA. 114 - 121.
[10] Xu Bo, Zhenbiao Chen,Wei Wei,Wei Pang, and Zhendong Yang. 2005. Phrase-based Statistical Machine Translation for MANOS System [A]. In: Proc. MT Summit X [C]. Sept. 12 - 16, 2005. Phuket, Thailand. i23 - i26.
[11] Hua-Ping ZHANG,Qun LIU, Hong-Kui YU, Xue-Qi CHENG, Shou BAI, Chinese Named Entity Recognition Using Role Model [J]. Computational Linguistics and Chinese Language Processing, Vol. 8, No. 2, August 2003, 29-60.
[12] 刘群,张华平,俞鸿魁,程学旗. 基于层次隐马模型的汉语词法分析[J]. 计算机研究与发展, 2004. 8.
[13] Deyi XIONG, Shuanglong LI,Qun LIU, Shouxun LIN, and Yueliang QIAN, Parsing the Penn Chinese Treebank with Semantic Knowledge[A]. The Second International Joint Conference on Natural Language Processing (IJCNLP-05) [C] , Jeju Island, Republic of Korea,October 11 - 13, 2005.
[14] 刘群,詹卫东,常宝宝,刘颖. 一个汉英机器翻译系统的计算模型与语言模型[A]. 第三届全国智能接口与智能应用学术会议[C] ,吴泉源,钱跃良主编,智能计算机接口与应用进展,第253 - 258页,电子工业出版社, 1997. 8.
[15] Qun Liu, Shiwen Yu, TransEasy:A Chinese-English Translation System based on hybrid approach[A]. Third Conference of the Association for Machine Translation in the Americas (AMTA-98) [C] , Langhorne, PA, USA, Oct. 1998, In: David Farwell, et al, Eds. ,Machine Translation and the Information Soup, Lecture Notes in Artificial Intelligence Vol. 1529, Springer, 514 - 517, 1998.
[16] 刘群,俞士汶. 汉英机器翻译的难点分析[A]. International Conference on Chinese Information Processing, 黄昌宁主编, 1998中文信息处理国际会议论文集[C] ,第507 - 514页,清华大学出版社, 1998. 11.
[17] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003, 17 (4) : 1 - 12.
[18] 刘群. 汉英机器翻译若干关键技术研究[D] ,北京大学博士论文, 2004. 5.
[19] 刘群. 基于微引擎流水线的机器翻译系统结构[J]. 计算机学报, 2004, 27 (5) : 1 - 12.
[20] Yang LIU, Qun LIU, and Shouxun LIN. Log-linear Models for Word Alignment[A]. The 43rd Annual Meeting of Association of Computational Linguistics (ACL-05) [C] , Michigan,USA, June 25 - 30, 2005.
[21] Richard Zens, Franz Josef Och, Hermann Ney. September 2002. Phrase-Based Statistical Machine Translation [A]. In: Proc. German Conference on Artificial Intelligence (KI2002) [C] , Springer Verlag, 18 - 32.
[22] Franz Josef Och. 2002. Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation[A]. ACL2002 [C] , 295 - 302.
[23] Och, Franz Josef. Statistical Machine Translation: From Single-Word Models to Alignment Templates[A]. Ph.D. thesis, Computer Science Department, RWTH[C] Aachen, Germany,October. 2002.
[24] Wu, Dekai. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora [A]. Computational Linguistics[C] , 377 - 404.
[25] Koehn, P. ,Och, F. J. , and Marcu,D. 2003. Statistical Phrase-Based Translation[A]. In: Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics[C]. 127 - 133.
[26] Och, F. J. , Tillmann, C. , Ney, H. 1999. Improved alignment models for statistical machine translation [A]. Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora[C] ,University of Maryland, College Park, 20 - 28.
[27] http://svr-www.eng.cam.ac.uk /~prc14/toolkit_documentation.html.
[28] Franz Josef Och. 2002. Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation[A]. ACL2002 [C] , 295 - 302.
[29] Brown, P. F. ,Della Pietra S. A. ,Della Pietra V. J. , and Mercer R. L. 1993“The Mathematics of Statistical Machine Translation: Parameter Estimation”[A]. Computational Linguistics[C] , 263 - 311.
[30] Ying Zhang, Stephan Vogel and Alex Waibel. Integrated Phrase Segmentation and Alignment Model for Statistical Machine Translation[A]. Submitted to Proc. of International Conference on Natural Language Processing and Knowledge Engineering(NLP-KE) [C] , 2003.
[31] Stephan Vogel, Hermann Ney, and Christoph Tillmann. HMM-based Word Alignment in Statistical Translation [A]. In: COLING’96: The 16th Int. Conf. On Computational Linguistics[C] , pp. 836 - 841, 1996, 19 (6) : 1 - 6.
[32] 胡日勒,宗成庆,徐波. 基于统计学习的机器翻译模板自动获取方法[J]. 中文信息学报, 2005, 19 (6) : 1 - 6.
[33] 张捷,陈群秀. 日汉机器翻译系统中的Agent研究[J]. 中文信息学报, 2003, 17 (1) : 7 - 12.
[34] 黄河燕,陈肇雄,宋继平. 一种人机互动的多策略机器翻译系统IHSMTS的设计与实现原理[J]. 中文信息学报, 1999, 13 (5) : 43 - 50.
[35] 张剑,吴际,周明. 机器翻译评测的新进展[J]. 中文信息学报, 2003, 17 (6) : 1 - 8.

基金

国家自然科学基金资助项目(60272041)
PDF(381 KB)

969

Accesses

0

Citation

Detail

段落导航
相关文章

/