An Improved Syntactic Phrase Extraction Approach for Statistical Machine Translation
SUN Shuihua1,2,DING Peng1,HUANG Degen1
1. School of Computer Science and Technology, Dalian University of Technology,Dalian, Liaoning 116024, China; 2. College of Information and Engineering, Fujian Uniuersity of Technology, Fuzhou, Fujian 350118, China
Abstract:The phrase table lies at the core of a phrase-based statistical machine translation system. The extracted phrase table based on heuristic methods is affected by incorrect word alignments, the unaligned words, and the absence of syntactic information. This paper presents a bilingual syntactic phrases extraction method based on the Expectation-maximization algorithm,which can optimize all parameters by iteratiions. Three techniques are examined to integrate bilingual syntactic phrases to the phrase-based machine translation system: direct augmentation of bilingual phrass,adding new features and re-training. Experiments show that all the three methods improve the BLEU score to varying degrees,with the top increase of 0.64 BLEU score by adding new features.
[1] Koehn P, Och F J, Marcu D. Statistical Phrase-based Translation[C]//Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference.Edmonton,Alberta.2003:127-133. [2] Hailong Cao, Andrew Finch, Eiichiro Sumita. Syntactic Constraints on Phrase Extraction for Phrase-Based Machine Translation[C]//Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation, COLING 2010.Beijing.2010:28-33. [3] Yang Liu, Qun liu, Shouxun Lin. Tree-to-String Alignment Template for Statistical Machine Translation[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA.2006:609-616. [4] Yamada K, Knight K.A Syntax-Based Statistical Translation Model [C]//Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse,France.2001:523-530. [5] Quirk C, Menezes A,Herry C. Dependency Treelet Translation: Syntactically Information Phrasal SMT[C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor.2005:271-279. [6] 刘冬明,赵军,杨尔弘. 汉英双语语料库中名词短语的自动对应[J]. 中文信息学报, 2003,17(5):6-12. [7] Imamura K. Hierarchical phrase alignment harmonized with parsing[C]//Proceedings of Six Natural Language Processing Pacific Rim Symposium.Tokyo.2001:377-384. [8] 刘群. 汉英机器翻译若干关键技术研究[M].清华大学出版社.2008. [9] Jinxi Xu, Jinying Chen. How Much Can We Gain from Supervised Word Alignment?[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon.2011:165-169. [10] 何彦青,周玉,宗成庆,王霞. 基于“松弛尺度”的短语翻译对抽取方法[J]. 中文信息学报,2007,21(5):91-95. [11] Boxing Chen, Roland Kuhn, George Foster, et al. Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables[C]//Proceedings of the MT Summit ⅩⅢ: the Thirteenth Machine Translation Summit. Xiamen, China.2011: 269-275. [12] Tong Xiao, Jingbo Zhu, Hao Zhang, et al. NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju Island,Korea.2012. [13] Peter F. Brown, Stephen A. Della Pietra etc. The mathematics of statistical machine translation: parameter estimation[J]. Computational Linguistics,1993: 263-309. [14] Franz Josef Och. Statistical Machine Translation: From Single-Word Models to Alignment Templates[D]. Ph.d. thesis, Computer Science Department, RWTH Aachen, Germany.2002. [15] Adam Lopez and Philip Resnik. Word-based alignment, phrase-based translation: Whats the link?[C]//Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, Massachusetts, USA .2006:90-99.