刘梦眙,姚亮,洪宇,刘昊,姚建民. 译文语序的领域性思考:一种融合主题信息的领域自适应调序模型[J]. 中文信息学报, 2017, 31(5): 50-58.
LIU Mengyi, YAO Liang, HONG Yu, LIU Hao, YAO Jianmin. Domain Adaptation of Reordering Model via Topic Information: Word Order in Translated Text across Domains. , 2017, 31(5): 50-58.
译文语序的领域性思考:一种融合主题信息的领域自适应调序模型
刘梦眙,姚亮,洪宇,刘昊,姚建民
苏州大学 计算机科学与技术学院,江苏 苏州 215006
Domain Adaptation of Reordering Model via Topic Information: Word Order in Translated Text across Domains
LIU Mengyi, YAO Liang, HONG Yu, LIU Hao, YAO Jianmin
School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:The research on domain adaptation (DA) for statistical machine translation (SMT) aims at dynamically adjusting the translation model to ensure balanced and reliable translation quality in different domains. Existing researches on adaptation of translation model have made remarkable progress, but neglect the reordering issue. This paper investigates the translation samples in a large scale source bilingual corpus, revealing that 36.17% samples exhibits clear word order differences in phrase level translation pairs. Therefore, we propose a domain adaptive reordering model based on fusing topic information, to explore the reordering differences of phrases under different topic distribution. Experimental results show that translation systems with adaptive reordering model yield obvious performance improvements.
[1] Axelrod A, He Xiaodong, Gao Jianfeng. Domain adaptation via pseudo in-domain data selection[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processins. Edingburg, Scotland, United Kingdom:Association for Computational Linguistics, 2011, 355-362. [2] 庞弘燊, 方曙, 杨志刚, 等. 研究领域的主题发展趋势分析方法研究:基于多重共现的视角[J]. 情报理论与实践, 2012, 35(8):44-47, 73. [3] 冯洋, 张冬冬, 刘群. 层次短语翻译模型的介词短语调序[J]. 中文信息学报, 2012, 26(1):31-36. [4] 何钟豪, 苏劲松, 史晓东, 等. 引入集成学习的最大熵短语调序模型[J]. 中文信息学报, 2014, 28(1):87-93. [5] 肖欣延, 刘洋, 刘群, 等. 面向层次短语翻译的词汇化调序方法研究[J]. 中文信息学报, 2012, 26(1):37-41, 50. [6] Cao Hailong, Zhang Dongdong, Li Mu, et al. A lexicalized reordering model for hierarchical phrase-based translation[C]//Proceedings of the 25th International Conference on Computational Linguistics. Dublin, Ireland:Technical Papers, 2014:1144-1153. [7] Yasuda K, Zhang Ruiqiang, Hirofumi Y, et al. Method of selecting training data to build a compact and efficient translation model[C]//Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India:The Association for Computer Linguistics, 2008:655-660. [8] Duh K, Neubig G, Sudoh K, et al. Adaptation data selection using neural language models:experiment in machine translation[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria:Association for Computational Linguistics, 2013:678-683. [9] 王星, 涂兆鹏, 谢军, 等. 一种基于分类的平行语料选择方法[J]. 中文信息学报, 2013, 27(6):144-150. [10] Liu Le, Hong Yu, Liu Hao, et al. Effective selection of translation model training data[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA:Association for Computational Linguistics, 2014:569-573. [11] Foster G, Kuhn R. Mixture-model adaptation for SMT[C]//Proceedings of the 2nd Workshop on Statistical Machine Translation. Prague, Czech Republic:Association for Computational Linguistics, 2007:128-135. [12] Matsoukas S, Rosti A V I, Zhang B. Discriminative corpus weight estimation for machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Singapore:Association for Computational Linguistics, 2009:708-717. [13] 曹杰, 吕雅娟, 苏劲松, 等. 利用上下文信息的统计机器翻译领域自适应[J]. 中文信息学报, 2010, 24(6):50-56. [14] Foster G, Goutte C, Kuhn R. Discriminative instance weighting for domain adaptation in statistical machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Massachusetts, USA:Association for Computational Linguistics, 2010:451-459. [15] Su Jinsong, Wu Hua, Wang Haifeng, et al. Translation model adaptation for statistical machine translation with monolingual topic information[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea:Association for Computational Linguistics, 2012:459-468. [16] Hewavitharana S, Mehay D N, Ananthakrishnan S, et al. Incremental topic-based translation model adaptation for conversational spoken language translation[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria:Association for Computational Linguistics, 2013:697-701. [17] Hasler E, Blunsom P, Koehn P, et al. Dynamic Topic Adaptation for Phrase-based MT[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden:Association for Computational Linguistics, 2014:328-337. [18] Chen B, Foster G, Kuhn R. Adaptation of reordering models for statistical machine translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Atlanta, Georgia:Association for Computational Linguistics, 2013:938-946. [19] Wang X, Xiong D, Zhang Min, et al. A topic-based reordering model for statistical machine translation[M]. Berlin Heidelberg:Springer, 2014. [20] Zhang B, Su J, Xiong D, et al. Discriminative reordering model adaptation via structural learning[C]//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina:AAAI Press, 2015:1040-1046. [21] Tillmann C, Zhang T. A localized prediction model for statistical machine translation[C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor, Michigan:Association for Computational Linguistics, 2005:557-564. [22] Blei D M, Andrew Y Ng, Michael I J. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022. [23] Koehn P, Och F, Marcu D. Statistical phrase-based translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada:Association for Computational Linguistics, 2003:48-54. [24] Koehn P, Hoang H, Birch A, et al. Moses:open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics on Interactive Poster and Demonstration Sessions. Prague, Czech Republic:Association for Computational Linguistics, 2007:177-180. [25] Xiao T, Zhu J, Zhang H, et al. NiuTrans:an open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea:Association for Computational Linguistics, 2012:19-24. [26] Franz J O, Hermann N. A systematic comparison of various statistical alignment models[J]. Computational Linguistics, 2003, 29(Jan):19-51. [27] Andreas S. SRILM-an extensible language modeling toolkit[C]//Proceedings of the 7th International Conference on Spoken Language Processing. Denver, Colorado, USA:Interspeech, 2002:901-904. [28] Franz J O. Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan:Association for Computational Linguistics, 2003:160-167. [29] Kishore P, Salim R, Todd W, et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania:Association for Computational Linguistics, 2002:311-318.