Research on New Energy Patent Machine Translation Integrating Terminology Information
YOU Xindong1,YANG Haixiang1,CHEN Haitao2,SUN Tian1,2,LV Xueqiang1
1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2. School of Foreign Languages, Beijing Information Science and Technology University, Beijing 100192, China)
Abstract:The traditional neural machine translation is a black box and cannot effectively add terminology information. It is of practical significance to use term provided by the user to jointly train the neural machine translation model. Accordingly, we propose a new energy transformer patent machine translation model with terminology information incorporated. The source term is replaced with the target term and the target term is added after the source term to fusing the terminology information. Experimentsal results on the ChineseEnglish task with patent termbase in the field of new energy show that the proposed patent translation model is better than the Transformer baseline model, as well as the translation quality analysis on three datasets.
[1]赵子甲.基于专利视角的我国新能源产业技术发展态势研究[J].中国发明与专利,2019,16(10): 36-42.
[2]张霄军,刘群.第十四届机器翻译峰会(MT Summit XIV)综述[J].中文信息学报,2015,29(01): 203-206.
[3]Huck M, Hangya V, Fraser A. Better OOV translation with bilingual terminology mining[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019: 5809-5815.
[4]晋耀红.一种混合策略的专利机器翻译系统研究[J].计算机工程与应用, 2012, 48(4): 29-32.
[5]张冬梅,晋耀红.面向专利机器翻译的要素句蜕识别和转换研究[J].计算机科学, 2014, 41(S1): 67-71.
[6]朱筠,晋耀红.面向汉英专利机器翻译的复杂谓语形态转换研究[J].语言文字应用,2015, 2(1): 127-135.
[7]胡韧奋.面向汉英专利机器翻译的介词短语自动识别策略[J].语言文字应用, 2015, 2(1): 136-144.
[8]李洪政,赵凯, 胡韧奋, 等.面向专利领域的汉英机器翻译融合系统[J].情报工程, 2017, 3(3): 105-115.
[9]Susanto R H, Htun O, Tan L. Sarahs participation in WAT 2019[C]//Proceedings of the 6th Workshop on Asian Translation,2019: 152-158.
[10]Park C, Jung Y J, Kim K, et al. Knu-Hyundais NMT system for scientific paper and patent tasks on WAT 2019[C]//Proceedings of the 6th Workshop on Asian Translation, 2019: 81-89.
[11]Hokamp C, Liu Q. Lexically constrained decoding for sequence generation using grid beam search[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1535-1546.
[12]Post M, Vilar D. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 1314-1324.
[13]Hu J E, Khayrallah H, Culkin R, et al. Improved lexically constrained decoding for translation and monolingual rewriting[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 839-850.
[14]Hasler E, De Gispert A, Iglesias G, et al. Neural machine translation decoding with terminology constraints[C]//Proceedings of Conference ot the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 506-512.
[15]Burlot F. Lingua custodia at WMT19: Attempts to control terminology[C]//Proceedings of the 4th Conference on Machine Translation, 2019: 147-154.
[16]Crego J, Kim J, Klein G, et al. Systrans pure neural machine translation systems[J].Computer science, 2016, 1(1): 1-23.
[17]Song K, Zhang Y, Yu H, et al. Code-switching for enhancing[C]//Proceedings of the Conference of the North, 2019: 1-11.
[18]Dinu G, Mathur P, Federico M, et al. Training neural machine translation to apply terminology constraints[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3063-3068.
[19]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Breach, 2017: 6000-6010.
[20]孙甜,陈海涛,吕学强,等.新能源专利文本术语抽取研究[J/OL].小型微型计算机系统: 1-10.http://kns.cnki.net/kcms/detail/21.1106.TP.20210511.1556.002.html.[2021-08-11]
[21]Philipp Koehn, Hieu Hoang, Alexandra Birch, et al. Moses: open source toolkit for statistical machine translation [C]//Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and De-mon-stration Sessions. Prague, Czech Republic, 2007: 177- 180.
[22]Ott M, Edunov S, Baevski A, et al. fairseq: a fast, extensible toolkit for sequence modeling[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2019: 48-53.
[23]Kingma D P, Ba J. Adam: a method for stochastic optimization[C]//Proceeding of the International Conference on Learning Representations, 2015: 1-15.
[24]Wiseman S, Rush A M. Sequence-to-sequence learning as beam-search optimization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 1296-1306.
[25]Papineni K,Roukoss, Ward T, et al.BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and De-mon-stration Ses-sions. Prague, Czech Republic, 2007, 1(1): 177-180.
[26]苏依拉,高芬.中文字粒度切分在蒙汉机器翻译的应用[J].中文信息学报, 2019, 33(12): 54-60.