[1] GRANT E, JEREMY G. Coverage and cynicism: The AFRL submission to the WMT 2018 parallel corpus filtering task[C]//Proceedings of the WMT, 2018: 872-876.
[2] NICK R, JAN R, YUNSU K, et al. The RWTH Aachen university filtering system for the WMT parallel corpus filtering task[C]//Proceedings of the WMT, 2018:946-954.
[3] HUDA K, HAINAN X, PHILIPP K. The JHU parallel corpus filtering systems for WMT[C]//Proceedings of the WMT, 2018: 896-899.
[4] RUI W, BENJAMIN M, MASAO U, et al. NICTs corpus filtering systems for the WMT parallel corpus filtering task[C]//Proceedings of the WMT, 2018:963-967.
[5] VCTOR M, SNCHEZ CARTAGENA M, SERGIO O R, et al. Prompsits submission to WMT parallel corpus filtering shared task[C]//Proceedings of the WMT, 2018:955-962.
[6] GABRIEL B C, CHI KIU L. NRC parallel corpus filtering system for WMT[C]//Proceedings of the WMT, 2019:252-260.
[7] MURATHAN K, ROBERT . Noisy parallel corpus filtering through projected word embeddings[C]//Proceedings of the WMT, 2019:277-281.
[8] VISHRAV C, TANG Y Q, FRANCISCO G, et al. Low-resource corpus filtering using multilingual sentence embeddings[C]//Proceedings of the WMT, 2019: 261-266.
[9] VIKTORH, ALEXANDER F. An unsupervised system for parallel corpus filtering[C]//Proceedings of the WMT, 2018:882-887.
[10] PHAM M Q, JOSEP C, JEAN S. SYSTRAN participation to the WMT shared task on parallel corpus filtering[C]//Proceedings of the WMT, 2018: 934-938.
[11] GUILLAUME L, ALEXIS C. Cross-lingual language model pretraining[J]. arXiv preprint arXiv:1901.07291, 2019.
[12] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale[C]//Proceedings of the ACL, 2020: 8440-8451.
[13] ONDRˇEJ B, CHRISTIAN F, MARK F, et al. Findings of the conference on machine translation[C]//Proceedings of the WMT, 2018: 272-303.
[14] BANERJEE S, LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005: 65-72.
[15] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceddings of the NIPS, 2017:5998-6008.
[16] PIOTR B, EDOUARD G, ARMAND J, et al. Enriching word vectors with subword information[J]. arXiv preprint arXiv:1607.04606,2017.
[17] CHOPRA S, HADSELL R, LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]//Proceddings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005:539-536.
[18] BERTINETTO L, VALMADRE J, HENRIQUES J F, el al. Fully-convolutional Siamese networks for object tracking[C]//Proceddings of European Conference on Computer Vision Workshops, 2016:850-865.
[19] NECULOIU P, VERSTEEGH M, ROTARU M. Learning text similarity with Siamese recurrent networks[C]//Proceddings of the 1st Wrokshop on Representation Learning for NLP, 2016:148-157.
[20] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL, 2018: 4171-4186.
[21] PETERS M, NEUMANN M,IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the NAACL-HLT, 2018: 2227-2237.
[22] RADFORD A, NARASIMHAN K,SALIMANS T, et al. Improving language understanding by generative pre-training[OL].https://blog.openai.com/language-unsupervised, 2018.
[23] VLADIMIR I L. Binary codes capable of correcting deletions, insertions and reversals[J]. Soviet Physics Doklady, 1966, 163(4): 845-848.
[24] PHILIPP K, HIEU H, ALEXANDRA B, et al. Moses: Open source toolkit for statistical machine translation[C]//Proceedings of the ACL, 2007: 177-180.
[25] MARCIN J D, ROMAN G, TOMASZ D, et al. Marian: Fast neural machine translation in C++[C]//Proceedings of the ACL, 2018:116-121.
[26] KISHORE P, SALIM R, TODD W, et al. Bleu: A method for automatic evaluation of machine translation[C]//Proceedings of the ACL, 2002: 311-318.
[27] HALUK A, TALHA , PINAR E A H, et al. Filtering noisy parallel corpus using transformers with proxy task learning[C]//Proceedings of the WMT, 2020:940-946.
[28] LI M X, XIANG Q Y, CHEN Z M, et al. A unified neural network for quality estimation of machine translation[J]. Ieice Transactions on Information and Systems, 2018, 101(9): 2417-2421.
[29] YANG Y F, DANIEL C, AMIN A, et al. Multilingual universal sentence encoder for semantic retrieval[C]//Proceedings of the ACL, 2020:87-94.

涂杰(1998—),硕士研究生,主要研究领域为自然语言处理和机器翻译。
E-mail: jietu@jxnu.edu.cn

李茂西(1977—),通信作者,博士,教授,主要研究领域为自然语言处理和机器翻译。
E-mail: mosesli@jxnu.edu.cn

裘白莲(1981—),博士,讲师,主要研究领域为计算语言学和机器翻译。
E-mail: qiubl@ecjtu.edu.cn