阮翀,施文娴,李岩昊,翁伊嘉,胡俊峰. 基于多译文的中文转述语料库建设及转述评价方案[J]. 中文信息学报, 2018, 32(12): 67-73.
RUAN Chong, SHI Wenxian, LI Yanhao, WENG Yijia, HU Junfeng. Multi-translation Based Chinese Paraphrase: Evaluation Metric and Corpus. , 2018, 32(12): 67-73.
Abstract:Paraphrase corpus is fundamental to research in paraphrase phenomenon, while Chinese paraphrase corpus is hardly available in academia. In this paper, we collected multiple Chinese translations of the novel Jane Eyre, obtaining roughly 50 000 parallel paraphrasing sentences. Then, we managed to extract more than 9 000 pairs of lexical paraphrase knowledge. We further modified METEOR, an automatic machine translation evaluation metric, to better evaluate Chinese paraphrase quality and provided a Chinese paraphrase evaluation dataset. The close test proved a better quality of our mined knowledge than that of Tongyici Cilin.
[1] Dolan B,Brockett C,Quirk C.Microsoft research paraphrase corpus[J].Retrieved March,2005,(29): 2008. [2] Lin T Y,et al.Microsoft coco: Common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision.Springer,Cham,2014: 740-755. [3] 董振东,董强.知网和汉语研究[J].当代语言学,2001,3(1):33-44. [4] 梅家驹.同义词词林[M].上海: 上海辞书出版社,1983. [5] Banerjee S,Lavie A.Meteor: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization,2005: 65-72. [6] Denkowski M,Lavie A.METEOR-next and the meteor paraphrase tables: Improved evaluation support for five target languages[C]//Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR.Association for Computational Linguistics,2010: 339-342. [7] Denkowski M,Lavie A.Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems[C]//Proceedings of the 6th Workshop on Statistical Machine Translation.Association for Computational Linguistics,2011: 85-91. [8] Denkowski M,Lavie A.Meteor universal: Language specific translation evaluation for any target language[C]//Proceedings of the 9th Workshop on Statistical Machine Translation,2014: 376-380. [9] Wang T,Hirst G.Exploring patterns in dictionary definitions for synonym extraction[J].Natural Language Engineering,2012,18(3): 313-342. [10] Turney P D.Mining the web for synonyms: PMI-IR versus LSA on TOEFL[C]//Proceedings of the 12th European Conference on Machine Learning.Springer,Berlin,Heidelberg,2001: 491-502. [11] Bannard C,Callison-Burch C.Paraphrasing with bilingual parallel corpora[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2005: 597-604. [12] Barzilay R,McKeown K R.Extracting paraphrases from a parallel corpus[C]//Proceedings of the 39th annual meeting on Association for Computational Linguistics.Association for Computational Linguistics,2001: 50-57. [13] Liu C,Dahlmeier D,Ng H T.PEM: A paraphrase evaluation metric exploiting parallel texts[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2010: 923-932. [14] Papineni K,et al.BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002: 311-318. [15] Moore R C.Fast and accurate sentence alignment of bilingual corpora[C]//Proceedings of Conference of the Association for Machine Translation in the Americas.Springer,Berlin,Heidelberg,2002: 135-144. [16] Gale W A,Church K W.A program for aligning sentences in bilingual corpora[J].Computational Linguistics,1993,19(1): 75-102. [17] Brown P F,et al.The mathematics of statistical machine translation: Parameter estimation[J].Computational Linguistics,1993,19(2): 263-311. [18] Lacoste-Julien S,et al.Word alignment via quadratic assignment[C]//Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics.Association for Computational Linguistics,2006: 112-119. [19] Mikolov T,et al.Efficient estimation of word representations in vector space[J].arXiv preprint,2013. [20] Mikolov T,et al.Distributed representations of words and phrases and their compositionality[J].arXiv preprint,2013. [21] Och F J,Ney H.A systematic comparison of various statistical alignment models[J].Computational Linuistics,2003,29(1): 19-31. [22] Bron C,Kerbosch J.Algorithm 457: Finding all cliques of an undirected graph[J].Communications of the ACM,1973,16(9): 575-577. [23] Luong T,Pham H,Manning C D.Effective Approaches to Attention-based Neural Machine Translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,2015: 1412-1421. [24] Ma S,et al.Bag-of-Words as target for neural machine translation[J].arXiv preprint,2018,arXiv: 1805.04871. [25] Kingma D,Ba J.Adam: A method for stochastic optimization[J].arXiv preprint,2014,arXiv,1412,6980. [26] Yang L,Sun M.Improved learning of Chinese word embeddings with semantic knowledge[M].Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.Springer,Cham,2015: 15-25. [27] Xu J,et al.Improve Chinese word embeddings by exploiting internal structure[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2016: 1041-1050.