Ethnic Language Processing and Cross Language Processing
LAI Hua, GAO Yumeng, HUANG Yuxin, YU Zhengtao, ZHANG Yongbing
2022, 36(3): 45-53,63.
Recently, the evaluation method of text generation based on pre-trained language model has gained attention, which evaluates the quality of generated text by computing the granularity similarity of sub-words of two sentences. However, for languages that contain many adhesive morphemes, such as Vietnamese and Thai, a single syllable or sub-word cannot form the semantic integrity, which means that the sub-word granularity matching method cannot fully represent the semantic relationship between two sentences. Therefore, we propose a text generation evaluation method with multi-granularity features of sub-words, syllables, and phrases. After the representation of text is obtained by MBERT, the semantic similarity of syllables and phrases is introduced to enhance the evaluation model of sub-words. Experimental results on such tasks as cross-language summarization, machine translation, and data screening show that, compared with ROUGE, BLEU based on statistical evaluation and Bertscore based on deep semantic matching, the proposed metric correlates better with human judgments.