一种面向句子的汉英口语翻译自动评分方法

李心广,陈帅,龙晓岚

PDF(3558 KB)
PDF(3558 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (7) : 54-62.
机器翻译

一种面向句子的汉英口语翻译自动评分方法

  • 李心广1,2,陈帅2,龙晓岚2
作者信息 +

Sentence Based Automatic Scoring Method for Chinese-English Oral Translation

  • LI Xinguang1,2 , CHEN Shuai2, LONG Xiaolan2
Author information +
History +

摘要

该文提出一种面向句子的汉英口语翻译自动评分方法,选取语义关键词、句子大意和口语流利度作为评分的主要参数。为了提高关键词评分的准确度,该文使用同义词辨析方法,识别考生答题关键词中的同义词;在句子层面,使用可伸展递归自编码(unfolding recursive auto-encoder,URAE)神经网络模型分析考生对句子大意的翻译;最后基于语速(tempo/rate)和语音的分布情况对口语流利度进行评分。综合三种参量加权评分,得到最后翻译质量的评分。实验结果表明,采用该文方法与人工评分结果具有较好的一致性,达到了预期设计目标。

Abstract

This paper proposes a sentence-based automatic scoring method for Chinese-English oral translation. Three main indicators are designed for evaluating keywords, general idea of sentences and fluency. As for keywords, this paper applies the synonym analysis to identify synonyms in candidate keywords. At the sentence level, the translation of sentences is evaluated by Unfolding Recursive Auto-Encoder (URAE). Then, fluency is scored by the speed of the speech. Finally, the weighted sum of the three indicators is generated as the overall translation quality score. The experimental results demonstrated that this automatic scoring method bears good consistency with manual scoring method.

关键词

汉英口语翻译自动评分 / 同义词辨析 / URAE神经网络 / 口语流利度

Key words

automatic scoring method of oral Chinese-English translation / synonym analysis / Unfolding Recursive Auto-Encoder (URAE) neural network / fluency

引用本文

导出引用
李心广,陈帅,龙晓岚. 一种面向句子的汉英口语翻译自动评分方法. 中文信息学报. 2021, 35(7): 54-62
LI Xinguang, CHEN Shuai, LONG Xiaolan. Sentence Based Automatic Scoring Method for Chinese-English Oral Translation. Journal of Chinese Information Processing. 2021, 35(7): 54-62

参考文献

[1] 罗凯洲, 韩宝成. Ordinate 与 SpeechRater口语自动评分系统评述与启示[J]. 外语电化教学, 2014 (4): 27-32.
[2] 严可,胡国平,魏思,等.面向大规模英语口语机考的复述题自动评分技术[J].清华大学学报(自然科学版),2009,49(S1):1356-1362.
[3] 朱洪涛, 黄桂敏. 基于声学模型自适应与支持向量回归的英语朗读发音质量评测模型[J]. 桂林电子科技大学学报, 2019, 39(5): 363-368.
[4] 陈桦, 吴奎, 李景娜. 英语口语自动评测新方法: 中国学生英语朗读自动评测系统[J]. 外语电化教学, 2019 (1): 13.
[5] 许苏魁, 戴礼荣, 魏思, 等. 自由表述口语语音评测后验概率估计改进方法[J]. 中文信息学报, 2017, 31(2): 212-219.
[6] Yoon S Y, Lee C. Content modeling for automated oral proficiency scoring system[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019: 394-401.
[7] Li Y L, Yan Y H. Research for automatic short answer scoring in spoken English test based on multiple features[J]. Journal of Electronics and Information Technology, 2012, 34(9): 2097-2102.
[8] Socher R, Huang E H, Pennin J, et al. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection[C]//Proceedings of the Advances in Neural Information Processing Systems, 2011: 801-809.
[9] An C, Huang J, Chang S, et al. Question similarity modeling with bidirectional long short-term memory neural network[C]//Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace (DSC). IEEE, 2016: 318-322.
[10] Xie N, Li S, Zhao J. ERCNN: Enhanced recurrent convolutional neural networks for learning sentence similarity[C]//Proceedings of the China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 119-130.
[11] Qian Y, Wang X, Evanini K, et al. Self-adaptive DNN for improving spoken language proficiency assessment[C]//Proceedings of the Interspeech, 2016: 3122-3126.
[12] Chen L, Tao J, Ghaffarzadegan S, et al. End-to-end neural network based automated speech scoring[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2018: 6234-6238.
[13] 张亮, 尹存燕, 陈家骏. 基于语义树的中文词语相似度计算与分析[J]. 中文信息学报, 2010, 24(6): 23-31.
[14] 于甜甜. 基于语义树的语句相似度和相关度在问答系统中的研究[D]. 济南:山东财经大学硕士学位论文, 2014.
[15] Goller C, Kuchler A. Learning task-dependent distributed representations by backpropagation through structure[C]//Proceedings of International Conference on Neural Networks. IEEE, 1996, 1: 347-352.
[16] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[17] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[18] 王耀华, 李舟军, 何跃鹰, 等. 基于文本语义离散度的自动作文评分关键技术研究[J]. 中文信息学报, 2016, 30(6): 173-181.
[19] 徐海铭, 金燕, 王磊. 口译水平测评中的语言指标效度研究: 以英语专业八级考试中的口译样本为例[J]. 外语测试与教学, 2016 (1): 1-12.
[20] 纪振发, 杨晖, 李然, 等. 基于短时自相关及过零率的语音端点检测算法[J]. 电子科技, 2016(09): 52-55.
[21] Yuan J, Liberman M. Phoneme, phone boundary, and tone in automatic scoring of mandarin proficiency[C]//Proceedings of the Interspeech, 2016: 2145-2149.
[22] Chen M, Zechner K. Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 722-731.
[23] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.

基金

国家自然科学基金(61877013);全国科学技术名词审定委员会2019年度科研项目(WT2019006);广东省科技创新战略专项资金(Pdjh2021a0170,Pdjh2021b0176)
PDF(3558 KB)

Accesses

Citation

Detail

段落导航
相关文章

/