王 玲2, 达瓦·伊德木草1,2,吾守尔·斯拉木1,2. 维哈柯及蒙语多文种语言相似性考查研究[J]. 中文信息学报, 2013, 27(6): 180-187.
WANG Ling2, DAWA Yidemucao1,2, WU Shouer Silamu1,2. An Investigation Research on the Similarity of Uyghur Kazakh Kyrgyz and Mongolian languages. , 2013, 27(6): 180-187.
An Investigation Research on the Similarity of Uyghur Kazakh Kyrgyz and Mongolian languages
WANG Ling2, DAWA Yidemucao1,2, WU Shouer Silamu1,2
1. College of Information Science & Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China; 2. Xinjiang Laboratory of Multi-language Information Technology, Xinjiang University, Urumqi, Xinjiang 830046, China
Abstract:In this paper, an investigation is done for the similarity between the same family and agglutinative languages (such as Altai family languages ,for example, Uyghur, Kazakh, Kyrgyz and Mongolian using different countries and areas ). Cosine similarity measure is used to calculate the similarity using the parallel texts and the acoustic features extracted from the same content speech sentences spoken by the different language speakers. Experimental results show that the transformation is more feasible by word to word units when learning the connection rule of a stem and an affix (function words) between languages by word level and common acoustic models. Thus, this avoids the uphill work of MT for the resource-deficient languages such as minority languages being used in the developing countries. Additionally, the costs can be reduced. Key wordssame family and agglutinative language; parallel text; acoustic and prosody parameters; F0; similarity
[1] Wushour Slam, et al, Speech Processing Technology of Uyghur Language[C]//Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments, 2009: 11-16. [2] 卡哈尔江,等. 一种改进的维吾尔语句子相似度计算方法[J], 中文信息学报,2011, 25(4): 50-53. [3] 伊·达瓦,等. 语料资源缺乏的连续语音识别方法的研究[J], 自动化学报,2010, 36(4): 550-557. [4] Shuichi Itahashi, Chiu-yu Tseng. Computer Processing of Oriental Languages[M]. 2010. World Scientific,www.American-sGroup.com. [5] T Schultz, A Waibel. Fast Bootstrapping of LVCSR System with Multilingual Phoneme Sets[C]//Proceedings of Eurospeech 2001: 371-374. [6] Lin jun Zhang, et al. Cross-Language information retrival, Journal of Computer Science,2004,31(7), 16-19. [7] EHARA Terumasa, et al. Mongolian to Japanese machine translation system[C]//Proceedings of second international symposium on information and language processing, 2007: 27-33. [8] Idomucogiin Dawa, Satoshi Nakamura. A Study on Cross Transformation of Mongolian Family Language[J], Journal of Natural Language Processing, J-STAGE, 2008,15 (5): 3-21. [9] 达瓦·伊德木草. 基于机器翻译的蒙文多文本转写方法的研究[C]//新疆维吾尔自治区科技厅自然科学基金资助项目(2011211A012). [10] 伊·达瓦等, 蒙古语语言—文字的自动化处理[J]. 中文信息学报,2006, 20(4): 56-62. [11] Jun Ye. Cosine similarity measures for intuitionistic fuzzy sets and their applications[J]. Mathmatical and Computer Modeling, 2011, 53: 91-97. [12] TSchultz, A Waibel. Experiments on Cross Language Acoustic Modeling[C]//Proceedings of Eurospeech, 2001. [13] 古井 贞熙. 音响·音声工学[M], 东京, 近代科学社,1992. [14] 伊·达瓦, 大川 茂村,白井 克彦, 蒙古语七个元音声频特性计算机分析[J], 声学学报,1999, 24(1): 94-97.