EBMT中高效的维吾尔语单词散列表构造算法

田生伟,吐尔根·依布拉音,禹龙

PDF(363 KB)
PDF(363 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (4) : 124-129.
综述

EBMT中高效的维吾尔语单词散列表构造算法

  • 田生伟1,吐尔根·依布拉音1,禹龙2
作者信息 +

Efficient Hash Algorithm for Uyhur Words in EBMT

  • TIAN Shengwei1, Turgun Ibrahim1, YU Long2
Author information +
History +

摘要

基于实例的机器翻译(EBMT)是一种高效的机器翻译方法,如何快速地从海量实例模式库中找出与待翻译句子相似的候选实例,是EBMT研究的关键技术之一。统计分析维吾尔语单词字母的分布特征,构造了基于维吾尔语单词的倒排索引散列表,在等概率条件下,平均查找长度为1.59;依据散列冲突的同义词在维吾尔语料中出现的频率作为权值,提出了一种新颖的解决散列冲突的算法同义词次优树算法。实验显示,算法的性能比传统的顺序查找和二分查找算法分别高出了27.5 %,21.8%,证明了该算法在EBMT中有较高的检索效率。

Abstract

The efficient retrieval of the candidate translation example from the large scale translation example base is fundamental issue in the study of EBMT. This paper proposes an Uyhur t Hash function designed according to the distribution of the uyhur words and characters, which, on the equiprobable condition, facilitate an average search length of 1.59. To resovle the conflict in the Hash table, a new mechanism name second optimal tree for synonym is established as regards to the frequency of the conflicting Urhur words. The experiments show that the proposed approach achieves 27.5% and 21.8% improvement in the performance compared with the sequential chain and binary search approach respectively.
Key wordscomputer application; Chinese information processing; EBMT; hash; average search length; second optimal tree

关键词

计算机应用 / 中文信息处理 / EBMT / 散列 / 平均查找长度 / 次优树

Key words

computer application / Chinese information processing / EBMT / hash / average search length / second optimal tree

引用本文

导出引用
田生伟,吐尔根·依布拉音,禹龙. EBMT中高效的维吾尔语单词散列表构造算法. 中文信息学报. 2009, 23(4): 124-129
TIAN Shengwei, Turgun Ibrahim, YU Long. Efficient Hash Algorithm for Uyhur Words in EBMT. Journal of Chinese Information Processing. 2009, 23(4): 124-129

基金

国家自然科学基金资助项目(60663006)
PDF(363 KB)

582

Accesses

0

Citation

Detail

段落导航
相关文章

/