针对数字编码的特点,本文提出了一种在不改变编码方案的情况下通过改进输入规则,结合语言模型,实现汉字数字编码的智能输入技术。文章首先讨论了怎样设计字词码本结构,使之能够满足灵活多样的输入方式,继而设计了一种动态自学习语言模型,重点分析了数据平滑算法在语言模型中的应用与改进,最后通过一个输入法示例程序,对改进前后不同情况下的输入效果进行了测试。实验表明,这种输入技术不但降低了输入法的平均码长,而且显著地提高了首字命中率。
Abstract
An intelligent digital code-based input technique for Chinese characters, which features in improving the input rules without modifying the original coding scheme and combining the language model, is proposed. The paper disusses how to design the Chinese character and word code to meet the various input modes at first. then designs a dynamic self-study language model, and analyses the data smoothing algorithm in the language model. The experimental results regarding the input performance are given at last, by comparing the intelligent input method with the orginal method, showing that the proposed input technique can not only reduce the average input code length, but also improve the hit rate of the first candidate character.
关键词
计算机应用 /
中文信息处理 /
汉字输入 /
数字编码 /
智能输入 /
动态自学习语言模型
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
Chinese character input /
digital code /
intelligent Input /
dynamic self-study language model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 马少平,夏莹,等. 基于词词同现概率的拼音汉字自动转换方法[J]. 电子计算机与外部设备, 1997, 21 (3) : 16 - 19.
[2] 吴军,王作英,等. 一种基于语言理解的输入方法[J]. 中文信息学报, 1996, 10 (2) : 56 - 61.
[3] 徐志明,王晓龙,姜守旭. 一种语句级汉字输入技术的研究[J]. 高技术通讯, 2000, (1) : 51 - 56.
[4] 马少平,夏莹,张金岭. 智能型数字码汉字输入技术[J]. 电子计算机与外部设备, 1999, (2) : 27 - 29.
[5] 王华,王晋豪,杨妙玲. 智能笔划输入法的研制和应用[J]. 艺术科技, 2003, (1) : 50 - 52.
[6] 陈一凡,朱亮. 汉字键盘输入智能处理软件综述[J]. 中文信息学报, 2003, 17 (2) : 60 - 65.
[7] 人民日报数据. http://library.suda.edu.cn /wlsjk/jinbao.htm.
[8] Stanley F Chen, Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling[J]. In proceedings of the 34th Annual Meeting of the ACL, 1996: 310 - 318.
[9] Stanley F Chen, Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling[R]. Technical Report TR-10-98, Center for Research in Computing Technology, Harvard University, 1998.
[10] Kenneth W Church, William A Gale. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams [J]. Computer Speech and Language, 1991, (5) : 19 - 54.
[11] SlavaM Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer[R]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1987, 35 (3) : 400 - 401.
[12] 张华平,刘群. 基于N2最短路径的中文词语粗分模型[J]. 中文信息学报, 2002, 16 (5) : 1 - 7.
[13] ICTCLAS. http://www.nlp.org.cn.
[14] GB/T19246 - 2003,信息技术通用键盘汉字输入通用要求[S].
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
江苏省高技术研究项目资助(BG2005020);江苏省教育厅自然基金资助(04KKB320134)
{{custom_fund}}