简繁转换系统的最根本问题是语言文字的应用问题。其核心是针对中国大陆、中国港澳台等不同应用环境的简繁汉字对应关系和术语对照表。这项工作是十分复杂的,不可能一蹴而就,需要分阶段逐步完成。首要工作是要做好大陆地区简繁对应关系的分解,研制出适用于大陆内部的简繁转换系统。这一系统应该包括六大步骤,其中“字境”概念的引入,为提高简繁转换的准确率提供了有力的支撑。
Abstract
Different language usages in the mainland, Hong Kong, Macao and Taiwan derives the issue of Simplified and Traditional conversionin Chinese. The key issue is the corresponding table between Simplified and Traditional Chinese characters and terms,which is a complex task beyond an immediate soultion. A fundamental step is to decompose the correpondence between Simplified and Traditional Chinese characters for the mainland. The conversion system should include six steps, and the proposed concept of “character context” is a strong support to improve the accuracy of Simplified and Traditional conversion.
Key wordsSimplified characters; Traditional characters; Simplified and Traditional correspondence; Simplified and Traditional conversion
关键词
简化字 /
繁体字 /
简繁对应关系 /
简繁转换
{{custom_keyword}} /
Key words
Simplified characters /
Traditional characters /
Simplified and Traditional correspondence /
Simplified and Traditional conversion
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Jack Halpern, Jouni Kerman. The Pitfalls and Complexities of Chinese to Chinese Conversion[C]//Proceedings of the Fourteenth International Unicode Conference in Cambridge, MA. 1999: 458-466.
[2] 郭小武. 电子文本的简繁转换——关于简体古籍逆向工程的实验报告[J]. 语言文字应用,2000,(4):79-86.
[3] Fai Wong, Mingchui Dong, Kaseng Leong, et al. Chinese Conversion Based on Statistic Model[C]//Proceedings of the 5th Chinese Digitization Forum, Anhui, China, 2007: 168-174.
[4] 冯霞. 中文繁简转换研究与系统实现[D].北京师范大学,2008.
[5] Tianyong Hao, Chunshen Zhu. Simplified-traditional Chinese character conversion based on multi-data resources: Towards a fused conversion algorithm[C]//Proceedings of the 2nd International Conference on Next Generation Information Technology (ICNIT). 2011: 50-56.
[6] Yidong Chen, Xiaodong Shi, Changle Zhou. A Simplified-Traditional Chinese Character Conversion Model Based on Log-Linear Models[C]//Proceedings of 2011 International Conference on Asian Language Processing (IALP). 2011: 3-6.
[7] 李民祥, 吴世弘, 曾议庆, 等. 基于对照表以及语言模型之简繁字体转换[J].中文计算语言学期刊, 2010, 15(1): 19-36.
[8] 吴健, 刘汇丹. 基于词语消歧的分层次汉字简繁转换系统[J]. 中国语言战略, 2012, 1(1): 25-35.
[9] 王宁. 基于简繁汉字转换的平行词语库建设原则[J]. 语言文字应用,2007(4):8-12.
[10] 卜师霞,李智. 电子文本的繁简转换问题分析[J]. 南阳师范学院学报(社会科学版), 2006,10:54-57.
[11] 王立军. 谈字频统计中的“能现度”问题[C]//两岸四地中文数字化合作论坛, 香港2004.
[12] 辛春生, 孙玉芳. 简繁汉字转换系统的设计与实现[J]. 软件学报, 2000,(11).
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委“十二五”重点项目“汉语文本简繁转换系统研制”(ZD1125-8)
{{custom_fund}}