编码字符集标准是计算机处理文字信息的基础,本文提出了编码字符集三元组抽象,对现有编码字符集标准进行了简单回顾和总结,深入剖析了影响巨大的ISO2022标准及其派生标准,对ISO2022编码机制应用于多语言环境的局限性进行了探讨,阐明了使用通用编码字符集UCS的必要性,并对其进行了分析。探讨了现有编码分类方法存在的问题,引入了一种对编码字符集以及实现方法进行分类的新方法,使用该方法对现有标准进行了归类;最后对汉字字符集相关的国家标准进行了分析评介。
Abstract
Coded character set standard are the bases of the computer text information processing. In this paper, a 3-turples model is proposed to descibe the coded character set. The existing code standards are reviewed and summarized. And the ISO 2022 and it's deriving standards are analyzed in detail; including the limitation of utilizing ISO 2022 in multilingual environment. Necessity of founding UCS (Universal Character Set) is presented, along with an outline analysis of UCS. After evaluating current classification methods of coded character set standards, a new method is produced with application in cataloguing existing standards. We close our paper with a brief analysis of important Chinese national standards on Han character set.
关键词
计算机应用 /
中文信息处理 /
编码字符集
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
coded character set
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Unicode Consortium, The. The Unicode Standard Version 4.0 [S]. Addison-Wesley, Reading, MA. 2003.
[2] ECMA-35 6th Edition. Character Code Structure and Extension Techniques[S].
[3] Zhu, HF. et al. Chinese Character Encoding for Internet Messages[S]. RFC 1922, March 1996.
[4] Lunde, Ken. CJKV Information Processing[M]. Sebastopol: O’Reilly & Associates, 1999.
[5] 陈季雷,杨裕衡,林守铿. 洞悉UNIX中文系统篇[M]. 台北: 和硕科技文化有限公司, 1994.
[6] Scheifler, Robert. Compound Text Encoding, Version 1.1 [S]. X Consortium Standard, X Version 11, Release 6.4. 1989.
[7] David Rosenthal and Stuart W. Marks. Inter-Client Communication Conventions Manual, Version 2.0 [S]. X Consortium Standard, X Version 11, Release 6.4.
[8] ISO/IEC 10646: 2003. Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Architecture and Basic Multilingual Plane Supplementary Planes[S].
[9] Atkin, Steven. A Framework for Multilingual Information Processing[D]. Doctor’s dissertation Florida Institute of Technology. December, 2001.
[10] GB/T 2311 - 90. 信息处理七位和八位编码字符集代码扩充技术[S].
[11] GB 13000.1 - 93. 信息技术通用多八位编码字符集(UCS) 第一部分:体系结构与基本多文种平面[S].
[12] GB 2312 - 80. 信息交换用汉字编码字符集 基本集[S].
[13] GB /T 7589 - 87. 信息交换用汉字编码字符集 第二辅助集[S].
[14] GB /T 7590 - 87. 信息交换用汉字编码字符集 第四辅助集[S].
[15] GB /T 8565.2 - 88. 信息处理 文本通信用编码字符集 第二部分 图形字符集[S].
[16] GB /T 12345 - 90. 信息交换用汉字编码字符集 辅助集[S].
[17] GB 18030 - 2000. 信息技术 信息交换用汉字编码字符集 基本集的扩充[S].
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863计划资助项目(2003AA1Z2110);中国科学院知识创新工程资助项目(KGCX2-SW-504)
{{custom_fund}}