计算机与不同用户的交互通常必须实现通过多种文字信息的输入/输出以实现,因此操作系统对多种文字的支持程度是其功能性的一个衡量标准。各种文字特征的巨大差异导致现代操作系统的文字处理实现非常复杂。本文总结了操作系统文字处理的范围与内容,包括文本输入与存储,文本处理以及用户交互处理;归纳了通用的文字处理模型和可能采取的技术途径及其优缺点;分析了常用操作系统的文字处理实现;最后展望了文字处理仍面临的挑战。
Abstract
The implementation of multilingual text I/O is essential for computers to interact with all sorts of users in the world. One of the most important functionalities for a computer is , how and to which extent its operating system supports languages with multi-scripts. Owing to considerable differences amongst scripts , multilingual text processing in an a global operating system is very complicated. In this paper , firstly , the scope and the content of multilingual text processing are defined , including text input , store , processing and interactions in an internationalized manner. Secondly , models for text processing are outlined ; several technical solutions are discussed ; the pros and cons are listed. Thirdly , technical features of text processing implemented by current operating systems are analyzed. Finally , some challenges in the realm of internationalized text processing are presented.
关键词
计算机应用 /
中文信息处理 /
综述文字处理 /
复杂文字 /
字体模型
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
overview /
text processing /
complex script /
font model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 戴昭铭. 规范语言学探索[M] . 上海三联书店,1998年7月.
[2] 周有光. 比较文字学初探[M] . 北京:语文出版社,1998年11月.
[3] Joan Aliprand , Julie Allen , Joe Becker , Mark Davis , Michael Everson , Asmus Freytag , John Jenkins , Mike Ksar , Rick McGowan , Eric Muller , Lisa Moore , Michel Suignard , Ken Whistler. The Unicode Standard Version 4.0[M] . Addison-Wesley , August 2003.
[4] ISO/IEC JTC1/SC34. ISO/IEC 9541 - 1 Information technology-Font information interchange-Part 1 : Architecture , 1991.
[5] The Unicode Consortium. The Unicode Standard Version 4.0.0[M] , chapter UAX # 29 Text Boundaries. Available at http://www.unicode.org/, 2003.
[6] The Open Group. CAE Specification 3/Portable Layout Services : Context-Dependent and Directional Text. Number C616. Available at http://www.opengroup.org/onlinepubs/9638399/toc.htm , February 1997.
[7] The Unicode Consortium. The Unicode Standard Version 4.0.0[M] , chapter UAX # 9 The Bidirectional Algorithm. Available at http://www.unicode.org/, 2003.
[8] The Unicode Consortium. The Unicode Standard Version 4.0.0[M] , chapter UTN # 2 A General Method for Rendering Combining Marks. Available at http://www.unicode.org/, 2003.
[9] ISO/IEC JTC1/SC2. ISO/IEC TR 15285 : 1998 Information technology - An operational model for characters and glyphs , 1998.
[10] 肖明,胡金柱,赵慧. 字形技术及OpenType字体文件格式研究[J] . 中文信息学报,1999 ,13 (6) :53 - 60.
[11] John H. Jenkins. The Unicode character-glyph model : Case studies [A] . In : 15th International Unicode Conference [C] , San Jose , California , USA , September 1999. The Unicode Consortium.
[12] Joshua Hadley , DirkMeyer. Script-specific font features [A] . In : Eighteenth International Unicode Conference[C] , Hong Kong , April 2001. The Unicode Consortium.
[13] Apple Computer , Inc. . Inside Mac OS X-Rendering Unicode Text with ATSUI. Available at http://developer.apple.com/documentation/, December 2002.
[14] Alexander Gelfenbain. Standard type service framework-Unicode-based framework for rendering typographically sophisticated text [A] . In : 22th International Unicode Conference [C] , San Jose , California , September 2002. The Unicode Consortium.
[15] Ienup Sung. Universal multi-script layout engine for complex text layout scripts[A] . In : 14th International Unicode Conference , Boston , Massachusetts , USA , March 1999. The Unicode Consortium.
[16] Sharon Correll. Graphite : An extensible rendering engine for complex writing systems[A] . In : Seventeenth International Unicode Conference (IUC17) , San Jose , California , September 2000. The Unicode Consortium.
[17] Eric Mader. An ICU open source library supporting the display of complex scripts [A] . In : 19th International Unicode Conference[C] , San Jose , California , September 2001. The Unicode Consortium.
[18] Keith Packard. The Xft font library : Architecture and users guide [A] . In :USENIX XFree86 Conference Proceedings[C] , Oakland , California , USA , November 2001. The USENIX Association.
[19] Michael Everson. Leaks in the Unicode pipeline : Script , script , script ?[A] . In : 21th International Unicode Conference[C] , Dublin , Ireland , May 2002. The Unicode Consortium.
[20] 黄行. 我国少数民族文字的类型、功能和规划工作[A] . 见:赵丽明、黄国营编,汉字的应用与传播——’99汉字应用与传播国际学术研讨会论文集[C] . 北京:华语教学出版社,2000年,220 - 238.
[21] 道布,谭克让. 中国少数民族文字[M] . 中国藏学出版社,1991年.
[22] 李宇明. 搭建中华字符集大平台[J] . 中文信息学报,2003 , 17 (2) :1 - 6.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863计划资助项目(2003AA1Z2110);中国科学院知识创新工程资助项目(KGCX2-SW-504)
{{custom_fund}}