俞士汶,段慧明,朱学锋,孙斌. 北京大学现代汉语语料库基本加工规范(续)[J]. 中文信息学报, 2002, 16(6): 59-65.
YU Shi-wen,DUAN Hui-ming,ZHU Xue-feng,SUN Bin. The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION. , 2002, 16(6): 59-65.
北京大学现代汉语语料库基本加工规范(续)
俞士汶,段慧明,朱学锋,孙斌
北京大学计算机系,北京大学计算语言学研究所
The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION
YU Shi-wen,DUAN Hui-ming,ZHU Xue-feng,SUN Bin
Institute of Computational Linguistics, Peking University
Abstract:The Institute of Computational Linguistics ,Peking University has completed the basic processing of a contemporary Chinese corpus that has 27 million Chinese Characters. In addition to word segmentation and part-of-speech tagging ,the processing involves the tagging of proper nouns (person names ,place names ,organization names and so on) ,morpheme subcategories and the special usages of verbs and adjectives. The success of this large-scale language engineering is attributed to the SPECIFICATION ,which had been made beforehand and was being perfected while in use. We are hereby making an introduction to the SPECIFICATION through this publication ,thus inviting the comments from all the experts and our colleagues for the improvement of it.