熊 丹,陆 勤,罗凤珠,石定栩,赵天成. 基于语料库的明清小说人名与称谓研究[J]. 中文信息学报, 2015, 29(1): 19-27.
XIONG Dan, LU Qin, LUO Fengzhu, SHI Dingxu, ZHAO Tiancheng. A Corpus-Based Study on Personal Names and Terms of Address in Chinese Classical Novels. , 2015, 29(1): 19-27.
A Corpus-Based Study on Personal Names and Terms of Address in Chinese Classical Novels
XIONG Dan1, LU Qin1, LUO Fengzhu2, SHI Dingxu3, ZHAO Tiancheng1
1. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; 2. Department of Chinese Linguistics & Literature, Yuan Ze University, Taiwan, China; 3. Department of Chinese & Bilingual Studies, Hong Kong Polytechnic University, Hong Kong, China
Abstract:Personal names and terms of address are important parts of named entities. The recognition of personal names as well as terms of address is ans essential issue in natural language processing. This paper presents a classification and annotation scheme for personal names and terms of address from the perspective of named entity recognition and information extraction on a corpus of four Chinese classical novels. Personal names and terms of address are categorized into simple types and compound types. And the compound-type is further categorized into four subtypes, fixed expressions, appositive constructions, subordinate constructions of affiliation, and other subordinate constructions. This paper also presents a comparative analysis on these types and the characteristics of the four novels based on full statistics of the annotated corpus.
[1] Lu Q, Chan S T, Xu R F, et al. A Unicode based Adaptive Segmentor[J]. Journal of Chinese Language and Computing, 2004, 14(3): 221-234. [2] 俞士汶,段慧明,朱学锋,等.北大语料库加工规范: 切分·词性标注·注音[J]. Journal of Chinese Language and Computing, 2003, 13(2): 121-158. [3] 魏培泉,谭朴森,刘承慧,等.建构一个以共时与历时语言研究为导向的历史语料库[J]. Computational Linguistics and Chinese Language Processing, 1997, 2(1): 131-145. [4] 中央研究院近代汉语语料库[DB/OL]. http://early_mandarin.ling.sinica.edu.tw/ [5] 郑尔宁.近二十年来现代汉语称谓语研究综述[J].语文学刊,2005,2: 120-122. [6] Dickey E. Forms of address and terms of reference[J]. Journal of Linguistics, 1997, 33(2): 255-274. [7] Braun F. Terms of Address: Problems of patterns and usage in various languages and cultures[M]. Berlin, New York, Amsterdam: Mouton de Gruyter, 1988. [8] 李学勤主编,(晋)郭璞注.尔雅注疏[M].北京:北京大学出版社,1999: 116-123. [9] (清)梁章钜.称谓录[M].长沙:岳麓书社,1991. [10] 杨应芹,诸伟奇.古今称谓词典[M].合肥: 黄山书社,1989. [11] 陆瑛.简明称谓辞典[M].广西: 广西民族出版社,1989. [12] 韩省之.称谓大辞典[M].北京: 新世界出版社,1991. [13] 吴海林.中国古今称谓全书[M].哈尔滨: 黑龙江教育出版社,1991. [14] 吉常宏.汉语称谓大词典[M].石家庄: 河北敎育出版社,2001. [15] Xiong D, Lu Q, Lo F J, et al. Specification for Segmentation and Named Entity Annotation of Chinese Classics in the Ming and Qing Dynasties[C]//Proceedings of the Chinese Lexical Semantics (CLSW2012 Revised Selected Papers), Lecture Notes in Computer Science, Volume 7717. Berlin, Heidelberg: Springer, 2013: 280-293. [16] 台湾经济部中央标准局.CNS14366,中文资讯处理分词规范[S].台湾: 经济部中央标准局,1996. [17] 国家技术监督局.中华人民共和国国家标准GB13715,信息处理用现代汉语分词规范[S].北京: 中国标准出版社,1992. [18] 夏迎炬,于浩,西野文人.《人民日报》语料库命名实体分类的研究[J]. Computational Linguistics and Chinese Language Processing, 2005, 10(4): 533-542.