先秦汉语在汉语史研究上具有重要地位,然而以往的研究始终没有形成结构化的先秦词汇资源,难以满足古汉语信息处理和跨语言对比的研究需要。国际上以英文词网(WordNet)的义类架构为基础,建立了数十种语言的词网,已经成为多语言自然语言处理和跨语言对比的基础资源。该文综述了国内外各种词网的构建情况,特别是古代语言的词网和汉语词网,且详细介绍了先秦词网的构建和校正过程,构建了涵盖43 591个词语、61 227个义项、17 975个义类的先秦词网。该文还通过与古梵语词网的跨语言对比,尝试分析这两种古老语言在词汇上的共性和差异,初步验证先秦词网的价值。
Abstract
Pre-Qin ancient Chinese plays an important role in the history of Chinese language. However, there is no well-structured lexical resources of Pre-Qin ancient Chinese, which is essential in ancient language processing and cross language comparison. This paper summarizes the construction methods of WordNet, which a well-formed semantic hierarchy developed for tens human languages, with a special focus in ancient languages’ and Chinese WordNets. This paper then presents the construction and data checking process of the WordNet for Pre-Qin ancient Chinese (PQAC-WN), which covers 43 591 words, 61 227 senses and 17 975 synsets. By cross language comparison with the ancient Sanskrit WordNet, this paper analyzes the lexical similarities and differences of the two ancient languages, thus preliminarily verifying the application of the PQAC-WN.
关键词
词网 /
先秦汉语 /
跨语言对比 /
古文信息处理
{{custom_keyword}} /
Key words
WordNet /
Pre-Qin ancient Chinese /
cross-language comparation /
ancient Chinese information processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 王力. 汉语史稿(中册)[M]. 北京:科学出版社, 1958.
[2] 杨合鸣.诗经词典[M].湖北: 辞书出版社,2012.
[3] XU H, CHEN S,CAI J, et al. The construction and statistical analysis of pre-qin ancient Chinese WordNet [J]. International Journal of Knowledge and Language Processing, 2021, 11(3): 48-61.
[4] 张颖杰.资源稀缺语言的词汇语义资源自动构建方法研究[D].南京:南京大学硕士学位论文,2017.
[5] ROMANO A, SUTTER M, LIU J, et al. National parochialism is ubiquitous across 42 nations around the world[J]. Nature Communications, 2021:12.10.038/s41467-021-24787-1.
[6] 魏培泉,黄居仁.建构一个以共时与历时语言研究为导向的历史语料库[J].中文计算语言学期刊,1997,2(1): 131-145.
[7] 陈小荷. 先秦文献信息处理[M]. 北京:世界图书出版公司北京公司, 2013.
[8] 石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J].中文信息学报,2010, 2: 39-45.
[9] 《十三经辞典》编纂委员会.十三经辞典[M].西安: 陕西人民出版社,2002.
[10] 郭锡良.先秦汉语名词、动词、形容词的发展[J].中国语文,2000(03): 195-204.
[11] 姜仁涛. 《尔雅》同义词研究[M]. 北京:中国文史出版社, 2006.
[12] 陈焕良,曹艳芝.《尔雅·释器》义类分析[J].中山大学学报(社会科学版),2003(05): 57-63.
[13] 袁毓林.面向信息检索系统的语义资源规划[J].语言科学,2008(01): 1-11.
[14] MILLER G A. WordNet: A lexical database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[15] 姚天顺,张俐,高竹.词网综述[J].语言文字应用,2001: 27-32.
[16] VOSSEN P. Euro WordNet: General document[J]. Dams Revue Francaise de Linguistique Appliquel, 2002,1:27-38.
[17] TUFIS D, CRISTEA D, STAMOU S. BalkaNet: Aims, methods, results and perspectives: A general overview[J]. Romanian Journal of Information Science and Technology, 2004, 7(1-2): 9-43.
[18] 于江生,俞士汶.中文概念词典的结构[J].中文信息学报,2002: 12-20.
[19] 张俐,李晶皎,胡明涵,等.中文词网的研究及实现[J].东北大学学报,2003: 327-329.
[20] HUANG C R. SINICA BOW: Integrating bilingual WordNet and SUMO ontology[C]//Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 2003: 825-826.
[21] XU R, GAO Z, PAN Y, et al. An integrated approach for automatic construction of bilingual Chinese-English WordNet [C]//Proceedings of the Asian Semantic Web Conference. Springer, Berlin, Heidelberg, 2008: 302-314.
[22] HUANG C R, HSIEH S K, HONG J F, et al. Chinese WordNet: Design, implementation, and application of an infrastructure for cross-lingual knowledge processing[J]. Journal of Chinese Information Processing, 2010, 24(2): 14-23.
[23] WANG S, BOND F. Building the Chinese open WordNet (cow): Starting from core synsets[C]//Proceedings of the 11th Workshop on Asian Language Resources, 2013: 10-18.
[24] LI M, ZHOU Z, WANG Y. Multi-fusion Chinese WordNet (MCW): Compound of machine learning and manual correction[J]. arXiv preprint arXiv: 2002.01761, 2020.
[25] 吴思颖,吴扬扬.基于中文词网的中英文词语相似度计算[J].郑州大学学报(理学版),2010,42(02): 66-69.
[26] ORDAN N, WINTNER S. Hebrew WordNet: A test case of aligning lexical databases across languages[J]. International Journal of Translation,2007, 19(1): 39-58.
[27] MINOZZI S. Latin WordNet, una rete diconoscenza semantica per il latino e alcune ipotesi di utilizzo nel campo dell’Information Retrieval[C]//Proceedings of the Paolo Mastandrea, editor, Strumenti digitali e collaborativi per le Scienze dell’Antichita`, 2017: 123-134.
[28] BIZZONI Y, BOSCHETTI F, DIAKOFF H, et al. The making of ancient greek WordNet [C]//Proceedings of the LREC, 2014: 1140-1147.
[29] ZHANG Y, LI B, WANG X, et al. Mapping word senses of middle ancient Chinese to WordNet [C]//Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. 2014, 1: 446-450.
[30] ZHANG Y, LI B, DAI X, et al. PQAC-WN: Constructing a WordNet for Pre-Qin ancient Chinese[J]. Language Resources and Evaluation, 2017, 51(2): 525-545.
[31] 刘雪扬. 基于《汉语大词典》的词义演变研究[D].南京:南京师范大学硕士学位论文,2015.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委项目(YB145-41);古籍工作重点课题(22GJK006);国家社会科学基金(21&ZD331,22&ZD262);江苏省社会科学基金(20JYB004)
{{custom_fund}}