田元贺;刘 扬. 汉语未登录词的词义知识表示及语义预测[J]. 中文信息学报, 2016, 30(6): 26-34.
TIAN Yuanhe; LIU Yang. Lexical Knowledge Representation and Sense Prediction of Chinese Unknown Words. , 2016, 30(6): 26-34.
Lexical Knowledge Representation and Sense Prediction of Chinese Unknown Words
TIAN Yuanhe1,2; LIU Yang2,3
1. Department of Chinese Language and Literature, Peking University, Beijing 100871, China;
2. Key Laboratory of Computational Linguistics Ministry of Education, Peking University, Beijing 100871, China;
3. Institute of Computational Linguistics, Peking University, Beijing 100871, China
Abstract:In the previous researches in sense prediction of Chinese unknown words, the lexical knowledge related to word-formation has been used but not regarded as a valuable form of knowledge representation. This paper, on the basis of the morphemic concepts, provides a multi-level solution to knowledge representation of Chinese unknown words. A model based on Bayesian network is also constructed to analyze semantic word-formation of Chinese unknown words, effectively predicting the multi-level lexical knowledge of Chinese unknown words. This kind of lexical knowledge representation is simple, intuitive and easy to expand. Experimental results show that, this knowledge representation is of important value in sense guessing of Chinese unknown words, and can meet the application needs on different levels.
[1] Lu X. Hybrid Models for Semantic Classification of Chinese Unknown Words[C]//Proceedings of the HLT-NAACL,2007: 188-195.
[2] Chen H H, Lin C C. Sense-tagging Chinese corpus[C]//Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 12. Association for Computational Linguistics, 2000: 7-14.
[3] Chen K J, Chen C. Automatic semantic classification for Chinese unknown compound nouns[C]//Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2000: 173-179.
[4] Tseng H. Semantic classification of Chinese unknown words[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 2. Association for Computational Linguistics, 2003: 72-79.
[5] Chen C J. Character-sense association and compounding template similarity: Automatic semantic classification of Chinese compounds[C]//Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing,2004: 33-40.
[6] 邱立坤.现代汉语未登录词词类和语义类标注研究[D].北京大学博士学位论文,2010.
[7] 尚芬芬,顾彦慧,戴茹冰,等.基于《现代汉语语义词典》的未登录词语义预测研究[J].北京大学学报(自然科学版),2016,01: 10-16.
[8] 张瑞霞,杨国增,闫新庆.基于知网的汉语普通未登录词语义分析模型[J].计算机应用与软件,2012,08: 126-130.
[9] 吉志薇,冯敏萱.面向普通未登录词理解的二字词语义构词研究[J].中文信息学报,2015,05: 63-68,83.
[10] 符淮青.词义和构成词的语素义的关系[J].辞书研究,1981,01: 98-110.
[11] Pustejovsky, J. The Generative Lexicon[M]. Mass: MIT Press, 1995.
[12] Grady Booch, Robert A. Maksimchuk, Michael W. Engle, etc. Object-Oriented Analysis and Design with Applications, 3rd Edition[M]. Addison-Wesley Professional, 2007.
[13] Fellbaum C. WordNet: An Electronic Lexical Database [M]. Mass: MIT Press, 1998.
[14] 傅爱平.汉语信息处理中单字的构词方式与合成词的识别和理解[J].语言文字应用,2003,04: 25-33.
[15] 苑春法,黄昌宁.基于语素数据库的汉语语素及构词研究[J].世界汉语教学,1998,02: 8-13.
[16] 杨梅.现代汉语合成词构词研究[D].南京师范大学博士学位论文,2006.
[17] 亢世勇,李毅,孙道功,等.汉语系统语料库的建设与词典编纂[C]//上海辞书学会.2004年辞书与数字化研讨会论文集.上海辞书学会: 2004: 7.
[18] 陆顾婧.汉语构词分析与词义知识表示研究[D].北京大学硕士学位论文,2013.
[19] 王淑华.双字组合理解模式探索[J].上海大学学报(社会科学版),2007,03: 43-47.
[20] Tom M. Mitchell著,曾华军,张银奎译.机器学习[M].北京: 机械工业出版社,2014: 125-126.