汉语未登录词的词义知识表示及语义预测

田元贺;刘 扬

PDF(1993 KB)
PDF(1993 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (6) : 26-34.
综述

汉语未登录词的词义知识表示及语义预测

  • 田元贺1,2;刘 扬2,3
作者信息 +

Lexical Knowledge Representation and Sense Prediction of Chinese Unknown Words

  • TIAN Yuanhe1,2; LIU Yang2,3
Author information +
History +

摘要

在此前的汉语未登录词语义预测中,构词相关的知识一直被当做预测的手段,而没有被视为一种有价值的知识表示方式,该文在“语素概念”基础上,深入考察汉语的语义构词知识,给出未登录词的“多层面”的词义知识表示方案。针对该方案,该文采用贝叶斯网络方法,构建面向汉语未登录词的自动语义构词分析模型,该模型能有效预测未登录词的“多层面”的词义知识。这种词义知识表示简单、直观、易于拓展,实验表明对汉语未登录词的语义预测具有重要的价值,可以满足不同层次的应用需求。

Abstract

In the previous researches in sense prediction of Chinese unknown words, the lexical knowledge related to word-formation has been used but not regarded as a valuable form of knowledge representation. This paper, on the basis of the morphemic concepts, provides a multi-level solution to knowledge representation of Chinese unknown words. A model based on Bayesian network is also constructed to analyze semantic word-formation of Chinese unknown words, effectively predicting the multi-level lexical knowledge of Chinese unknown words. This kind of lexical knowledge representation is simple, intuitive and easy to expand. Experimental results show that, this knowledge representation is of important value in sense guessing of Chinese unknown words, and can meet the application needs on different levels.

关键词

未登录词 / 词义知识表示 / 语义预测 / 语义构词

Key words

Chinese unknown words / lexical knowledge representation / sense prediction / semantic word formation
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
田元贺;刘 扬. 汉语未登录词的词义知识表示及语义预测. 中文信息学报. 2016, 30(6): 26-34
TIAN Yuanhe; LIU Yang. Lexical Knowledge Representation and Sense Prediction of Chinese Unknown Words. Journal of Chinese Information Processing. 2016, 30(6): 26-34

参考文献

[1] Lu X. Hybrid Models for Semantic Classification of Chinese Unknown Words[C]//Proceedings of the HLT-NAACL,2007: 188-195.
[2] Chen H H, Lin C C. Sense-tagging Chinese corpus[C]//Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 12. Association for Computational Linguistics, 2000: 7-14.
[3] Chen K J, Chen C. Automatic semantic classification for Chinese unknown compound nouns[C]//Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2000: 173-179.
[4] Tseng H. Semantic classification of Chinese unknown words[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 2. Association for Computational Linguistics, 2003: 72-79.
[5] Chen C J. Character-sense association and compounding template similarity: Automatic semantic classification of Chinese compounds[C]//Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing,2004: 33-40.
[6] 邱立坤.现代汉语未登录词词类和语义类标注研究[D].北京大学博士学位论文,2010.
[7] 尚芬芬,顾彦慧,戴茹冰,等.基于《现代汉语语义词典》的未登录词语义预测研究[J].北京大学学报(自然科学版),2016,01: 10-16.
[8] 张瑞霞,杨国增,闫新庆.基于知网的汉语普通未登录词语义分析模型[J].计算机应用与软件,2012,08: 126-130.
[9] 吉志薇,冯敏萱.面向普通未登录词理解的二字词语义构词研究[J].中文信息学报,2015,05: 63-68,83.
[10] 符淮青.词义和构成词的语素义的关系[J].辞书研究,1981,01: 98-110.
[11] Pustejovsky, J. The Generative Lexicon[M]. Mass: MIT Press, 1995.
[12] Grady Booch, Robert A. Maksimchuk, Michael W. Engle, etc. Object-Oriented Analysis and Design with Applications, 3rd Edition[M]. Addison-Wesley Professional, 2007.
[13] Fellbaum C. WordNet: An Electronic Lexical Database [M]. Mass: MIT Press, 1998.
[14] 傅爱平.汉语信息处理中单字的构词方式与合成词的识别和理解[J].语言文字应用,2003,04: 25-33.
[15] 苑春法,黄昌宁.基于语素数据库的汉语语素及构词研究[J].世界汉语教学,1998,02: 8-13.
[16] 杨梅.现代汉语合成词构词研究[D].南京师范大学博士学位论文,2006.
[17] 亢世勇,李毅,孙道功,等.汉语系统语料库的建设与词典编纂[C]//上海辞书学会.2004年辞书与数字化研讨会论文集.上海辞书学会: 2004: 7.
[18] 陆顾婧.汉语构词分析与词义知识表示研究[D].北京大学硕士学位论文,2013.
[19] 王淑华.双字组合理解模式探索[J].上海大学学报(社会科学版),2007,03: 43-47.
[20] Tom M. Mitchell著,曾华军,张银奎译.机器学习[M].北京: 机械工业出版社,2014: 125-126.

基金

国家社科基金(16BYY137);国家重点基础研究发展计划资助项目(2014CB340504);国家社科基金(12&ZD119)
PDF(1993 KB)

739

Accesses

0

Citation

Detail

段落导航
相关文章

/