词语相似度计算是机器翻译、信息检索等自然语言处理领域的关键问题之一。传统的词语相似度计算方法,未能很好地考虑上下文信息对词语语义的约束,从而不能对语境变换带来的词语间相似度的差异进行有效的区分。该文引入模糊数学中隶属函数的概念计算词语上下文信息的模糊重要度,并结合基于《知网》的语义相似度计算方法,提出一种基于语境的词语相似度计算方法。实验表明,该算法可以根据语境有效地区分语义相近的词语。
Abstract
Word similarity computation is one of the key issues in natural language processing fields, such as machine translation, information retrieval etc. As traditional methods ignore the context information of the word, they can not effectively distinguish the differences among the word similarities when the context information changes. This paper presents an approach for word similarity computation based on the context information, which employs the fuzzy membership functions to compute the fuzzy significance of the words and combines a method of word similarity calculation using HowNet. The experimental results indicate that our approach distinguish the semantic similar words effectively by the context information.
Key wordscomputer application; Chinese information processing;context; fuzzy degree of significance; word similarity computation; membership function
关键词
计算机应用 /
中文信息处理 /
语境 /
模糊重要度 /
词语相似度 /
隶属函数
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
context /
fuzzy degree of significance /
word similarity computation /
membership function
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Miller GA, Fellbaum C. Semantic network of English [M]. Levin B, pinker S.lexical & conceptual semantics Amsterdam, Netherlands: Elsevier Science Publishers, 1991.
[2] P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer. (1991) Word sense disambiguation using statistical methods[C]//Proceedings of the 29th Meeting of the Association for Computational Linguistics (ACL-91), Berkley,C.A.,1991:264-270.
[3] Lillian Lee. Similarity-Based Approaches to Natural Language Processing[D]. Ph.D. thesis. Harvard University Technical Report, TR-11-97.
[4] Dagan I, Lee L. Similarity-based models of word cooccurrence probabilities [J]. Machine Learning. Special Issue on Machine Learning and Natural Language, 1999.
[5] 于江生, 俞士汶. 中文概念词典的结构[J]. 中文信息学报, 2002, 16 (4): 13-21.
[6] 刘群, 李素建. 基于《知网》的词汇语义相似度计算. Computational Linguistics and Chinese Language Processing, 2002,7(2):59-76.
[7] 章志凌, 等. 基于Corpus库的词语相似度计算方法[J].计算机应用, 2006, 26 (3): 638-640.
[8] 秦春秀, 赵捧未, 刘怀亮. 词语相似度计算研究[J].信息系统, 2007,30(1):105-108.
[9] Rada R. Development and application of a metric on semanticnets [C]// IEEE Transactions on System. Man and Cybernetics, 1989.
[10] Lee J H. Information retrieval based on conceptual distance in ISA hierarchies [J]. Journal of Documentation, 1993.
[11] Philip R. Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language [J]. Journal of Artificial Intelligence Research, 1999,(11): 95-130.
[12] 王斌. 汉英双语语料库自动对齐研究[D ]. 北京: 中国科学院计算技术研究所, 1999.
[13] 谢季坚,刘承平.模糊数学方法及其应用[M].华中科技大学出版社 2006.15-37.
[14] 余超. 基于知网的词语相似度计算研究及应用[D]. 沈阳: 沈阳航空工业学院, 2006.
[15] 郭丽. 基于上下文的词语相似度计算及其应用[D]. 沈阳: 沈阳航空工业学院, 2009.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60842005);辽宁省教育厅科技研究资助项目(2007T140)
{{custom_fund}}