基于词典属性特征的粗粒度词义消歧

吴云芳,金澎,郭涛

PDF(359 KB)
PDF(359 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (2) : 1-8.
综述

基于词典属性特征的粗粒度词义消歧

  • 吴云芳,金澎,郭涛
作者信息 +

Coarse-Grained Word Sense Disambiguation. Using Features Described in the Lexicon

  • WU Yun-fang, JIN Peng, GUO Tao
Author information +
History +

摘要

本文依据《现代汉语语法信息词典》中对词语多义的属性特征描述,对《人民日报》语料中155 个词语共 4 996 个同形实例进行了粗粒度词义自动消歧实验,同时用贝叶斯算法进行了比较测试。基于词典属性特征的消歧方法在同形层面上准确率达到 90%, 但召回率偏低。其优点在于两个方面: 1) 不受词义标注语料库规模的影响;2) 对特定词语意义的消歧准确率可达到100%。本文也讨论了适用于不同词类的消歧特征。

Abstract

This paper presents a simple but effective feature-based approach to Chinese word sense disambiguation using the distributional features available from the Grammatical Knowledge-base of Contemporary Chinese. The test data is the sense-tagged corpus of People’s Daily. A Nave Bayes classifier is also tried as a comparable statistical method. The feature-based approach achieves precision of 90%, which is comparable to the NB classifier. The striking advantages of the feature-based approach are 1) It is not influenced by the data size, and 2) It can disambiguate some specific words with precision of 100%. The features appropriate for different parts of speech in Chinese WSD are also discussed. This paper demonstrates that sense features described in the lexicon are worth including in WSD.

关键词

人工智能 / 自然语言处理 / 特征 / 词义 / 词义消歧 / 贝叶斯分类法

Key words

artificial intelligence / natural language processing / feature / word sense / word sense disambiguation / Nave Bayes classifier

引用本文

导出引用
吴云芳,金澎,郭涛. 基于词典属性特征的粗粒度词义消歧. 中文信息学报. 2007, 21(2): 1-8
WU Yun-fang, JIN Peng, GUO Tao. Coarse-Grained Word Sense Disambiguation. Using Features Described in the Lexicon. Journal of Chinese Information Processing. 2007, 21(2): 1-8

参考文献


[1] 刘风成,黄德根,姜鹏.基于AdaBoost MH算法的汉语多义词消歧[J].中文信息学报,2006,20(3):6-13.
[2] 李涓子.汉语词义排歧方法研究 [D]. 清华大学计算机科学与技术系博士学位论文. 1999.
[3] 卢志茂,等. 神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J]. 高技术通讯,2004,(8).
[4] 全昌勤,等. 从搭配种子获取最优种子的词义消歧方法[J]. 中文信息学报, 2005, 19(1): 30-35.
[5] Lesk, M.E. Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone [A]. In: Proceedings of the SIGDOC Conference [C]. 1986.
[6] Yarowsky, D. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora [A]. In: Proceedings of COLING 92 [C]. 1992.
[7] 俞士汶,等.现代汉语语法信息词典详解[M].北京: 清华大学出版社, 2003.
[8] Niu, ZH. Y., Ji, D. H. and Tan, Ch. L.: Optimizing Feature Set for Chinese Word Sense Disambiguation [A]. In: Third International Workshop On The Evaluation of Systems for the Semantic Analysis of Text [C]. 2004.
[9] Dang, H. T. and Palmer, M.: The Role of Semantic Roles in Disambiguating Verb Senses [A]. In: Proceedings of the 43th Annual Meeting of the ACL[C]. 2005.
[10] Yarowsky, D. and Florian, R. Evaluating Sense Disambiguation Performance Across Diverse Parameter Spaces [J]. Journal of Natural Language Engineering, 2002.

基金

国家973计划资助项目(2004CB318102)
PDF(359 KB)

653

Accesses

0

Citation

Detail

段落导航
相关文章

/