长期以来,词义排歧一直被认为是自然语言处理的难题之一。本文用机器可读词典《现代汉语辞海》提供的搭配实例作为多义词的初始搭配知识,采用适当的统计和自组织方法自动扩大搭配集;为保证学习质量,在学习过程中逐渐增大上下文窗口的长度;提出使用搭配统计表的多元最大对数似然比词义排歧算法。最后,对本文提出的方法进行了实验,实验表明这种算法具有较高的正确率。
Abstract
Word sense disambiguation has been a difficult problem in natural language processing. This paper presents a method of automatically increasing new collocations by the use of the collocations provided by a machine readable dictionary《XianDaiHanYuCiHai》; In order to assuring the learning quality , the size of context was enlarged gradually ; In the procedure of learning and word sense disambiguating , author gives a multi maximal log word sense disambiguation algorithm. At last , the method was tested and proved that it has higner accurancy.
关键词
自然语言处理 /
词义排歧 /
自组织方法 /
搭配
{{custom_keyword}} /
Key words
natural language processing /
word sense disambigaution /
adaptive method /
collocation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Hearst ,Maarti. Noun Homograph Disambiguation Using Local Context in Large Corpora , in proceedings , ARPA Human Language Technology Workshop ,1993
[2] Nancy Ide ,Jean Veronis. Computational Linguistics Special Issue on Word Sense Disambiguation. Computational Linguistics. 1998 ,1 - 42
[3] Yarowsky D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of 33rd Annual Meeting of ACL , Cambridge , Massachusetts , USA , 1995 ,181 - 188
[4] Yarowsky D. Decision Lists for Lexical Ambiguity Resolution : Application to Accent Restoration in Spanish and French. In : Proc. 32nd Annual Meeting of Association for Computational Linguistics , 1994 ,88 - 95
[5] 倪文杰,竺一鸣,高蕴琪等. 现代汉语辞海. 北京:人民中国出版社,1994
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}