刘风成,黄德根,姜鹏. 基于AdaBoost.MH算法的汉语多义词消歧[J]. 中文信息学报, 2006, 20(3): 8-15.
LIU Feng-cheng,HUANG De-gen,JIANG Peng. Chinese Word Sense Disambiguation with AdaBoost MH Algorithm. , 2006, 20(3): 8-15.
基于AdaBoost.MH算法的汉语多义词消歧
刘风成,黄德根,姜鹏
大连理工大学计算机科学与技术系
Chinese Word Sense Disambiguation with AdaBoost MH Algorithm
LIU Feng-cheng,HUANG De-gen,JIANG Peng
Department of Computer Science , Dalian University of Technology
Abstract:An approach based on supervised AdaBoost MH learning algorithm for Chinese word sense disambiguation is presented. AdaBoost MH algorithm is employed to boost the accuracy of the weak decision stumps rules for trees and repeatedly calls a learner to finally produce a more accurate rule. A simple stopping criterion is also presented. In order to extract more contextual information, we introduce a new semantic categorization knowledge which is useful for improving the learning efficiency of the algorithm and accuracy of disambiguation, in addition to using two classical knowledge sources, part-of-speech of neighboring words and local collocations. AdaBoost MH algorithm making use of these knowledge sources achieves 85.75% disambiguation accuracy in open test for 6 typical polysemous words and 20 polysemous words of SENSEVAL3 Chinese corpus.
[1] N. Ide, J. Veronis, Introduction to the special Issue on Word Sense Disambiguation: The State of the Art[J]. Computational Linguistics, ACL , 1998. 24 (1). [2] D. Yarowsky. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods[A]. In: the 33rd Annual Meeting of ACL [C]. Massachusetts, 1995: 181 - 188. [3] 李涓子,黄昌宁,杨尔弘. 一种自组织的汉语词义排歧方法[J]. 中文信息学报, 1999, 13 (3) : 1 - 8. [4] H. T. Ng, Exemplar-based Word Sense Disambiguation: Some Recent Improvements[A]. In: proceeding of the 2nd Conference on Empirical Methods in Natural Language Processing, EMNLP, 1997. [5] Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. Word-sense disambiguation using statistical methods[A]. In: proceedings of the 29th conference on Association for Computational Linguistics[C]. California, June 1991, 264 - 270. [6] G. Towell, E. M. Voorhees, Disambiguating Highly Ambiguous Words [J]. Computational Linguistics, ACL, 1998. 24 (1). [7] S. Abney, R. E. Schapire, Y. Singer. Boosting Applied to Tagging and PP-attachment [A]. In: proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Proceeding and Very larger Corpora [C]. 1999. [8] R. E. Schapire, Y. Singer, BoosTexter. A Boosting-based System for Text Categorization [J]. Machine Learning. 2000. 39: 135 - 168. [9] R. E. Schapire, Y. Singer, Improved Boosting Algorithms Using Confidence-rated Predictions [J]. Machine Learning. 1999. 38: 297 - 336. [10] Christopher D. Manning and hinrich Schutze. Foundations of statistical natural language processing [M]. Cambridge: MIT Press, 1999. [11] Walker, E. Donald, Knowledge resource tools for accessing large text files. In: proc. First Conference of the UW Centre for the New Oxford English Dictionary: Information in Data[C]. Waterloo, Canada. Nov. 6 - 7, 1995. [12] Yarowsky, David. Word-sense disambiguation using statistical models of Roget’s categories trained on larger corpora[A]. ACL , 1992. 454 - 460. [13] 梅家驹,等. 多义词词林[M]. 上海: 上海辞书出版社, 1996. [14] Zheng-Yu Niu. and Dong-Hong Ji. Optimizing Feature Set for Chinese Word Sense Disambiguation [A]. SENSEVAL-3: Third International Workshop on the Evaluation of Systems [C]. Barcelona, Spain, July, 2004. [15] H. T. Ng, Getting Serious about Word Sense Disambiguation [A]. In: proceedings of the SIGLEX Workshop “Tagging Textwith Lexical Semantics: Why, What and How?”[C] , 1997. [16] G. A. Miller, R. Beckwith, C. Fellbaum, et al. Five Papers on Word Net[J]. Special Issue of International Journal of Lexicography. 1990. [17] 董振东. 知网[E13/OL]. http://www.keenage.com 2000. [18] R. Mihalcea, I. Moldovan. An Automatic Method for Generating Sense Tagged Corpora[A]. In: proceedings of the 16th National Conference on Artificial Intelligence[C] , 1999. [19] Eneko Agirre, Olatz Ansa, Eduard Hovy and David Martinez. Enriching Very larger ontologies using the WWW [A]. In: proceedings of the Ontology Learning Workshop [C] , Berlin, 2000.