朱虹,刘扬,俞士汶. 汉语形容词的自动词义区分研究[J]. 中文信息学报, 2009, 23(6): 19-26.
ZHU Hong, LIU Yang, YU Shiwen. Researches on Word Sense Discrimination of Chinese Adjective. , 2009, 23(6): 19-26.
Researches on Word Sense Discrimination of Chinese Adjective
ZHU Hong, LIU Yang, YU Shiwen
Institute of Computational Linguistics, Peking University, Beijing 100871, China; Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China
Abstract:Lexical knowledge acquisition is the bottleneck for many tasks like word sense disambiguation, lexical knowledge base construction et al. This paper introduces an automatic word sense discrimination method for Chinese mid-high-frequency adjectives. We employ the EM algorithm and exploit the features of Chinese character, contextual bag-of-words and host-attribute pair instead of the more unreliable syntactic information. We further optimize the morphology selection by utilizing HowNet in our work. The experimental results show that word sense discrimination results are different from Chinese lexicons and could be used for lexicon modification and expansion even for other type of Chinese words. Key words computer application; Chinese information processing; knowledge acquisition; word sense discrimination; feature selection; EM algorithm
[1] Navigli R. Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance [C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, COLING-ACL, 2006: 105-112. [2] 朱虹,刘扬. 词汇语义知识库的研究现状和发展趋势 [J]. 情报学报. 2008, 27(6): 870-877. [3] Agirr E. and Soroa A. Evaluating Word Sense Induction and Discrimination Systems [C]//Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), 2007: 7-12. [4] Schütze H. Automatic Word Sense Discrimination [J]. Computational Linguistics, 1998, 24(1): 97-124. [5] Purandare A. and Pedersen T. Sense Clusters-Finding Clusters that Represent Word Senses [C]//Proceedings of 19th Conference on Artificial Intelligence (AAAI-04), San Jose, CA. 2004. [6] Niu, ZY. Ji, DH. Tan, CL. Learning word senses with feature selection and order identification capabilities [C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain. 2004. [7] Pantel P. Lin DK. Discovering Word Senses from Text [C]//Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Canada. 2002: 613-619. [8] Fellbaum, C. WordNet - An Electronic Lexical Database [M]. MIT Press, 1998. [9] Velldal, E. A Fuzzy clustering approach to word sense discrimination [C]//Proceedings of the 7th International conference on Terminology and Knowledge Engineering, Copenhagen, Denmark. 2005. [10] Zhao Y. Karypis G. Hierachical Clustering Algorithms for Document Datasets [J]. Data Mining and Knowledge Discovery, 2005, 10: 141-168. [11] Hsieh SK. Huang CR. When Conset meets Synset: A Preliminary Survey of an Ontological Lexical Resource based on Chinese Characters [C]//Proceedings of the COLING/ACL on Main conference poster sessions, Sydney, Australia, 2006. [12] Dong ZD. Dong Q. Ontology and HowNet [OL]. [2006-04-23].http://www.keenage.com/htl/e_index.html. [13] 王锦, 陈群秀. 汉语述语形容词机器词典机器学习词聚类研究[J]. 中文信息学报, 2007, 21(3): 40-46.