王锦,陈群秀. 汉语述语形容词机器词典机器学习词聚类研究[J]. 中文信息学报, 2007, 21(3): 40-46.
WANG Jin, CHEN Qun-xiu. Clustering of Chinese Adjectives Based on the Machine Tractable Dictionary of Contemporary Chinese Predicate Adjectives. , 2007, 21(3): 40-46.
汉语述语形容词机器词典机器学习词聚类研究
王锦,陈群秀
清华大学 计算机系 智能技术与系统国家重点实验室,北京 100084
Clustering of Chinese Adjectives Based on the Machine Tractable Dictionary of Contemporary Chinese Predicate Adjectives
WANG Jin, CHEN Qun-xiu
State Key Laboratory of Intelligent Technology and System Dept. of Computer Science & Technology, Tsinghua University, Beijing 100084, China
Abstract:In this paper we present a method to group adjectives according to their corpora distribution, based on the Machine Tractable Dictionary of Contemporary Chinese Predicate Adjectives. We describe how our system extracts three groups of information for each adjective, which includes: modified nouns, synonyms, and antonyms, and exploits this knowledge to compute a measure of similarity between two adjectives with help of literal similarity and route weight of each adjective to another adjective, which in some extent solve the problem caused by sparse data. We also show how a clustering algorithm can use these similarities to produce the groups of adjectives, and we present results produced by our system for a sample set of adjectives.
[1] Donald Hindle. Noun Classification from Predicate-Argument Structures[A]. In: Proceedings of the 28th Annual Meeting of the ACL[C]. Pennsylvania: Association for Computational Linguistics, 1990, 268-275. [2] Kathleen McKeown, Vasileios Hatzivassiloglou. Augmenting lexicons automatically: Clustering semantically related adjectives[A]. In: Proc. ARPA Human Language Technology Workshop 93[C]. Princeton, NJ: ARPA Workshop on Human Language Technology, 1993, 272-277. [3] Makato Iwayama, Takenobu Tokunaga. Cluster-based text categorization: a comparison of category search strategies[A]. In: Proceedings of SIGIR 95, 18th ACM International Conference on Research and Development in Information Retrieva[C]. New York, US: ACM Press, 1995, 273-281. [4] Alcala, R., Casillas, J. Cord on, O., et al. Techniques for Learning and Tuning Fuzzy Rule-Based Systems for Linguistic Modeling and Their Application[A]. In: KNOWLEDGE-BASED SYSTEMS. Techniques and Applications Vol III[C]. Europe: Acade-mic Press, 1999, 889-941. [5] 黄昌宁,李涓子.词义排歧的一种语言模型[J].语言文字应用,2000,3:85-90. [6] 鲁松.自然语言中词相关性知识无导获取和均衡分类器的构建[D].北京:中国科学院计算技术研究所,2001. [7] Shlomo Argamon-Engelson, Ido Dagan. Committee-based sample selection for probabilistic classifiers[J]. Journal of Artificial Intelligence Research, 1999, 11:335-360. [8] 闻扬,苑春法,黄昌宁.基于搭配对的汉语形容词-名词聚类[J].中文信息学报,2000,14(6):45-50. [9] Kendall, M.G.. A New Measure of Rank Correlation[J]. Biometrika, 1938, 30:81-93. [10] 郝秀兰,杨尔弘.基于小规模语料库和机器可读词典的二元分布语义获取[J].中文信息学报,2004,18(6):23-29.