概念获取是自然语言理解领域中重要的研究课题。该文提出了一种基于汉语量词的名词概念描述方法,设计并实现了一个权重计算方案。通过聚类实验探索了量词对名词语义区分的作用和贡献,实验结果表明基于量词的名词概念表达方式是有效的,可以区分大部分名词概念。
Abstract
Concept acquisition from corpora has become increasingly important in NLP. This paper presents a new concept representation based on classifier words. Concepts are modeled as vectors with one component corresponding to each classifier word. We propose a weighting scheme that assigns each classifier word a weight in a concept. Then we conduct experiments to identify concept similarities via clustering, and the results show classifier words can categorize most concept classes.
关键词
概念获取 /
量名搭配 /
量词 /
聚类
{{custom_keyword}} /
Key words
Concept acquisition /
classifier-noun collocation /
classifier words /
cluster
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Grefenstette, Gregory. SEXTANT: Extracting Semantics from Raw Text Implementation Details[R]. Compater Science Technical keport, Cs92-05, University of Pittsburgh, Feb. 1992.
[2] D Lin. Automatic Retrieval and Clustering of Similar Words [C]//Proceedings of the COLING-ACL, 1998: 768-774.
[3] Almuhareb A, Poesio M. Attribute-based and value-based clustering: an evaluation [C]//Proceedings of the EMNLP, 2004.
[4] Tai, James H Y. Chinese Classifier Systems and Human Categorization [M]. In Honor of Professor William S-Y. Wang: Interdisciplinary Studies on Language and Language Change, Matthew Chen and Ovid Tseng, eds. Pyramid Publishing Company, 1994: 479-494.
[5] Huang Chu-ren, CHEN Keh-jiann, GAO Zhao-ming. Noun Class Extraction from a Corpus-based Collocation Dictionary: An Integration of Computational and Qualitative Approaches [J]. Quantitative and Computational Studies of Chinese Linguistics, 1998: 339-352.
[6] 俞士汶,朱学锋,王惠等.现代汉语语法信息词典详解(第二版)[M].北京: 清华大学出版社,2003.
[7] Karypis G. CLUTO: A Clustering toolkit [R], Technical Report 02-017, University of Minnesota, 2002.
[8] 王萌,俞士汶,段慧明,孙薇薇. 现代汉语名词语法属性的计量研究初探[J],中文信息学报,2008,22(5): 22-27.
[9] Dongdong Zhang, Mu Li, Nan Duan. Measure Word Generation for English-Chinese SMT System [C]//Proceedings of the ACL, 2008: 89-96.
[10] Dominic Widdows, Beate Dorow. A Graph Model for Unsupervised Lexical Acquisition [C]//Proceedings of the COLING, 2002: 1093-1099.
[11] Hong Zhang. Numeral Classifiers in Mandarin Chinese [J], East Asian Linguist, 2007 (16): 43-59.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(No.61300152)
{{custom_fund}}