词汇间语义关系的定量化研究是自然语言处理任务中一个重要的基础性工作。词汇间语义关系总体上分为等同关系、上下位关系、相关关系,现有的语义关系定量化工作主要集中于词汇间语义的等同关系(相似性)量化研究。该文研究和提出了量化词汇间语义相关关系的基本思路和新方法,即构造词汇相关关系二分图来求解和量化词汇间间接相关关系,该方法能够解决在统计语料中没有出现的词汇对的相关关系量化求解问题。实验结果表明,该文提出的方法比单纯用互信息来计算和量化词汇间语义相关关系更为可行。同时,对于一个特定词汇而言,该文的方法能够得到一个相关关系量化的相对合理的趋势性结果。
Abstract
The quantitative research of semantic relation between words is an essential subtask for some natural language processing task. Generally, semantic relation between words includes three types of relations, namely, synonymy relation, hyponymy relation and relevance relation. The existing quantitative researches of semantic relation between words are mostly focused on how to quantify the synonymy relation (or similarity relation) between words. In this paper, we study and present a novel approach to quantity the semantic relevance relation between words by constructing the bipartite graph of lexical relevance relation. Moreover, our approach can resolve the measurement of the semantic relevance relation between words without co-occurrence in the corpus. The experiment results show that our approach is more feasible than the mutual information. For a specific word, our approach can generates a relative reasonable trend result on its semantic relevance relation to other words.
关键词
计算机应用 /
中文信息处理 /
词汇间语义关系 /
相关关系 /
互信息 /
二分图 /
量化方法
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
semantic relation between words /
relevance relation /
mutual information, bipartite graph /
measurement method
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张燕飞.编著.信息组织的主题语言[M].武汉大学出版社.2005.79
[2] L.Qun and L.Sujian. Word Similarity Computing Based on HowNet[J]. Computational Linguistics and Chinese Language Processing,2002,7(2): 59-76.
[3] P. Resnik. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language[J]. Journal of Artificial Intelligence,1999,11(11): 95-130.
[4] Felbaum C. WordNet: an Electronic Lexical Database[M]. Cambridge, Massachusetts, MIT Press,1998.
[5] D.Bollegala, Y. Matsuo, and M.Ishizuka. An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web.[C]//Proceedings of NAACLHLT, 2007: 340-347.
[6] 章志凌.基于Corpus库的词语相似度计算方法[J].计算机应用. 2006,26(3): 638-640.
[7] 张晓孪,张蕾,等.基于知识图的汉语词语间语义相似度计算[J].计算机工程与应用,2007,43(8): 160-163.
[8] Zhao Jinglei, Liu Hui and Lu Ruzhan. Attribute-based Computing of Word Similarity[J]. Journal of Computational Information Systems,2008,4(4): 1453-1461.
[9] C.Aone and M.RamoswSantacruz. Rees: A Large Scale Relation and Event Extraction System[C]//Proceedings of the 6th Applied Natural Language Processing Conference. 2000: 76-83.
[10] S. Brin. Extracting Patterns and Relations from the World-Wide Web[J]. Lecture Notes in computer Science, 1999(1590): 172-183.
[11] 车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2): 1-6.
[12] W. R. Pestman. Mathematical Statistics: An Introduction[M]. New York: Walter de Gruyter Press,1998: 198-199.
[13] J. C. A. van der Lubbe. Information Theory[M]. London: Cambridge University Press,2001: 16-21
[14] Christopher D.Manning, Hinrich Schütze. Foundations of Statistical Natural Language Processing[M]. MIT Press. Cambridge, MA: May 1999: 111-114.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金重大资助项目(60496326);江西省教育厅科技计划资助项目([2006]178)
{{custom_fund}}