李素建,宋涛,高杰,幺鹏跃,李文捷. 一种基于使用差异的词语领域性分析方法[J]. 中文信息学报, 2009, 23(6): 72-79.
LI Sujian, SONG Tao, GAO Jie, YAO Pengyue, LI Wenjie. A Method of Lexical Domain Analysis Based on Usage Discrepancy. , 2009, 23(6): 72-79.
A Method of Lexical Domain Analysis Based on Usage Discrepancy
LI Sujian1, SONG Tao1, GAO Jie2, YAO Pengyue1, LI Wenjie3
1. Institute of Computational Linguisitics, Peking Universtiy, Beijing 100871, China; 2. Foreign language Department, Heze University, Heze, Shandong 274105, China; 3. Department of Computing, The Hongkong Polytechnic University, Hongkong, China
Abstract:The representation of domain knowledge usually focuses on the domain lexicons, and then domain analysis for terms or term components is a natural task. In this paper, we propose a novel domain analysis method based on the discrepancy of lexical usage. Based on the word segmentation result, we introduce a link analysis method to compute the usage degree of each word for several typical domain corpora. Then through analyzing the discrepancy of the word usage in different domains, we can acquire the domain term component with larger usage discrepancy. This method is experimented on several domains such as military, entertainment and so on, achieving better results than the commonly used tf×idf method and Bootstapping method. Key wordsartificial intelligence; natural language processing; domain analysis; domain term; domain term component; link analysis; usage discrepancy
[1] 黄玉兰,龚才春,许洪波,程学期. 基于伪相关反馈模型的领域词典生成算法[J]. 中文信息学报,2008, 22(1): 111-115. [2] 凌祺,樊孝忠. 领域词汇自动获取的研究[J]. 微机发展,2005,15(8): 148-150. [3] 孙霞,郑庆华,王朝静,张素娟. 一种基于生语料的领域词典生成方法[J]. 小型微型计算机系统,2005,26(6): 1088-1092. [4] 陈文亮,朱靖波,姚天顺,等.基于Bootstrapping的领域词汇自动获取[C]//语言计算与基于内容的文本处理.北京: 清华大学出版社,2003. [5] 傅骞,魏顺平,王斌,路秋丽. 教育技术领域术语提取研究[J]. 现代教育技术,2008,18(5): 60-65. [6] 何燕,穗志方,段慧明,俞士汶. 一种结合术语部件库的术语提取方法[J]. 计算机工程与应用,2006,42(33): 4-7. [7] 吴云芳. 信息科学与技术领域术语部件描述[J]. 语言文字应用,2003,(4): 34-39. [8] Wilson Wong, Wei Liu, Mohammed Bennamoun. Determining termhood for learning domain ontologies using domain prevalence and tendency[C]//Proceedings of the sixth Australasian conference: Data mining and analytics. Gold Coast, Australia,2007. [9] Kyo Kageura, Bin Umino. Methods of automatic term recognition: a review [J]. Terminology 1996, 3(2): 259-289. [10] Christopher D. Manning. Prabhakar Raghavan, Hinrich Schütze. An Introduction to Information Retrieval[M]. 2008. [11] Amy N. Langville, Carl D.Meyer. Deeper inside pagerank [J]. Technical Report, NCSU Center for RES SCI Comp. 2003.