1. Computer and Control Engineering Faculty North University of China, Taiyuan, Shanxi 030051, China; 2. Applied Linguistics Institate Beijing Language and Culture University, Beijing 100083, China
Abstract:Word, as the smallest semantic unit, has complex relationship with text domains. Especially, it is often difficult to define the exact domain for the commonly used words. In fact, it is not always necessary to establish clear relationship between the word and the domain for real applications. Instead, we can achieve satisfactory results by quantifying the domain property of the words. In this paper, we propose an unsupervised method for quantifying the domain property of words, based on word association information in the large-scale corpus. We valide the proposed value of words domain property by comparing against the classical TF * IDF measure in the topic detection application.
[1] George A Miller. The WordNet project[DB/OL].[2012]. http://wordnet.princeton.edu/ [2] 董振东, 董强. 知网[DB/OL]. [2013]. http://www.keenage.com/. [3] Fabrizio Sebastiani. Machine Learning in Automated Text Categorization[C]//Proceedings of ACM Computing Surveys (CSUR), 2002, 34(1):1-47. [4] Navigli R, Faralli S, Soroa A, et al. Two birds with one stone: learning semantic models for Text Categorization and Word Sense Disambiguation[C]//Proceedings of the 20th ACM international conference on information and knowledge management. ACM, 2011: 2317-2320. [5] Gu H, Zhou K. Text classification based on domain ontology[J]. Journal of Communication and Computer, 2006, 3(5): 29-32. [6] Reeve L H, Han H, Brooks A D. The use of domain-specific concepts in biomedical text summarization[J]. Information Processing & Management, 2007, 43(6): 1765-1776. [7] S. Brin, L. Page. The anatomy of a large-scale hypertextual web searchengine[C]//Proceedings of 7th International WWW Conference, 1998: 107-117. [8] Karypis, George. CLUTO-a clustering toolkit[CP/OL]. http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview.2002. [9] Ying Zhao, George Karypis. Criterion functions for document clustering: Experiments and analysis[C]//Proceedings of Technical Report TR #01-40, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2001.