中文文献的层次分类方法

战学刚,林鸿飞,姚天顺

PDF(161 KB)
PDF(161 KB)
中文信息学报 ›› 1999, Vol. 13 ›› Issue (6) : 21-26.
综述

中文文献的层次分类方法

  • 战学刚,林鸿飞,姚天顺
作者信息 +

Hierarchical Method for Chinese Document Classification

  • Zhan Xuegang , Lin Hongfei , Yao Tianshun
Author information +
History +

摘要

现有的分类系统通常忽略类别体系的层次结构,在对文献进行分类时,往往很难区分类别相近的文献属于哪一类。本文基于向量空间模型,提出根据类别体系的层次结构,自顶向下,逐层分类的方法。其目的是提高分类精度;并根据概念词典,将同义词或下位概念映射到单一的概念词上,由这些概念词构成一个规模很小的特征集,以缩小特征向量空间的维数,从而减少分类系统的计算量。此外,通过对类别层次体系的分析,压缩特征向量,从另一方面减少分类系统的计算量。

Abstract

Existing statistical document classification systems often ignore the hierarchical structure of the pre-defined topics. This makes it difficult to identify which category a document belongs to when the possible categories are somewhat similar. In this article , we propose a top-down classification method according to the hierarchical structure of topics. The purpose is to improve precision and reduce computation of classification systems. Through a concept dictionary (thesaurus) , we map the synonyms or lower-level concepts in a document to a small set of concept words that are used as terms. This reduces the computational complexity from another aspect by reducing the dimension of the vector space.

关键词

文献分类 / 向量空间模型 / 类别层次结构

Key words

Document classification / Vector space model / Topic category hierarchy

引用本文

导出引用
战学刚,林鸿飞,姚天顺. 中文文献的层次分类方法. 中文信息学报. 1999, 13(6): 21-26
Zhan Xuegang , Lin Hongfei , Yao Tianshun. Hierarchical Method for Chinese Document Classification. Journal of Chinese Information Processing. 1999, 13(6): 21-26

参考文献

[1] Yiming Yang. An Evaluation of Statistical Approach to Text Categorization , http://www.cs.cmu.edu//yiming
[2] Kennech W Church ,Lisa F Rau. Commercial Applications of Natural Language Processing ,Comm. of ACM ,Nov. 1995 ,38 (11)
[3] 吴立德等. 大规模中文文本处理. 上海:复旦大学出版社,1997
[4] Schutze H , Hull D , Pedersen J . A Comparison of Selective Bayesian Network Classifiers. In : ICML - 96 , 1996
[5] Koller D ,Sahami M. Toward Optimal Feature Selection. In :Proceedings of ICML - 96 ,1996
[6] Salton G. Automatic Text Processing : The Transformation ,Analysis , and Retrieval of Information by Computer. Addison2Wesley ,Reading ,Pennsylvania ,1989
[7] 姚天顺等. 自然语言理解. 北京:清华大学出版社,1995
[8] 战学刚,姚天顺. 基于汉语分析的中文分类方法. 见:1998中文信息处理国际会议论文集,北京:清华大学出版社,1998
PDF(161 KB)

1228

Accesses

0

Citation

Detail

段落导航
相关文章

/