文本分类中基于对数似然比测试的特征词选择方法

李国臣

PDF(291 KB)
PDF(291 KB)
中文信息学报 ›› 1999, Vol. 13 ›› Issue (4) : 17-22.
综述

文本分类中基于对数似然比测试的特征词选择方法

  • 李国臣
作者信息 +

A Log-Likelihood-Ratio-Test-Based Feature Word Selection Approach in Text Categorization

  • Li Guochen
Author information +
History +

摘要

本文将对数似然比测试用于文本分类中的特征词选择。与传统的频度、集中度和分散度等多种统计指标的测试独立进行的方法相比较,这种方法利用协方差矩阵协调了各个统计指标之间的联系,从而将它们有机地统一为一个整体。实验显示,这种特征词选择方法优于传统的频度测试、集中度测试和分散度测试独立进行的特征词选择的方法。

Abstract

The paper uses the Log-Likelihood-Ratio-Test-Based feature words selection approach in the field of text categorization. In comparison with the traditional method , that is , each of the frequecy test , salience test and distributioness test is conducted independently , the proposed approach uses covariance matrix to coordinate the associations among the variant statistics so that all of them are integrated into a whole. The experiments show that the approach is superior to the t raditional approach.

关键词

文本分类 / 特征词选择 / 对数似然比测试

Key words

Text categorization / Feature Selection / Log Likelihood Ratio Test

引用本文

导出引用
李国臣. 文本分类中基于对数似然比测试的特征词选择方法. 中文信息学报. 1999, 13(4): 17-22
Li Guochen. A Log-Likelihood-Ratio-Test-Based Feature Word Selection Approach in Text Categorization. Journal of Chinese Information Processing. 1999, 13(4): 17-22

参考文献

[1] Apte C et al . Automated learning of decision rules for text categorization. ACM Transaction on Information Systems ,July 1994 ,12(3)
[2] Burtle C. Statistics in Linguistics. Basil Blackwell World Publishing Corp. 1985
[3] Duda R O et al . Pattern Classification and Scene Analysis. John Wiley & Sons , NY, USA , 1973
[4] Lewis D D et al . Evaluating and optimizing autonomous text classification systems. In : Proceedings of the 18th SIGIR Conference , 1995
[5] Yang Y. Noise reduction in a statistical approach to text categorization. In : Proceedings of the 18th SIGIR Conference , 1995
[6] Young S. The HTK Book. Cambridge University ,1997
[7] 吴军. 汉语语料的自动分类. 中文信息学报,1995(4)
[8] 杨允信. 中文文件自动分类之研究. 见:台湾第六届计算语言学研讨会论文集,1993
[9] 蔡元龙. 模式识别. 西安:西北电讯工程学院出版社, 1986
[10] 丁均彦. 文本分类系统的研究与实现[硕士学位论文] . 北京:清华大学,1998
PDF(291 KB)

851

Accesses

0

Citation

Detail

段落导航
相关文章

/