Review
XU Yan, WANG Bin, LI Jin-tao,SUN Chun-ming
2008, 22(1): 44-50.
Feature selection(FS) plays an important role in text categorization(TC). Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization [J]. Existing experiments show IG is one of the most effective methods. In this paper, a feature selection method is proposed based on Rough Set theory. According to Rough set theory, knowledge about a universe of objects may be defined as classifications based on certain properties of the objects, i.e. rough set theory assume that knowledge is an ability to partition objects. We quantify the ability of classify objects, and call the amount of this ability as knowledge quantity and then following this quantification, we put forward a notion “knowledge Gain” and put forward a knowledge gain-based feature selection method(KG method). Experiments on NewsGroup collection and OHSUMED collection show that KG performs better than the IG method, specially, on extremely aggressive reduction.