FANG Zhi-fei,LIN Hong-fei,YANG Zhi-hao,ZHAO Jing
2006, 20(2): 26-34.
Genre is defined as a category on the basis of external criteria , so its classification is different from the classification based on content. A new mechanism for automatic classification of Chinese text genre is presented , and its main idea is as follows. Features for genre classification , as an essential factor in the mechanism , are described in two ways : one is in word-set , such as affective words and political words derived from some related dictionaries and corpus statistics ; another one is in rule format , such as document identifiers and items. In terms of the correlativeness and variance of features , an approach of parametric distribution is applied to evaluate various features of the genres and extract the features for genre classification. Support Vector Machine is then used as the learning algorithm to build the classifier. The experiment on automatic classification of Chinese text genres , running on a text corpus consisting of five genres , shows that it can improve the precision of classification.