Review
LU Song,LI Xiao-li,BAI Shuo,WANG Shi
2000, 14(6): 8-13,20.
Text Representation has been the fundamental problem in Information Retrieval ,such as text retrieval ,automatic summary and search engine. tf.idf (term frequency ,inverse document frequency) as one of term-weighting schemes in Vector Space Model is a good text representation which is popular and make good results in the field of Information Retrieval. The proportion of distribution of terms in text collection is one of the most important factors of expressing the content of text , but it is beyond tf.idf’s power.Because of this ,this paper provides an improved approach named tf.idf.IG to remedy this defect by Information Gain from Information Theory. The Information Gain of terms as one factor for term-weighting schemes can effectively weight the proportion of distribution of terms. In text classification ,tf.idf.IG in this paper overcomes old tf.idf.