Review
WANGBin ,PAN Wen-feng
2005, 19(5): 3-12.
The volume of junk emails on the Internet has grown tremendously in the past few years and is causing serious problems. Content2based filtering is one of the mainstream technologies used so far. This paper aims to provide an overview on the state of art in this research field , including benchmark corpora , evaluation methods and filtering approaches. Many filtering approaches , including Ripper , Decision Trees , Rough Sets , Rocchio , Boosting , Bayes , kNN , SVM and Winnow , are discussed and compared in this paper. The experimental results show that some approaches , such as Boosting , Flexible Bayes , SVM, Winnow , can achieve very good results on research corpora. However , much more work should be done for practical use.