Abstract:A kernel-based nominal data classification (KNDC) method is proposed with a new distance definition and a simple inner product computing method in this paper. It’s insensitivity to outliers and classification capability to unbalanced data in real datasets are further analyzed. The calculation on inner product of nominal data is difficult, often regarded as the bottleneck of SVM. The KNDC possesses a lower computation complexity than SVM over the nominal dataset, which is discussed for its validity. Experimental results on the standard datasets demonstrate that the proposed method has promising performance compared with other methods. Key wordskernel-based classification method; nominal dataset; dissimilarity measure; inner production calculation
[1] Minho Kim,R.S.Ramakrishna,Projected Clustering for Categorical Datasets[J].Pattern Recognition Letters,2006,27: 1405-1417. [2] F.Esposito,D.Malerba,V.Tamma,H.H.Bock,Classical resemblance measure,in: H.-H.Bock,E.Diday(Eds.),Analysis of Symbolic Data,Springer[C]//Berlin,2000,139-152. [3] C.Stanfill,D.Waltz,Towards memory-based reasoning[J].Commun,ACM,1986, 29(12): 1213-1228. [4] Victor Cheng,Chun-Hung Li,James T.Kwokb,Chi-Kwong Lic,Dissimilarity learning for nominal data[J].Pattern Recognition, 2004,37: 1471-1477. [5] J.C.Gower,P.Legendre,Metric and Euclidean properties of dissimilarity coefficients[J].J.Classif.1986,3: 5-48. [6] H.Spath,Cluster Analysis Algorithm for Data Reduction and Classification[J].Ellis Horwood,Chichester,1980. [7] Burges J.C.,A tutorial on support vector machine for pattern recognition[J].Data Mining and Knowledge Discoverty,1998,2(2): 121-167. [8] V apnik V N. Statistical learning theo ry [M]. New York: John W iley & Sons, INC, 1998. [9] Scholkopf B, MIka S, Burges C, et al. Input Space Versus Feature Space in Kernel-based Methods [J]. IEEE Trans on Neural Networks, 1999,10(5):1000-1017. [10] Defeng Wang,Daniel S.Yeung,Eric C.C.Tsang,Weighted Mahalanobis Distance Kernels for Support Vector Machines[J]. IEEE Transaction on Neural networks,2007,18: 1453-1462. [11] Richard O.Duda,Peter E.Hart,David G.Stork,Pattern Classification,John Wiley,2001: 318-347. [12] 韩先培,赵军. 基于Wikipedia 的语义元数据生成[J].中文信息学报,2009,23(2): 108-114. [13] 李文法,段毅,刘 悦,孙春来. 一种面向流分类的特征选择算法[J].中文信息学报,2009,23(3): 51-57. [14] 孙济洲.网络入侵检测技术的研究[D].博士学位论文,天津大学,2003. [15] 李志华,王士同,等.基于离群聚类的异常检测研究[J].系统工程与电子技术,2009,31(5): 1227-1230.(LI Zhi-hua,WANG Shi-tong,Clustering with outliers-based anomalous intrusion detection[J].Systems Engineering and Electronics,2009,31(5): 1227-1230. [16] UCI repository of machine learning database[EB/OL]. http://www.ics.UCI.edu/~mlearn/MLRepository.html,1998.