Cluster and Association Analysis of Natural Languages
CHEN Zhenning1, CHEN Zhenyu2
1. School of Humanities, Zhejiang University, Hangzhou, Zhejiang 310058, China; 2. Department of Chinese Language and Literature, Fudan University, Shanghai 200433, China
Abstract:Cluster analysis is the task of grouping a set of objects by associations of these objects. The diameters of cluster and association analysis are similarity measures, which often involves the absolute similarity of the symmetry property. But most rules found in natural languages are inclined and have asymmetrical forms. We describes the asymmetrical associationby a parameter of Probability Entailment, i.e. the conditional probability, to represent the asymmetrical associations among features. And then we define the Domination Relation, the Tight Relation, the Control Center, and the Midway island. A strategy for cluster based on inclined similarity measures is presented to deal with issues likethe false isolated points, data sparsity and family iconicity.