Information Extraction and Text Mining
HONG Zhuangzhuang, HUANG Zhaohua, WAN Zhongbao, ZHANG Wei, GAO Mengxi
2020, 34(2): 56-62.
The domain texts can be characterized by the complex structure, the high similarity and the dynamic change. With a mixture of continuous and discrete types of data, the existing knowledge discovery method is restricted in the mining efficiency of the text rules. To deal with this issue, this paper proposes a text rule mining method based on GMM and Rough Set. Firstly, the method constructs an information table according to the attribute type of the target data; Then, the Gaussian Mixture Model (GMM) clustering algorithm is applied to cluster the continuous data, on which the data is discretized and the state is reduced, and the decision table is generated; Finally, the rough set theory is used to reduce the attributes of decision table, and the decision rules are extracted through the reduction table. The experimental results show that the proposed method has higher precision and stronger attribute reduction ability, achieving an average precision and F score of 95.0% and 95.7%, respectively.