Information Extraction and Text Mining
MA Huifang, WANG Shuang, LI Miao, LI Ning
2019, 33(9): 69-78.
Keywords extraction is an important technique for web page retrieval,knowledge comprehension,and document classification,etc. In this paper,a novel keywords extraction method of combining graph structure with nodes association(GSNA) is proposed,which is able to locate keywords without a corpus. Firstly,the frequent closed itemset are exploited and the strong association rules are generated. Secondly,an association graph is constructed based on association rules,where the head and the body of the rules represent nodes,and an edge exists if and only if there is a strong association rule between two nodes and value of lift are adopted to represent weight. Thirdly,three node factors (i.e. graph structure,node semantics and associations) are unified under the same keyword extraction framework for random walking. Finally,a trustworthy sematic clustering algorithm is employed to avoid the semantic overlapping among terms. Three experiments conducted on the Chinese and English data sets show that GSNA is effective for keywords extraction.