Review
MA Li, JIAO Licheng, BAI Lin, ZHOU Yafu, DONG Luobing
(. Institute of Intelligence Information Processing, Xidian University, Xi’an, Shanxi 7007, China;
. Information center, Xi’an Institute of Post and Telecommunications, Xi’an, Shanxi 7006, China;
. Library ,Xidian University, Xi’an, Shanxi 7007, China)
2009, 23(3): 121-129.
In this paper, a new algorithm is proposed for extracting compound keywords from the Chinese document by the small world network. Using k-nearest-neighbor coupled graph, a Chinese document is first represented as a networkthe node represent the term, and the edge represent the co-occurrence of terms. Then, two variables, clustering coefficient increment and average path length increment, are introduced to measure term's importance and to generate the candidate keyword set. With factors such as co-operation between two any terms of part of speech in a sentence and the neighborhood between any two terms of the candidate set, some related words in the candidate set are combined as the compound keywords. The experimental results show that the algorithm is effective and accurate in comparision with the manual keywords extraction from the same document. The semantic representation by the compound keywords of a document is far more clearer than that of single keywords set, facilitating a better comprehension of the document.
Key words computer application; Chinese information processing; small world network; term network graph; average shortest path length increment; average clustering coefficient increment; compound keywords