基于词汇吸引与排斥模型的共现词提取

郭锋,李绍滋,周昌乐,林颖,李胜睿

PDF(396 KB)
PDF(396 KB)
中文信息学报 ›› 2004, Vol. 18 ›› Issue (6) : 17-23.

基于词汇吸引与排斥模型的共现词提取

  • 郭锋,李绍滋,周昌乐,林颖,李胜睿
作者信息 +

Co-occurrence Word Retrieval Based on the Lexical Attraction and Repulsion Model

  • GUO Feng,LI Shao-zi,ZHOU Chang-le,LIN Ying,LI Sheng-rui
Author information +
History +

摘要

共现词提取在信息挖掘和自然语言处理中有着十分重要的地位。而传统的共现词提取方法仅仅局限在单一的一种统计量上,其结果十分不精确,需要人工再进行整理。本文提出了一种基于词汇吸引与排斥模型的共现词提取算法,并通过将多种常用统计量进行组合,改进了算法的效果。在开放测试环境下,所提取的共现词其用户感兴趣度为60.87%。将该算法应用于基于Web的共现词检索系统,在速度和共现词的提取精度上均取得了比较好的效果。

Abstract

Co-occurrence word retrieval is very important in information mining and natural language processing. But traditional co-occurrence word retrieval methods used only a single statistic method , so the result is very imprecise , and needs lots of manual collation. In this paper we present a co-occurrence words extraction algorithm based on the lexical attraction and repulsion model , and combine some common statistical methods with the algorithm to improve its effect. In the open test , our system’s Interesting performance is 60.87%. We show good performance in speed and precision when applied the algorithm on a co-occurrence search system based on web.

关键词

计算机应用 / 中文信息处理 / 共现词 / 词汇吸引与排斥模型 / 共现距离

Key words

computer application / Chinese information processing / co-occurrence / lexical attraction and repulsion model / co-occurrence distance

引用本文

导出引用
郭锋,李绍滋,周昌乐,林颖,李胜睿. 基于词汇吸引与排斥模型的共现词提取. 中文信息学报. 2004, 18(6): 17-23
GUO Feng,LI Shao-zi,ZHOU Chang-le,LIN Ying,LI Sheng-rui. Co-occurrence Word Retrieval Based on the Lexical Attraction and Repulsion Model. Journal of Chinese Information Processing. 2004, 18(6): 17-23

参考文献

[1] Ying Ding , IR and AI. Using Co-occurrence Theory to Generate Lightweight Ontologies[A] . Proceedings of 12th International Workshop on Database and Expert Systems Applications[C] , Pages :961 - 965 , Sept. ,2001.
[2] 吴光远,何丕廉,等. 基于向量空间模型的词共现研究及其在文本分类中的应用[J] . 计算机应用 ,2003 , 23 (6) : 138 - 145.
[3] El-Sayed Atlam , A New Method for Construction Field Association Terms Using Co-occurrence Words and Declinable Words Information[A] . Proceedings of 2002 IEEE International Conference on Systems , Man and Cybernetics[C] , Volume 4 ,Pages :5 , Oct. 2002 .
[4] Yuen-Hsien Tseng , Fast Co-occurrence Thesaurus Construction for Chinese News[A] . Proceedings of 2001 IEEE International Conference on Systems , Man , and Cybernetics[C] , Volume 2 , Pages :853 - 858 , Oct. 2001.
[5] Doug Beeferman , Adam Berger , John Lafferty. A Model of Lexical Attraction and Repulsion[A] . Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. [C] , Pages :373 - 380 , 1997.
[6] 王丽坤,王宏,等. 文本挖掘及其关键技术与方法[J] . 计算机科学, 2002 , 29 (12) : 12 - 19.
[7] 许伟,黄昌宁,等. 基于语料库的语言建模[J] . 清华大学学报, 1997 , 37 (3) : 71 - 75.
[8] 罗盛芬,孙茂松. 基于字串内部结合紧密度的汉语自动抽词实验研究[J]. 中文信息学报, 2003 , 17(3) : 9 - 14.
[9] Ido Dagan , Shaul Marcus. Contextual word similarity and estimation from sparse data[J] . Computer Speech and Language , Vol. 9 ,Pages :123 - 152 ,1995. 9.

基金

福建省自然科学基金资助项目(A0310009);福建省重点科技资助项目(2001J005)
PDF(396 KB)

Accesses

Citation

Detail

段落导航
相关文章

/