基于短文本信息流的回顾式话题识别模型

周 泓,刘金岭,王新功

PDF(2077 KB)
PDF(2077 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (1) : 111-117.
信息抽取与文本挖掘

基于短文本信息流的回顾式话题识别模型

  • 周 泓1,刘金岭1,王新功2
作者信息 +

Retrospective Topic Identification Model for Short Text Information Flow

  • ZHOU Hong1, LIU Jinling1 , WANG Xingong2
Author information +
History +

摘要

近几年来,短文本信息流广泛应用于一些全民媒体,它在公开传递信息同时携带了丰富且具有极大价值的信息资源。该文提出了一种回顾式话题识别模型,改进了权值计算方法,有效提取了具有较强分辨话题能力的关键词,在聚类过程中将BIC值作为话题类别合并依据,提高了聚类的准确率。通过进行时间段分隔和去掉孤立点信息提高了算法的效率。实验结果表明,该方法有效地提高了短文本信息流的话题检测准确率和效率。

Abstract

In recent years, the short text information flow has occured in some public media. For this kind of data, a retrospective topic identification model is presented with an improved weight estimation. It employes the value of BIC for clustering to improve the clustering accuracy. By dividing the time segments and removing isolated information point, the efficiency of the algorithm is further improved. The experimental results show that this method achieves good accuracy and efficiency in the topic detection of the short text information flow.

关键词

短文本 / 信息流 / 话题识别 / 聚类

Key words

short text / information flow / topic identification / clustering

引用本文

导出引用
周 泓,刘金岭,王新功. 基于短文本信息流的回顾式话题识别模型. 中文信息学报. 2015, 29(1): 111-117
ZHOU Hong, LIU Jinling , WANG Xingong. Retrospective Topic Identification Model for Short Text Information Flow. Journal of Chinese Information Processing. 2015, 29(1): 111-117

参考文献

[1] Wang ZM,Zhou XS. A topic detection method based on bicharacteristic vectors[C]//Proceedings of the Intl Conf. on Networks Security,Wireless Communications and Trusted Computing. Vol. 2. Washington: IEEE Computer Society, 2009. 683-687.
[2] Allan J, Papka R.On-line new event detection and tracking[C]//Proceedings of the 21 st Annual International ACM SIGIR Conference on Research and Devel-
opment in Information Retrieval. Melbourne:ACM Press,1998.37-45.
[3] 赵华,赵铁军,张姝,等.基于内容分析的话题识别研究[J].哈尔滨工业大学学报,2006,38(10) : 1740-1743.
[4] Seo YW,Sycara K.Text clustering for topic detection[C]//Proceedings of the Pittsburgh: Robotics Institute, Carnegie Mellon University, 2004. 1-11.
[5] 骆卫华,于满泉,许洪波,等.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36.
[6] Sakaki Ti,Okazzki M,Matsuo Y.Earthquake Shakes Twitter User:Real-time Event Detection Detection by Social Sensors[C]//Proceedings of the 19th International Conference on World Wide Web,2010. Raleigh,North Carolina:ACM Press,2010:851-861.
[7] Petrovi S,Osborne M,Lavrenko V.Streaming First Story Detection with application to Twitter[C]//Proceedings of HLTNAACL,2010. stroudsburg,PA,USA:Association for Computational Linguistics,2010:181-189.
[8] Liu Zitao,Yu Wenchao,Chen Wei,et al.Short Text Feature Selection for Micro-blog Mining[C]//Computational Intelligence and Softeare Engineering,2010. Wuhan, China:Wuhan Unive- sity, 2010:1-4.
[9] Pelleg D, Moore A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters[C]//Proceedings 17th ICML. Stanford University.2000.727-734.
[10] 张小明,李舟军,巢文涵.基于增量型聚类的自动话题识别研究[J].软件学报,2012,23(6): 1578-1587.
[11] 刘金岭. 基于语义密度的文本聚类研究[J].计算机工程,2010,36(5):81-83.
[12] 王强,关毅,王晓龙.基于标题类别语义识别的文本分类算法研究[J].电子与信息学报,2007,29(12):2886-2890.
[13] 刘金岭.基于降维的短信文本语义分类及主题提取[J].计算机工程与应用,2010,46(23): 159-161.
[14] 黄九鸣,吴泉源,刘春阳,等.短信文本信息流的无监督会话抽取技术[J].软件学报,2012,23(4):735-747.
[15] NIST.The 2004 Topic Detection and Tracking(TDT2004) Task Definition and Evaluation Plan version1.1c[EB/OL]. http://www.nist.gov.
[16] M. E. J. Newman. Powerlaws, Pareto distributions and Zipfs law[J]. Contemporary Physics, 2005,46(5):323-351.

基金

河北省科技支撑计划项目(10213581);淮安市社会发展项目(HASZ2012046);淮安市科技支撑计划(工业)项目(HAG2012086)
PDF(2077 KB)

525

Accesses

0

Citation

Detail

段落导航
相关文章

/