面向动态主题数的话题演化分析

方 莹,黄河燕,辛 欣,魏骁驰,庄 琨

PDF(3694 KB)
PDF(3694 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (3) : 142-149.
信息提取和文本挖掘

面向动态主题数的话题演化分析

  • 方 莹1,2,黄河燕1,辛 欣1,魏骁驰1,庄 琨1
作者信息 +

Topic Evolutionary Analysis for Dynamic Topic Number

  • FANG Ying1,2,HUANG Heyan1, XIN Xin1, WEI Xiaochi1, ZHUANG Kun1
Author information +
History +

摘要

话题演化用于自动分析话题变化趋势,具有较高的应用和研究价值。ILDA(Infinite Latent Dirichlet Allocation)模型在LDA(Latent Dirichlet Allocation)模型的基础上增加了狄利克雷过程,除了能获取隐变量,更重要的是能完成超参的动态更新和主题数的变动。而已有的话题演化研究中,话题的主题数需要事先指定且无法变动,基于ILDA模型的方法则可以针对性地解决该问题。构建的话题演化分析系统可实现如下功能:各周期内按不同主题分类、相邻周期间的主题进行关联、按时间顺序计算子话题强度。实验显示,基于ILDA模型的参数动态更新符合实际需求,话题演化分析过程完善可行。

Abstract

Topic evolution for the topic changing trend analysisis of significance in both application and research. On the basis of LDA (Latent Dirichlet Allocation) model, ILDA (Infinite Latent Dirichlet Allocation) model is enhanced with a Dirichlet process. The ILDA model can not only acquire the latent variable, but also update the super-parameters and change the topic number dynamically. In the existing topic evolution systems, the topic number is pre-defined without permission to change. The method based on ILDA model aims to resolve this by enabling the following: different topics for classification in each cycle, topic association between adjacent cycles and the sub topic strength calculation in time sequence. The experiments show that the variable updating of the parameters meet the actual demand, resulting a satisfactory process of topic evolution analysis.

关键词

主题模型 / 无参混合模型 / 狄利克雷过程 / 话题演化

Key words

topic model / non-parameter mixture model / Dirichlet process / topic evolution

引用本文

导出引用
方 莹,黄河燕,辛 欣,魏骁驰,庄 琨. 面向动态主题数的话题演化分析. 中文信息学报. 2014, 28(3): 142-149
FANG Ying1,2,HUANG Heyan1, XIN Xin1, WEI Xiaochi1, ZHUANG Kun1. Topic Evolutionary Analysis for Dynamic Topic Number. Journal of Chinese Information Processing. 2014, 28(3): 142-149

参考文献

[1] Allan J. Introduction to topic detection and tracking[M]. Topic detection and tracking. Springer US, 2002: 1-16.
[2] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6): 71-87.
[3] Gohr A, Hinneburg A, Schult R, et al. Topic Evolution in a Stream of Documents[C]//Proceedings of the SDM09:859-872.
[4] 吕楠,罗军勇,刘尧,等.一种有效的事件演化分析算法[J].计算机应用研究,2009,26(11):4101-4103.
[5] Martinovic J, Gajdo P. Vector model improvement by fca and topic evolution[C]//Proceedings of the DATESO. 2005, 129: 46-57.
[6] Mei Q, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining[C]//Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005: 198-207.
[7] 崔凯,周斌,贾焰,等.一种基于LDA的在线主题演化挖掘模型[J].计算机科学,2010,37(11): 156-159,163.
[8] Song X, Lin C Y, Tseng B L, et al. Modeling and predicting personal information dissemination behavior[C]//Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005: 479-488.
[9] AlSumait L, Barbará D, Domeniconi C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking[C]//Proceedings of the Data Mining, 2008. ICDM08. Eighth IEEE International Conference on. IEEE, 2008: 3-12.
[10] Heinrich G. Infinite LDA implementing the HDP with minimum code complexity[J]. Technical note, 2011: 170.
[11] Blei D, Ng A, Jordan M. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research,2002,3:993-1022.
[12] 单斌, 李芳. 基于LDA话题演化研究方法综述[J].中文信息学报,2010, 24(6): 43-49.
[13] Gershman S J, Blei D M. A tutorial on Bayesian nonparametric models[J]. Journal of Mathematical Psychology, 2012, 56(1): 1-12.
[14] 贺亮,李芳. 基于话题模型的科技文献话题发现和趋势分析[J].中文信息学报,2012: 26(2): 109-115.
[15] Teh Y W, Jordan M I, Beal M J, et al. Hierarchical Dirichlet processes[J]. American Statistical Association, 2006,101(476): 1566-1581.
[16] NIPS corpus[DB/OL]. http://sourceforge.net/Projects/knowceans/files knowceans/ knowceans-tools/ nips-corpus-20111205.zip download.
[17] Zheng Z, Wu X, Srihari R. Feature selection for text categorization on imbalanced data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 80-89.

基金

国家973项目(2013CB329605,2013CB29606),国家自然科学基金(61202244),商丘师范学院青年骨干教师资助项目(2013GGJS013)
PDF(3694 KB)

667

Accesses

0

Citation

Detail

段落导航
相关文章

/