基于主题网络的伪主题分析

闫蓉,高光来

PDF(3110 KB)
PDF(3110 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (12) : 100-108.
信息抽取与文本挖掘

基于主题网络的伪主题分析

  • 闫蓉1,2,高光来1,2
作者信息 +

Pseudo Topic Analysis Based on Topic Network

  • YAN Rong1,2, GAO Guanglai1,2
Author information +
History +

摘要

传统无监督的主题建模方法利用相互独立的主题变量抽象描述文本语义,忽略了各主题内部隐含的结构和联系,粗粒化的文本主题分析加剧了“强制主题”问题对文本建模的影响。该文通过研究主题网络社区内部结构,结合主题内部语义耦合关系与网络拓扑结构,提出伪主题分析方法来识别和解释主题,实现从网络结构角度描述文本语义特征,弥补统计主题分析方法对文本语义结构刻画的不足。

Abstract

This paper proposed a novel pseudo topic analysis approach based on the community structure in the topic network and the relationships between the topics. It represents the text semantics from the perspective of network structure, which is a remedy to existing statistical topic modeling methods.

关键词

伪主题分析 / 主题网络 / 文本理解

Key words

pseudo topic analysis / topical network / text understanding

引用本文

导出引用
闫蓉,高光来. 基于主题网络的伪主题分析. 中文信息学报. 2018, 32(12): 100-108
YAN Rong, GAO Guanglai. Pseudo Topic Analysis Based on Topic Network. Journal of Chinese Information Processing. 2018, 32(12): 100-108

参考文献

[1] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3: 993-1022.
[2] Hofmann T.Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 1999),1999,50-57.
[3] Zhai C X.Probabilistic topic models for text data retrieval and analysis[C]//Proceedings of the 40th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 2017),ACM Press,New York,NY,2017,1399-1401.
[4] Lancichinetti A,et al.High-reproducibility and high-accuracy method for automated topic classification[J].Physical Review X,2014,5(1),No.11007: 1-11.
[5] Blei D M,Griffiths T L,Jordan M I.The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies[J].Journal of the ACM,2010,57(2):17-24.
[6] Blei D M,Lafferty J D.Correlated topic models[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems,2005,147-154.
[7] Allison June-Barlow C,Blei D M.Visualizing topic models[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media,2012,419-422.
[8] Wei F R,et al.TIARA: a visual exploratory text analytic system[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010,153-162.
[9] Smith A,et al.Concurrent visualization of relationships between words and topics in topic models[C]//Proceedings of the Workshop on Interactive Language Learning,Visualization,and Interfaces.Baltimore: ACL,2014,79-82.
[10] Li X,et al.Group topic model: organizing topics into groups[J].Information Retrieval,2015,18(1):1-25.
[11] 刘知远,孙茂松.汉语词同现网络的小世界效应和无标度特性[J].中文信息学报,2007,21(6):52-58.
[12] Cong J,Liu H.Approaching human language with complex networks[J].Physics of Life Reviews,2014,11(4):598-618.
[13] Kulig A,et al.Modeling the average shortest-path length in growth of word-adjacency networks[J].Physical Review E Statistical Nonlinear and Soft Matter Physics,2015,91(3):032810.
[14] Girvan M,Newman M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences of the United States of America (PNAS),2001,99(12):7821-7826.
[15] Edilson A C J,Alneu A L,Diego R A.Word sense disambiguation: a complex network approach[J].Information Sciences,2018,442-443: 103-113.
[16] De A H F,Costa L D F,Amancio D R.Using complex networks for text classification: Discriminating informative and imaginative documents[J].Epl,2016,113(2):28007.
[17] Yu L,et al.TIIREC: a tensor approach for tag-driven item recommendation with sparse user generated content[J].Information Sciences,2017,411: 122-135.
[18] Zhou G R,Chen G.Hierarchical latent semantic mapping for automated topic generation[J].International Journal of Networked and Distributed Computing,2016,4(2):127-136.
[19] Lancichinetti A,et al.High-reproducibility and high-accuracy method for automated topic classification[J].Physical Review X,2015,5(1): (011007)1-11.
[20] Arruda H F D,Costa L da F,Amancio D R.Topic segmentation via community detection in complex networks[J].Chaos: An Interdisciplinary Journal of Nonlinear Science.2016,26(6):163-222.
[21] Akimushkin C,Amancio D R,Jr O O.Text authorship identified using the dynamics of word co-occurrence networks[J].PLoS ONE,2017,12(1):1-15.
[22] Chen Q,Guo X,Bai H.Semantic-based topic detection usingmarkov decision processes[J].Neurocomputing,2017,242:40-50.
[23] Fu J C,et al.Leaders in communities of real-world networks[J].Physica A,2016,444:428-441.
[24] Fu J C,Zhang W X,Wu J L.Identification of leader and self-organizing communities in complex networks[J].Scientific Reports,2017,7(704):1-10.
[25] Blondel V D,et al.Fast unfolding of community hierarchies in large networks[J].Journal of Statistical Mechanics: Theory and Experiment,2008,10:1-12.
[26] Shen H W,et al.Detect overlapping and hierarchical community structure in networks[J].Physica A Statistical Mechanics and Its Applications,2009,388(8):1706-1712.

基金

国家自然科学基金(61662053);内蒙古自然科学基金(2018MS06025);内蒙古大学高层次人才项目(21500-5175128)
PDF(3110 KB)

657

Accesses

0

Citation

Detail

段落导航
相关文章

/