自动挖掘科技文献话题,总结发展趋势及最新研究动态,有助于科技工作者的研究。该文提出一种话题发现和趋势分析的方法,该方法首先利用LDA话题模型抽取科技文献的话题,然后计算话题的强度和影响力,最后针对热门和冷门话题以及影响力高和影响力低的话题,进行了趋势分析。该文提出的话题强度和影响力计算方法,可以针对任何文集。对ACL 论文集的实验,显示了计算语言学领域过去的发展状况。和其他方法的对比实验,也验证了该文提出的话题强度和影响力的计算方法是正确和可行的。
Abstract
Automatically extracting topics from scientific literature and finding the research trends are of substantial significance to researchers. In this paper, we use LDA model to generate topics from the scientific literature, then calculate the strength and impact of the topic, and finally, find the trends of the hot topics vs. cold topics, high vs. low impact topics. The method of calculating topic strength and impact is applicable to any document. The experiments on ACL anthology have shown the research trend in computational linguistics. And the contrast experiment also proves validity of the proposed calculating method.
Key wordstopic model; trend analysis
关键词
话题模型 /
趋势分析
{{custom_keyword}} /
Key words
topic model /
trend analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] S.Deerwester, S.Dumais, T.Landauer, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society of Information Science, 1990, 41(6):391-407.
[2] T.Hofmann. Probabilistic Latent Semantic Indexing[C]//Proceedings of the Twenty-Second Annual International SIGIR Conference, 1999.
[3] D.M.Blei,A.Y.Ng, M.I.Jordan. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research,2003,3:993-1022.
[4] D.M.Blei,J.D.Lafferty. A Correlated Topic Model of Science[J]. The Annals of Applied Statistics 2007,1(1):17-35.
[5] D.M.Blei, J.D.Lafferty. Dynamic Topic Model[C]//Proceedings of International conference on Machine Learning,2006,113-120.
[6] M. Rosen-Zvi,T. Griffths,M. Steyvers,et al. The Author-Topic Model for Authors and Documents[C]//Proceedings of the 20th Conference on Uncertainty in Artifcial Intelligence,2004.
[7] Jie Tang,Jing Zhang,Limin Yao,et al. ArnetMiner: Extraction and Mining of Academic Social Networks[C]//Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD).2008: 990-998.
[8] A.Daud,Juanzi Li,Lizhu Zhou et al. Exploiting Temporal Authors Interests via Temporal-Author-Topic Modeling[C]//Proceedings of ADMA,2009,LNAI 5687: 435-443.
[9] Q.He,B.Chen,J.Pei,et al. Detecting Topic Evolution in Scientific Literature: How Can Citations Help[C]//Proceeding of CIKM,2009: 957-966.
[10] Y.Jo,C.Lagoze,C. L.Giles. Detecting Research Topics via the Correlation between Graphs and Texts[C]//Proceedings of KDD,2007: 370-379.
[11] G.S.Mann,D.Mimno, A.McCallum. Bibliometric Impact Measures Leveraging Topic Analysis[C]//Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries,2006.
[12] T.L.Griffiths, M.Steyvers. Finding Scientific Topics[C]//Proceeding of the National Academy of Science,2004: 5228-5235.
[13] D.Hall,D.Jurafsky, C.D.Manning. Studying the History of Ideas Using Topic Models [C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing,2008: 363-371.
[14] 楚克明,李芳. 基于LDA话题关联的话题演化[J]. 上海交通大学学报,2010,44(11): 1501-1506.
[15] 单斌,李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报,2010,24(6):43-49.
[16] Ulrich Sch¨afer Bernd Kiefer Christian Spurk J¨org Steffen Rui Wang. The ACL Anthology Searchbench[C]//Proceedings of the ACL-HLT 2011 System Demonstrations: 7-13.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目 (60873134)
{{custom_fund}}