维基百科中争议性文章的发现方法研究

常天舒,林鸿飞

PDF(1960 KB)
PDF(1960 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (4) : 76-83.
信息提取和文本挖掘

维基百科中争议性文章的发现方法研究

  • 常天舒,林鸿飞
作者信息 +

Identifying Controversial Articles in Wikipedia

  • CHANG Tianshu, LIN Hongfei
Author information +
History +

摘要

维基百科收录的文章和参与编辑的用户日益增多,其中不乏一些用户对同一条目持有不同的见解。该文旨在发现维基百科中的争议性文章,通过维基百科提供的历史信息,在传统的挖掘方法基础上,对具有特殊属性的用户角色进行总结并融合到排序模型中,探讨这类用户对争议性文章挖掘的作用。在 16-745篇文章组成的数据集上进行了实验,除传统的PRF和NDCG评价外,该文给出了更直观的排序结果,与其他基准模型相比有较大的提升。

Abstract

The number of Wikipedia articles and contributors grows at a very fast pace, therefore, a remarkable property of some Wikipedia articles were written by up to thousands of authors who have contradicting opinions. This paper aims to indentify controversial articles in Wikipedia. It draws clues from the edit history page in Wikipedia based on the traditional methods, and takes into account the contributors of the corresponding article to compute controversial scores. We also introduce a new intuitive evaluation method besides the PRF and NDCG evaluation metrics. Experiments on 16745 Wikipedia articles show that our methods perform much better than the other baseline models.

关键词

维基百科 / 争议度排序 / 社会网络分析

Key words

Wikipedia / Controversy Rank / Social Network Analysis

引用本文

导出引用
常天舒,林鸿飞. 维基百科中争议性文章的发现方法研究. 中文信息学报. 2014, 28(4): 76-83
CHANG Tianshu, LIN Hongfei. Identifying Controversial Articles in Wikipedia. Journal of Chinese Information Processing. 2014, 28(4): 76-83

参考文献

[1] Wikipedia. What is Wikipedia [OL].http://wikipedia.jaylee.cn/.
[2] J Giles. Internet encyclopedias go head to head [OL]. http://www.nature.com/news/2005/051212/full/438 900a.html.
[3] V Franco, R Piirto, H Y Hu, et al. Anatomy of a flame: conflict and community building on the Internet [J]. Tech. and Society Magazine, IEEE, 1995,14: 12-21.
[4] B Q Vuong, E P Lim, A Sun, et al. On ranking controversies in Wikipedia: models and evaluation[C]//Proceedings of the International Conference on Web Search and Web Data Mining (WSDM08), Palo Alto, California, USA, February 11-12, 2008: 171-182.
[5] N Lipka, B Stein. Identifying featured articles in Wikipedia: writing style matters[C]//Proceedings of International World Wide Web Conferences (WWW10). Raleigh, North Carolina, USA, 2010: 1147-1148.
[6] B T Adler, L de Alfaro. A content-driven reputation system for the Wikipedia[C]//Proceedings of International World Wide Web Conferences (WWW07), Banff, Canada, 2007: 261-270.
[7] J E Blumenstock. Size matters: word count as a measure of quality on Wikipedia[C]//Proceedings of International World Wide Web Conferences (WWW08), Beijing, China, 2008: 1095-1096.
[8] A Kittur, B Suh, B A Pendleton, et al. He says, she says: conflict and coordination in Wikipedia[C]//Proceedings of SIGCHI Conf. Human Factors in Computing Systems, Son Jose, California, USA, 2007: 453-462.
[9] U Brandes, P Kenis, J Lerner, et al. Network analysis of collaboration structure in Wikipedia[C]//Proceedings of International World Wide Web Conferences (WWW09), Madrid, Spain, 2009: 731-740.
[10] U Brandes, J Lerner. Visual analysis of controversy in contributor-generated encyclopedias [J]. Information Visualization, 2008,11: 34-48.
[11] R Jesus. Bipartite networks of wikipedias articles and authors: a meso-level approach[C]//Proceedings of International Symposium on Wikis and Open Collaboration (WikiSym09). Orlando, Florida, USA, 2009: 1-10.

基金

国家自然科学基金(60673039,60973068);国家社科基金(08BTQ025);教育部留学回国人员科研启动基金和高等学校博士学科点专项科研基金资助课题(20090041110002,201100411100 34)
PDF(1960 KB)

703

Accesses

0

Citation

Detail

段落导航
相关文章

/