张恒才,陆锋,仇培元. 基于D-S证据理论的微博客蕴含交通信息提取方法[J]. 中文信息学报, 2015, 29(2): 170-178.
ZHANG Hengcai, LU Feng, QIU Peiyuan. Extracting Traffic Information from MicroBlog Based on D-S Evidence Theory. , 2015, 29(2): 170-178.
基于D-S证据理论的微博客蕴含交通信息提取方法
张恒才,陆锋,仇培元
中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
Extracting Traffic Information from MicroBlog Based on D-S Evidence Theory
ZHANG Hengcai, LU Feng, QIU Peiyuan
State Key Lab of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
Abstract:Micro-Blog messages usually contain a great amount of real-time traffic information which can be expected to become an important data source for city traffic. In this paper, we propose an approach for extracting traffic information from massive micro-blogs based on D-S evidence theory to solve the data fusion problem brought by micro-blogs characteristics of high dynamic, uncertainty and ambiguous narrating. Firstly, an evaluation index system for the traffic information collected from the mass micro-blog messages is built, whose accuracy is enhanced by use of a wikipedia semantic model. Secondly, a function of basic probability assignment is defined for the micro-blog messages with the help of word similarity. Finally, the D-S theory is adopted to judge and fuse the extracted traffic information, throught evidence composition and decision. An experiment on Beijing road networks and Sina Micro-blog platform shows the presented approach can effectively judge the reliability of the traffic information contained in mass micro-blog messages, and can utilize the message contents delivered by different micro-blog users at utmost. Meanwhile, compared with traditional text clustering algorithm, the proposed approach is more accurate.
[1] 陆锋, 郑年波, 段滢滢等. 出行信息服务关键技术研究进展与问题探讨[J]. 中国图象图形学报,2009, 14(7): 1219-1229. [2] Jagan Sankaranarayanan HS, Benjamin E Teitler, Michael D Lieberman, et al. TwitterStand: news in tweets[C]//Proceeding of the GIS 09 Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2009. [3] 杨亮, 林原, 林鸿飞. 基于情感分布的微博热点事件发现[J]. 中文信息学报,2012,26(1): 84-90,109. [4] Abel F, Gao Q, Houben GJ, et al. Semantic enrichment of twitter posts for user profile construction on the social web[J]. The Semanic Web: Research and Applications. 2011: 375-389. [5] Castillo C, Mendoza M, Poblete B. Information credibility on twitter[C]//Proceeding of the ACM 2011: 675-684. [6] Michelson M, Macskassy SA. Discovering users topics of interest on twitter: a first look[C]//Proceeding of the 2010 ACM; 2010: 73-80. [7] Wu X, Wang J. How about micro-blogging service in China: analysis and mining on sina micro-blog[C]//Proceeding of the 2011 ACM; 2011: 37-42. [8] 宗成庆. 统计自然语言处理: 清华大学出版社; 2008. [9] 张剑峰, 夏云庆, 姚建民. 微博文本处理研究综述[J]. 中文信息学报,2012,26(04): 21-27,42. [10] 彭泽映, 俞晓明, 许洪波等. 大规模短文本的不完全聚类[J]. 中文信息学报,2011, 25(01): 54-59. [11] Wang L, Jia Y, Han W. Instant message clustering based on extended vector space model. Springer-Verlag. 2007: 435-443. [12] 程显毅, 朱倩. 文本挖掘原理: 科学出版社; 2010. [13] 白秋产, 金春霞, 周海岩. 概念向量文本聚类算法[J]. 计算机工程与应用,2011, 35: 155-157,209. [14] 赵飞, 周涛, 张良等. 维基百科研究综述[J]. 电子科技大学学报,2010, 40(03): 321-334. [15] 王锦, 王会珍, 张俐. 基于维基百科类别的文本特征表示[J]. 中文信息学报,2011, 25(02): 27-31. [16] Dempster AP. Upper and Lower Probabilities Induced by a Multivalued Mapping[J]. Annals of Mathematical Statistics. 1967, 38(2): 325-339. [17] Shafer G. A Mathematical Theory of Evidence: Princeton University Press. 1976. [18] 郭璘, 方廷健, 叶加圣等. 基于最小二乘支持向量机和证据理论的交通数据融合[J]. 中国科学技术大学学报,2007, 12: 1500-1504. [19] 李德仁, 王树良, 李德毅等. 论空间数据挖掘和知识发现的理论与方法[J]. 武汉大学学报(信息科学版),2002, 3: 221-233. [20] 李晓峰, 张树清, 韩富伟等. 基于多重信息融合的高分辨率遥感影像道路信息提取[J]. 测绘学报,2008, 2: 178-184. [21] 北京市质量技术监督局. 城市道路交通运行评价指标体系[S]. 北京市地方标准 DB11/T 785-2011. [22] 陈传彬, 陆锋, 励惠国等. 自然语言表达实时路况信息的路网匹配融合技术[J]. 中国图象图形学报,2009, 8: 1669-1676. [23] Milne D. Computing semantic relatedness using Wikipedia link structure[C]//Proceedings of the New Zealand Computer Science Research Student Conference, NZ CSRSC07, Hamilton, New Zealand; 2007. [24] Milne D, Witten IH. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links[C]//Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008); 2008. [25] Milne D, Witten IH. Learning to link with wikipedia[C]//Proceedings of the 17th ACM conference on Information and knowledge management. Napa Valley, California, USA: ACM; 2008:509-518. [26] WikipediaDataset. http://dumps.wikimedia.org/zhwiki/20110726/,2011. [27] Joachims T. Making large-scale support vector machine learning practical[J]. In Advances in Kernel Methods—Support Vector Learning: MIT Press; 1999: 169-184. [28] Joachims T. Learning to classify text using support vector machines: Methods, theory and algorithms: Kluwer Academic Publishers; 2002. [29] The Dragon ToolKit. http://dragon.ischool.drexel.edu/, 2008. [30] SVM-Light. http://svmlight.joachims.org/, 2008. [31] 刘青磊, 顾小丰. 基于《知网》的词语相似度算法研究[J]. 中文信息学报,2010, 24(06): 31-36.