Automatic Event Labeling for Traffic Information Extraction from Microblogs
QIU Peiyuan1,2, ZHANG Hengcai1, YU Li1,2, LU Feng1
1 State Key Lab of Resources and Environmental Information System, IGSNRR, CAS, Beijing 100101, China; 1 University of Chinese Academy of Sciences, Beijing 100101, China
Abstract:Microblog messages usually contain a great amount of real-time traffic information which can complement the sensor based traffic information collecting technologies. In this paper, we propose an automatic event labeling method to extract traffic information from microblog messages. Specifically, we apply the spatial relation identification between geographic entities in event extraction to determine the spatial elements in traffic event messages. Firstly, a conditional random field model is used to label the event role in the message texts. Secondly, the relations between the roles and the relations between the elements are tagged by SVM models. The experiment on Sina microblogs shows the precision and recall of the proposed approach are both over 90%, which is superior to the well-known pattern matching method.
[1] 陆锋, 郑年波, 段滢滢等. 出行信息服务关键技术研究进展与问题探讨[J]. 中国图象图形学报, 2009, 14(07): 1219-1229. [2] 赵妍妍, 秦兵, 车万翔等. 中文事件抽取技术研究[J]. 中文信息学报, 2008, 22(01): 3-8. [3] 郑家恒, 王兴义, 李飞. 信息抽取模式自动生成方法的研究[J]. 中文信息学报, 2004, 18(01): 48-54. [4] 张春菊. 中文文本中事件时空与属性信息解析方法研究[D]. 南京师范大学博士学位论文, 2013. [5] Chieu H L, Ng H T. A Maximum Entropy Approach to Information Extraction from Semi-structured and Free Text[C]//Proceedings of the 18th National Conference on Artificial Intelligence. Menlo Park, CA, USA, 2002: 786-791. [6] Kordjamshidi P, Van Otterlo M, Moens M-F. Spatial Role Labeling: Towards Extraction of Spatial Relations from Natural Language[J]. ACM Transactions on Speech and Language Processing, 2011, 8(3): 4:1-4:36. [7] Kordjamshidi P, Frasconi P, Otterlo M V, et al. Relational Learning for Spatial Relation Extraction from Natural Language[G]//Muggleton S H, Tamaddoni-Nezhad A, Lisi F A. Inductive Logic Programming. Springer Berlin Heidelberg, 2012: 204-220. [8] Sankaranarayanan J, Samet H, Teitler B E, et al. TwitterStand: news in tweets[C]//Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’09). Seattle, Washington, 2009: 42-51. [9] Strtgen J, Gertz M, Popov P. Extraction and Exploration of Spatio-temporal Information in Documents[C]//Proceedings of the 6th Workshop on Geographic Information Retrieval. Zurich, Switzerland, 2010: 16:1-16:8. [10] Lingad J, Karimi S, Yin J. Location extraction from disaster-related microblogs[C]//Proceedings of the 22nd international conference on World Wide Web companion (WWW ’13 Companion). Rio de Janeiro, Brazil: 2013: 1017-1020. [11] Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twitter users: real-time event detection by social sensors[C]//Proceedings of the 19th international conference on World wide web (WWW’10). Raleigh, North Carolina, USA, 2010: 851-860. [12] Schulz A, Hadjakos A, Paulheim H, et al. A Multi-Indicator Approach for Geolocalization of Tweets[C]//Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM 2013). Boston, USA: 2013: 573-582. [13] Rauch E, Bukatin M, Baker K. A Confidence-based Framework for Disambiguating Geographic Terms[C]//Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References - Volume 1. Edmonton, Canada, 2003: 50-54. [14] Pouliquen B, Kimler M, Steinberger R, et al. Geocoding multilingual texts: Recognition, disambiguation and visualisation[C]//Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006). Genoa, Italy, 2006: 53-58. [15] 陈传彬, 陆锋, 励惠国等. 自然语言表达实时路况信息的路网匹配融合技术[J]. 中国图象图形学报, 2009, 14(8): 1669-1676. [16] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting andLabeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning (ICML ’01). Williamstown, MA, USA, 2001: 282-289. [17] Peng F, McCallum A. Information extraction from research papers using conditional random fields[J]. Information Processing and Management, 2006, 42(4): 963-979. [18] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297. [19] Fernández-Delgado M, Cernadas E, Barro S, et al. Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?[J]. Journal of Machine Learning Research, 2014, 15(1): 3133-3181. [20] Kosala R, Adi E, Steven. Harvesting Real Time Traffic Information from Twitter[J]. Procedia Engineering, 2012, 50: 1-11. [21] Wanichayapong N, Pruthipunyaskul W, Pattara-Atikom W, et al. Social-based traffic information extraction and classification[C]//Proceedings of the 11th International Conference on ITS Telecommunications (ITST 2011). St. Petersburg, Russia, 2011: 107-112. [22] Endarnoto S K, Pradipta S, Nugroho A S, et al. Traffic Condition Information Extraction & Visualization from Social Media Twitter for Android Mobile Application[C]//Proceedings of the 2011 International Conference on Electrical Engineering and Informatics (ICEEI 2011). Bandung, Indonesia, 2011: 1-4. [23] 程显毅, 朱倩. 文本挖掘原理[M]. 北京: 科学出版社, 2010. [24] 张恒才, 陆锋, 仇培元. 基于D-S证据理论的微博客蕴含交通信息提取方法[J]. 中文信息学报, 2015,29(2): 170-178.