近些年来,中文时间信息抽取和处理已经变得越来越重要。然而,很少有研究者关注中文文本中事件信息所对应的时间信息的识别和分析。本文的目的就是确定文本中时间信息和事件信息之间的映射关系。区别于传统的基于规则的方法,本文采用了一种机器学习的方法—基于转换的错误驱动学习—来确定事件相应的时间表达,这种学习算法可以自动的获取和改进规则。使用训练得到的转换规则集后,系统的时间-事件映射错误率减少了9.74%,实验结果表明本系统对基于规则的方法有很好的改进效果。
Abstract
In the past years , temporal information processing and extraction has received increasing attentions. Nevertheless , only a few researchers have investigated the recognition about corresponding temporal expression of the event in Chinese text. The aim of this paper is to investigate both the temporal information extraction and the determining of mapping relation between event and its temporal expression. As compared to many other techniques , we use a machine learning method , transformation-based error-driven learning algorithm to determine the time-event mapping relation. The method can automatically acquire the analytical rules. The system builds an initial time-event tagger firstly. Then by machine learning , the system get a patch rule set to improve the performance of the initial time-event tagger. Using the patch rule set , system gets 6.5% error rate decrease for time-event mapping relation determination. The experiment indicates that the transformation-based error-driven learning is a good patch for based-rule method.
关键词
计算机应用 /
中文信息处理 /
时间信息处理 /
基于转换的错误驱动学习 /
信息抽取
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
Temporal information processing /
transformation-based error-driven learning /
information extraction
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Paola Merlo , Suzanne Stevenson. Automatic Verb Classification Based on Statistical Distributions of Argument Structure[J] . Computational Linguistics. 2001.
[2] Suzanne Stevenson and Paola Merlo. Automatic verb classification using distributions of grammatical features[A] . In : Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics [C] , 1999 ,45 - 52.
[3] Brill , Eric. A simple rule-based part of speech tagger[A] . In : Proceedings of the Third Conference on Applied Natural Language Processing[C] . ACL , 1992. Trento , Italy.
[4] Brill , Eric. A Corpus-Based Approach to Language Learning[D] . Department of Computer and Information Science , University of Pennsylvania. 1993b.
[5] Allen J. F. Towards a general theory of action and time. Artificial Intelligence[J] , 1984 , 23 :123 - 154.
[6] Allen , Papka , A. & Lavrenko , V. On-Line New Event Tracking[A] . Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval [C] , New York : ACM Press. 1998. 37 - 45.
[7] Swan , R. & Allen , J. Automatic Generation of Overview Timelines[A] . Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval [C] , New York : ACM Press. 2000. 49 - 56.
[8] Kalczynski , P. J. , Abramowicz , W. , Wecel , K. A. et al. Time Indexer : A Tool for Extracting Temporal References from Business News[A] . Proceedings of the 2003 Information Resource Management Association International Conference[C] . Philadelphia , PA: Idea Group Inc. 2003. 832 - 835.
[9] Llido , D. , Berlanga , R. & Aramburu , M. J. , Extracting Temporal References to Assign Document-Event Time Periods[A] . Proceedings of the Database and Expert Systems Applications [C] , 12th International Conference (DEXA - 2001) . Berlin : Springer Verlag. 62 - 71.
[10] Wenjie Li , Kam-Fai Wong , Chunfa Yuan , A Design of Temporal Event Extraction from Chinese Financial News. International Journal of Computer Processing of Oriental Languages[J] , 2003. 16 , (1) : 21 - 39.
[11] Qingzhong Li , Wei Gao , Wenjie Li , et al. , Design Issues in a Chinese Financial Information Extraction System [A] , Proceedings of 20th International Conference on Computer Processing of Oriental Languages[ C] , Shenyang , Chgina , August 4 - 6 ,2003 , 417 - 423.
[12] Brill , Eric and Philip Resnik. A transformation-based approach to prepositional phrase attachment disambiguation [A] . In : Proceeding of the Fifteenth International Conference on Computational Linguistics [C] . (COLING - 1994) . Kyoto , Japan.
[13] Brill , Eric. Transformation-based error-driven parsing[C] . In : Proceedings of the third International Workshop on Parsing Technologies[C] , 1993c. Tilburg , The Netherlands.
[14] JianHui Zhou , ChunFa Yuan , Wenjie Li et al , Automatic Acquirement of Information Extraction Rules in Financial Area[A] , Proceedings of 20th International Conference on Computer Processing of Oriental Languages , Shenyang , China[C] , August 4 - 6 2003 , 410 - 416.
[15] Zhu X.D. , Yuan C. F. , Wong K. F. et al. An Algorithm for Situation Classification of Chinese Verbs[A] , In : Proceedings of the second Chinese Language Processing Workshop[C] , Oct. 2000 , Hong Kong , 140 - 145.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
自然科学基金资助项目(69975008);863计划资助项目(2001AA114210)
{{custom_fund}}