该文提出一种基于正则文法的时间表达式识别算法 它基于“时间基元”①进行规则构建,提高了时间表达式识别的召回率;同时使用基于错误驱动思想的规则剪枝算法,削减了从训练语料带来的噪声,提高了识别的正确率,两者搭配有效提高了系统整体性能。在ACE07中文语料上的实验结果显著超过了现有水平,F-score达到89.9%。该文提出的算法具有很好的通用性和扩展性,加以改进将可以有更广泛的应用。
Abstract
This paper proposes a generic algorithm for Time Expression Recognition (TER) task based on regular expressions. The algorithm generates rules based on “Basic Time Unit”, which improves the recall value. And it prunes the rule collection through error driven method and reduces the “noise” taken from training corpus, which leads to a high precision. The two features jointlyimprove the overall efficiency of our method compared to the baseline systemwith a significant better performance of up to 89.9% F-score on ACE07 Chinese Corpus. In addition, the proposed algorithm has good adaptablility and scalability for a broader application.
Key wordscomputer application; Chinese information processing; time expression recognition; basic time unit; Timex2; error-driven; regular expression
关键词
计算机应用 /
中文信息处理 /
时间表达式识别 /
时间基元 /
Timex2 /
错误驱动 /
正则表达式
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
time expression recognition /
basic time unit /
Timex2 /
error-driven /
regular expression
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Seok Bae Jang, Jennifer Baldwin. Inderjeet Mani Automatic TIMEX2 Tagging of Korean News [J]. ACM Transactions on Asian Language Information processing (TALIP), 2004, 3(1): 51-65.
[2] Nikolai Vazov A System for Extraction of Temporal Expressions from French Texts based on Syntactic and Semantic Constraints [C]//Proceedings of the workshop on Temporal and spatial information processing, 2001, Volume 13: Article No.14:1-8.
[3] Estela Saquete, Patricio Martinez-barco. Rafael Mufioz Recognizing and Tagging Temporal Expressions in Spanish [C]//Workshop on Annotation Standards for Temporal Information in Natural Language (LREC), 2002: 44-51.
[4] Mingli Wu, Wenjie Li, Qin Lu, Baoli Li. A Chinese Temporal Parser for Extracting and Normalizing Temporal Information [C]//International Joint Conference on Natural Language Processing (IJCNLP), 2005, Volume 3651: 694-706.
[5] David Ahn, Sisay Fissaha Adafre, Maarten De Rijke Towards Task-Based Temporal Extraction and Recognition [C]//Proceedings Dagstuhl Workshop on Annotating, Extracting, and Reasoning about Time and Events, 2005.
[6] Kadri Hacioglu, Ying Chen. Benjamin Douglas Automatic Time Expression Labeling for English and Chinese Text [C]//Computational Linguistics and Intelligent Text Processing (CICLing), 2005, Volume 3406: 548-559.
[7] 林静,曹德芳, 苑春法. 中文时间信息的TIMEX2自动标注 [J]. 清华大学学报(自然科学版), 2008, 48(1):117-120.
[8] 贺瑞芳, 秦兵, 刘挺, 潘越群, 李生. 基于依存分析和错误驱动的中文时间表达式识别 [J]. 中文信息学报, 2007, 21(5):36-40.
[9] 贺瑞芳, 秦兵,潘越群, 刘挺, 李生. 基于启发式错误驱动学习的中文时间表达式识别 [J]. 高技术通讯, 2008, 18(12):1258-1262.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60503070)
{{custom_fund}}