该文提出一种统计与规则相结合的时间表达式识别方法。首先,通过分析中文文本中时间表达式的词形、词性和上下文信息,采用条件随机场识别时间单元而非时间表达式整体,避免了中文时间表达式边界定位不准确的问题;然后,从训练语料中自动获取候选触发词,并依据评价函数对候选触发词打分,筛选出正确的触发词完善触发词库;最后,根据时间触发词库与时间缀词库,制定规则对时间表达式边界进行定位。实验结果显示开式测试F1值达到98.31%。
Abstract
This paper proposes a generic algorithm for time expression recognition task by combining rules with statistics. By analyzing a set of linguistic features of time expressions such as lexical features and context information, Conditional Random Fields (CRF) is applied to recognize time unit rather than time expression so as to, avoid the boundary localization problems in Chinese time expressions. In addition, the candidate trigger words are automatically obtained from the test corpus, refining the trigger thesaurus by a designed score function. Finally, rules for the time expression boundary localization are formulated based on time trigger thesaurus and time affix word thesaurus. Our experimental results show that the F1 value reaches 98.31% in an open test.
关键词
CRF /
规则 /
时间触发词 /
时间缀词
{{custom_keyword}} /
Key words
CRF /
rule /
time trigger /
time affix word
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 高霄云,杨建林.基于规则的中文时间词和数词的自动识别算法[J].现代图书情报技术,2007(3): 46-50.
[2] Mingli Wu,Wenjie Li,Qin Lu,et al. A Chinese Temporal Parser for Extracting And Normalizing Temporal Information[C]//Proceedings of international Joint Conference on Natural Language Processing (IJCNLP),2005(3651): 694-706.
[3] 乌桐,周雅倩,黄萱菁等.自动构建时间基元规则库的中文表达式识别[J].中文信息学报,2010,24(4): 3-10.
[4] 林静,曹德芳,苑春法.中文时间信息的TIMEX2自动标注[J].清华大学学报(自然科学版),2008,48(1): 117-120.
[5] Ferro L, Gerber L, Mani I, et al.TIDES 2003 Standard for fhe Annotation of Temporal Expressions[EB/OL]. http://timex2.mitre.org.2003.
[6] Ferro L, Gerber L, Mani I, et al.TIDES 2005 Standard for fhe Annotation of Temporal Expressions[EB/OL]. http://timex2.mitre.org. 2005.
[7] Pawel Maqur,Robert Dale. A Rule Based Approach to Temporal Expression tagging[C] //Proceedings of the International Multiconference on Computer Science and Information Technology.2007,293-03.
[8] 赵紫玉,徐金安,张玉洁,等.规则与统计相结合的日语时间表达式识别[J].中文信息学报,2013,27(6): 192-200.
[9] 李君婵,谭红叶,王凤娥.中文时间表达式及类型识别[J].计算机科学,2012,39(11A): 191-211.
[10] David Ahn,Sisay Fissaha Adafre,Maarten De Rijke.Towards Task-Based Temporal Extraction and Recognition[C]//Proceedings of Dagstuhl Workshop on Annotating, Extracting, and Reasoning about Time and Events,2005.
[11] 朱莎莎,刘宗田,付剑锋,等.基于条件随机场的中文时间短语识别[J].计算机工程, 2011,37(15): 164-167.
[12] 贺瑞芳,秦兵,刘挺,等.基于依存分析和错误驱动的中文时间表达式识别[J].中文信息学报,2007,21(5): 36-40.
[13] 刘莉,何中市,邢欣来,等.基于语义角色的中文时间表达式识别[J].计算机应用研究,2011,28(7): 2543-2545.
[14] Gerber L,Huang S,Wang X. Standard for fhe Annotation of Temporal Expressions, Chinese supplement draft[EB/OL].//timex2.mitre.org.2004.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61173100,61173101,61272375)
{{custom_fund}}