模糊限制语用于表示不确定性的观点。由模糊限制语所引导的信息为模糊限制信息,开展中文模糊限制信息检测研究,对事实信息抽取意义重大。模糊限制信息检测包含模糊限制性句子识别和模糊限制信息范围检测两个子任务。中文模糊限制信息范围语料库的缺乏,影响了中文模糊限制信息检测的研究。该文研究制定了基于短语结构的中文模糊限制信息范围标注规则,构建了中文模糊限制信息范围语料库。最后对标注的语料库进行了统计和分析。该文语料库的构建为中文模糊限制信息检测研究提供了资源支持。
Abstract
Hedge is usually used to express uncertainty. Hedge information indicates that authors do not backup their statements with facts. Chinese hedge information detection is of great significance for Chinese factual information extraction. Hedge information detection contains two subtasks: identifying hedges and detecting the in-sentence scopes of hedge cues. The lack of Chinese hedge scope corpus has limited the research of Chinese hedge scope information detection. This paper first manually crafted the syntactic rules for Chinese hedge scope annotation, and then constructs a Chinese hedge scope corpus. Finally, we statistically analyzed the corpus. The construction of the corpus provides a great support for Chinese uncertainty detection.
关键词
中文模糊限制信息范围 /
标注规则 /
语料库
{{custom_keyword}} /
Key words
Chinese hedge scope /
annotation rules /
corpus
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Lakoff G. Hedges: a study in meaning criteria and the logic of fuzzy concepts [J]. Journal of Philosophical Logic, 1973, 2(4): 458-508.
[2] Prince E F,Frader J, Bosk C. On hedging in physician-physician discourse [J]. Linguistics and the Professions, 1982: 83-97.
[3] Farkas R, Vincze V, Móra G, et al. The CoNLL 2010 Shared Task: Learning to detect hedges and their scope in natural language text [C]//Proceedings of the CoNLL, Uppsala, Sweden, 2010: 1-12.
[4] Vincze V, Szarvas G, Farkas R, et al. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes [J]. BMC Bioinformatics, 2008, 9(11): S9.
[5] Saurí R and Pustejovsky J. FactBank: A corpus annotated with event factuality [J]. Language Resources and Evaluation, 2009, 43(3): 227-268.
[6] Tang B Z, Wang X L, Wang X, et al. A cascade method for detecting hedges and their scope in natural language text [C]//Proceedings of the CoNLL, Uppsala, Sweden, 2010: 25-29.
[7] 邹博伟, 周国栋, 朱巧明. 否定与不确定信息抽取研究综述[J]. 中文信息学报,2015, 29(4): 16-24.
[8] 周惠巍, 杨欢, 黄德根, 等. 基于句法结构约束的模糊限制信息范围检测[J]. 中文信息学报,2013, 27(5): 137-143.
[9] 何自然. 模糊限制语与言语交际[J]. 外国语(上海外国语学院学报), 1985, (5): 27-31.
[10] 贾晓凡, 蒋跃. 基于小型语料库的模糊限制语分类方法的对比研究[J]. 外语艺术教育研究, 2011, (3): 10-14.
[11] Chen Z C, Zou B W, Zhu Q M, et al. The scientific literature corpus for chinese negation and uncertainty identification[M]. Chinese Lexical Semantics. Springer Berlin Heidelberg, 2013: 657-667.
[12] 曹媛,朱巧明,李培峰. 中文事件事实性信息语料库的构建方法[J]. 中文信息学报,2013, 27(6): 38-44.
[13] 计峰, 邱锡鹏, 黄萱菁. 中文不确定性句子的识别研究[C]. 全国信息检索学术会议,2010: 594-601.
[14] Zou B W, Zhu Q M, Zhou G D. Negation and Speculation Identification in Chinese Language [C]//Proceedings of the ACL-2015, Beijing, 2015: 656-665.
[15] 陈萍, 蒋跃. 中英医学论文摘要中模糊限制语的对比研究[J]. 外语艺术教育研究, 2009, 3(1): 15-20.
[16] Velldal E, Ovrelid L, Read J, et al. Speculation and negation: rules, rankers, and the role of syntax[J]. Association for Computational Linguistics, 2012, 38(2): 369-410.
[17] Cheng L X, Lin H F, Zhou F, et al. Enhancing the accuracy of knowledge discovery: a supervised learning method [J]. BMC Bioinformatics, 2014, 15(Suppl 12): S9.
[18] Moncecchi G, Minel J, Wonsever D. The Influence of Syntactic Information on Hedge Scope Detection[C]//Proceedings of the 14th Ibero-American Conference on AI.Berlin: Springer, 2014:83-94.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272375)
{{custom_fund}}