该文针对中文阅读理解问答中的时间、人物、地点、数值、实体、描述六类问题,制定了各类问题回答的启发式规则集。对规则集中每条规则赋予一个相应权值,利用正交表对各规则所对应的权值进行了调优选取,给出了各候选答案句基于相应规则的得分计算方法。该文方法在山西大学自主开发的中文阅读理解语料库CRCC v1.1 上进行了实验,在整个语料库上得到了83.09%的HumSent准确率。为了与文献[10]中的最大熵方法比较,该文在与文献[10]中完全相同的训练集上调优规则的权值,在相同的测试集上测试,最终得到HumSent准确率81.13%,比最大熵的方法高大约1%, 且在全部的六类问题上,该文方法的HumSent准确率都不低于最大熵方法。
Abstract
This paper constructs a set of heuristic rules for six types of question regarding to time,human, location, number, entity and description in Chinese QARC system. Each rule is further assigned with a weight optimized by the orthogonal array. Then the calculation of each candidate answer sentence is described over corresponding rules. The experiment on the CRCC v1.1 (Chinese reading comprehension corpus) built by Shanxi University produces 83.09% HumSent accuracy. Compare with the results of ME-based method, the proposed approach achieves 81.13% HumSent accuracy, which is about 1% higher than the ME-based results on the same training and testing environment.
Key wordscomputer application; Chinese information processing; reading comprehension; question answering; heuristic rules; orthogonal array
关键词
计算机应用 /
中文信息处理 /
阅读理解 /
问答系统 /
规则 /
正交表
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
reading comprehension /
question answering /
heuristic rules /
orthogonal array
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] E. Charniak. Toward a Model of Children’s Story Comprehension[D]. Massachusetts Institute of Technology, 1972.
[2] Lynette Hirschman, Marc Light, Eric Breck, and John D. Burger.Deep Read: a reading comprehension system[C]//Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, Maryland, 1999:325-332.
[3] Ellen Riloff, Michael P. Thelen. A Rule-based Question Answering System for Reading Comprehension Tests [C]//ANLP/NAACL-2000 Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems. Seattle, Washington. 2000: 13-19.
[4] Eugene Charniak, Yasemin Altun, Rodrigo de Salvo Braz, et al. reading comprehension programs in a statistical-language-processing class[C]//Proceedings of the ANLP/NAACL 2000 Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, Seattle, Washington 2000:1-5.
[5] Hwee Tou Ng, Leong Hwee Teo, Jennifer Lai, Pheng Kwan. a machine learning approach to answering questions for Reading Comprehension Tests [C]//Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000.
[6] 杜永萍. 基于模式知识库的问题回答关键技术研究[D]. 博士论文. 复旦大学. 2005.
[7] Kui Xu, Helen Meng and Fuliang Weng. A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension[C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 2006: 185-188.
[8] 王凯华,李济洪,张国华,王瑞波. 基于最大熵模型的中文阅读理解问答系统技术研究[C]//内容计算的研究与应用前沿(CNCCL-2007). 北京:清华大学出版社, 2007:643-648.
[9] 张娜,李济洪. 基于语义标注的中文阅读理解语料库的建设[C]//内容计算的研究与应用前沿(CNCCL-2007). 北京:清华大学出版社, 2007:338-343.
[10] 李济洪,王瑞波,王凯华,李国臣. 基于最大熵模型的中文阅读理解问题回答技术[J]. 中文信息学报, 2008(6):55-62.
[11] 中国现场统计研究会三次设计组可计算性项目的三次设计[M].北京: 北京大学出版社,1985.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60873128);国家社会科学基金青年资助项目(07CYY022)
{{custom_fund}}