文摘句排序是多文档自动文摘中的一个关键技术,直接影响到文摘的流畅程度和可读性。文本时间信息处理是影响排序算法质量的瓶颈技术,由于无法获得准确的时间信息,传统的句子排序策略均回避了这一问题,而且均无法获得稳定的高质量的排序效果。对此该文从文本时间信息处理入手,首先提出了中文文本时间信息抽取、语义计算以及时序推理算法,并在此算法基础上,借鉴传统的主成分排列的思想和句子相关度计算方法,提出了基于时间信息的句子排序算法。实验表明该算法的质量要明显好于传统的主成分排列算法和时序排列算法。
Abstract
Sentences ordering is a key issue in the multi-documents automatic summarization, which influences the fluency and readability of the summarization. Among them, temporal information processing is the bottleneck technology which affects the quality of the ordering algorithm. Traditional ordering methods ignore this factor because the temporal information processing is very difficult, and, as a result, they could not achieve steady and high-quality ordering effects. To address this issue, this paper proposes an algorithm of Chinese text temporal information extraction, semantics computation and temporal reasoning. Then, based on the strategy of the majority ordering and the computation of sentences similarity, we propose sentences ordering algorithm based on the temporal information. The experiments show that the quality of this algorithm outperforms the calssical majority ordering algorithm and the chronological ordering algorithm.
Key words computer application; Chinese information processing; multi-documents automatic summarization; sentences ordering; Chinese temporal information processing
关键词
计算机应用 /
中文信息处理 /
多文档自动文摘 /
句子排序 /
中文时间信息处理
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
multi-documents automatic summarization /
sentences ordering /
Chinese temporal information processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Barzilay, R. et al. Inferring strategies for sentence ordering in multidocument news summarization [J]. Journal of Artificial Intelligence Research, 2002, 17: 35-55.
[2] REGINA B, ELHADAD N, MCKEOWN K R. Sentence ordering in multidocument summarization [C]//Proceedings of the 1st Human Language Technology Conference. San Diego, California, 2001: 32-38.
[3] 姚超, 李生, 张姝, 郑德权. 基于内聚度的多文档文摘句子排序[C]//中文信息处理前沿进展--中国中文信息学会二十五周年学术会议论文集, 2006: 345-351.
[4] Mani, I., & Wilson, G. Robust temporal processing of news [C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 2000: 69-76.
[5] Filatova, E., & Hovy, E. Assigning time-stamps to event-clauses[C]//Proceedings of the AACL/EACL Workshop on Temporal and Spatial Information Processing. Toulose, France, 2001: 88-95.
[6] Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization [C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, Sydney, Australia, 2006: 385-392.
[7] McKeown, Kathleen, Judith Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, and Eleazar Eskin. Towards Multidocument Summarization by Reformulation: Progress and Prospects [C]//Sixteenth National Conference on Artificial Intelligence. Orlando, Florida, 1999: 453-460.
[8] Mirella Lapata, Automatic Evaluation of Information Ordering: Kendall’s Tau [J]. Computational Linguistics, 2006, 32(4): 471-484.
[9] Naoaki Okazaki Yutaka Matsuo Mitsuru Ishizuka Improving chronological sentence ordering by precedence relation [C]//Proceedings of the 20th international conference on Computational Linguistics. Geneva, Swiss, 2004: 750-756.
[10] Allen J. F., Ferguson G. Actions and Events in Interval Temporal Logic [J]. The Journal of Logic and Computation, 1994, 4(5): 531-579.
[11] Shoham Y. Reasoning about Change: Time and Causation from the Standpoint of Artificial Intelligence [J/OL]. MIT Press, 1988.
[12] 1 Sandewall E. Filter preferential entailment for the logic of action in almost continuous worlds [C]//Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-89), 1989: 894-899.
[13] Christian Freksa. Temporal Reasoning Based on Semi-Intervals [J]. Artificial Intelligence, 1992, Vol. 54(1-2): 199- 227.
[14] Kamp, H. and U. Reyle. From Discourse to Logic: Introduction to Model theoretic Semantics of Natural Language [J]. Computational Linguistics, 1993, 21(2): 265-268.
[15] 吴平博, 陈群秀, 马亮. 基于时空分析的线索性事件的抽取与集成系统研究[J]. 中文信息学报,2006, 20 (1): 21-28.
[16] 贺瑞芳, 秦兵, 刘挺, 潘越群, 李生. 基于依存分析和错误驱动的中文时间表达式识别[J]. 中文信息学报. 2007, 21(5): 36-40.
[17] 王昀, 苑春法. 基于转换的时间—事件关系映射[J]. 中文信息学报,2004, 18(4): 23-30.
[18] 陈振宇, 陈振宁. 怎样计算现代汉语句子的时间信息[J]. 中文信息学报,2005, 19(3): 94-104.
[19] 马红妹,等. 汉语篇章时间短语的分析与时制验算[J]. 计算机研究与发展. 2002, 39(10): 1211-1220.
[20] 麻志毅, 林鸿飞, 姚天顺,等. 基于情境的文本中的时间信息分析[J]. 东北大学学报, 1999, 21(3): 239-242.
[21] 刘婷, 林闯, 刘卫东. 扩展时段时序逻辑的推理机制[J]. 计算机学报. 2002, 25(6): 637-644.
[22] 徐永东,徐志明,王晓龙. 中文文本时间信息抽取及语义计算[J]. 哈尔滨工业大学学报, 2007, 39(3): 438-442.
[23] J. Allen. Maintaining knowledge about temporal intervals [J]. Communications of the ACM, 1983, 26(11): 832-843.
[24] 徐永东,徐志明,王晓龙.基于累积Logistic回归分析的文本段落聚类策略研究[J].高技术通讯,2006,16(8): 789-794.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60803092);哈尔滨工业大学科技创新基金资助(IMQQ29080001)
{{custom_fund}}