复述技术研究综述

刘挺,李维刚,张宇,李生

PDF(328 KB)
PDF(328 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (4) : 27-34.

复述技术研究综述

  • 刘挺,李维刚,张宇,李生
作者信息 +

A Survey on Paraphrasing Technology

  • LIU Ting,LI Wei-gang,ZHANG Yu,LI Sheng
Author information +
History +

摘要

复述是自然语言中比较普遍的一个现象,它集中反映了语言的多样性。复述研究的对象主要是短语或者句子的同义现象。自然语言处理各种底层技术的不断发展和成熟,为复述研究提高了可能,使之受到越来越多的关注。在英文和日文方面,复述技术已经被成功的应用到信息检索、自动问答、信息抽取、自动文摘以及机器翻译等多个领域,有效地提高了系统的性能。本文主要对复述实例库的构建、复述规则的抽取以及复述的生成等几方面的最新研究进展进行详细的综述,并简要介绍了我们在中文复述方面进行的初步研究工作。在文章的最后一部分,我们对复述技术的难点及未来的发展方向进行了展望,并对全文进行了总结。

Abstract

Paraphrase is a common phenomenon in natural language which captures core aspects of variability in language. The study of paraphrase is about the synonymy phenomena of phrases or sentences. With the development of foundation technology of natural language processing, research on paraphrase has been recently received growing attention. Currently, paraphrasing technology has been applied in many NLP fields, such as, information retrieval, question answering, information extraction, automatic text summarization, machine translation and text watermark, to improve the performance of these systems. This paper will mainly survey several aspects of paraphrasing technology as followed: paraphrases corpus construction, paraphrases rules extraction, paraphrases generation and paraphrase evaluation. And some of ourwork about paraphrase are also introduced in brief. At the last section, some challenges, together with the future directions of paraphrasing technology are indicated.

关键词

人工智能 / 自然语言处理 / 综述 / 句子复述 / 复述语料库 / 复述抽取 / 复述生成

Key words

artifical intelligence / natural language processing / overview / sentence paraphrasing / paraphrases corpus / paraphrases extraction / paraphrases generation

引用本文

导出引用
刘挺,李维刚,张宇,李生. 复述技术研究综述. 中文信息学报. 2006, 20(4): 27-34
LIU Ting,LI Wei-gang,ZHANG Yu,LI Sheng. A Survey on Paraphrasing Technology. Journal of Chinese Information Processing. 2006, 20(4): 27-34

参考文献

[1] 张玉洁,山本和英. 汉语语句自动改写[J]. 中文信息学报, 2003, 17 (6) : 31 - 38.
[2] De Beaugrande, R. Alain, and W. Dressler. Introduction to text linguistics [M]. New York: Longman, 1981.
[3] M. A. K. Halliday. An Introduction to Functional Grammar [M]. London; Baltimore, Md, 1985.
[4] R. Barzilay and K. McKeown. Extracting paraphrases from a parallel corpus [A]. In: ACL/EACL, 2001.
[5] O. Glickman and I. Dagan. Identifying lexical paraphrases from a single corpus: A case study for verbs [A]. In: proceedings of Recent Advantages in Natural Language Processing[A] , September 2003.
[6] C. Boonthum. Istart: Paraphrase recognition [A]. In: the Student Research Workshop: ACL, 2004.
[7] F. Rinaldi, J. Dowdall, et. al. Exploiting paraphrases in a question answering system [A]. In: IWP, 2003.
[8] R. Barzilay and N. Elhadad. Sentence alignment for monolingual comparable corpora [A]. In: EMNLP, 2003.
[9] L. Iordanskaja, R. Kittredge, and A. Polguere. Lexical selection and paraphrase in a meaning-text generation model [M]. In Artificial Intelligence and Computational Linguistics, 1991, pages 293 - 312 .
[10] J. Robin. Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation [D]. PhD thesis, Columbia University, 1994.
[11] D. Lin and P. Pantel. Discovery of inference rules for QA [J]. Natural Language Engineering, 1, 2001.
[12] G. Miller, R. Beckwith, et. al, Introduction to wordnet: An online lexical database [M]. 1993.
[13] D. Stephen, William B. Dolan, and Lucy Vanderwende. Mindnet: Acquiring and structuring semantic information from text [M]. Technical Report TR-98-23, Microsoft Research, 1998.
[14] L. Irene and K. Knight. Generation that exploits corpus-based statistical knowledge [A]. In: ACL, 1998.
[15] R. Barzilay andM. Elhadad. Using lexical chains for text summarization [A]. In: ACL, 1997.
[16] F. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words[A]. In: ACL, 1993.
[17] V. Hatzivassiloglou and K. R. McKeown. Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning [A]. In: ACL 93, pages 172 - 182.
[18] D. Lin. Automatic retrieval and clustering of similar words [A]. In: COLING-ACL, 1998, pages 768 - 774.
[19] S. Kurohashi and Y. Sakai. A new approach to dictionary-based understanding [A]. In: ACL, 1999.
[20] H. Wu, M. Zhou. Optimizing Synonym Extraction Using Mono and Bilingual Resources. [A] In: IWP [C] , 2003.
[21] R Barzilay. Information Fusion for Mutlidocument Summarization: Paraphrasing and Generation [D]. PhD thesis, Columbia University, 2003.
[22] B. Dolan, C. Quirk, et. al, Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources [A]. Coling 2004.
[23] H. Kanayama. Paraphrasing rules for automatic evaluation of translation into Japanese [A]. In: IWP, 2003.
[24] Y. Shinyama, S. Sekine, et. al. Automatic paraphrase acquisition from news articles [A] , In: HLT, 2002.
[25] Y. Shinyama and S. Sekine. Paraphrase acquisition for information extraction [A]. In: IWP, 2003.
[26] W. Li, T. Liu. Combining Sentence Length with Location to Align Mono Parallel Texts [A]. In: AIRS, 2004.
[27] F. France. Learning paraphrases to improve a question-answering system [A]. In: EACL, 2003.
[28] A. Ibrahim, B. Katz, Extracting structural paraphrases from aligned monolingual corpora [A]. In: IWP, 2003.
[29] T. Poibeau. Automatic extraction of paraphrastic phrases from medium-size corpora [A]. In: Coling 2004.
[30] T. Takahashi, Kozo Nawata, et. al. Effects of structural matching and paraphrasing in question answering [J]. IEICE Transactions on Information and System, 2003.
[31] N. Tomuro. Interrogative reformulation patterns and acquisition of question paraphrases [A]. In: IWP, 2003.
[32] W. Gale and K. Ward Church. A program for aligning sentences in bilingual corpora [A]. In: ACL, 1991.
[33] C. Brockett and B. Dolan. SVM for Paraphrases Identification and Corpus Construction [A]. In: IWP, 2005.
[34] G. Hirst. Paraphrasing Paraphrased [A]. In: IWP2003, 2003.
[35] D. Lin and P. Pantel. DIRT-Discovery of inference rules from text [A]. In: ACM SIGKDD, 2001.
[36] Hua Wu, Ming Zhou. Synonymous Collocation Extraction Using Translation Information [A]. In : ACL [A] , 2003.
[37] Bo Pang, Kevin Knight, and Daniel Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences [A]. In: HLT/NAACL, 2003.
[38] R. Barzilay and L. Lee. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment [A]. In: proceedings of HLT-NAACL 2003, pages 16 - 23.
[39] P. Brown, S. Della Pietra, et. al. The mathematics of statistical machine translation: Parameter estimation [J]. Computational Linguistics, 1993.
[40] Y. Lepgae and E. Denoual. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation [A]. In: IWP, 2005.
[41] S. Wan, M. Dras. Preliminary evaluations of grammaticality [A]. In: IWP, 2005.
[42] C. Quirk, C. Brockett. Monolingual machine translation for paraphrase generation [A]. In: EMNLP 2004.

基金

国家自然科学基金资助项目(60435020;60503072;60575042)
PDF(328 KB)

959

Accesses

0

Citation

Detail

段落导航
相关文章

/