基于双语语料库的短语复述实例获取研究

李维刚,刘挺,李生

PDF(276 KB)
PDF(276 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (5) : 112-117.
综述

基于双语语料库的短语复述实例获取研究

  • 李维刚,刘挺,李生
作者信息 +

Phrasal Paraphrase Acquisition Based on Bilingual Corpus

  • LI Wei-gang, LIU Ting, LI Sheng
Author information +
History +

摘要

本文提出一种基于双语语料库的短语复述实例获取方法,尤其能够很好的抽取歧义短语的复述实例。该方法通过输入一个双语短语对约束短语的语义,利用词对齐的双语语料库,构造一个双向抽取模型从中抽取双语对的复述实例。双向抽取模型通过比较每一个候选复述短语和输入短语之间的语义一致性,来确定每个候选是否成为最终的复述实例。实验结果表明,本文短语复述实例获取方法的综合准确率达到了 60% ,获取了较好的性能。

Abstract

In this paper a novel method based on bilingual corpus is proposed to extract phrasal paraphrase examples. We focus on extract paraphrases of ambiguous phrases. A bilingual pair is the original input. Then all candidate paraphrases are extracted from word aligned bilingual corpus. The bi-direction model is designed to acquire confident paraphrases according to the coherence between the candidate phrases and the input phrases. The experimental results show that the synthesis precision is about 60%.

关键词

计算机应用 / 中文信息处理 / 复述实例 / 复述获取 / 短语复述 / 双语语料库

Key words

computer application / Chinese information processing / paraphrase example, paraphrase acquisition, phrasal paraphrase, bilingual Ccrpus

引用本文

导出引用
李维刚,刘挺,李生. 基于双语语料库的短语复述实例获取研究. 中文信息学报. 2007, 21(5): 112-117
LI Wei-gang, LIU Ting, LI Sheng. Phrasal Paraphrase Acquisition Based on Bilingual Corpus. Journal of Chinese Information Processing. 2007, 21(5): 112-117

参考文献

[1] Chris Callison-Burch, Philipp Koehn, and Miles Osborne. Improved statistical machine translation using paraphrases [A]. In: Human Language Technology Conference[C]. 2006.
[2] Liang Zhou, Chin-Yew Lin, and Eduard Hovy. Re-evaluating machine translation results with paraphrase support [A]. In: EMNLP conference[C]. 2006.
[3] D Kauchak and R Barzilay. Paraphrasing for automatic evaluation [A]. In: HLT-NAACL[C]. 2006.
[4] Regina Barzilay. Information Fusion for Multi-document Summarization: Paraphrasing and Generation [D]. PhD thesis, Columbia University, 2003.
[5] I. Ali, K. Boris. Extracting structural paraphrases from aligned monolingual corpora [A]. In: IWP[C]. 2003.
[6] Satoshi Sekine. On-demand information extraction [A]. In: BANNARD G/ACL[C]. 2006. 731-738.
[7] R. Barzilay and K. McKeown. Extracting paraphrases from a parallel corpus [A]. In: ACL[C]. 2001.
[8] K. Ohtake and K. Yamamoto. Applicability analysis of corpus-derived paraphrases toward example based paraphrasing [A]. Language, Information and Computation Proceedings[C]. 2003.
[9] B. Pang, K. Knight, and D. Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences [A]. In: HLT/NAACL[C]. 2003.
[10] Dekang Lin and Patrick Pantel. Dirt-discovery of inference rules from text [A]. ACM SIGKDD[C]. 2001.
[11] H. Wu and M. Zhou. Synonymous collocation extraction using translation information [A]. ACL[C]. 2003.
[12] Mona Diab and Philip Resnik. An unsupervised method for word sense tagging using parallel corpora [A]. ACL[C]. 2002.
[13] Bannard C. and Callison-Burch C. Paraphrasing with bilingual parallel corpora [A]. ACL2005[C]. 597-604.
[14] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment methods [J]. Computational Linguistics, 2003, 29:19-51.

基金

国家自然科学基金(60503072, 60575042,60435020)
PDF(276 KB)

532

Accesses

0

Citation

Detail

段落导航
相关文章

/