动词次范畴是根据句法行为对动词的进一步划分,它是由核心动词和一系列论元组成。其相关研究在英汉等多种语言方面都取得了较好的成果,但跨语言之间的研究还很少。该文提出了一种基于主动学习策略的英汉动词次范畴论元对应关系自动获取方法,这种方法可以在双语平行语料上,几乎不需要任何先验的语言学知识的情况下,自动获取英汉论元的对应关系。然后我们将这些对应关系加入了统计机器翻译系统。实验结果表明,融合了英汉动词次范畴论元对应关系的SMT系统在性能上有明显的提升,证明了自动抽取的对应关系的有效性,也为SMT提供了新的研究方向。
Abstract
The verb subcategorization (SCF) is a more brief classification based on syntactic behaviors of verb and it is composed by a verb and several arguments. Recently it has attracted substantial researches for a single language, e.g. English and Chinese, whereas the cross-lingual subcategorization demands more systematic efforts. We present a novel method to obtain SCF argument crrespondence between Chinese and English based on active learning. This method can find the new relations through bilingual parallel sentence pairs almost without any priori language knowledge. We also integrated these relations to the statistical machine translation (SMT) system and experiment results show that the performance of SMT combined bilingual argument relationships has significant improvement, which indicates the validity of argument corresponding relationships automatically obtained.
Key wordsartificial intelligence; machine translation; verb subcategorization; cross-lingual argument crrespondence; automatic acquisition; statistical machine translation
关键词
人工智能 /
机器翻译 /
动词次范畴化 /
跨语言论元对应关系 /
自动获取 /
统计机器翻译
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
verb subcategorization /
cross-lingual argument crrespondence /
automatic acquisition /
statistical machine translation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Korhonen A. Subcategorization acquisition [D]. Trinity Hall University of Cambridge, 2001.
[2] Han Xi-wu,Zhao Tie-jun, Qi Hao-liang, et al. Subcategorization acquisition and evaluation for Chinese Verbs [C]//proceedings of the COLING 2004, Switzerland, 2004:723-728.
[3] Her O S. Grammatical functions and verb Subcategorization in mandarin Chinese[D]. University of Hawaii,1990.
[4] Brent M. From grammar to lexicon: unsupervised learning of lexical syntax [J]. Computational Linguistics, 1993,19(3):243-262.
[5] Sabine Shulte im Walde. Inducing German semantic verb classes from purely syntactic Subcategorization information [C]//Proceedings of the 40th Annual Meeting of the Association for Computational linguistics, USA,2002:223-230.
[6] 韩习武. 汉语动词次范畴化自动获取技术的研究 [D]. 哈尔滨工业大学,2005.
[7] 黄瑞红,孙乐,冯元勇,黄云平. 基于核方法的中文实体关系抽取研究 [J]. 中文信息学报, 2008, 22 (5): 102-108.
[8] Conghui Zhu, Tiejun Zhao and Xiwu Han. Chinese Verb Subcategorization Acquisition from Noisy Data on Sentence Level [C]//Proceedings of 2009 World Congress on Computer Science and Information Engineering, USA, 2009:239-244.
[9] 方李成,宗成庆. 基于层次短语的统计翻译系统中规则冗余度的高效约束方法 [C]//第四届全国学生计算语言学会议论文集, 太原,2008:303-309.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目 (60773069,60973169)
{{custom_fund}}