动词子语类框架(Subcategorization Frame以下简称SCF)在句法分析、语义角色标注等方面的研究中具有不可或缺的重要作用。在子语类框架信息的获取过程中,首先要建立标准完备的子语类框架类型集。目前英语研究已经建立了获得普遍共识的子语类框架类型集。而汉语方面还没有标准的动词子类框架类型集。本文提出一种语言学知识与统计方法相结合的汉语动词子语类框架类型集的半自动获取方案。初步建立起既符合统计结果又基本符合语言学理论的汉语动词子语类框架类型集。实验证明,加入语言学理论的子语类框架类型集降低了对语料的依赖程度,比完全由分析语料产生的类型集更完备。
Abstract
Subcategorization of verbs is an essential issue and plays an important role in syntactic parsing, semantic roles labeling and etc. A sufficient subcategorization frame type set is critical for subcategorization acquisition. By now, a set of subcategorization frame types has come to an agreement in English, while no standard subcategorization frame type set for Chinese verbs has been achieved. In this paper we apply a semi-supervise method for subcategorization frame type acquisition with linguistic theory and statistical algorithm. Firstly we create a set of seeds of subcategorization patterns according to linguistics theory. And then a semi-supervise machine learning method is applied to analyze the corpus for extending the seeds. Contrasted with a corpus based subcategorization frame type acquisition mehtod, our method gains better precision and coverage.
关键词
计算机应用 /
中文信息处理 /
动词子语类框架 /
类型集 /
语言学与统计方法结合
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
verb’s subcategorization frame /
lexicon /
the hybrid of linguistic theory and statistical algorithm
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Briscoe, E. and J. Carroll. Automatic extraction of subcategorization from corpora [A]. In: Proceedings of the 5th ACL Conference on Applied Natural Language Processing [C]. Washington, DC: 1997. 356-363.
[2] Collins, M. J. Three generative, lexicalised models for statistical parsing [A]. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97) [C]. 1997. 16-23.
[3] Carroll, J., G. Minnen and E. Briscoe. Can subcategorisation probabilities help a statistical parser?[A]. In: Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora[C]. Montreal, Canada: 1998. 118-126.
[4] Michael Brent. Automatic acquisition of subcategorization frames from untagged text [A]. In: Proceedings of the 29th Meeting of the ACL [C]. Berkeley, CA: 1991. 209-214.
[5] Walde SS, Brew C. Inducing German semantic verb classes from purely syntactic subcategorization information[A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics [C]. 2002. 223-230.
[6] Sarkar A, Zeman D. Automatic Extraction of Subcategorization Frames for Czech [A].In: Proceedings of the 19th International Conference on Computational Linguistics [C]. Aarbrucken, Germany: 2000.
[7] Eva Esteve Ferrer. Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information [A]. In: Proceedings of the ACL 2004 Workshop on Text Summarization[C]. Spain: 2004.
[8] Anna Korhonen. Subcategorization Acquisition[D]. PhD thesis published as Technical Report UCAM-CL-TR-530. Computer Laboratory, University of Cambridge. 2002.
[9] Levin, Beth. English Verb Classes and Alternation[M]. Chicago University Press, Chicago: 1993.
[10] 袁毓林. 汉语动词的配价研究[M]. 江西教育出版社. 1998.
[11] 郑定欧. 现代汉语配价语法研究[M]. 北京: 北京大学出版社. 1995.
[12] 韩习武, 赵铁军. 基于子类的汉语动词SCF获取回退[J]. 计算机工程与应用, 2005.
[13] Fillmore, C. Topics in Lexical Semantics [J]. Current Issues in Linguistic Theory, 1977, 76-138.
[14] 袁毓林. 论元角色的层级关系和语义特征[M]. 世界汉语教学. 2002.
[15] 袁毓林.一套汉语动词的论元角色的语法指标[M]. 世界汉语教学. 2003.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60503071)
{{custom_fund}}