冀铁亮,孙薇薇,穗志方. 语言学与统计方法结合建立汉语动词SCF类型集[J]. 中文信息学报, 2007, 21(5): 118-125.
JI Tie-liang, SUN Wei-wei, SUI Zhi-fang. The Acquisition of Chinese Verb’s Subcategorization Frame Types Based on Linguistic Theory and Statistical Algorithm. , 2007, 21(5): 118-125.
语言学与统计方法结合建立汉语动词SCF类型集
冀铁亮,孙薇薇,穗志方
北京大学 计算语言学研究所,北京 100871
The Acquisition of Chinese Verb’s Subcategorization Frame Types Based on Linguistic Theory and Statistical Algorithm
JI Tie-liang, SUN Wei-wei, SUI Zhi-fang
Institute of Computational Linguistics, Peking University, Beijing 100871, China
Abstract:Subcategorization of verbs is an essential issue and plays an important role in syntactic parsing, semantic roles labeling and etc. A sufficient subcategorization frame type set is critical for subcategorization acquisition. By now, a set of subcategorization frame types has come to an agreement in English, while no standard subcategorization frame type set for Chinese verbs has been achieved. In this paper we apply a semi-supervise method for subcategorization frame type acquisition with linguistic theory and statistical algorithm. Firstly we create a set of seeds of subcategorization patterns according to linguistics theory. And then a semi-supervise machine learning method is applied to analyze the corpus for extending the seeds. Contrasted with a corpus based subcategorization frame type acquisition mehtod, our method gains better precision and coverage.
[1] Briscoe, E. and J. Carroll. Automatic extraction of subcategorization from corpora [A]. In: Proceedings of the 5th ACL Conference on Applied Natural Language Processing [C]. Washington, DC: 1997. 356-363. [2] Collins, M. J. Three generative, lexicalised models for statistical parsing [A]. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97) [C]. 1997. 16-23. [3] Carroll, J., G. Minnen and E. Briscoe. Can subcategorisation probabilities help a statistical parser?[A]. In: Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora[C]. Montreal, Canada: 1998. 118-126. [4] Michael Brent. Automatic acquisition of subcategorization frames from untagged text [A]. In: Proceedings of the 29th Meeting of the ACL [C]. Berkeley, CA: 1991. 209-214. [5] Walde SS, Brew C. Inducing German semantic verb classes from purely syntactic subcategorization information[A]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics [C]. 2002. 223-230. [6] Sarkar A, Zeman D. Automatic Extraction of Subcategorization Frames for Czech [A].In: Proceedings of the 19th International Conference on Computational Linguistics [C]. Aarbrucken, Germany: 2000. [7] Eva Esteve Ferrer. Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information [A]. In: Proceedings of the ACL 2004 Workshop on Text Summarization[C]. Spain: 2004. [8] Anna Korhonen. Subcategorization Acquisition[D]. PhD thesis published as Technical Report UCAM-CL-TR-530. Computer Laboratory, University of Cambridge. 2002. [9] Levin, Beth. English Verb Classes and Alternation[M]. Chicago University Press, Chicago: 1993. [10] 袁毓林. 汉语动词的配价研究[M]. 江西教育出版社. 1998. [11] 郑定欧. 现代汉语配价语法研究[M]. 北京: 北京大学出版社. 1995. [12] 韩习武, 赵铁军. 基于子类的汉语动词SCF获取回退[J]. 计算机工程与应用, 2005. [13] Fillmore, C. Topics in Lexical Semantics [J]. Current Issues in Linguistic Theory, 1977, 76-138. [14] 袁毓林. 论元角色的层级关系和语义特征[M]. 世界汉语教学. 2002. [15] 袁毓林.一套汉语动词的论元角色的语法指标[M]. 世界汉语教学. 2003.