本文尝试利用基于变换的方法标注中文句子词汇的句法功能。系统输入已分词并标注了词性的句子, 输出每个词的依存关系。我们首先设计了一个由44种依存关系组成的汉语依存体系, 然后以人-机互助的方式标注了1300句中文句子。其中1100句作为训练文本用来获取标注规则, 余下200句用做测试。设计了17类变换模板, 采用基于变换的算法获取了60条有序的依存关系标注规则。在测试时, 对新词标注以该词词性所对应的最高频的依存关系作为初始标注以提高鲁棒性。实验表明这种方法简单可行, 取得了初步满意的效果。
Abstract
A transformation based method is applied to tag the syntactic function of the words in a Chinese sentence. The system inputs a Chinese sentence with word boundary and part-of-speech information , and outputs the syntactic function for every words in the sentence. To realize this , a Chinese dependency formalism which consists of 44 kinds of dependency relations is firstly designed , and a corpus of 1300 sentences tagged with dependency relations in an efficient man-machine interactive mode is prepared. Among these these sentences , 1100 sentences are used as the training corpus , and the rest 200 sentences are used for test . Totally 60 ordered tagging transformations out of 17 kinds of transformation templates are acquired with the so-called transformation based method. To improve the robustness and the coverage , new words are initially annotated with the dependency relation of the highest frequency corresponding to its part-of-speech. This method is simple and easy to realize , and the experiment shows a preliminary good result .
关键词
基于变换的学习算法 /
汉语 /
句法标注 /
依存关系
{{custom_keyword}} /
Key words
Transformation-based learning algorithm /
Chinese /
Syntactic tagging /
Dependency relation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] S. G. DeRose , Grammtical Category Disambiguation by Statisical Optimization , Computational Linguistics V.14. No.1 1988.
[2] 冯志伟, 特思尼耶尔从属关系语法, 国外语言学, 1983年第一期。
[3] Ming Zhou , Changning Huang , An Efficient Syntactic Tagging Tool for Corpora , Proc. of coling-94.
[4] 周明, 黄昌宁, 面向语料库标注的汉语依存体系的探讨, 中文信息学报, Vol.8. No3 , 1994.
[5] Eric Brill , A Simple Rule-Based Part of Speech Tagger. Proc. ACL'92.
[6] Eri Brill , Transformation-Based Error-Driven Learning and Natural Language Porcessing : A Case Study in Part-Of-Speech Tagging , Computational Lingustics , vol.22 , No.4 , 1995.
[7] 周明, 吴进, 黄昌宁, 基于快速学习算法的英文词性标注的研究, 软件学报, 已投稿。
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}