汉语功能块描述了句子的基本骨架,是联结句法结构和语义描述的重要桥梁。本文提出了两种不同功能块分析模型: 边界识别模型和序列标记模型,并使用不同的机器学习方法进行了计算模拟。通过两种模型分析结果的有机融合,充分利用了两者分析结果的互补性,对汉语句子的主谓宾状四个典型功能块的自动识别性能达到了80%以上。实验结果显示,基于局部词汇语境机器学习算法可以从不同侧面准确识别出大部分功能块,句子中复杂从句和多动词连用结构等是主要的识别难点。
Abstract
Chinese functional chunks are defined as a series of non-overlapping, non-nested skeleton segments of a sentence, representing the implicit grammatical relations between the sentence-level predicates and their arguments. In this paper, we proposed two statistical models for parsing four main functional chunks in a sentence. In the chunk boundary detection model, we focus on building the sub models based on SVM algorithm for detecting SP (subject-predicate) and PO (predicate-object) boundaries. In the sequence labeling model, we formulate the chunking task as a sequence labeling problem and base our model on CRF algorithm. By introducing some revision rules, we build a combined parsing model which integrates the advantages of both statistical models and have achieved the best F-Score of 82.93%, 86.58%, 78.46% and 86.64%for subject, predicate, object and adverb functional chunks respectively. Experimental results show that the complex clauses and serial verb structures are the main recognition difficulties.
关键词
计算机应用 /
中文信息处理 /
汉语功能块 /
边界识别模型 /
序列标记模型 /
模型融合
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
functional chunk /
boundary recognition model /
sequence labeling model /
model merging
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Lance A.Ramshaw and Mitchell P.Marcus. Text Chunking Using Transformation-Based Learning [A]. In: Proceedings of the Third ACL Workshop on Very Large Corpora8 [C]. Cambridge MA, USA: 1995.
[2] Erik F. Tjong Kim Sang and Sabine Buchholz. Introduction to CoNLL-200 Shared Task: Chunking [A]. In: Proceedings of CoNLL-2000 and LLL-2000[C]. Lisbon, Portugal: 2000. 127-132.
[3] Erik F. Tjong Kim Sang and Herv D jean. Introduction to the CoNLL-2001 Shared Task: Clause Identification [A]. In: Proceedings of CoNLL-2001 [C]. Toulouse, France: 2001. 53-57.
[4] Xavier Carreras and Llus Marquez. Introduction to the CoNLL-2004 shared task: Semantic role labeling [A]. In: Proceedings of the Conference on Computational Natural Language Learning (CoNLL)[C]. Boston, MA: May, 2004.
[5] Xavier Carreras and Llu s M arquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling [A]. In: Proceedings of the CoNLL-2005 [C]. 2005.
[6] 周强,任海波,詹卫东.构建大规模汉语语块库 [A]. 黄昌宁,张普主编自然语言理解与机器翻译[C].北京: 清华大学出版社, 2001. 102-107.
[7] Steven Abney. Parsing By Chunks [A]. In: Robert Berwick, Steven Abney and Carol Tenny (eds.), Principle-Based Parsing [C]. Kluwer Academic Publishers, Dordrecht. 1991.
[8] Yingze Zhao, Qiang Zhou A SVM-based Model for Chinese Functional Chunk Parsing [A]. In: Proc. of the Fifth SIGHAN Workshop on Chinese Language Processing[C]. Sydney: 2006. 94-101.
[9] Vladimir N. Vapnik. The Nature of Statistical Learning Theory [M]. Springer, 1995.
[10] John Lafferty, Fernando Pereira, and Andrew McCallum. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [A]. In: International Conference on Machine Learning (ICML’01)[C]. 2001. 282-289.
[11] 赵颖泽. 汉语功能块的自动分析 [D]. 北京: 清华大学,2006.
[12] Xavier Carreras1, Lluis Marquez, et. al. Learning and Inference for Clause Identification [A]. In: Proc. of ECML’02 [C]. 2002.
[13] Sandra Kübler and Erhard W. Hinrichs. From chunks to function-argument structure: A similarity-based approach [A] . In: Proceedings of ACL/EACL 2001 [C]. Toulouse, France: 2001. 338 - 345.
[14] Elliott Franco Dr bek, Qiang Zhou. Experiments in Learning Models for Functional Chunking of Chinese Text [A]. In: Proc. of IEEE International Workshop on Natural Language processing and Knowledge Engineering[C]. Tucson, Arizona, 2001. 859-864.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60573185;60520130299)
{{custom_fund}}