汉语基本块描述体系

周强,

PDF(174 KB)
PDF(174 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (3) : 21-27.
综述

汉语基本块描述体系

  • 周强1,2
作者信息 +

Base Chunk Scheme for the Chinese Language

Author information +
History +

摘要

块分析是自然语言处理研究中的重要技术,其处理基础是设计一套合理有效的块描述体系。本文在吸收和总结前人研究成果和经验的基础上,提出了一套基于拓扑结构的汉语基本块描述体系。它通过引入词汇关联信息确定基本拓扑结构,形成了很好的基本块内聚性判定准则,建立了句法形式与语义内容的有机联系桥梁。这套描述体系大大简化了从现有的句法树库TCT中自动提取基本块标注语料库和相关词汇关联知识库的处理过程,为进一步进行汉语基本块自动分析和词汇关联知识获取互动进化研究打下了很好的基础。

Abstract

Chunk parsing is an important technique in the natural language processing research community, whose processing basis lies in a suitable and efficient chunk scheme. In this paper, we proposed a new topology-based base chunk scheme for the Chinese language. After introducing the lexical cohesion relationships to determinate three basic topological structures, we formed a better set of principles to analyze the content cohesion of a base chunk and built an efficient bridge to link its syntactic form and semantic meaning. Based on the chunk scheme, we can greatly simplify the processing procedure to automatically extract useful base chunk annotated corpora and corresponding lexical cohesion knowledge from a large scale Chinese syntactically annotated corpus TCT. All these research work will lay good foundations for the further explorations to develop Chinese base chunk parser and lexical cohesion knowledge acquisition tools.

关键词

计算机应用 / 中文信息处理 / 基本块 / 部分分析 / 语料库标注 / 词汇知识获取

Key words

computer application / Chinese information processing / base chunk / partial parsing / corpus annotation / lexical knowledge acquisition

引用本文

导出引用
周强,. 汉语基本块描述体系. 中文信息学报. 2007, 21(3): 21-27
Base Chunk Scheme for the Chinese Language. Journal of Chinese Information Processing. 2007, 21(3): 21-27

参考文献

[1] Steven Abney. Parsing by Chunks [A]. In: Robert Berwick, Steven Abney and Carol Tenny (eds.) Principle-Based Parsing [C]. Kluwer Academic Publishers, 1991.
[2] Erik F. Tjong Kim Sang and Sabine Buchholz. Introduction to CoNLL-2000 Shared Task: Chunking [A]. In: Proceedings of CoNLL-2000 and LLL-2000 [C]. Lisbon, Portugal, 127-132.
[3] Sang T K and D jean H. Introduction to the CoNLL-2001 Shared Task: Clause Identification [A]. In: Proc. of CoNLL-2001 [C]. Toulouse, France, 53-57.
[4] Carreras X. and Marquez, L. Introduction to the conll-2005 shared tasks: Semantic role labeling [A]. In: Proc. of CoNLL-2005 [C].
[5] Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. Multiword Expressions: A Pain in the Neck for NLP [A]. In: Proc. Third International Conference of Computational Linguistics and Intelligent Text Processing (CICLing 2002) [C]. Mexico City, Mexico, February 2002. 17-23.
[6] 徐通锵. 语言论[M],东北师范大学出版社, 1997.
[7] 董振东. 语义关系的表达和知识系统的建造[J], 语言文字应用,1998,(3): 76-82.
[8] 董振东,董强. 关于知网-中文信息结构库[A],http://www.keenage.com/, 2000.
[9] 汉语基本短语标注规范[R].清华大学计算机系智能技术与系统国家重点实验室,技术资料,2002年2月.
[10] 张昱琪,周强. 汉语基本短语的自动识别[J].中文信息学报,2002,16(6): 1-8.
[11] 周强. 汉语句法树库标注体系[J].中文信息学报,2004,18(4): 1-8.
[12] Tiejun Zhao, Muyun Yang et al. Statistics Based Hybrid Approach to Chinese Base Phrase Identification [A]. In: Proc. of the Second Chinese Language Processing [C]. ACL 2000, Hong Kong.
[13] Li, H., C. N. Huang, J. Gao, and X. Fan. Chinese Chunking with Another Type of Spec [A]. In: Proceedings of the 3rd ACL SIGHAN Workshop [C]. Barcelona, Spain, 2004. 41-48.
[14] 孙宏林. 现代汉语非受限文本的实语块分析[D]. 北京大学计算机系博士学位论文,2001.5.
PDF(174 KB)

661

Accesses

0

Citation

Detail

段落导航
相关文章

/