汉语组块分析研究综述

李业刚1,2,黄河燕1

PDF(960 KB)
PDF(960 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 1-9.
综述

汉语组块分析研究综述

  • 李业刚1,2,黄河燕1
作者信息 +

A Survey on Chinese Chunk Parsing

  • LI Yegang1,2, HUANG Heyan1
Author information +
History +

摘要

组块分析作为浅层句法分析的代表,既可以满足很多语言信息处理系统对于句法功能的需求,又可以作为子任务,在词法分析和完全句法分析以及语义分析中间架起一座桥梁,为句子进行进一步深入分析提供有力的支持,因此众多的研究将注意力集中于组块分析上。该文主要对组块的定义和分类、组块识别方法、组块的标注和评测以及组块内部关系分析等几方面的研究进展进行详细的综述。最后,探讨了组块分析存在的问题并对未来的发展方向进行了展望。

Abstract

Chunking, as a typical shallow parsing, serves for many language information processing system for their demands on syntactic information, as well as a bridge between the lexical analysis, syntactic parsing and semantic parsing. This paper surveys the rich researches on chunking in several aspectsthe definition and classification of chunks, the chunks identification, the chunks annotation and evaluation, and the internal relationship in chunks. Finally, this paper draws conclusions and discusses the future work.
Key wordsChinese information processing; shallow parsing;chunk parsing; chunk identification

关键词

中文信息处理 / 浅层句法分析 / 组块分析 / 组块识别

Key words

Chinese information processing / shallow parsing / chunk parsing / chunk identification
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
李业刚1,2,黄河燕1. 汉语组块分析研究综述. 中文信息学报. 2013, 27(3): 1-9
LI Yegang1,2, HUANG Heyan1. A Survey on Chinese Chunk Parsing. Journal of Chinese Information Processing. 2013, 27(3): 1-9

参考文献

[1] Abney S. Parsing by Chunks[C]//Berwiek R,Abney S, Carol T, eds. Principle-Based Parsing. Dordrecht: Kluwer Academic Publishers,1991: 257-278.
[2] Erik F, Tjong Kim Sang, Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking[C]//Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000: 127-132.
[3] 孙宏林,俞士汶.浅层句法分析方法概述[J].当代语言学,2000,2(2): 74-83.
[4] 赵军,黄昌宁.结合句法组成模板识别汉语基本名词短语的概率模型[J].计算机研究与发展,1999,36(11): 1384-1390.
[5] 赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2): 1-7,39.
[6] 赵军,黄昌宁.汉语基本名词短语结构分析模型[J].计算机学报,1999,22(2): 141-146.
[7] 周雅倩,郭以昆,黄萱菁,等.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3): 440-446.
[8] 王立霞,孙宏林.现代汉语介词短语边界识别研究[J].中文信息学报,2005,19(3): 80-86.
[9] Y Tan, T Yao, Q Chen, et al. Applying Conditional Random Fields to Chinese Shallow Parsing[C]//David: Computational Linguistics and Intelligent Text Processing 6th International Conference, Mexico City,Mexico,2005: 527-536.
[10] 李素建,刘群,白硕.统计和规则相结合的汉语组块分析[J].计算机研究与发展,2002,39(4): 385-391.
[11] 张昱琪,周强.汉语基本短语的自动识别[J].中文信息学报,2002,16(6): 1-8.
[12] 周强,孙茂松,黄昌宁.汉语句子的组块分析体系[J].计算机学报,1999,22(11): 1158-1165.
[13] 周强.汉语基本块描述体系[J]. 中文信息学报,2007,21(3): 21-27.
[14] 孙广路.基于统计学习的中文组块分析技术研究[D].哈尔滨:哈尔滨工业大学,2008.
[15] 李沐,吕学强,姚天顺.一种基于E-Chunk的机器翻译模型[J].软件学报,2002,13(4): 669-676.
[16] Zhou M.A block-based robust dependency parser for unrestricted Chinese text[C]//Cardie C,Daelemans Nedelle C, Tjong Kim Sang E F: Proceedings of the 2nd Chinese Language Processing Workshop Attached to ACL. HongKong: Association for Computational Linguistics,2000: 78-84.
[17] Chen WL, Zhang YJ, Hitoshi I. An empirical study of Chinese chunking[C]//Morristown, Proc. of the COLING/ACL 2006 Main Conf. Poster Sessions. Sydney, Australia: Association for Computational Linguistics, 2006: 97-104.
[18] 谭咏梅,王小捷,周延泉,等.使用SVMs进行汉语浅层分析[J].北京邮电大学学报,2008,31(1).
[19] 刘芳,赵铁军,于浩,等.基于统计的汉语组块分析[J].中文信息学报,2000,14(6): 28-33.
[20] Z Tiejun, Y Muyun, L Fang, et al. Statistics Based Hybrid Approach to Chinese Base Phrase Identification[C]//Cardie C, Daelemans Nedelle C, Tjong Kim Sang E F: Proceeding CLPW ’00 Proceedings of the 2 Workshop on Chinese Language Processing. Hong Kong: Association for Computational Linguistics, 2000: 73-77.
[21] 李葕,朱靖波,姚天顺.基于Stacking算法的组合分类器及其应用于中文组块分析[J].计算机研究与发展,2005,42(5): 844-848.
[22] H Gao, DG Huang,YS Yang. Chinese Chunking Using ESVM-KNN[C]//YM Cheng,YP Wang,HL Liu: Proceedings of the 2006 International Conference on Computational Intelligence and Security,Guangzhou: IEEE,2006: 721-734.
[23] Li H, C N Huang, J Gao, et al. Chinese Chunking with Another Type of Spec[C]//Oliver Streiter, Qin Lu: Proceedings of the 3rd ACL SIGHAN Workshop.Barcelona,Spain: Association for Computational Linguistics,2004: 41-48.
[24] B Li, Q Lu, Y Li. Building a Chinese Shallow Parsed Treebank for Collocation Extraction[C]//Proceedings of 4th International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City,Mexico, 2003: 402-405.
[25] S H Wu, C W Shih, C W Wu, et al. Applying Maximum Entropy to Robust Chinese Shallow Parsing[C]//Proceedings of ROCLING-2005, Taiwan,China,2005: 23-30.
[26] 周强,李玉梅.汉语块分析评测任务设计[J].中文信息学报,2010,24(1): 123-128.
[27] Ramshaw L A,M M P. Text chunking using transformation-based learning[C]//Yarowsky D, Church K,eds. Proceedings of the 3rd ACL Workshop on Very Large Corpora. Massachusetts: Association for Computational Linguistics,1995: 82-94.
[28] Tjong Kim Sang E F,Veenstra J. Representing text chunks[C]//Osborne M,Tjong Kim Sang E F,eds, Proceedings of EACL’99.Bergen: Assoeiation for Computational Linguistics,1995: 173-179.
[29] Erik F, Tjong Kim Sang, Sabine Buchholz. Introduction to CoNLL-2000 Shared Task: Chunking[C]//Proceedings of CoNLL-2000.Lisbon,Portugal,2000: 127-132.
[30] K Uehimoto,Q Ma, M Murata,et al. Named entity extraction based on a maximum entropy model and transformation rules[C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics,2000: 326-335.
[31] K Church. A Stochastic Parts Program and Noun Phrases Parser for Unrestricted Text[C]//Proceedings of the 2nd Conference on Applied Natural Language Processing, New Jersey, USA, 1988: 136-143.
[32] Ramshaw L, Marcus M. Text Chunking Using Transformation-Based Learning[C]//Proceedings of 3rd Workshop on Very Large Corpora. Massachusetts: Association for Computational Linguistics,1995:82-94.
[33] A V D. Bosch, S Buchholz. Shallow Parsing on the Basis of Words Only: A Case Study[C]//Eisner: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA: Association for Computational Linguistics, 2002: 433-440.
[34] T Zhang, F Damerau, D Johnson. Text Chunking Based on a Generalization of Winnow. Journal of Machine Learning Research[J]. 2002,(2): 615-637.
[35] S B Park, B T Zhang. Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning[C]//Erhard W, Dan Roth: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan: Association for Computational Linguistics, 2003: 497-504.
[36] R K Ando, T Zhang. A High-Performance Semi-Supervised Learning Method for Text Chunking[C]//Kevin Knight, Hwee Tou Ng, Kemal Oflazer: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan: Association for Computational Linguistics, 2005: 1-9.
[37] Erik F. Tjong Kim Sang. Memory-Based Shallow Parsing[J]. The Journal of Machine Learning Research. 2002: 559-594.
[38] F Pla, A Molina, N Prieto. Improving chunking by means of lexical-contextual information in statistical language models[C]//Alan: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg,PA,USA: Association for Computational Linguistics, 2000. 148-150.
[39] Koeling Rob. Chunking with maximum entropy models[C]//Alan: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg,PA,USA: Association for Computational Linguistics, 2000: 139-141.
[40] Kudoh Taku, Matsumoto Yuji. Use of support vector learning for chunk identification [C]//Alan: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg,PA,USA: Association for Computational Linguistics, 2000: 142-144.
[41] Abney S. Part of speech tagging and partial parsing[C]//Church K, Young S, Bloothooft G, eds, Proc. of the Corpus-Based Methods in Language and Speech, An ELSNET Volume. Dordrecht: Kluwer Academic Publishers,1996: 119-136.
[42] 周强.汉语基本块规则的自动学习和扩展进化[J].清华大学学报(自然科学版), 2008,4(1): 88-91.
[43] 李珩,谭咏梅,朱靖波,等.汉语组块识别[J].东北大学学报(自然科学版),2004,25(2): 114-117.
[44] 李珩,朱靖波,姚天顺.基于SVM的中文组块分析[J].中文信息学报,2004,18(2): 1-7.
[45] 周俊生,戴新宇,陈家骏,等.基于大间隔方法的汉语组块分析[J].软件学报,2009,20(4): 870-877.
[46] 孙广路,郎非,薛一波.基于条件随机域和语义类的中文组块分析方法[J].哈尔滨工业大学学报,2011,43(7): 135-139.
[47] Tan YM, Yao TS, Chen Q, et al. Applying conditional random fields to Chinese shallow parsing[C]//David: Computational Linguistics and Intelligent Text Processing 6th International Conference. Mexico City, Mexico: COCLing 2005. 2005: 167-176.
[48] 张昱琪,周强. 汉语基本短语的自动识别[J].中文信息学报,2002,16(6): 1-8.
[49] 黄德根,王莹莹.基于SVM的组块识别及其错误驱动学习方法[J].中文信息学报,2006,20(6): 17-24.
[50] 黄德根,于静.分布式策略与CRFs相结合识别汉语组块[J].中文信息学报,2009,23(1): 16-22.
[51] Gao Hong,Huang Degen,Liu Wei, et al. Double Rule Learning in Boosting[J]. International Journal of Innovative Computing, Information & Control.2008,4(6): 1411-1420.
[52] G Sun, C Huang, X Wang, et al. Chinese Chunking Based on Maximum Entropy Markov Models[J]. International Journal of Computational Linguistics and Chinese Language Processing. 2006, 11(2): 115-136.
[53] G Sun, Y Guan, X Wang. A Maximum Entropy Chunking Model with N-Fold Template Correction[J]. Journal of Electronics. 2007,24(5): 690-695.
[54] 刘世岳,李珩,张俐,等.Co-training机器学习方法在中文组块识别中的应用[J].中文信息学报,2005,19(3): 73-79.
[55] 梁颖红,赵铁军,于浩,等.基于改进K-均值聚类的汉语语块识别[J].哈尔滨工业大学学报, 2007,39(7): 1106-1109.
[56] 程葳,赵军,刘非凡,等.面向口语翻译的双语语块自动识别[J].计算机学报,2004,27(8): 1016-1020.
[57] 宇航,周强.汉语基本块的内部关系分析[J].清华大学学报(自然科学版),2009,49(10): 136-140.
[58] Yoshimasa Tsuruoka, Jun’ichi Tsujii, Sophia Ananiadou. Fast Full Parsing by Linear-Chain Conditional Random Fields[C]//Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Association for Computational Linguistics, Athens, Greece,2009: 790-798.
[59] 周强,赵颖泽.汉语功能块自动分析[J].中文信息学报,2007,21(5): 18-24.
[60] 陈亿,周强,宇航.分层次的汉语功能块描述库构建分析[J].中文信息学报,2008,22(3): 24-31,43.
[61] Taro Watanabe, Eiichiro Sumita,Hiroshi G Okuno.Chunk-based Statistical Translation[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sapporo, Japan,2003: 303-310.

基金

国家自然科学基金资助项目(61132009,61201352)
PDF(960 KB)

Accesses

Citation

Detail

段落导航
相关文章

/