常用现代汉语副词用法自动识别研究

张坤丽,赵 丹,昝红英,柴玉梅

PDF(1205 KB)
PDF(1205 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (6) : 65-72.
综述

常用现代汉语副词用法自动识别研究

  • 张坤丽,赵 丹,昝红英,柴玉梅
作者信息 +

Automatic Recognition of Usage of Common Modern Chinese Common Adverbs

  • ZHANG Kunli, ZHAO Dan, ZAN Hongying, CHAI Yumei
Author information +
History +

摘要

副词以其功能和用法的繁杂多样,引起了众多学者的研究。该文以构建三位一体的副词用法词典、副词用法规则库和副词用法语料库为基础,首先基于规则的方法对副词用法自动识别进行研究,对《人民日报》语料中的副词识别准确率达到了84.86%;然后,基于统计的方法,用不同特征模板、不同上下文窗口以及不同模型等对语料中常用副词进行识别。实验结果表明,基于统计的方法对副词用法自动识别研究有较好的效果。

Abstract

The adverbs of modern Chinese play complex syntax roles with strong individualized characteristics in their usages. Therefore, many research have been focused on the adverb usages. In this paper, we introduce a triune knowledge base (usage dictionary, usage rule and usage corpus ) of Contemporary Chinese adverbs that we have been finished. Based on this knowledge base, we first design usage rules to label the adverb usages in the corpus of Peoples Daily automatically, achieving an accuracy of 84.86%. Then we adopt statistical strategy to label the common adverbss usage with different feature templates, context window sizes and models. Experiment show that the statistical methods produce preferable results for the automatic recognition of adverbs usages.
Key wordsautomatic recognition of adverb usage; adverb usage rule; conditional random fields; maximum entropy; support vector machine

关键词

副词用法自动识别 / 副词用法规则 / 条件随机场 / 最大熵 / 支持向量机

Key words

automatic recognition of adverb usage / adverb usage rule / conditional random fields / maximum entropy / support vector machine

引用本文

导出引用
张坤丽,赵 丹,昝红英,柴玉梅. 常用现代汉语副词用法自动识别研究. 中文信息学报. 2012, 26(6): 65-72
ZHANG Kunli, ZHAO Dan, ZAN Hongying, CHAI Yumei. Automatic Recognition of Usage of Common Modern Chinese Common Adverbs. Journal of Chinese Information Processing. 2012, 26(6): 65-72

参考文献

[1] 张谊生.现代汉语副词研究[M].上海:学林出版社,2001.
[2] 俞士汶,朱学锋,刘云.现代汉语广义虚词知识库的建设[J].汉语语言与计算学报,2003,13(1):89-98.
[3] 昝红英,张坤丽,柴玉梅,等.现代汉语虚词知识库的研究[J].中文信息学报,2007,21(5):107-111.
[4] 昝红英,朱学锋.面向自然语言处理的汉语虚词研究与广义虚词知识库构建[J].当代语言学,2009,2:124-135.
[5] 陆俭明,马真.现代汉语虚词散论[M].北京: 语文出版社,1999.
[6] 张谊生.现代汉语虚词[M].上海:华东师范大学出版社,2000.
[7] 张亚军.副词与限定功能描述[M].合肥:安徽教育出版社,2002.
[8] 刘云.汉语虚词知识库的建设[D].[博士后出站报告].北京:北京大学,2004.
[9] 刘锐,昝红英,张坤丽.现代汉语副词用法的自动识别研究[J].计算机科学,2008,8(A):172-174.
[10] 袁应成,昝红英,张坤丽,等.基于规则的虚词用法自动标注算法设计与系统实现[C]//第十一届汉语词汇语义学研讨会论文集,苏州:苏州大学,2010:163-169.
[11] 昝红英,张军珲,朱学锋,等.副词“就”的用法及其自动识别研究[J]. 中文信息学报,2010.24(5):10-16.
[12] 金澎,吴云芳,俞士汶.词义标注语料库建设综述[J].中文信息学报,2008,22(3):16-23.
[13] Lafferty J,McCallum A,Pereira F. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th ICML-01,2001: 282-289.
[14] Berger A.L,Della Pietra V.J,Della Pietra S.A. A maximum entropy approach to natural language processing[J]. Computational Linguistics,1996,22(1):39-71.
[15] http://www.support-vector.net[CP/OL].
[16] CRF++: Yet Another Toolkit[CP/OL].http://www.chasen.org/~taku/software/CRF++.
[17] http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html[CP/OL].
[18] http://www.csie.ntu.edu.tw/~cjlin/libsvm[CP/OL].

基金

国家自然科学基金资助项目(60970083); 模式识别国家重点实验室开放课题基金资助项目; 河南省科技创新人才杰出青年基金资助项目(104100510026)
PDF(1205 KB)

Accesses

Citation

Detail

段落导航
相关文章

/