该文提出了一种基于卷积树核的无指导中文实体关系抽取方法。该方法以最短路径包含树作为关系实例的结构化表示形式,以卷积树核函数作为树相似度计算方法,并采用分层聚类方法进行无指导中文实体关系抽取。在ACE RDC 2005中文基准语料库上的无指导关系抽取实验表明,采用该方法的F值最高可达到60.1,这说明基于卷积树核的无指导中文实体关系抽取是行之有效的。
Abstract
This paper proposes a convolution tree kernelbased approach for unsupervised Chinese entity relation extraction. This method first represents potential relation instances as shortest path-enclosed trees, then computes similarities between them using convolution tree kernel, finally groups them into various clusters through hierarchical clustering algorithms. Evaluation on the ACE RDC 2005 benchmark corpus shows that the convolution tree kernel-based approach achieves the highest F-measure of 60.1 on the task of unsupervised Chinese entity relation extraction, suggesting that this method is promising.
Key wordscomputer application; Chinese information processing; entity relation extraction; unsupervised learning; convolution tree kernel
关键词
计算机应用 /
中文信息处理 /
实体关系抽取 /
卷积树核 /
无指导学习 /
层次聚类
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
entity relation extraction /
unsupervised learning /
convolution tree kernel
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 李保利, 陈玉忠, 俞士汶. 信息提取研究综述[J]. 计算机工程与应用, 2003, 39(10): 1-5.
[2] Kambhatla N. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations[C]//ACL-2004(Poster): 178-181.
[3] Zhao S B and Grishman R. Extracting relations with integrated information using kernel-based methods [C] //ACL-2005: 419-426.
[4] Zhou G D, Su J, Zhang J and Zhang M. Exploring various knowledge in relation extraction[C] //ACL-2005: 427-434.
[5] Jiang J and Zhai C X. A Systematic Exploration of the Feature Space for Relation Extraction[C] //NAACL-HLT-2007: 113-120.
[6] 奚斌 ,钱龙华 ,周国栋 ,等. 语言学组合特征在语义关系抽取中的应用[J].中文信息学报, 2008, 22(3): 44-49,63.
[7] Zelenko D, Aone C and Richardella A. Kernel-based methods for relation extraction[J]. Journal of Machine Learning Research, 2003, 3(Feb): 1083-1106.
[8] Culotta A and Sorensen J. Dependency tree kernels for relation extraction [C]//ACL-2004: 423-429.
[9] Bunescu R and Mooney R J. A shortest path dependency kernel for relation extraction [C] //HLT-EMNLP-2005: 724-731.
[10] Zhang M, Zhang J, Su J and Zhou G D. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features [C]//COLING-ACL-2006: 825-832.
[11] Zhou G D, Zhang M, Ji D H, Zhu Q M. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information [C]//EMNLP-CoNLL-2007: 728-736.
[12] Qian L H, Zhou G D, Zhu Q M, Qian P D. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C]//COLING-2008: 697-704.
[13] 庄成龙 ,钱龙华 ,周国栋. 基于树核函数的实体语义关系抽取方法研究[J].中文信息学报, 2009, 23(1): 4-8, 34.
[14] Brin S. Extracting patterns and relations from the World Wide Web [C]//Proceedings of WebDB Workshop at 6th International Conference on Extending Database Technology (EDBT'98), 1998.
[15] Agichtein E and Gravano L. Snowball: Extracting Relations from Large Plain-Text Collections [C]// Proceedings of the fifth ACM conference on Digital libraries, 2000.
[16] Zhang Z. Weakly supervised relation classification for Information Extraction[C]//CIKM-2004: 581-588.
[17] Chen J X, Ji D H and Tan C L. Relation Extraction using Label Propagation Based Semi supervised Learning [C]//COLING-ACL-2006: 126-139.
[18] Hasegawa T, Sekine S and Grishman R. Discovering Relations among Named Entities from Large Corpora[C]//ACL-2004: 415-422.
[19] Zhang M, Sun J, Wang D M, et al. Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-base Clustering[C]//IJCNLP-2005: 378-389.
[20] 车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2005, 19(2):1-6.
[21] 董静, 等. 中文实体关系抽取中的特征选择研究[J]. 中文信息学报, 2007, 21(4): 80-85, 91.
[22] Li W J, Zhang P, Wei F R, Hou Y X and Lu Q. A Novel Feature-based Approach to Chinese Entity Relation Extraction[C]//ACL-2008(short paper): 89-92.
[23] Che W X, et al.. Improved-Edit-Distance Kernel for Chinese Relation Extraction[C]//IJCNLP, 2005: 132-137.
[24] 刘克彬, 等. 基于核函数中文关系自动抽取系统的实现[J]. 计算机研究与发展, 2007, 44(8): 1406-1411.
[25] Huang R H, Sun L, Feng Y Y. Study of Kernel-Based Methods for Chinese Relation Extraction[C]//LNCS (Lecture Notes in Computer Science), 2008(4993): 598-604.
[26] Chen J X, Ji D H, Tan C L, et al. Unsupervised Feature Selection for Relation Extraction [C]//CIKM-2007: 411-418.
[27] Collins M and Duffy N. Convolution Kernels for Natural Language[C]//NIPS-2001: 625-632.
[28] Christopher D. Manning, Hinrich Schtze. Foundations of Statistical Natural Language Processing[M]. Beijing: Publishing House of Electronics Industry, 2005.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60873150,60970056,90920004);江苏省自然科学基金资助项目(BK2008160)
{{custom_fund}}