实体关系抽取是信息抽取领域中的重要研究课题。本文使用两种基于特征向量的机器学习算法,Winnow 和支持向量机(SVM) ,在2004 年ACE(Automatic Content Extraction) 评测的训练数据上进行实体关系抽取实验。两种算法都进行适当的特征选择,当选择每个实体的左右两个词为特征时,达到最好的抽取效果,Winnow和SVM算法的加权平均F-Score 分别为73108 %和73127 %。可见在使用相同的特征集,不同的学习算法进行实体关系的识别时,最终性能差别不大。因此使用自动的方法进行实体关系抽取时,应当集中精力寻找好的特征。
Abstract
Entity Relation Extraction is an important research field in Information Extraction. Two kinds of machine learning algorithms , Winnow and Support Vector Machine (SVM) , were used to extract entity relation from the training data of ACE (Automatic Content Extraction) Evaluation 2004 automatically. Both of the algorithms need appropriate feature selection. When two words around an entity were selected , the performance of the both algorithms got the peak. The average weighted F2Score of Winnow and SVM algorithms were 73108 % and 73127 % respectively. We can conclude that when the same feature set is used , the performance of different machine learning algorithms get little difference. So we should pay more attention to find better features when we use the automatic learning methods to extract the entity relation.
关键词
计算机应用 /
中文信息处理 /
实体关系抽取 /
ACE 评测 /
特征选择
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
entity relation extraction /
ACE evaluation /
feature selection
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1 ] In : Proceedings of the 6th Message Understanding Conference (MUC - 7) [ C] . National Institute of Standars and Technology , 1998.
[ 2 ] C. Aone and M. Ramos2Santacruz. Rees : A large2scale relation and event extraction system[A] . In : Proceedings of the 6th Applied Natural Language Processing Conference[C] ,pages 76 - 83 , 2000.
[ 3 ] S. Miller , M. Crystal , H. Fox , L. Ramshaw , R. Schwartz , R. Stone , R. Weischedel , and the Annotation Group. Algorithms that learn to extract information2BBN: Description of the SIFT system as used for MUC[A] . In : Proceedings of the Seventh Message Understanding Conference (MUC - 7) [C] , 1998.
[4 ] S. Soderland. Learning information extraction rules for semi2structured and free text [J ] . Machine Learning ,1999. 34(1 - 3) :233 - 272.
[5 ] N. Cristianini and J . Shawe2Taylor. An Introduction to Support Vector Machines[M] . Cambridge University Press ,Cambirdge University , 2000.
[6 ] T. Zhang. Regularized winnow methods[A] . In : Advances in Neural Information Processing Systems 13[C] , pages 703 - 709 , 2001.
[7 ] D. Haussler. Convolution kernels on discrete structures[R] . Technical Report UCSC2CRL - 99 - 10 , 7 , 1999.[8 ] H. Lodhi , C. Saunders , J . Shawe2Taylor , N. Cristianini , and C. Watkins. Text classification using string kernels [R] . J . Mach. Learn. Res. , 2 :419 - 444 , 2002.
[9 ] D. Zelenko , C. Aone , and A. Richardella. Kernel methods for relation extraction[R] . J . Mach. Learn. Res. , 3 :1083 - 1106 , 2003.
[10 ] A. Culotta and J . Sorensen. Dependency tree kernels for relation extraction[A] . In : Proceedings of ACL [ C] .2004. Barcelona , Spain.
[11 ] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm[J ] . In : Computational Learning Theory , pages 209 - 217 , 1998.
[12 ] T. Joachims. Text categorization with support vector machines : learning with many relevant features[A] . In : C.Nédellec and C. Rouveirol , editors ,Proceedings of ECML - 98 , 10th European Conference on Machine Learning[C] , number 1398 , pages 137 - 142 , Chemnitz , DE , 1998. Springer Verlag , Heidelberg , DE.
[13 ] K. Aas and L. Eikvil , Text categorization : A survey , tech. rep. [R] , Norwegian Computing Center , June 1999.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助(60435020)
{{custom_fund}}