|
|
Classifying Named Entities on Chinese Wikipedia |
XU Zhihao1,2,HUI Haotian1,2,QIAN Longhua1,2,ZHU Qiaoming1,2 |
1.Natural Language Processing Lab of Soochow University,Suzhou,Jiangsu 215006,China; 2. School of Computer Science & Technology,Soochow University,Suzhou,Jiangsu 215006,China |
|
|
Abstract Classifying Wikipedia Entities is of great significance to NLP and machine learning. This paper presents a machine learning based method to classify the Chinese Wikipedia articles. Besides using semi-structured data and non-structured text as basic features, we also extend to use Chinese-oriented features and semantic features in order to improve the classification performance. The experimental results on a manually tagged corpus show that the additional features significantly boost the entity classification performance with the overall F1-measure as high as 96% on the ACE entity type hierarchy and 95% on the extended entity type hierarchy.
|
Received: 08 July 2015
|
|
|
|
|
[1] Nothman J, Curran J R, Murphy T. Transforming Wikipedia into named entity training data[C]//Proceedings of the Australian Language Technology Workshop. 2008: 124-132. [2] Nothman J. Learning named entity recognition from Wikipedia[D]. The University of Sydney Australia 7, 2008. [3] Bunescu R C, Pasca M. Using Encyclopedic Knowledge for Named entity Disambiguation[C]//Proceedings of the EACL. 2006, 6: 9-16. [4] Zirn C, Nastase V, Strube M. Distinguishing between instances and classes in the wikipedia taxonomy[M]. Springer Berlin Heidelberg, 2008. [5] Toral A, Munoz R. A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia[J]. NEW TEXT Wikis and blogs and other dynamic text sources, 2006, 56. [6] Bhole A, Fortuna B, Grobelnik M, et al. Extracting named entities and relating them over time based on wikipedia[J]. Informatica (Slovenia), 2007, 31(4): 463-468. [7] Tardif S, Curran J R, Murphy T. Improved text categorisation for Wikipedia named entities[C]//Proceedings of the Australasian Language Technology Association Workshop 2009. 2009: 104. [8] Dakka W, Cucerzan S. Augmenting Wikipedia with
|
|
|
|