Synonymous Expansion Based Entity Attribute Extraction via Online Encyclopedia
LIU Qian1,2, LIU Bingyang1,2, HE Min3, WU Dayong 1, LIU Yue 1, CHENG Xueqi 1
1. CAS Key Laboratory of Network Data Science & Technology, Institute of Computing Technology,
Chinese Academy of Sciences, Beijing 100190,China;
2. University of Chinese Academy of Sciences, Beijing 100049,China;
3. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
Abstract:Entity attribute extraction is fundamental to information extraction and knowledge base construction. This paper proposes an approach to open-domain entity attributes extraction from the online encyclopedia. The method collects potential attribute phrases through a combination of the web page structure and the domain independent patterns. Then, the acquired attribute patterns are expanded by synonymous expansions, which in turn helps to obtain a set of synonymous attributes. Experimental results show that the proposed approach boosts the coverage of extracted attributes without losing the precision.
[1] Popescu A-M, Etzioni O. Extracting product features and opinions from reviews[M]Natural language processing and text mining. Springer London, 2007: 9-28.
[2] Pasca M, Van Durme B, Garera N. The role of documents vs. queries in extracting class attributes from text[C]//Proceedings of CIKM. Lisbon, Portugal, 2007: 485-494.
[3] Pasca M. Attribute extraction from conjectural queries[C]//Proceedings of COLING 2012. India, 2012: 2177-2190.
[4] Tokunaga K, Kazama J, Torisawa K. Automatic discovery of attribute words from Web documents[C]//Proceedings of the Natural Language Processing-IJCNLP 2005. Jeju Island, Korea, 2005: 106-118.
[5] Raju S, Pingali P, Varma V. An unsupervised approach to product attribute extraction[M]. Advances in Information Retrieval. Springer Berlin Heidelberg, 2009: 796-800.
[6] Lee T, Wang Z, Wang H, et al. Attribute Extraction and Scoring: A Probabilistic Approach[C]//Proceedings of ICDE. Brisbane, Australia, 2013: 194-205.
[7] Van Durme B, Qian T, Schubert L. Class-driven attribute extraction[C]//Proceedings of the 22nd International Conference on Computational Linguistics. Manchester, UK, 2008: 921-928.
[8] Ravi S, Paca M. Using structured text for large-scale attribute extraction[C]//Proceedings of CIKM. Napa Valley, California, 2008: 1183-1192.
[9] Lin D, Zhao S, Qin L, et al. Identifying synonyms among distributionally similar words[C]//Proceedings of IJCAI. Acapulco, Mexico, 2003: 1492-1493.
[10] Turney P. Mining the web for synonyms: PMI-IR versus LSA on TOEFL[C]//Proceedings of the 12th European Conference on Machine Learning. Freiburg, Germany, 2001: 491-502.
[11] Witten I, Milne D. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links[C]//Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence. Chicago, USA, 2008: 25-30.
[12] Kim S, Toutanova K, Yu H. Multilingual named entity recognition using parallel data and metadata from Wikipedia[C]//Proceedings of ACL,Korea, 2012: 694-702.
[13] Han X, Zhao J. Structural semantic relatedness: a knowledge-based method to named entity disambiguation[C]//Proceedings of ACL, Sweden, 2010: 50-59.
[14] Suchanek F M, Kasneci G, Weikum G. Yago: a core of semantic knowledge[C]//Proceedings of WWW, Canada, 2007: 697-706.
[15] 叶正,林鸿飞,苏绥,等. 基于支持向量机的人物属性抽取[J]. 计算机研究与发展, 2007, 44: 271-275.
[16] 卢汉,曹存根,王石. 基于元性质的数量型属性值自动提取系统的实现[J]. 计算机研究与发展, 2010, 47(10): 1741-1748.