Abstract:Baidu baike contains large amount of knowledges on the named entity, the link relationship and the category information. In order to recognize the product (or brand) name from open texts, we propose a graph-based method to discover product name using a few seeds. We incorporate the “related entry” and “open category” structure of baidu baike to reinforce the similarity measures. Applied this method on 1.3 million entries, satisfactory results are achieved for the product mining. Key wordsbrand name mining; semi-supervised learning; graph method
[1] 周俊生,戴新宇,尹存燕,等. 基于层叠条件随机场模型的中文机构名自动识别[J]. 电子学报, 2006: 34(5):804-809. [2] David Nadeau, Satoshi Sekine. A survey of named entity recognition and classification[J].Lingvisticae Investigationes, 2007. [3] Van Durme, B., Pas ca, M.. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled instances for open-domain information extraction[C]//Proceedings Twenty-Third AAAI Conference on Artificial Intelligence.2008. [4] Talukdar P. P., Reisinger J., Pasca,M., et al. Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008, 581-589. [5] Talukdar P. P., Pereira, F. Experiments in graph-based semi-supervised learning methods for class-instance acquisition[C]//Proceedings of 48th Annual Meeting of the Association for Computational Linguistics (ACL).2010. [6] Purnamrita Sarkar. Tractable Algorithms for Proximity Search on Large Graphs[D]. PhD thesis, Carnegie Mellon University, 2010. [7] D. Aldous, J. Fill. Reversible Markov Chains and Random Walks on Graphs[M]. Book in preparation. [8] Leo Katz. A new status index derived from sociometric analysis[C]. Psychometrika, 1953. [9] Lada A. Adamic, Eytan Adar. Friends and neighbors on the web[J]. Social Networks, 2003. [10] Ding Zhou, Sergey A. Orshanskiy, Hongyuan Zha, and C. Lee Giles. Co-ranking authors and documents in a heterogeneous network[C]//Data Mining, IEEE International Conference on, 2007:739-744. [11] Einat Minkov. Adaptive Graph Walk Based Similarity Measures in Entity-Relation Graphs[D].PhD thesis, Carnegie Mellon University, 2008. [12] Wensi Xi, Edward A. Fox, Weiguo Fan, et al.. Simfusion: measuring similarity using unified relationship matrix[C]//Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 05, New York, NY, USA, 2005: 130-137. [13] Xiaojin Zhu, Zoubin Ghahramani. Learning from labeled and unlabeled data with label Propagation[R]. Technical report, 2002. [14] Xiance Si, Zhiyuan Liu, Maosong Sun. Explore the structure of social tags by sub-sumption relations[C]//Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, August 2010:1011-1019.