Tibetan Person Attribute Extraction Based on SVM and Pattern
ZHU Zhen1,2, SUN Yuan1,2
1. School of Information Engineering, Minzu University of China,Beijing 100081, China;
2. Minority Languages Branch, National Language Resource and Monitoring Research Center,
Minzu University of China, Beijing 100081, China)
Abstract:This paper proposes an SVM and pattern based approach to Tibetan person attribute extraction. The pattern system is built with language rules on Tibetan language features with clear semantic information, such as case-auxiliary words, particular verb and etc. Then, a machine learning approach via SVM is introduced to build a a hierarchy classifier strategy. Experiment results indicate a significant improvement in person attributes extraction.
Key words person attributes extraction; tibetan language processing; SVM; hierarchy classifier
[1] 李光,钟雅琼.大陆研拟藏维文网络舆情监测系统监控分裂风险[N].凤凰周刊, 2012(18).
[2] Bizer C,Heath T,Berners-Lee T. Linked data-the story so far [J].International Journal on Semantic Web and Information Systems (IJSWIS),2009,5(3): 1-22.
[3] 张静,唐杰.下一代搜索引擎的焦点: 知识图谱[J].中国计算机学会通讯, 2012,9(4):64-68.
[4] Kong Fang, Zhou Guodong, Zhu Qiaoming. Survey on Coreference Resolution [J]. Computer Engineering, 2010, 36(8): 33-36.
[5] Bikel D, Castelli V, Florian R. Entity linking and slot filling through statistical processing and inference rules[C]//Proceedings of the TAC 2009 Workshop 2009.
[6] Burman A, Jayapal A, Kannan S.Entity linking, slot filling and temporal bounding[C]//Proceedings of the KBP,2011.
[7] Axel Bernal, Koby Crammer, Artemis Hatzigeorgiou. Global discriminative learning for higher-accuracy computational gene prediction[J]Computational Biology, 2007, 3(3):488-497.
[8] Freitag D, McCallum A. Information extraction with HMM structures learned by stochastic optimization[C]//Proceedings of the AAAI Press, 2000: 584-589.
[9] Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conf. on Machine Learning, 2001: 282-289.
[10] Kambhatla N. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations[C]//Proceedings of 42th Annual Meeting of the Association for Computational Linguistic, 2004: 21-26.
[11] Zhou G, Su J, Zhang J, Zhang M. Combining Various Knowledge in Relation Extraction[C]//Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics, 2005.
[12] Zelenko D, Aone C, Richardella. Kernel methods for relation extraction[J]. Journal of Machine Learning Research, 2003: 1083-1106.
[13] Nadia Ghamrawi, Andrew McCallum. Collective multi-label classification[C]//Proceedings of the Conference on Information and Knowledge Management (CIKM), 2005.
[14] Nanda Kambhatla. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations[C]//Proceedings of ACL, 2004: 178-181.
[15] Zhao S B, Grishman R. Extracting relations with integrated information using kernel methods[C]//Proceedings of ACL, 2005: 419-426.
[16] Miller S, Fox H, Ramshaw L, et al. A novel use of statistical parsing to extract information from text [C]//Proceedings of 6th Applied Natural Language Processing Conference, 2000.
[17] Culotta A, Sorensen J.Dependency tree kernels for relation extraction[C]//Proceedings of 42th Annual Meeting of the Association for Computational Linguistics, 2004: 21-26.
[18] Zelenko D, Aone C, Richardella. Kernel methods for relation extraction[J]. Journal of Machine Learning Research, 2003: 1083-1106.
[19] 加羊吉,李亚超,宗成庆,等.最大熵和条件随机场模型相融合的藏文人名识别方法 [J].中文信息学报,2014:28(1):107-112.
[20] 才智杰.藏文自动分词系统中紧缩词的识别 [J].中文信息学报,2009,23(1): 35-37.
[21] Sun Yuan, Zhao Xiaobing. Research on automatic recognition of Tibetan personal names based on multi-features[C]//Proceedings of International Conference on Natural Language Processing and Knowledge Engineering 2010.