Abstract:An approach of base noun phrase (BaseNP) identification based on rough sets is proposed in this paper. It divides BaseNP identification into two ordinal subtasks: tagging and identification, and regards BaseNP tagging as a decision-making problem which can be solved in rough sets theory. So it characters feature reduction and rule optimization. In the paper, rough sets-based rule learning method and relevant algorithms are briefly introduced at first, the flow charts of BaseNP tagging and identification are then described, and the solution to the instance collision is put forward for improving the performance of BaseNP identification. The detailed experimental steps and results, and the comparison with some representative similar systems are given at last. According to the analysis of the results, the paper also points out the direction of further improvement of the approach.
[1] Kupiec, Julian. An algorithm for finding noun phrase correspondences in bilingual corpora [A]. In: proceedings of the 31st Annual Meeting of ACL [C] , 1993, 17 - 22. [2] Cardie and D Pierce. Error-driven pruning of treebank grammas baseNP identification [A]. In: proceedings of the 36th International Conference on Computational Linguistics[C] , 1998, 218 - 224. [3] Ramshaw L and Marcus M. Text chunking using transformation-based learning [A]. In: proceedings of the Third Workshop on Very Large Corpora[C] , 1995, 82 - 94. [4] Endong Xun. A unified statistical model for the identification of English baseNP [A]. In: proceedings of the 38th Annual Meeting of the Association for Computational Linguistics[C] , 2000, 104 - 111. [5] Shlomo Argamon, Ido Dagan, and Yuval Krymolowski. A memory-based approach to learning shallow natural language patterns[A]. In: proceedings of the 36th Annual Meeting of the Association for Computational Linguistics[C] , 1998, 67 - 73. [6] Erik F. Tjong Kim Sang and Jorn Veenstra. Representing text chunks[A]. In: proceedinge of the 9th Conference of the European Chapter of the Association for Computational Linguistice[C] , 1999, 173 - 179. [7] Erik F Tjong Kim Sang. Noun phrase representation by system combination[A]. In: proceedings of ANLP - NAACL 2000 [C] , Seattle, WA, USA, 2000, 50 - 55. [8] 王国胤. Rough集理论与知识获取[M]. 西安:西安交通大学出版社, 2001.