Abstract:This paper introduced the design and implementation of Tianwang Fame System. It mainly discussed on the factors and algorithms that affect matching of a named entity with Chinese webpages’relevance evaluation on the celebrities. Aiming at shortages of the current Search Engines , the project is to improve the quality of the web information services , and to enhance the ability of the personalizing services. Based on the Tianwang Search Engine of Peking University , the Fame System adopted new techniques in Nature Language Processing , especially in Chinese information extraction according to the features of webpage information. The paper proposed a new method to the relevance evaluation of webpages against attributes of named enties. This method optimizes the order of the search results , and improves the service quality of Tianwang Fame System.
[1] Dayne Freitag. Information Extraction from HTML : Application of a General Machine Learning Approach. American Association for Artificial Intelligence (www.aaai.org) . 1998. [2] A. Douthat . The Message Understanding Conference Scoring Software User’s Manual. MUC - 7 Proceedings. SAIC 1999. [3] S. Chakrabarti , B. E. Dom , et al. Mining the web’s link structure. COMPUTER , 1999 ,32 :60 - 67. [4] J. M. Kleinberg. Authoritative sources in a hyperlinked environment . Journal of ACM , 1999 ,46 :604 - 632. [5] Jiawei Han , Micheline Kamber. Data Mining-Concept and Tachniques. Academic Press , 2000. [6] 孙斌. 中文信息提取系统设计与若干相关基础问题的研究. 北京大学博士后研究工作报告,2002. 5. [8] 冯是聪. 搜索引擎个性化查询服务研究. 北京大学博士生开题报告(电子版) , 2002. 6. [8] Ray , Deborah S. Mastering Html 4.0 1998. [9] 施水才,肖诗斌,等. TSR中文文本信息检索技术的发展. 中国中文信息学会二十周年学术会文论文集. 清华大学出版社,2001. 11 (79 - 88) . [10] 昝红英,俞士汶. CCD及其应用. 广西师范大学学报,2003. 1 ,98 - 103.