Review
Shen Dayang1 , Sun Maosong2
1999, 13(2): 25-33.
PersonIndexer , a prototype system for automatically generating Chinese personal information index in Internet , is proposed in this paper. Preliminary experimental results on all HTML texts under two CERNET web sites indicate that , the average recall and precision for extraction of Chinese names , Chinese names in Pinyin form as well as Chinese organization names are 97.8% & 61.9% , 100% & 64.5% ,94.5 % & 92.1% respectively , and the recall and precision for extracting email addresses , telephone and fax numbers are about 100%. We believe that , the integration of large - scale - running - text - oriented Chinese NLP techniques with information retrieval techniques in Internet , will become a hot research topic of Chinese information processing in the near future.