2002 Volume 16 Issue 6 Published: 16 December 2002
  

  • Select all
    |
  • ZHANG Yu-qi,ZHOU Qiang
    2002, 16(6): 2-9.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposed a hybrid model to identify Chinese base phrases.At first step ,We use a memory-based learning (MBL) approach to the chunking of nine types of Chinese base phrases and compare the results coming from different feature vectors. In the second series of experiments we used grammar rules that represent the inner structures of base phrases and lexical information to correct the incorrect predictions from the first step. The experiments reported in this paper show competitive results : the precision is 95.2% and the recall is 93.7%.
  • WANG Hou-feng
    2002, 16(6): 10-18.
    Abstract ( ) PDF ( ) Knowledge map Save
    Anaphora occurs throughout discourse or dialogue. Their high frequencies make anaphora resolution one key problem in discourse processing which attract attention of increasing researchers. In this article ,some issues of anaphora resolution will be discussed , such as basic concepts , special referring phenomena ,necessary knowledge on anaphora resolution. Some typical computational models of anaphora resolution and implement technologies will be given as well.
  • LIU Bin,HUANG Tie-jun,CHENG Jun,GAO Wen
    2002, 16(6): 19-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    Automatic text classification is defined as the task to assign pre-defined category labels to documents. To improve the classification performance ,this article puts forward the multi-level feature selection method and the kernel-based distance-weighted KNN algorithm. We extract the statistical text features on three different levels as Chinese letters , the common wordlist and the professional wordlist ,which can represent more statistical character of the document set. The kernel-based weighted KNN algorithm solves the multi-peak distribution problem and the overlap boundary problem of the sample set ,as well as the classifier's precise decision problem. In practical use ,the Internet and text data bases provide many pre-classified training samples.But some of them are not good for training the classifier.We use sample weightiness analysis to address this problem. The experimental system shows the effectiveness of the method.
  • HUANG Ke,MA Shao-ping
    2002, 16(6): 26-32.
    Abstract ( ) PDF ( ) Knowledge map Save
    Word segmentation is an important step in Chinese natural language processing. This paper explores the problem of classifying Chinese web pages based on statistical word segmentation.We first construct a Chinese word list of binary words automatically from training Chinese web pages. Then the texts in testing Chinese web pages are segmented with the word list. Web pages are classified based on the segmentation results. Experiments show that statistical word segmentation can efficiently improve classification precision.Based on the experiment results ,we analyze the influence of statistical word segmentation on Chinese web page classification. Single Chinese characters and words play different roles in web page classification and the reason for the difference is also analyzed.
  • HE Hong-zhao,HE Pi-lian,GAO Jian-feng,HUANG Chang-ning
    2002, 16(6): 33-38+46.
    Abstract ( ) PDF ( ) Knowledge map Save
    Term mismatch between queries and documents is a fundamental problem in Chinese Information Retrieval (IR) , which affects the effectiveness of retrieval results. Query expansion in IR can deal with this kind of problem in some degree. However ,experiments show that the common query expansion in IR cannot get steady retrieval results. In this paper ,we propose and realize query expansion based on the context ,which can choose the expansion words according to the context of the query. Experiment results with TREC-9 show that query expansion based on the context is a smarter method. Compared with the results of common query expansion ,query expansion based on the context can get statistically significant improvement.
  • LIU Xiao-dong,ZHANG Lei
    2002, 16(6): 39-46.
    Abstract ( ) PDF ( ) Knowledge map Save
    Expert systems form one of the most important research areas in Artificial Intelligence. The main parts in expert systems are knowledge bases and inference engines. In the knowledge bases the main knowledge is knowledge expressed by “IF-THEN”statements. In knowledge graphs ,a new form of knowledge representation ,the”IF-THEN”statements are tired up with causal operators (CAU-relations) . In this paper ,we picked out some Chinese operators with”CAU”meaning ,and investigated these operators. The goal is to build knowledge bases in expert systems.
  • ZHENG Shi-fu,LIU Ting,QIN Bing,LI Sheng
    2002, 16(6): 47-53.
    Abstract ( ) PDF ( ) Knowledge map Save
    Question-Answering is a hot research field in Natural Language Processing ,which includes many kinds of NLP technology. This paper introduces the current research status and the methods that are often used in Question-Answering. In general ,a Question-Answering system is made up of three parts : Question Analysis ,Information Retrieval and Answer Extraction. This paper describes the main functions of these three parts and the common approach used in these parts in detail. At last ,this paper introduces the evaluation of Question-Answering system.
  • LU Jian-jiang,QIAN Pei-de
    2002, 16(6): 54-58.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper mainly studies the design and the realization of the automatic verify check for code table of Chinese input method. It gives out the conception and the design of the rule-base and tells the working theory of the verify check system. Then it discusses the design scheme and the working procedure of automatic verify check based on rule-base in detail. Finally ,it gives out the integration strategy of the automatic verify system and the Chinese input method from the point of the application in realization.
  • YU Shi-wen,DUAN Hui-ming,ZHU Xue-feng,SUN Bin
    2002, 16(6): 59-65.
    Abstract ( ) PDF ( ) Knowledge map Save
    The Institute of Computational Linguistics ,Peking University has completed the basic processing of a contemporary Chinese corpus that has 27 million Chinese Characters. In addition to word segmentation and part-of-speech tagging ,the processing involves the tagging of proper nouns (person names ,place names ,organization names and so on) ,morpheme subcategories and the special usages of verbs and adjectives. The success of this large-scale language engineering is attributed to the SPECIFICATION ,which had been made beforehand and was being perfected while in use. We are hereby making an introduction to the SPECIFICATION through this publication ,thus inviting the comments from all the experts and our colleagues for the improvement of it.