Journal of Chinese Information Processing

Select

Automatic Identification of Chinese Base Phrases

ZHANG Yu-qi,ZHOU Qiang

2002, 16(6): 2-9.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposed a hybrid model to identify Chinese base phrases.At first step ,We use a memory-based learning (MBL) approach to the chunking of nine types of Chinese base phrases and compare the results coming from different feature vectors. In the second series of experiments we used grammar rules that represent the inner structures of base phrases and lexical information to correct the incorrect predictions from the first step. The experiments reported in this paper show competitive results : the precision is 95.2% and the recall is 93.7%.

Select

Survey : Computational Models and Technologies in Anaphora Resolution

WANG Hou-feng

2002, 16(6): 10-18.

Abstract ( ) PDF ( )

Knowledge map

Save

Anaphora occurs throughout discourse or dialogue. Their high frequencies make anaphora resolution one key problem in discourse processing which attract attention of increasing researchers. In this article ,some issues of anaphora resolution will be discussed , such as basic concepts , special referring phenomena ,necessary knowledge on anaphora resolution. Some typical computational models of anaphora resolution and implement technologies will be given as well.

Select

A New Statistical-based Method in Automatic Text Classification

LIU Bin,HUANG Tie-jun,CHENG Jun,GAO Wen

2002, 16(6): 19-25.

Abstract ( ) PDF ( )

Knowledge map

Save

Automatic text classification is defined as the task to assign pre-defined category labels to documents. To improve the classification performance ,this article puts forward the multi-level feature selection method and the kernel-based distance-weighted KNN algorithm. We extract the statistical text features on three different levels as Chinese letters , the common wordlist and the professional wordlist ,which can represent more statistical character of the document set. The kernel-based weighted KNN algorithm solves the multi-peak distribution problem and the overlap boundary problem of the sample set ,as well as the classifier's precise decision problem. In practical use ,the Internet and text data bases provide many pre-classified training samples.But some of them are not good for training the classifier.We use sample weightiness analysis to address this problem. The experimental system shows the effectiveness of the method.

Select

Chinese Web Page Classification Based On Statistical Word Segmentation

HUANG Ke,MA Shao-ping

2002, 16(6): 26-32.

Abstract ( ) PDF ( )

Knowledge map

Save

Word segmentation is an important step in Chinese natural language processing. This paper explores the problem of classifying Chinese web pages based on statistical word segmentation.We first construct a Chinese word list of binary words automatically from training Chinese web pages. Then the texts in testing Chinese web pages are segmented with the word list. Web pages are classified based on the segmentation results. Experiments show that statistical word segmentation can efficiently improve classification precision.Based on the experiment results ,we analyze the influence of statistical word segmentation on Chinese web page classification. Single Chinese characters and words play different roles in web page classification and the reason for the difference is also analyzed.

Select

Query Expansion Based on the Context in Chinese Information Retrieval

HE Hong-zhao,HE Pi-lian,GAO Jian-feng,HUANG Chang-ning

2002, 16(6): 33-38+46.

Abstract ( ) PDF ( )

Knowledge map

Save

Term mismatch between queries and documents is a fundamental problem in Chinese Information Retrieval (IR) , which affects the effectiveness of retrieval results. Query expansion in IR can deal with this kind of problem in some degree. However ,experiments show that the common query expansion in IR cannot get steady retrieval results. In this paper ,we propose and realize query expansion based on the context ,which can choose the expansion words according to the context of the query. Experiment results with TREC-9 show that query expansion based on the context is a smarter method. Compared with the results of common query expansion ,query expansion based on the context can get statistically significant improvement.

Select

“CAU”Words and the Analysis by Means of Knowledge Graphs

LIU Xiao-dong,ZHANG Lei

2002, 16(6): 39-46.

Abstract ( ) PDF ( )

Knowledge map

Save

Expert systems form one of the most important research areas in Artificial Intelligence. The main parts in expert systems are knowledge bases and inference engines. In the knowledge bases the main knowledge is knowledge expressed by “IF-THEN”statements. In knowledge graphs ,a new form of knowledge representation ,the”IF-THEN”statements are tired up with causal operators (CAU-relations) . In this paper ,we picked out some Chinese operators with”CAU”meaning ,and investigated these operators. The goal is to build knowledge bases in expert systems.

Select

Overview of Question-Answering

ZHENG Shi-fu,LIU Ting,QIN Bing,LI Sheng

2002, 16(6): 47-53.

Abstract ( ) PDF ( )

Knowledge map

Save

Question-Answering is a hot research field in Natural Language Processing ,which includes many kinds of NLP technology. This paper introduces the current research status and the methods that are often used in Question-Answering. In general ,a Question-Answering system is made up of three parts : Question Analysis ,Information Retrieval and Answer Extraction. This paper describes the main functions of these three parts and the common approach used in these parts in detail. At last ,this paper introduces the evaluation of Question-Answering system.

Select

The Study of Designing Automatic Verify Check for Code Table of Chinese Input Method

LU Jian-jiang,QIAN Pei-de

2002, 16(6): 54-58.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper mainly studies the design and the realization of the automatic verify check for code table of Chinese input method. It gives out the conception and the design of the rule-base and tells the working theory of the verify check system. Then it discusses the design scheme and the working procedure of automatic verify check based on rule-base in detail. Finally ,it gives out the integration strategy of the automatic verify system and the Chinese input method from the point of the application in realization.

Select

The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION

YU Shi-wen,DUAN Hui-ming,ZHU Xue-feng,SUN Bin

2002, 16(6): 59-65.

Abstract ( ) PDF ( )

Knowledge map

Save

The Institute of Computational Linguistics ,Peking University has completed the basic processing of a contemporary Chinese corpus that has 27 million Chinese Characters. In addition to word segmentation and part-of-speech tagging ,the processing involves the tagging of proper nouns (person names ,place names ,organization names and so on) ,morpheme subcategories and the special usages of verbs and adjectives. The success of this large-scale language engineering is attributed to the SPECIFICATION ,which had been made beforehand and was being perfected while in use. We are hereby making an introduction to the SPECIFICATION through this publication ,thus inviting the comments from all the experts and our colleagues for the improvement of it.

Please choose a citation manager

Content to export

2002 Volume 16 Issue 6 Published: 16 December 2002