2001 Volume 15 Issue 6 Published: 15 December 2001
  

  • Select all
    |
  • ZHENG Jia-heng,QIAN Yi-li,LI Jing
    2001, 15(6): 2-7,27.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an ideography of abundant semantic contents , the Chinese character is a closed set with limited number while the Chinese word is an open system which is unlimited. Following the idea of "character-sense elementalization and word-sense combinationalization", this paper researches the combination of word-sense with the character-sense as the starting point . Firstly , it establishes the database of character-sense and word-sense by searching automatically the combinations of two-character words’word-sense f rom three main dictionaries. Then it defines the combination types through the calculating of semantic relativity. The author hopes this paper can provide references for the research of the combination of two-character words’word-sense.
  • HOU Jun,WANG Zuo-ying
    2001, 15(6): 8-13.
    Abstract ( ) PDF ( ) Knowledge map Save
    A hybrid semantic and word based language model is brought forward in this paper. The performance of the model is tested in semantic tagging and Mandarin speech recognition ,and compared with t raditional N-gram and semantic language models. The hybrid model better describes the relation between semantics and words and achieves a lower perplexity in tagging corpus. In Mandarin speech recognition , this model shows a better performance and requires less memory space than the word based trigram model.
  • LI Rong,LIU Shao-hui,YE Shi-wei,SHI Zhong-zhi
    2001, 15(6): 14-19.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper presents an algorithm based on the combination of Support Vector Maching (SVM) and k Nearest neighbor (k-NN) ,to deal with ambiguities in Chinese word segmentation. We regard the ambiguities segmentation as a classified problem and propose a vector representation of them. The method to find the solutions is supervised learning. After the ambiguities being selected and classified by handwork ,the ambiguities with high frequency are trained by SVM. For the testhing ambiguities ,we classify it based on mixed classified algorithm. The experiments show that not only the correct rate can reach 91.6%. for crossing ambiguities ,but also the performance of this algorithm is of high stability.
  • CHEN Yi-dong,LI Tang-qiu,HONG Qing-yang,ZHENG Xu-ling
    2001, 15(6): 20-27.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper describes a model of English lexical selection ,based on example comparison and with the help of semantic pattern match. First ,we argue the importance of lexical selection during English generation of Chinese-English Machine Translation system. Then we make a comparison about several tactics of lexical selection and bring forward our model. At last we describe the structure of generation lexicon and the algorithm for our method in detail. This paper also briefly introduces the semantic knowledge resource that we utilize - How Net .
  • LIN Hong-fei,WANG Jian-feng
    2001, 15(6): 28-33.
    Abstract ( ) PDF ( ) Knowledge map Save
    It is essential to knowledge discovery that multi-linguistic text categorization is applied to share the information sources in the Internet . The model for bi-linguistic text categorization is presented in this paper. It utilizes the mechanism of text feature extraction to extract the features of classes and texts ,and it generates the feature vectors of classes and texts by the rule of word translation based on concept expansion. As a result ,it uses Latent Semantic Indexing to integrate the bi-linguistic texts on the semantic layer ,and it calculates the semantic similarity between texts and classes to classify the texts. It can make high categorization precision ,and it is independent of machine translation and manual tagging.
  • JIN Xiang-yu,SUN Zheng-xing,ZHANG Fu-yan
    2001, 15(6): 34-40.
    Abstract ( ) PDF ( ) Knowledge map Save
    A domain-independent dictionary-free lexical acquisition model is presented in this paper ,which introduces a self-increasing algorithm to acquire the co-occurrence patterns of Chinese characters ,and introduces some criteria such as support and confidence to filter these co-occurrence patterns to get lexical items. Experiments show that it can acquire lexical items with high frequency effectively and efficiently without the support of the dictionary and the supervised learning in term of corpus. The model proposed in this paper particularly suits for lexical-frequency-sensitive but time-critical Chinese information processing applications ,such as real time automatic Chinese text classification systems.
  • ZHANG Min,MA Shao-ping
    2001, 15(6): 41-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on the need of information retrieval technology on Chinese ancient books ,we made the statistical analyses of the ancient Chinese on a large-scale corpus. Firstly ,we propose a method to cooperate corpus on different fields. In this method ,we analyzed the statistics of ancient Chinese on more than 35,000,000 characters. It shows that the common used characters are concentrated but the remaining is diffused with the decreasing speed of exponential. Then we give some more analyses on bigrams. Comparisons are made between modern Chinese and ancient Chinese. Conclusions are got and Chinese characters are divided into four different parts according with the usage frequency. Finally ,these statistics are used in the information retrieval system of ancient Chinese books.
  • JIN Ling,WU Wen-hu,ZHENG Fang,WU Gen-qing
    2001, 15(6): 48-53.
    Abstract ( ) PDF ( ) Knowledge map Save
    Proposed in this paper is a novel language model based on the traditional N-gram model , where the inter-word distance information is integrated ,and therefore the model is referred to as the distance-weighted statistical language model. In this model , the relationship between disconnected words is taken into consideration. Based on the principle that closer words (in distance) have a closer relation. A distance-weighted function has been used to integrate the information so as to improve the model’s prediction ability. Compared with the original n-gram model ,the experiments results show that the proposed language model will reduce the Chinese whole sentence IME system’s word error rate.
  • ZHANG Lei,LI Xue-liang,LIU Xiao-dong
    2001, 15(6): 54-59.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge graph theory is a new method of knowledge representation. In this paper ,we compare the knowledge graph ontology with other ontology , such as Aristotle’s、Kant ’s and Peirce’s. As a result ,knowledge graph theory is more primitive than others. On the base of the comparing ,the classification of logic words in natural language processing is also studied. The logic words are classified into two kinds ,according to their different structures in knowledge graphs. For each kind of the logic words ,we analysis the word graphs in the form of the knowledge graph ,respectively. So ,the idea of "structure is meaning" is expressed more clearly.
  • ZHAI Ling-hui,MA Shao-ping
    2001, 15(6): 60-65.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper describes the data conversion problem when CICS on mainf rame ES/9000 (based on MVS/VSE operating system) and CICS on minicom RS/6000 (based on AIX operating system) communicate with each other. We discuss the reason that Chinese characters in EBCDIC and ones in ASCII cannot convert to each other with CICS. Then we provides two solutions for this question :one solution is implemented on the combination of CICS program ,Java program and CICS configuration ,the other is achieved through the combination of Java program and CICS configuration.