2008 Volume 22 Issue 3 Published: 16 June 2008
  

  • Select all
    |
    Review
  • Review
    YUAN Yu-lin
    2008, 22(3): 3-15.
    Abstract ( ) PDF ( ) Knowledge map Save
    : This paper firstly introduces the architecture and defects of the mainstream semantic resources, such as WordNet, VerbNet, PropBank and FrameNet. Then, it demonstrates how to create a mapping relation of word sense and semantic frame between these different semantic resources, so as to realize to the linking, unification and mutual complement of these resources. Finally, it shows the new trend of the deep semantic representation and annotation which aims to automatic inference: from the argument structure of verb to the propositional structure of sentence, events and event relation between related verbs, sentences, and different parts of speech (e. g., verbs and the nouns denoting event), the anaphoric relation between deictic, pronoun, empty category and their antecedents denoting event.
  • Review
    JIN Peng, WU Yun-fang, YU Shi-wen
    2008, 22(3): 16-23.
    Abstract ( ) PDF ( ) Knowledge map Save
    : The bottleneck of word sense disambiguation (WSD) is lack of large scale, high-quality word sense annotated corpus. In this paper, several word sense annotated corpus are introduced in the aspects of corpus coverage, dictionary, tokens, word types and the inter annotator agreement, involving English, Chinese and Japanese. As for the auto and semi-auto construction methods, this papers focuses on bootstrapping methods and word-aligned parallel corpus based approaches. And finally, some issues in the word sense annotated corpus construction are pointed and possible solutions are given.
  • Review
    CHEN Yi, ZHOU Qiang, YU Hang
    2008, 22(3): 24-31,43.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Through an experimental analysis of the relation between the length and the structure of functional chunks and the performance of the parser, this paper reveals that it is the long, complex-structured functional chunks that are the major difficulty in parsing. Therefore, this paper proposes a new hierarchical functional chunk scheme and automatically generates the new functional chunk bank from Tsinghua Chinese Tree Bank (TCT). Further extensive researches about the length and structure distribution of the new chunk bank indicates that the new functional chunks bear short length and simple structure, which will help to improve the performance of functional chunk parser.
  • Review
    WANG Bao-xun, WANG Xiao-long, LIU Bing-quan, LI Peng
    2008, 22(3): 32-36, 114.
    Abstract ( ) PDF ( ) Knowledge map Save
    : This paper presents an unsupervised learning strategy to identify the variants of biomedical terms. The minimum edit distance algorithm and a character-matching algorithm are first applied to identify the morphological variants and the abbreviations as the candidate variants for a given term. The system similarity model is innovatively introduced to measure the semantic context for each candidate variant. This method requires no linguistic knowledge or labor-intensive corpora, and the experiment indicates its significant improvement in recall with a reasonable precision.
  • Review
    WANG Zhi-min
    2008, 22(3): 37-43.
    Abstract ( ) PDF ( ) Knowledge map Save
    : This paper explores the similar features of Chinese noun metaphors and presents a metaphorical inference approach by using metaphorical external similarities, dictionary information and the maximum entropy model. To detect Chinese noun metaphorical patterns of ‘n+n’, a dictionary with one semantic class per word is established by tailoring the Chinese Concept Dictionary (CCD) through the method of human-machine interaction. This proves the validity of the noun metaphor knowledge base and prepares for later identification experiments. The experiments indicate that the maximum entropy model, the metaphorical similarity and the dictionary all contributes to the success identification of noun metphors.
  • Review
    XI Bin, QIAN Long-hua, ZHOU Guo-dong, ZHU Qiao-ming, QIAN Pei-de
    2008, 22(3): 44-49,63.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Semantic relation extraction is one of the important fields in information extraction research. The present feature vector based approach for semantic relation extraction can hardly be improved simply by mining new features. This paper presents a novel method through combining the diverse basic lexical, syntactic and semantic features to form new combined features. The experiments show that these combined features positively improve the precision and recall of the SVM based relation extraction. The F-score of relation extraction for the 7 major types and 23 subtypes in ACE 2004 corpora achieves 66.6% and 59.50% respectively.
  • Review
    JI Duo,WANG Zhi-chao,CAI Dong-feng,ZHANG Gui-ping
    2008, 22(3): 50-55.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Agglomerate hierarchical clustering algorithm is distinguished for its superior performance in dividing the data set by continually merging similar clusters. The cluster distance computing method is the key issue affecting the performance of hierarchical clustering algorithm. This paper proposes a new method of calculating the clusters distance based on the Gaussian distribution. This method considers the factors in the cluster-itself to improve the calculation veracity, such as the cluster’s size and its data distribution., The experimental results on different text sets prove that the proposed method improves the performance of hierarchical clustering effectively.
  • Review
    LI Hong-mei, DING Zhen-guo,ZHOU Shui-sheng, ZHOU Li-hua
    2008, 22(3): 56-63.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Most search engines return ranked lists of document snippets, which makes the user difficult to find the relevant information. One method is that the snippets returned by the search engine are grouped into clusters, which may help the user quickly and efficiently navigate the results of a query at a topic level and locate the relevant information. This paper first introduces, several key requirements for Web research results clustering methods and the classification of the clustering methods. Then it probes into the major clustering algorithms and their improved method at present, and discusses the evaluation of clustering quality. Finally, this paper summarizes the future developments of clustering search engine results.
  • Review
    ZENG Yi-ling, XU Hong-bo, BAI Shuo
    2008, 22(3): 64-66, 122.
    Abstract ( ) PDF ( ) Knowledge map Save
    : The exponential explosion of Internet information complicates human access to those useful information. To detect the most important aspects of the vast information and manage the it accordingly, a key phrase extraction algorithm is proposed on the basis of a multi-level segmented word concatenation. Supported by a customized noise library and filtering strategies, the algorithm is capable of extracting key phrases in large amount of internet data. Further, a carefully designed clustering algorithms is applied so that the key phrases describing the same event are correctly grouped together. Experiment in real internet data proves the efficiency of our algorithms.
  • Review
    YAO Tian-fang, CHENG Xi-wen, XU Fei-yu, Hans USZKOREIT, WANG Rui
    2008, 22(3): 71-80.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Opinion Mining is a novel and important research topic, aiming to automatically acquire useful opinioned information and knowledge in subjective texts. This technique has wide and many real-world applications, such as e-commerce, business-intelligence, information monitoring, public-opinion poll, e-learning, newspaper and publication compilation, business management, etc. In this paper, we give a definition for opinion mining and then describe the motivation of this research. Afterwards, we present a survey on the state-of-the-art of opinion mining on top of four subtasks: topic extraction, holder identification, claim extraction and sentiment analysis, followed by an overview of several existing systems. In addition, specific analysis on Chinese Opinion Mining is performed. Finally, we provide the summarization of opinion mining research.
  • Review
    ZHANG Jin, WANG Xiao-lei, XU Hong-bo
    2008, 22(3): 81-88.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Evaluation has long been of interest to automatic summarization circle because of its effective promotion to the summarization progress. After a discussion of the background of summarization evaluation and the unsettled issues, this paper briefly introduces and comments on the current summarization evaluating methods. It further provides a detailed analysis of key technologies in the existing evaluation method. And finally, it presents some directions for future research.
  • Review
    WANG Hao-chang, ZHAO Tie-jun
    2008, 22(3): 89-98.
    Abstract ( ) PDF ( ) Knowledge map Save
    : 21st century is the era of biology and there are more than 6 hundred thousand academic papers published annually in this field. The challenge to researchers is how to automatically and effectively acquire relevant knowledge from huge size of biomedical literature. To address this issue, the biomedical text mining has become a new branch of bioinformatics and made great progress.. This survey introduces main approaches and relevant achievements in this research, including machine learning methods to named entity recognition, abbreviation and synonym recognition, relation extraction, as well as relevant resource constructions, international evaluations and academic gatherings..Some domestic researches are briefly described and, finally, prospective developments in the near future are anticipated.
  • Review
    LONG Yan-hua, GUO Wu,DAI Li-rong
    2008, 22(3): 99-104.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Support Vector Machine (SVM) has been widely used in text-independent speaker verification systems. However, there are lots of training data unbalance problems with this algorithm due to the insufficiency of the data from target speakers. These problems may introduce severe performance degradation in application. So, it will influence the entire system’s performance directly by choosing the right impostor. In this paper, we propose two strategies to select impostor’s samples in SVM training. Experiments show that the methods proposed in the paper can efficiently solve the above unbalance problems, and significantly improve the performance of the system compared with the traditional methods which are based on random data selection algorithms. By testing the methods in the NIST 2004 benchmark, we have significantly reduced the equal error ratio of the speaker verification system from 0.093 to 0.068.
  • Review
    Arzugul·xerip
    2008, 22(3): 105-109.
    Abstract ( ) PDF ( ) Knowledge map Save
    : This paper, adopts the complex feature theories and, makes a tentative research on the diversity of verbal suffixes in Uyghur language. The suffixes in Uyghur language can be classified into three types, namely derivational morphemes, inflectional morphemes and derivational-inflectional morphemes. These suffixes form different complicated characteristics in terms of classification, grammatical forms, aspect, tense, person, number and additive condition. Verbs possess dissimilar addition rules when they are affixed to root or stem. This paper focuses on verbal suffixes and their variant forms, additive condition, the complex feature as well as the representation of verbal suffixes. It further demonstrates the unification of feature structure with the example of direct statement verb in past tense.
  • Review
    Mayire Yibulayin, Mijiti Abulimiti, Askar Hamdulla
    2008, 22(3): 110-114.
    Abstract ( ) PDF ( ) Knowledge map Save
    : Error detection and ranking are important issues in language analyzing. This paper summarizes the common spelling errors according to the phonetic and lexical features of Uighur and discusses the corresponding solution. It also presents and implements a minimum edit distance based approach for Uighur spelling check and correction, integrating the Uighur morphological structure to improve accuracy and speed of the correction ranking. The method is already applied in the areas such as automatic Uygur proofreading and multi-lingual text retrieval. Experiment on the texts from University Journals published in Xinjiang reaches the accuracy of 85%.
  • Review
    LIN Min,SONG Rou
    2008, 22(3): 115-123.
    Abstract ( ) PDF ( ) Knowledge map Save
    : The main problem existing in current Chinese character glyph discriptions is the lack of a formal description for Chinese character glyphs which is computable and can cover all possible Chinese characters at the same time. This paper proposes a grid description approach for Chinese characters. Experiment result indicates that it can not only describe all possible Chinese character skeletons, including typos, but also provide great support for automatic extraction and computation using Chinese character glyph features with different particle size, such as strokes, radicals and structure relations. Therefore, this method establishs a reliable basis for a variety of applications based on computing of Chinese character glyph.
  • Review
    LIU Han-meng, RUI Jian-wu, BAI Zhen-long, WU Jian
    2008, 22(3): 124-128.
    Abstract ( ) PDF ( ) Knowledge map Save
    : The standard conformance testing of software products is an important method to measure the quality and the performance of these products. According to the software product usability principle, the meaning and the content of the standard conformance testing for Tibetan fonts are analyzed and defined in this article, based on Tibetan character set standards and the glyph standards. A solution to test standard conformance of Tibetan fonts is proposed; a corresponding algorithm is designed; and a program of auto-testing for Tibetan fonts is implemented. The experimental result shows that our solution is feasible and valid. It is also a way to test those fonts of other scripts.