2014 Volume 28 Issue 3 Published: 10 March 2014
  

  • Select all
    |
    Language Analysis and Generation
  • Language Analysis and Generation
    WEI Xue, YUAN Yulin
    2014, 28(3): 1-10.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a rule-based approach to interpret Chinese ‘N+N’ compounds automatically. The working procedures are: 1) Establishing the semantic class patterns for noun compounds according to the semantic classification in Semantic Knowledge-base of Contemporary Chinese. 2) Revealing the semantic relation between the nouns in N+N′ compounds by taking the Agentive Role or Telic Role of a certain noun as the paraphrasing verb. 3) Designing one interpretation template or more for every semantic class pattern, and building the database of N+N′ combination to record the semantic class patterns and the Paraphrasing Verbs. 4) Building the database of Noun_Verb, which contains the Agentive Role and/or Telic Role of each noun by using the HowNet. Based on these two databases, a mechanis is finally achieved to generate the interpretation of the Chinese noun compounds automatically.
  • Language Analysis and Generation
    XU Fan1, ZHU Qiaoming2, ZHOU Guodong2, WANG Mingwen1
    2014, 28(3): 11-21.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper systematically explores the impact of cohesion theory in Discourse Coherence Modeling (DCM). Different from the state-of-the-art supervised entity-based and discourse relation-based grid models, our unsupervised model shows the importance of the theme-rheme structure, a cohesion theory of systemic-functional grammar, to DCM, and the appropriateness of theme and coreference based filtering mechanism to discourse consistency in DCM. Evaluation on three publicly available benchmark data sets via sentence ordering and summary coherence rating tasks shows the effectiveness of both theme-rheme structure and coreference resolution in DCM. It also shows that our system significantly outperforms the state-of-the-art ones.
  • Language Analysis and Generation
    JI Cui, LU Dawei, SONG Rou
    2014, 28(3): 22-27.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese is a topic-prominent language. In Chinese discourse, a single topic can be discussed at length, but there can also be changes in topic. This paper focuses on a specific kind of topic change named new branching topic, in which. parts of the comment of original topic address a new topic, while the new topic and its comments cannot constitute into a sentence with the original topic. This paper discusses the capacity of verbs addressing an object as a New Branch Topic, classifying the verbs according to their semantic categories and listing the semantic distribution statistics of all the verbs with such function in Fortress Besieged.
  • Language Analysis and Generation
    CHEN Zhongshuai, LIU Yang, YU Xiaohui
    2014, 28(3): 28-35.
    This paper analyses sentiment orientation of English sentences with modality. Sentences with modality are used widely in English, which comprise a significant proportion of typical reviews corpus. Due to the unique characteristics of modality, it is challenging for a general sentiment analysis system to handle these sentences. This paper identifies these sentences with the help of POS tagging and present a new modal feature that has been rarely discussed in previous studies. To further improve the accuracy, we develop a novel method which can effectively combine phrases sharing similar meanings of modality. The experimental results illustrate that the F-score of the proposed method increases by 4% and 7% than classic methods in the two-class and three-class sentiment orientation classifications, respectively.
  • Language Analysis and Generation
    SONG Yijun1,WANG Ruibo1,LI Jihong1, LI Guochen2
    2014, 28(3): 36-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    Given a predicate word and its frame, semantic role labeling of Chinese FrameNet can be divided into two steps: the boundary identification of semantic roles and the classification of semantic roles. In this paper, these tasks are formalized onto the word sequential labeling problem through IOB2 strategy. We apply conditional random field model to automatic labeling experiment with word as the basic tagging unit. We extract 15 new base-chunk features by applying the base chunk parser of Tsinghua University to automatic parsing on sentences, and the features are formalized onto the word sequence. Experiments show that the F1-value of the total performance of semantic roles labeling increases by nearly 1% in comparison with the baseline, which is significant under 0.05 significance level of the t-test.
  • Language Analysis and Generation
    CHEN Xueli1, LI Ru1,2, WANG Sai1, WANG Zhiqiang1
    2014, 28(3): 48-54.
    Abstract ( ) PDF ( ) Knowledge map Save
    The low coverage of Chinese FrameNet leads to many unknown lexical units and restricts the frames semantic analysis for Chinese. In order to identify frames for unknown lexical units, this paper proposes two methods based on Tongyici CiLin: the Average Semantic Similarity method and Maximum Entropy (ME-based) method which both combine the static features and dynamic features. Experiments show that the two methods can effectively identify the frame of unknown lexical units: the accuracy of the similarity-based method is 78.61% considering Top-4 candidates; the Top-1 accuracy of the ME-based method for the same test set is 87.29% (and 75% for another news texts).
  • Information Retrieval and Social Computing
  • Information Retrieval and Social Computing
    WANG Xiaoming, WANG Li, YANG Jingzong
    2014, 28(3): 55-61.
    Abstract ( ) PDF ( ) Knowledge map Save
    Microblog is widely used nowadays. While its users interaction structure is complex, a novel method is proposed in this paper to analyze the property of microblog information diffusion network. We first give the definition of the information source. Then information diffusion networks for six different topic events are visualized and analyzed. Information diffusion network is modeled as a directed acyclic graph, and three motif structures are defined to present information scattering, information gathering and information transmitting, respectively. According to the Spearman rank correlation coefficient, the distributions of the three motif structures are quite different from each other. As for the information diffusion network evolution, it is dount that the information scattering structure has the largest number at each snapshot.
  • Information Retrieval and Social Computing
    LI Heyuan 1,2, YU Xiaoming 1, LIU Yue 1, CHENG Xueqi 1, CHENG Gong3
    2014, 28(3): 62-67.
    Abstract ( ) PDF ( ) Knowledge map Save
    Micro-blogs changes the way people obtain information. However, Micro-blogs has been infiltrated by large amount of spam, which is a challenge to normal user. In this paper, we research on spam in Chinese Micro-blogs. We study the behavior of spam user and propose 7 new features for detecting them. Then, we describe how to apply features into detecting spammer via a SVM classifier. The experiment results indicate that the accuracy and recall of the proposed method is satisfactory.
  • Information Retrieval and Social Computing
    WAN Shengxian1,2, GUO Jiafeng 1, LAN Yanyan 1, CHENG Xueqi1
    2014, 28(3): 68-74.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tweet popularity prediction in social network is very important for applications such as information recommendation and viral marketing. This paper proposes a new approach for tweet popularity prediction based on propagation simulation. The maximum entropy model is firstly used to learn the probabilities of users retweeting behaviors, and then the independent cascade model is used to simulate the diffusion processes of tweets in real social network. This approach benefits from using more information of social network structure and users. Experiments on Twitter dataset show that our approach is better in both precision and stability compared to baselines.
  • Information Retrieval and Social Computing
    HUO Shuai, ZHANG Min, LIU Yiqun, MA Shaoping, JIN Yijiang, RU Liyun
    2014, 28(3): 75-80.
    Abstract ( ) PDF ( ) Knowledge map Save
    Search engines are committed to helping people find target information accurately and quickly, hence the evaluation of search performance becomes more vital, This paper deals with the rare queries performance evaluation which is less touched. First, three types of features are extracted after analyses of rare queries characteristics. Second, correlation of the features is analyzed and different combinations of features are tested. Then, two data balancing approaches are raised to alleviate the serious imbalance of the data set. Finally the evaluation method for rare queries is put forward and then improved. The experimental results show that the proposed evaluation approach is effective, by which the identification of non-relevant results achieves encouraging precision.
  • Machine Translation
  • Machine Translation
    LI Liangyou, GONG Zhengxian, ZHOU Guodong
    2014, 28(3): 81-91.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the development of machine translation, the automatic evaluation methods have been paid more and more attention. Since so many related methods and technologies have been proposed, it is a big challenge to organize and describe them with a scientific classification. This paper focuses on three types of methods, i.e. Checkpoint-based methods, String-matching methods and Machine Learning based method. This paper enumerates several representative approaches for each type of method, describing the principle of metrics and analyzing advantages and shortcomings of them. In addition, the sub-branch of evaluation with limited references is also introduced as a special catalog, which plays an important role in increasing the degree of automation as well as boosting the performance. Furthermore, some famous evaluation metric campaigns are introduced. Finally, we show the trend of current researches on automatic evaluation and point out some relevant problems for future study.
  • Minority Language Information Processing
  • Minority Language Information Processing
    ZHU Jie1,2, LI Tianrui1, LIU Shengjiu1
    2014, 28(3): 92-98.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an fundamental issue of text processing, spelling check is implemented in a wide range of fields, such as word processing, character recognition, voice recognition, search engine. According to the word formation rule of the Tibetan voice features, the paper proposes an algorithm for spelling check of Tibetan syllable via a simplified model of Tibetan syllable rules. Results of two experiments verify the effectiveness of the algorithm. Without considering the special case of Tibetan syllables, the accuracy of spelling errors check rate reaches 99.8%.
  • Minority Language Information Processing
    TashiGyal1,DuoLa2
    2014, 28(3): 99-103.
    Abstract ( ) PDF ( ) Knowledge map Save
    According to actual need of Tibetan natural language processing, , this paper adopts the complex feature set and function unification for formal description of Tibetan sentence. In light of the modern linguistic theory, this paper explores the frame representation for function unification of the Tibetan word, syntax, semantic rules.
  • Minority Language Information Processing
    Bianba wangdui, Zhuoga, CHEN Yanli, WU Qiang
    2014, 28(3): 104-111.
    Abstract ( ) PDF ( ) Knowledge map Save
    To implement Tibetan sorting algorithm, the recognition of construction elements which compose Tibetan syllable must be solved, on which the sorting can be accomplished according to the priority. Through the study on the Tibetan morpheme structure, spelling law and grammar rules, a novel algorithm is designed for modern Tibetan construction elements recognition. Ambiguity, double vowel and abbreviation of Tibetan special syllable is considered in the algorithm. In addition, to guarantee right recognition in Tibetan Standard of China, corresponding processing is adopted in the algorithm. The test shows that the algorithm can meet the actual demands of the recognition of Tibetan construction elements.
  • Minority Language Information Processing
    Riyiman Tursun, Wushour Silamu
    2014, 28(3): 112-115.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper presents a system for Uyghur Online-Handwritten word recognition. According to the characteristics of the Uyghur word handwriting, the system adoptes a strategy based on multiple classifier combination, using Gaussian Mixture Model forthe static image and Hidden Markov Model for the dynamic writing trajectory of the handwritten word, respectively.The combination of multiple classifiers improves the recognition accuracy effectively. In the preliminary experiments, our system achieves an accuracy of 97% and 99%, respectively.
  • Speech Recognition and Analysis
  • Speech Recognition and Analysis
    ZHANG Lianhai, CHEN Bin, QU Dan, LI Bicheng
    2014, 28(3): 116-122.
    Abstract ( ) PDF ( ) Knowledge map Save
    In order to solve the issue of unreliable burst spectrum feature, a Chinese stop detection method based on energy change rate characteristic is proposed. The energy change rate features are first acquired from the Seneff's auditory spectrum, and then transformed by Fisherface approach. Finally the KNN classifier is implemented to realize stop detection. Tested by leave-one-out cross validation, the results indicate a good performance of high stability and generalization: the accuracy is 96.39% for clean speech and 88.07% for noisy speech with the SNR of 10dB.
  • Speech Recognition and Analysis
    ZHOU Xuewen, HU He
    2014, 28(3): 123-128.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper presents an Automatic Labeling/Retrieving system for acoustic parameters. By using the system, phnetic analysts may dramaticlly deduce errors in labeling and retrieving acoustic parameters, improve working efficiency, ensure repeatbility and verifibility of phonetic data and promote standarization in establishing acoustic parameter databases.
  • Speech Recognition and Analysis
    WU Qian, WANG Bei
    2014, 28(3): 129-135.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper investigats the effects of topic transition type and sentence length on pause, final lengthening and pitch reset at prosodic phrase boundaries between two clauses. The discourses contained two sentences each. The second sentence is manipulated to control length (long vs short) and topic transition type(continuation, elaboration or shift).The results from twenty native speakers show that: 1) Both topic transition and sentence length have significant effects on pause duration and pitch reset, but not on pre-boundary lengthening, with no interaction between them. More specifically, longer pause and larger pitch reset occurre when the second sentence is long. Pause duration and pitch reset are increased to a larger degree in the condition of topic shift than topic elaboration and continuation. 2) A weak negative correlation is found between pause duration and pre-boundary lengthening. And, there is a weak positive correlation between pause duration and pitch reset. (3) Compared with male speakers, female speakers use both pitch and duration variation to mark topic transition type in a more systematic way. The above results suggest that the effect of sentence length on acoustic cues at intonational phrase boundaries is probably articulatory, whereas that of topic transition type is communicative.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    ZHOU Hongzhao, HOU Mingwu, HOU Min, TENG Yonglin
    2014, 28(3): 136-141.
    Abstract ( ) PDF ( ) Knowledge map Save
    Comparison is a common expression to assess which is better or whether they are identical (or similar) in some aspects among several things. How to identify comparative sentences and extract the elements being compared automatically is a novel and practical research in the sentiment analysis field. Based on the interdependent relationship between comparative sentences and comparative elements, we propose a method to accomplish the two identification tasks simultaneously. According to the semantic classification of words and comparative sentences, we construct the lexicon system consisting of a domain lexicon, a sentiment lexicon, a mark lexicon and a common lexicon, and them build a rule base of comparative sentences identification and comparative elements extraction. On the testing corpus published by The Fourth Chinese Opinion Analysis Evaluation (COAE2012), the experiments demonstrate a promising .e. evaluation) result by the proposed method.
  • Information Extraction and Text Mining
    FANG Ying1,2,HUANG Heyan1, XIN Xin1, WEI Xiaochi1, ZHUANG Kun1
    2014, 28(3): 142-149.
    Abstract ( ) PDF ( ) Knowledge map Save
    Topic evolution for the topic changing trend analysisis of significance in both application and research. On the basis of LDA (Latent Dirichlet Allocation) model, ILDA (Infinite Latent Dirichlet Allocation) model is enhanced with a Dirichlet process. The ILDA model can not only acquire the latent variable, but also update the super-parameters and change the topic number dynamically. In the existing topic evolution systems, the topic number is pre-defined without permission to change. The method based on ILDA model aims to resolve this by enabling the following: different topics for classification in each cycle, topic association between adjacent cycles and the sub topic strength calculation in time sequence. The experiments show that the variable updating of the parameters meet the actual demand, resulting a satisfactory process of topic evolution analysis.