Content of 语言分析与生成 in our journal
  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Language Analysis and Generation
    XIAO Yonglei , LIU Shenghua , LIU Yue ,
    CHENG Xueqi , ZHAO Wenjing , REN Yan , WANG Yuping
    . 2014, 28(4): 21-28.
    The emergence of social media services is seeing a large amount of short text such as tweets and reviews are generated every day. Mining those data attracts more interests from both industry and academia. And such data has already become an important source of information for marketing, stock prediction, etc. However, mining short text is non-trival since of extremely sparse text and lack of context. Thus we propose to enrich short text content by automatically identifying concepts in open knowledge bases such as Wikipedia, which are semantically related to them. In our work, firstly, through linkable pruning, concept linking and disambiguation, important n-grams in tweet and their related Wikipedia concepts are linked. Secondly, NMF (non-negative matrix factorization) is used to factorize concept-document matrix to get concepts' semantic neighbors. And related concepts are then expended for tweets. Experiments on the collection of tweets from TREC 2011 and Wikipedia 2011 show that our approach gets effective results.
  • Language Analysis and Generation
    JIANG Zhipeng, GUAN Yi, DONG Xishuang
    . 2014, 28(4): 29-36.
    Baidu(2)
    Hierarchical parsing is a simple and rapid complete syntactic analysis method, which can be decomposed into three stages: POS tagging, chunking and parsing tree construction. In this paper, chunking is further divided into base chunking and complex chunking, and conditional random field model is adopted for sequence labeling instead of maximum entropy model. Considering error accumulation, which is a particularly serious problem in hierarchical parsing, this paper presents a simple and practical error predicting and collaborative correcting method, by tracking the predicted errors in this layer to the next layer and combines prediction scores of two layers to correct error collaboratively. The experimental results show that hierarchical parsing with error correction achieves almost the same analytic precision of the mainstream prediction Chinese parsers.
  • Language Analysis and Generation
    WEI Xue, YUAN Yulin
    . 2014, 28(3): 1-10.
    Baidu(1)
    This paper proposes a rule-based approach to interpret Chinese ‘N+N’ compounds automatically. The working procedures are: 1) Establishing the semantic class patterns for noun compounds according to the semantic classification in Semantic Knowledge-base of Contemporary Chinese. 2) Revealing the semantic relation between the nouns in N+N′ compounds by taking the Agentive Role or Telic Role of a certain noun as the paraphrasing verb. 3) Designing one interpretation template or more for every semantic class pattern, and building the database of N+N′ combination to record the semantic class patterns and the Paraphrasing Verbs. 4) Building the database of Noun_Verb, which contains the Agentive Role and/or Telic Role of each noun by using the HowNet. Based on these two databases, a mechanis is finally achieved to generate the interpretation of the Chinese noun compounds automatically.
  • Language Analysis and Generation
    XU Fan1, ZHU Qiaoming2, ZHOU Guodong2, WANG Mingwen1
    . 2014, 28(3): 11-21.
    This paper systematically explores the impact of cohesion theory in Discourse Coherence Modeling (DCM). Different from the state-of-the-art supervised entity-based and discourse relation-based grid models, our unsupervised model shows the importance of the theme-rheme structure, a cohesion theory of systemic-functional grammar, to DCM, and the appropriateness of theme and coreference based filtering mechanism to discourse consistency in DCM. Evaluation on three publicly available benchmark data sets via sentence ordering and summary coherence rating tasks shows the effectiveness of both theme-rheme structure and coreference resolution in DCM. It also shows that our system significantly outperforms the state-of-the-art ones.
  • Language Analysis and Generation
    JI Cui, LU Dawei, SONG Rou
    . 2014, 28(3): 22-27.
    Baidu(2)
    Chinese is a topic-prominent language. In Chinese discourse, a single topic can be discussed at length, but there can also be changes in topic. This paper focuses on a specific kind of topic change named new branching topic, in which. parts of the comment of original topic address a new topic, while the new topic and its comments cannot constitute into a sentence with the original topic. This paper discusses the capacity of verbs addressing an object as a New Branch Topic, classifying the verbs according to their semantic categories and listing the semantic distribution statistics of all the verbs with such function in Fortress Besieged.
  • Language Analysis and Generation
    CHEN Zhongshuai, LIU Yang, YU Xiaohui
    . 2014, 28(3): 28-35.
    Baidu(1)
    This paper analyses sentiment orientation of English sentences with modality. Sentences with modality are used widely in English, which comprise a significant proportion of typical reviews corpus. Due to the unique characteristics of modality, it is challenging for a general sentiment analysis system to handle these sentences. This paper identifies these sentences with the help of POS tagging and present a new modal feature that has been rarely discussed in previous studies. To further improve the accuracy, we develop a novel method which can effectively combine phrases sharing similar meanings of modality. The experimental results illustrate that the F-score of the proposed method increases by 4% and 7% than classic methods in the two-class and three-class sentiment orientation classifications, respectively.
  • Language Analysis and Generation
    SONG Yijun1,WANG Ruibo1,LI Jihong1, LI Guochen2
    . 2014, 28(3): 36-47.
    Baidu(2)
    Given a predicate word and its frame, semantic role labeling of Chinese FrameNet can be divided into two steps: the boundary identification of semantic roles and the classification of semantic roles. In this paper, these tasks are formalized onto the word sequential labeling problem through IOB2 strategy. We apply conditional random field model to automatic labeling experiment with word as the basic tagging unit. We extract 15 new base-chunk features by applying the base chunk parser of Tsinghua University to automatic parsing on sentences, and the features are formalized onto the word sequence. Experiments show that the F1-value of the total performance of semantic roles labeling increases by nearly 1% in comparison with the baseline, which is significant under 0.05 significance level of the t-test.
  • Language Analysis and Generation
    CHEN Xueli1, LI Ru1,2, WANG Sai1, WANG Zhiqiang1
    . 2014, 28(3): 48-54.
    Baidu(1)
    The low coverage of Chinese FrameNet leads to many unknown lexical units and restricts the frames semantic analysis for Chinese. In order to identify frames for unknown lexical units, this paper proposes two methods based on Tongyici CiLin: the Average Semantic Similarity method and Maximum Entropy (ME-based) method which both combine the static features and dynamic features. Experiments show that the two methods can effectively identify the frame of unknown lexical units: the accuracy of the similarity-based method is 78.61% considering Top-4 candidates; the Top-1 accuracy of the ME-based method for the same test set is 87.29% (and 75% for another news texts).
  • Language Analysis and Generation
    PENG Weiming, SONG Jihua, YU Shiwen
    . 2014, 28(2): 1-7.
    This paper compares the Sentence-based DiagramTreebank with existing lexical specification in the aspect of word segmentation unit and POStagging, revealing the disjunction between automatic lexical analysis and parsing in the current Chinese information processing.It describes the parsing strategy of some special structures such as nonce formation and idiomsin the Diagram Treebank as well as their linguistics rationale. It also explores the implementation of the Chinese word classtheories such as “For All Words,the Word-class Is Based on the Sentence” and “Referentiality” in Chinese information processing.
  • Language Analysis and Generation
    WANG Qian, LUO Senlin, HAN Lei, PAN Limin
    . 2014, 28(2): 8-16.
    Baidu(1)
    According to modern Chinese semantics, there are 4 semantic types (single, complex, compound and multiple). Attempted to capture the overall sentential semantic structures, sentential semantic type recognition is an important step to the whole sentential semantic structure parsing. This paper proposes a 4-semantic-types recognition method based on predicate and sentential semantic type chunk. This method firstly identifies some single semantic type sentences by the predicate number in each sentence. For the rest sentences, C4.5 algorithm is applied to get the maximum number of sentential-semantic-type chunk of predicates in sentential semantic structure, and then the sentential semantic type of each sentence is identified by combining the top sentence node in syntax structure. The experimental data contains 10221 sentences chosen from Beijing Forest Studio-Chinese Tag Corpus. The accuracy rate of sentential semantic type is up to 97.6% in open test.
  • Language Analysis and Generation
    CHE Tingting, HONG Yu, ZHOU Xiaopei, YAN Weirong, YAO Jianmin, ZHU Qiaoming
    . 2014, 28(2): 17-27.
    The functional connective is a word feature that directly expresses interior semantic relations, structure characteristics and the development trend of context of discourse units. Based on the functional connective, this paper puts forward a kind of methods for predicting relations of implicit discourse. First, this method mines functional connectives at the word and phraselevel, and divides the discourse relationcategory of functional connectives. Secondly, it buildsthe concept model for each type of functional connectives to describe argument attributes connected by functional connectives,and establishes the mapping system between argument concepts and discourse relations; Finally, the predictions of the implicit discoursesemantic relationis realized by statistical strategy to recognize conceptual model of argument and with “concept-relations” mapping system. The experimental results show that, the predicting method byconstructing concept model based on functional connectives, got the significant performance improvementscompared to the existing classification method based on supervised learning.
  • Language Analysis and Generation
    ZHANG Muyu, QIN Bing, LIU Ting
    . 2014, 28(2): 28-36.
    Discourse Relation is an important part of discourse semantic analysis. This paper analyses the differences between Chinese and English discourses, then presents the first Chinese discourse relation taxonomy based on the English discourse relation researches in details. Aiming at the rationality of the hierarchy, we conducts annotation experiments on Chinese internet news texts and analyses all difficulties happened during the data annotation together with the resolution to lay a foundation for the future discourse semantic analysis.
  • Language Analysis and Generation
    WU Zuoyan, WANG Yu
    . 2014, 28(2): 37-43.
    A new measure based on Hierarchical Network of Concepts(HNC) theory is put forward to compute the semantic similarityin natural language. Based on the coding rules and the map theory included in the concept expression form in the vocabulary relation level of HNC, the method integrates the concept of connotation, outward features, classification and combination of symbol to calculate semantic similarity. This method is compared with the current popular similarity methods based onHowNetaccording to the subjective judgment of human. Experiment showsthat the method has a good performance, which can distinguish the differences between different words more accurately.