Content of 词法·句法·语义分析及应用 in our journal
  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Morphological Syntactic and Semantic Analysis/Application
    GUO Zhen, ZHANG Yujie, SU Chen, XU Jinan
    . 2014, 28(6): 1-8.
    Recent work on joint word segmentation, POS tagging, and dependency parsing in Chinese has two key problems: one is that the word segmentation based on character and the dependency parsing based on word are not well-combined in the transition-based framework; the other is that the current joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the conventional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. For Chinese word segmentation, we design 4 transition actions: Shfit_S, Shift_B, Shift_M and Shift_E, through which the features used in previous researches can also be integrated into the model. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved the F1-scores of 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model in the three tasks by 0.92%, 1.77% and 3.95%, respectively. Especially, the F1 value of word segmentation and POS tagging achieved the best among the public results so far.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHANG Haibo, CAI Qiawu, JIANG Wenbin, LV Yajuan, LIU Qun
    . 2014, 28(6): 9-17.
    Baidu(2)
    In order to solve the problem of error propagation in traditional morphological analysis method with a pipline of the voice harmony restoration and the morphological segmentation, this paper presents a unified approach combining voice harmony restoration and morphological segmentation. It makes use of a kind of integrated label for both the voice harmony restoration and morphological segmentation. Experiments show that the proposed method can improve precision and alleviate the error propagation in traditional morphological analysis method.
  • Morphological Syntactic and Semantic Analysis/Application
    LI Guochen, DANG Shuaibing, WANG Ruibo, LI Jihong
    . 2014, 28(6): 18-25.
    Baidu(3)
    Chinese base-chunk identification is an important task for automatically syntactic and semantic analysis. A widely-used strategy is to transform it into a word-level sequence labeling problem, and use models like CRFs to deal with it. Despite its best results in many open evaluations, practical application of such method is limited by accuracy of Chinese word segmentation systems and sparsity of Chinese word features. Therefore, this paper presents a base-chunk identification model based on deep neural network models, which take Chinese character as tagging unit and original input layer. Moreover, Chinese characters C&W distributed representation and word2vec distributed representation are derived through unsupervised learning models, and they are taken as initial input parameters of deep neural network to improve the training procedure. Experimental results show that the precision, recall and F-measure of our final identification model can achieved 80.74%, 73.80% and 77.12%, respectively, conditioned on a five-layer neural network with feature window of size [-3, 3] and word2vec distributed representation.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHAO Min,PENG Weiming, SONG Jihua, YANG Tianxin
    . 2014, 28(6): 26-33.
    Baidu(5)
    An efficient and convenient tagging tool plays a crucial role in the Treebank construction. As the results of ananalysis on the existing syntax tagging tools which are for the Sentence Pattern Structure, this paper designs a human-computer interaction graphical syntax tagging tool, with an integration of part of speech tagging and semantic tagging. This paper illustrates the new mode and functions of this tool in the Treebank building project from the perspective of practice.
  • Morphological Syntactic and Semantic Analysis/Application
    LIU Hongchao, ZHAN Weidong
    . 2014, 28(6): 34-40.
    This paper introduces the development of Construction Database for Contemporary Chinese, which is a NLP-oriented language resource. Taking the construction “A+Yi(One)+X, B+Yi(One)+Y” as an example, the authors describe the framework of the ongoing project. The construction can be divided into different subcategories according to their meanings. Among these subcategories, six form-meaning pairs are discussed in detail on their components and interpretation of construction meaning.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHENG Lijuan, SHAO Yanqiu, YANG Erhong
    . 2014, 28(6): 41-47.
    Baidu(3)
    Chinese is flexible in the word order, with different sentences for the same meaning. Projective phenomenon, based on the traditional dependency tree, cannot solve some Chinese phenomenon perfectly. This paper, demonstrates and analyses the existence of non-projective phenomenon in Chinese from a semantic dependency graph database with 10 thousand Chinese sentences. To give an induction and explanation in term of linguistics and deep understanding of semantics, the paper summarizes 7 situations in which the non-projective phenomenon exists, including sentences with sentences used as object, comparative sentences, sentences with sentences used as predicate, sentences with two or more events, pronouns, verb and complement phrases used as predicate and note phrases or sentences. This is of substantial significance to the automatic semantic dependency tagging.
  • Morphological Syntactic and Semantic Analysis/Application
    SHI Jiao,LI Ru,WANG Zhiqiang
    . 2014, 28(6): 48-55.
    Baidu(3)
    Based on the theory of frame semantics, Chinese core frame semantic analysis is to extract the core frame semantic representation to analyze the semantic content of the sentence. We solve this problem using a three-stage learning model. Taking the tasks different characteristics into consideration, we choose the choose Maximum Entropy model to take core target in the sentential contexts and predict frame for the core target, Conditional Random Field model to label the frame elements defined in Chinese FrameNet. Experimental results on the 10831 exemplified sentences show that the F score of core target identification and frame element identification reach 99.51% and 59.01% respectively, and the frame identification reaches 84.73% accuracy.
  • Morphological Syntactic and Semantic Analysis/Application
    WANG Zhen, CHANG Baobao, SUI Zhifang
    . 2014, 28(6): 56-61.
    Semantic role labeling is an important task in Chinese natural language processing. Using feature based statistical machine learning to perform semantic role labeling is the mainstream method nowadays, denpeding heavily on manually designed features. This paper investigates semantic role labeling based on deep neural nets, which can learn features automatically. Experimental results show that our algorithm is promising. However, it cannot reach conventional machine learning methods with manually designed features yet.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHOU Junsheng, QU Weiguang, XU Juhong, LONG Yi, ZHU Yaobang
    . 2014, 28(6): 62-69.
    Baidu(1)
    Natural Language Interfaces (NLIs) to the Geographical Information Systems (GISs) have not received a lot of attention in computational linguistics, in spite of the potential values of such systems for users of GISs. This paper presents a pilot study of implementing Chinese NLIs to GISs based on semantic parsing. First, we design a formal meaning representation language (MRL) related to a specific GIS application and develop a corresponding corpus. Second, we translate the natural language questions into GIS queries in MRL using semantic parsing. In particular, we propose a semantic parsing approach based on a latent structural perceptron with hybrid tree. Our evaluation results on the developed corpus show that the proposed methods significantly outperform the baseline approaches, and more importantly, demonstrate that it is feasible to build such NLIs to GISs using semantic parsing.
  • Morphological Syntactic and Semantic Analysis/Application
    HUANG Peijie, HUANG Qiang, WU Xiupeng,
    WU Guisheng, GUO Qingwen, CHEN Nanting, CHEN Chuping
    . 2014, 28(6): 70-78.
    Baidu(5)
    To solve the problems caused by diversity and flexibility of Chinese language in question understanding, the paper adopts the strategy of “getting semantic knowledge based on grammar question type structure”, and proposes a question understanding method by combining grammar and semantics for Chinese spoken dialogue system. First, we set up a hand-crafted grammar bases working independent of the domain and application direction. Second, through sentence compression, utterances are simplified to the structure of a sentence. Then question type pattern recognition is applied to determining the only question type pattern for the utterance which corresponds to the proper semantic organization method, query strategy and response way. On the other hand, we extract the relevant semantic information from the source utterance according to domain knowledge base. Afterwords, the extracted semantic information is converted into well-organized semantic knowledge based on the corresponding question type pattern to complete the question understanding. The proposed method is implemented as a Chinese dialogue system for mobile phone shopping guide. Test results demonstrate the efficiency of our approach.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHANG Yangsen,TANG Anjie, ZHANG Zewei
    . 2014, 28(6): 79-84.
    Baidu(2)
    Most of the errors in the political news are semantic errors. On the basis of analyzing expression characteristics of political errors in news field text, we summarize the political error types in the newspapers, and establish the corresponding knowledge bases for political error detection. According to the research on linguistic features of political news,a formal model of detecting political errors is presented. The strategy based on the combination of rules and Statistics is used to proofread semantic errors of the political news field. The results show a good application prospect of the method: with a recall rate of 65.5% and an accuracyof 80.5%.
  • Morphological, Syntactic and Semantic Analysis/Application
    QIAN Yili , FENG Zhiru
    . 2014, 28(5): 32-38.
    This paper proposes a Chinese prosodic phrase prediction method is proposed based on CRF model over Chinese Chunk which reflects shallow syntactic information. The Chunk definition and its tagging algorithm is first described, and thenthe CRF is applied over the Chunk annotated corpus to predict prosodic phrase boundary. The experimental results show that, after labeling the structure of Chunk, the F-sore of the CRFs model for prosodic phrase identification increases nearly 10%.
  • Morphological, Syntactic and Semantic Analysis/Application
    SUN Jing, FANG Yan, DING Bin, ZHOU Guodong
    . 2014, 28(5): 39-45.
    This paper proposes a different way of lexical analysis, to analyze the internal structures of words, and presents a word structure analysis method by extending the word tag set. First, we describe the characteristics of the internal structures of words, By treating the prefixes and suffixes within words structures as special words, we identify the internal structures of words through the detection of prefixes and suffixes. We convert the issue of identifying the internal structures of words into the sequence tagging problem, adopting the CRF model to realize the words structures analysis using extending the word tag set. The experiment shows that they achieve higher accuracy both on overall performance and on the identification of each layer structure.
  • Morphological, Syntactic and Semantic Analysis/Application
    LIU Dongming, YANG Erhong
    . 2014, 28(5): 46-50.
    Word, as the smallest semantic unit, has complex relationship with text domains. Especially, it is often difficult to define the exact domain for the commonly used words. In fact, it is not always necessary to establish clear relationship between the word and the domain for real applications. Instead, we can achieve satisfactory results by quantifying the domain property of the words. In this paper, we propose an unsupervised method for quantifying the domain property of words, based on word association information in the large-scale corpus. We valide the proposed value of words domain property by comparing against the classical TF * IDF measure in the topic detection application.
  • Morphological, Syntactic and Semantic Analysis/Application
    YU Dong, XUN Endong
    . 2014, 28(5): 51-59.
    Baidu(3)
    This paper introduces a knowledge based unsupervised method for acronym term disambiguation. Word embedding is used for acronym term semantic representation. In the first stage of disambiguation, significantly similar documents are clustered and used as training data. Each cluster corresponds to an interpretation of an acronym term, so it can be seen as a semantic tag. Then the word embedding is trained for several times and semantic relation between two words can be calculated by average of cosine similarity of their vectors. In the second stage, the paper proposes to use feature word expansion and linear weighted semantic similarity to improve system performance. By calculating semantic similarities between documents and interpretations, implicit semantics can be mined as new feature words; and the feature words are linearly weighted by their semantic similarities with specific interpretation. Experimental results on 25 acronym terms show that, feature word expansion improves system F score by 4% and semantic weight gains higher performance by 2%, which yielding a final system F score of 89.40%.
  • Morphological, Syntactic and Semantic Analysis/Application
    WANG Meng,YU Shiwen
    . 2014, 28(5): 60-65.
    Concept acquisition from corpora has become increasingly important in NLP. This paper presents a new concept representation based on classifier words. Concepts are modeled as vectors with one component corresponding to each classifier word. We propose a weighting scheme that assigns each classifier word a weight in a concept. Then we conduct experiments to identify concept similarities via clustering, and the results show classifier words can categorize most concept classes.
  • Morphological, Syntactic and Semantic Analysis/Application
    JIA Yuxiang, WANG Haoshi, ZAN Hongying, YU Shiwen, WANG Zhimin
    . 2014, 28(5): 66-73.
    Selectional preference describes the semantic preference of the predicate for its arguments. It is an important lexical knowledge which can be applied to syntactic and semantic analysis of natural languages. This paper studies the automatic acquisition of Chinese selectional preferences and proposes a HowNet based method and a LDA (Latent Dirichlet Allocation) based method. A comparative study shows that the former method acquires better understood knowledge while the latter achieves better performance in application. The two methods are complementary and mayoe combineal in process.