2014 Volume 28 Issue 6 Published: 10 June 2014
  

  • Select all
    |
    Morphological Syntactic and Semantic Analysis/Application
  • Morphological Syntactic and Semantic Analysis/Application
    GUO Zhen, ZHANG Yujie, SU Chen, XU Jinan
    2014, 28(6): 1-8.
    Abstract ( ) PDF ( ) Knowledge map Save
    Recent work on joint word segmentation, POS tagging, and dependency parsing in Chinese has two key problems: one is that the word segmentation based on character and the dependency parsing based on word are not well-combined in the transition-based framework; the other is that the current joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the conventional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. For Chinese word segmentation, we design 4 transition actions: Shfit_S, Shift_B, Shift_M and Shift_E, through which the features used in previous researches can also be integrated into the model. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved the F1-scores of 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model in the three tasks by 0.92%, 1.77% and 3.95%, respectively. Especially, the F1 value of word segmentation and POS tagging achieved the best among the public results so far.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHANG Haibo, CAI Qiawu, JIANG Wenbin, LV Yajuan, LIU Qun
    2014, 28(6): 9-17.
    Abstract ( ) PDF ( ) Knowledge map Save
    In order to solve the problem of error propagation in traditional morphological analysis method with a pipline of the voice harmony restoration and the morphological segmentation, this paper presents a unified approach combining voice harmony restoration and morphological segmentation. It makes use of a kind of integrated label for both the voice harmony restoration and morphological segmentation. Experiments show that the proposed method can improve precision and alleviate the error propagation in traditional morphological analysis method.
  • Morphological Syntactic and Semantic Analysis/Application
    LI Guochen, DANG Shuaibing, WANG Ruibo, LI Jihong
    2014, 28(6): 18-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese base-chunk identification is an important task for automatically syntactic and semantic analysis. A widely-used strategy is to transform it into a word-level sequence labeling problem, and use models like CRFs to deal with it. Despite its best results in many open evaluations, practical application of such method is limited by accuracy of Chinese word segmentation systems and sparsity of Chinese word features. Therefore, this paper presents a base-chunk identification model based on deep neural network models, which take Chinese character as tagging unit and original input layer. Moreover, Chinese characters C&W distributed representation and word2vec distributed representation are derived through unsupervised learning models, and they are taken as initial input parameters of deep neural network to improve the training procedure. Experimental results show that the precision, recall and F-measure of our final identification model can achieved 80.74%, 73.80% and 77.12%, respectively, conditioned on a five-layer neural network with feature window of size [-3, 3] and word2vec distributed representation.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHAO Min,PENG Weiming, SONG Jihua, YANG Tianxin
    2014, 28(6): 26-33.
    Abstract ( ) PDF ( ) Knowledge map Save
    An efficient and convenient tagging tool plays a crucial role in the Treebank construction. As the results of ananalysis on the existing syntax tagging tools which are for the Sentence Pattern Structure, this paper designs a human-computer interaction graphical syntax tagging tool, with an integration of part of speech tagging and semantic tagging. This paper illustrates the new mode and functions of this tool in the Treebank building project from the perspective of practice.
  • Morphological Syntactic and Semantic Analysis/Application
    LIU Hongchao, ZHAN Weidong
    2014, 28(6): 34-40.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper introduces the development of Construction Database for Contemporary Chinese, which is a NLP-oriented language resource. Taking the construction “A+Yi(One)+X, B+Yi(One)+Y” as an example, the authors describe the framework of the ongoing project. The construction can be divided into different subcategories according to their meanings. Among these subcategories, six form-meaning pairs are discussed in detail on their components and interpretation of construction meaning.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHENG Lijuan, SHAO Yanqiu, YANG Erhong
    2014, 28(6): 41-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese is flexible in the word order, with different sentences for the same meaning. Projective phenomenon, based on the traditional dependency tree, cannot solve some Chinese phenomenon perfectly. This paper, demonstrates and analyses the existence of non-projective phenomenon in Chinese from a semantic dependency graph database with 10 thousand Chinese sentences. To give an induction and explanation in term of linguistics and deep understanding of semantics, the paper summarizes 7 situations in which the non-projective phenomenon exists, including sentences with sentences used as object, comparative sentences, sentences with sentences used as predicate, sentences with two or more events, pronouns, verb and complement phrases used as predicate and note phrases or sentences. This is of substantial significance to the automatic semantic dependency tagging.
  • Morphological Syntactic and Semantic Analysis/Application
    SHI Jiao,LI Ru,WANG Zhiqiang
    2014, 28(6): 48-55.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on the theory of frame semantics, Chinese core frame semantic analysis is to extract the core frame semantic representation to analyze the semantic content of the sentence. We solve this problem using a three-stage learning model. Taking the tasks different characteristics into consideration, we choose the choose Maximum Entropy model to take core target in the sentential contexts and predict frame for the core target, Conditional Random Field model to label the frame elements defined in Chinese FrameNet. Experimental results on the 10831 exemplified sentences show that the F score of core target identification and frame element identification reach 99.51% and 59.01% respectively, and the frame identification reaches 84.73% accuracy.
  • Morphological Syntactic and Semantic Analysis/Application
    WANG Zhen, CHANG Baobao, SUI Zhifang
    2014, 28(6): 56-61.
    Abstract ( ) PDF ( ) Knowledge map Save
    Semantic role labeling is an important task in Chinese natural language processing. Using feature based statistical machine learning to perform semantic role labeling is the mainstream method nowadays, denpeding heavily on manually designed features. This paper investigates semantic role labeling based on deep neural nets, which can learn features automatically. Experimental results show that our algorithm is promising. However, it cannot reach conventional machine learning methods with manually designed features yet.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHOU Junsheng, QU Weiguang, XU Juhong, LONG Yi, ZHU Yaobang
    2014, 28(6): 62-69.
    Abstract ( ) PDF ( ) Knowledge map Save
    Natural Language Interfaces (NLIs) to the Geographical Information Systems (GISs) have not received a lot of attention in computational linguistics, in spite of the potential values of such systems for users of GISs. This paper presents a pilot study of implementing Chinese NLIs to GISs based on semantic parsing. First, we design a formal meaning representation language (MRL) related to a specific GIS application and develop a corresponding corpus. Second, we translate the natural language questions into GIS queries in MRL using semantic parsing. In particular, we propose a semantic parsing approach based on a latent structural perceptron with hybrid tree. Our evaluation results on the developed corpus show that the proposed methods significantly outperform the baseline approaches, and more importantly, demonstrate that it is feasible to build such NLIs to GISs using semantic parsing.
  • Morphological Syntactic and Semantic Analysis/Application
    HUANG Peijie, HUANG Qiang, WU Xiupeng,
    WU Guisheng, GUO Qingwen, CHEN Nanting, CHEN Chuping
    2014, 28(6): 70-78.
    Abstract ( ) PDF ( ) Knowledge map Save
    To solve the problems caused by diversity and flexibility of Chinese language in question understanding, the paper adopts the strategy of “getting semantic knowledge based on grammar question type structure”, and proposes a question understanding method by combining grammar and semantics for Chinese spoken dialogue system. First, we set up a hand-crafted grammar bases working independent of the domain and application direction. Second, through sentence compression, utterances are simplified to the structure of a sentence. Then question type pattern recognition is applied to determining the only question type pattern for the utterance which corresponds to the proper semantic organization method, query strategy and response way. On the other hand, we extract the relevant semantic information from the source utterance according to domain knowledge base. Afterwords, the extracted semantic information is converted into well-organized semantic knowledge based on the corresponding question type pattern to complete the question understanding. The proposed method is implemented as a Chinese dialogue system for mobile phone shopping guide. Test results demonstrate the efficiency of our approach.
  • Morphological Syntactic and Semantic Analysis/Application
    ZHANG Yangsen,TANG Anjie, ZHANG Zewei
    2014, 28(6): 79-84.
    Abstract ( ) PDF ( ) Knowledge map Save
    Most of the errors in the political news are semantic errors. On the basis of analyzing expression characteristics of political errors in news field text, we summarize the political error types in the newspapers, and establish the corresponding knowledge bases for political error detection. According to the research on linguistic features of political news,a formal model of detecting political errors is presented. The strategy based on the combination of rules and Statistics is used to proofread semantic errors of the political news field. The results show a good application prospect of the method: with a recall rate of 65.5% and an accuracyof 80.5%.
  • Language Resources Construction
  • Language Resources Construction
    WANG Mengxiang,WANG Houfeng,LIU Yang,RAO Qi
    2014, 28(6): 85-94.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper employs the verbal semantic hierarchy as the core element to construct a four-level verbal classification system. According to the generative lexicon theory, the semantic frame theory and the construction grammar, we describe the attributes and combinatory features of verbs from event structure, semantic frame, qualia structure and syntactic format. The aim of this research is to construct a Chinese Verb Library(CVL)which can be used to make an adequate explanation and in-depth description of “V+V” constructions, “V+N” constructions, and other creative collocations。Experiment results show that our research can support the syntactic analysis, semantic role labeling, especially can reveal the implicit predicate relationship.
  • Language Resources Construction
    XIAO Guozheng,GAO Jinglian,SHUANG Wenting,
    JI Donghong,GUO Tingting,WU Hongmiao
    2014, 28(6): 95-100.
    Abstract ( ) PDF ( ) Knowledge map Save
    By means of the reconstruction of the linguistic term lexeme’s intension and extension, this paper advances and discusses the topic of renewable construction of language resources on the basis of natural and artificial parallel language resources. The purpose of the research is, through the renewable mode of “resource-construct-resource”, to facilitate the rapid development of multi-type, high coverage language resources and promote the theory on language application.
  • Discourse Analysis
  • Discourse Analysis
    DING Bin, KONG Fang, LI Sheng, ZHOU Guodong
    2014, 28(6): 101-106.
    Abstract ( ) PDF ( ) Knowledge map Save
    Discourse relations can be expressed explicitly or implicitly. This paper focuses on explicit discourse relations that are explicitly signaled by discourse connectives. We propose an explicit discourse relation parsing platform, containing connective identification and sense classification. Using 500 texts from the Chinese Discourse TreeBank corpus (CTB), we annotate an explicit discourse relations corpus. Considering headwords of connectives, we construct a connective identifier using maximum entropy based on this corpus, which reports F1 of 66.79%. And a sense classifier based on the context of connective itself is proposed and reports F1 of 91.92%.
  • Discourse Analysis
    SHANG Ying, SONG Rou, LU Dawei
    2014, 28(6): 107-113.
    Abstract ( ) PDF ( ) Knowledge map Save
    The Topic Sufficient Sentence(TSS) is defined on the basis of Generalized Topic Structure(GTS) theory, in which the sentence-formability of TSS is one of the important features. Based on the corpus of large scale and different language styles, this paper focuses on the sentence-formability of TSS. We find that a small amount of TSS cant form sentences.By analyzing and classifying those sentences, we present some methods to ensure the sentence-formability of TSS. This not only provides a way to make GTS theory perfect, but also improves the performance of application systems that use TSS.
  • Discourse Analysis
    REN Han, WAN Jing, WU Hongmiao, FENG Wenhe
    2014, 28(6): 114-119.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper introduces a co-training approach to recognizing textual entailment. In this approach, a small labeled entailment dataset as well as a large unlabeled one are employed for co-training, which aims at solving the lack of entailment data . Two different views, rewriting view and assessing view, are proposed to measure structural and non-structural entailment relations, likewise two classifiers, namely semantic tree kernel based classifier and statistical features based classifier, are applied to train under the two views respectively. For predication, a global classifier is built, trained by the results of co-training. Experiments show that the co-training based approach achieves a good performance in the case of a small training dataset.
  • Discourse Analysis
    GU Jinghang, ZHU Suyang, QIAN Longhua, ZHU Qiaoming
    2014, 28(6): 120-128.
    Abstract ( ) PDF ( ) Knowledge map Save
    Personal Family Network is an important component of social networks and it is of great importance of to extract personal family relationships auto matically. This paper proposes a novel method to extend the family relation extraction via Within-Document Coreference Resolution, improving the recall of family networks constructed. Meanwhile, a new evaluation metric is devised to evaluate the performance of personal family networks more reasonably. The experimental results on a large-scale corpus of Gigaword show that, our method can extract accurate family relations while increase the recall of family networks, thereby laying the foundation for social network analysis.
  • Sentiment Analysis and Social Computation
  • Sentiment Analysis and Social Computation
    YANG Liang, ZHANG Shaowu, LIN Hongfei, SONG Yanxue
    2014, 28(6): 129-136.
    Abstract ( ) PDF ( ) Knowledge map Save
    Word emotion disambiguation is vital to sentiment analysis. After discussing the differences between word emotion disambiguation and word sense disambiguation, we select the multi-emotional word automatically as well as manually. From the aspect of sentiment analysis, we propose a word emotion disambiguation method based on graph ranking which builds directed meaning graphs according to semantic relations, and iteratively selectes the most weighted sense of the given word as the right output. Results from MicroBlog corpus and emotional corpus, prove our method is superior than the eithor the method based on part of speech and emotional frequencies or the method based on Bayesian model.
  • Sentiment Analysis and Social Computation
    SONG Hongwei, HE Yu, FU Guohong
    2014, 28(6): 137-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    Subjectivity recognition plays an important role in many opinion mining systems such as sentiment classifiers and opinion summarization systems. In this paper, we present a sentiment density based fuzzy sets classifier for Chinese subjectivity classification. In this study, we first employ the odds ratio technique to extract subjective cues from training data. Then, we calculate sentiment density using the extracted subjective cues to represent sentence subjectivity. Finally, we implement a triangular fuzzy sets classifier with sentiment density as features for subjectivity classification. We conduct two experiments on the NTCIR-6 Chinese opinion data, showing the feasibility of the proposed method.
  • Sentiment Analysis and Social Computation
    LI Yancui, LIN Liyuan, ZHOU Guodong
    2014, 28(6): 143-149.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper investigates the application of supervised learning methods in multi-document opinion summarization. We use the corpus collected from Amazon, extract text features, PageRank feature, opinion features and reviews quality features, and, finally, generate the multi-document opinion summarization based on supervised learning method. Experimental results show that the ROUGE values are significantly improve by using supervised learning method than that unsupervised learning method. The opinion features and reviews quality features are helpful for summarization.
  • Sentiment Analysis and Social Computation
    WANG Jingjing, LI Shoushan, HUANG Lei
    2014, 28(6): 150-155.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper investigates the classification of users into male and female with the information provided by Chinese Microblog. Although some researchers have devoted their efforts on gender classification, there is still a lack of researches in Chinese gender classification. In this paper, firstly, a classification method using user names or messages (sent by the users) to recognize male and female is proposed. Different types of features (e.g., character and word features) are adopted into the classification; Secondly, on the basis of the two classifiers trained by user names and messages, Bayes rule is employed to combine the two classifiers so as to make the prediction with the knowledge from both the user names and messages. Experimental results demonstrate that the proposed approach yields a nice performance to gender classification, and the combination method outperforms the individual classifiers trained with only user names or messages.
  • Sentiment Analysis and Social Computation
    SUN Chengjie, LIN Lei, LIU Bingquan
    2014, 28(6): 156-161.
    Abstract ( ) PDF ( ) Knowledge map Save
    Dialogue act classification for online forum post can indicate the role of a post in a thread, which is helpful for reconstructing the conversation relation in a thread and improving the performance of forum retrieval. This paper proposes a weakly supervised learning method for online forum post dialogue act classification, which trests the posts dialogue act classification as sequential labeling problem for threads. The proposed approach can lean the model for dialogue act classification with feature constrains and unlabeled data. It achieved an accuracy of 75.6% and 60.7% in CNET data set and edX data set respectively, which are better that the performances of supervised CRF model.
  • Sentiment Analysis and Social Computation
    FAN Xi, XU Hongbo, LIANG Ying
    2014, 28(6): 162-168.
    Abstract ( ) PDF ( ) Knowledge map Save
    Real name registration suffers great difficulties in social network and it is a world-wide issue. Some users use multiple IDs (usually called “sock-puppet”) to publish disharmonious views in order to reach illegal attempt such as to start or spread a rumor. Its important to figure out a way to identify these users. In this paper, we propose to extract featuresfrom text data and social relation data, and train a novel vector-space-model based on the combination of different IDs to detectthe sock-puppet relation. In the experiment of the forum data, we achieved 93% of classify precision. The result verified the effectiveness of the proposed method.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    WU Qiong,HUANG Degen
    2014, 28(6): 169-174.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a generic algorithm for time expression recognition task by combining rules with statistics. By analyzing a set of linguistic features of time expressions such as lexical features and context information, Conditional Random Fields (CRF) is applied to recognize time unit rather than time expression so as to, avoid the boundary localization problems in Chinese time expressions. In addition, the candidate trigger words are automatically obtained from the test corpus, refining the trigger thesaurus by a designed score function. Finally, rules for the time expression boundary localization are formulated based on time trigger thesaurus and time affix word thesaurus. Our experimental results show that the F1 value reaches 98.31% in an open test.
  • Information Extraction and Text Mining
    LIU Huihui,WANG Suge,ZHAO Celi
    2014, 28(6): 175-182.
    Abstract ( ) PDF ( ) Knowledge map Save
    The identification of the default for comment object and coment attribute for opinion mining is important on multi objects, multi attributes review texts. This paper proposes a new method to deal with this issue. At first, the rule set of default item identification is constructed to obtain the candidate set of recognized default item. We treat the identification of the default item as a binary classification problem, and select the lexical and dependency parsing features. We employ the decision tree C4.5 algorithm to train classification model which was used to judge the recognized default item on the testing data. Experimental results show that the F-value of the classification of the dependency syntactic feature set is superior to the lexical feature set about 2%. Compared with the single feature, the accuracy and F-value of the integrating of two feature sets of lexical and dependency parsing increase up to 10% and 5%, respectively.
  • Information Extraction and Text Mining
    GUO Xiyue, HE Tingting , HU Xiaohua, CHEN Qianjun
    2014, 28(6): 183-189.
    Abstract ( ) PDF ( ) Knowledge map Save
    Identifying the relation features between named entities is the key aspect in named entity relation extraction. Traditional methods usually chose the lexical features and other surface features, which are well addressed already. This paper proposes a novel Chinese named entity relation extraction method, adding such syntactic and semantic features as dependency parsing, core predicate verb and semantic role labeling etc. Experimented by SVM over a true news text corpus, the results indicate that this method could improve the F1 value significantly.
  • Information Extraction and Text Mining
    LI Shengdong, LV Xueqiang, SHI Shuicai, SUN Jun
    2014, 28(6): 190-193.
    Abstract ( ) PDF ( ) Knowledge map Save
    According to the definition and characteristics of topic detection, the paper analyzes the advantages and disadvantages of the traditional incremental clustering algorithm and K-means algorithm, and proposes an adaptive incremental K-means algorithm for topic detection. Experimental results prove that the new algorithm improves the performance of topic detection.
  • Information Extraction and Text Mining
    HE Xiangqing, LIU Ying
    2014, 28(6): 194-200.
    Abstract ( ) PDF ( ) Knowledge map Save
    We selected literary proses written by Ziqing Zhu, Zengqi Wang and Liangcheng Liu as corpora. Text clustering is used to mine new stylistic features from the perspective of rhythm and tempo. The experimental results show that n-grams based on the vowels of the last character of the sentence, n-grams based on the length of clauses, punctuations and length of sentences, all can successfully distinguish from the articles of the three authors. Specifically, Liangcheng Liu preferred to utilize the vowels of higher tongue position. Ziqing Zhu focused on some specific rhymes, but the rhymes used by Liu and Wang are more plentiful than those of Zhu. Wang’s Clauses are the shortest, and he paid more attention to the order of sentence patterns. Long sentences and short sentences are alternatively used by Liu, and the tempos used by Liu are changeful. The sentence lengths used by Zhu are less changeful.
  • Information Extraction and Text Mining
    XIONG Jiao, WANG Mingwen, LI Maoxi, WAN Jianyi
    2014, 28(6): 201-207.
    Abstract ( ) PDF ( ) Knowledge map Save
    Graph model has been widely applied to document summarization by using sentence as the graph nodes, and the similarity between sentences as the weights of edge. However, the knowledge of terms and documents are neglected in this model. In this paper, we propose a tri-layer graph model based on the term, the sentence and the documentto make full use of knowledge when computing the similarity of sentences. The experimental results on the data sets of DUC2003 and DUC2004 show that the proposed model outperforms the state-of-the-art LexRank model and Document Sensitive Ranking model.
  • Information Extraction and Text Mining
    REN Bin,CHE Wanxiang,LIU Ting
    2014, 28(6): 208-215.
    Abstract ( ) PDF ( ) Knowledge map Save
    For social media text mining, the traditional lexicon method has the problem of lower accuracy and difficulty in lexicon acquisition. This paper proposes a dependency parsing-based text mining method to acquire information from social media text using matching rules. This method can work without lexicons and the experiment results prove a substantial increase in accuracy compared to the lexicon method. Using the dependency parsing-based method, we conducted an eating habits analysis on the Weibo text and achieve results on gender, region, time, including cross-analysis results, which are presented by word clouds.
  • Information Extraction and Text Mining
    LIU Lijia, GUO Jianyi, ZHOU Lanjiang, YU Zhengtao, SHAO Fa, ZHANG Jinpeng
    2014, 28(6): 216-222.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aimed at the problems of complex relation pattern and low relation extraction performance in the unstructured free text, this paper proposes an approach to extract the entity attribute relation from unstructured free text information by applying the LM optimization algorithm of BP neural network. The procedure consists ofthe corpus preprocessing, the named entity recognition (including the instance, attributes and attribute values) by CRFs model, the BP neural network construction over the domain features, and the application ofLM algorithm to extract corresponding relations. Compared to SVM, the artificial neural network optimization algorithm is more suitable for multi-classification problems with a higher recognition accuracy. Several groups of tests show that the method in this paper has achieved good effect in the field of entity attribute relation extraction with an improvments of 12.8% in term of F-score.