2015 Volume 29 Issue 6 Published: 15 December 2015
  

  • Select all
    |
    Review
  • Review
    YU Jiangde,HU Shunyi,YU Zhengtao
    2015, 29(6): 1-7.
    Abstract ( ) PDF ( ) Knowledge map Save
    To integrate multi-information without error accumulation in the pipeline approach, a unified character-based tagging approach is proposed for Chinese lexical analysis, including word segmentation, part-of-speech tagging and named entity recognition. Treating Chinese lexical analysis as a character sequence tagging problem, each character tagging could be integrated with three kinds of information that is word-position, part-of-speech and named entity. After the tagging process, the maximum entropy model is applied to complete the three subtasks. The closed evaluation is performed on PKU corpus from Bakeoff2007, and the results show a F-score of 96.4% on word segmentation, 95.3% on POS tagging and 90.3% on named entity recognition.
    Key words Chinese lexical analysis; maximum entropy model; trinity; character-based tagging
       
       
       
  • Review
    SANG Leyuan, HUANG Degen
    2015, 29(6): 8-12.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a new approach integrating simple noun phrase information into preposition phrase recognition. We recognize simple noun phrases through basic CRF model, and filter the phrases with conversion rules in order to adapt to the inner phrase patterns in the preposition phrases. Then we utilize the simple noun phrases to merge fragmental participles into a complete phrase in our corpus. Finally, we recognize the preposition phrases through multilayer CRFs, and use rules to correct the result. The optimized model performs 1.03 point higher than the current best model yielding 93.02% precision , 92.95% recall, and 92.99%, F-measure.
    Key words simple noun phrase recognition;CRF;participle fusion
       
       
       
  • Review
    FENG Wenhe
    2015, 29(6): 13-22.
    Abstract ( ) PDF ( ) Knowledge map Save
    Complex sentence relationship analysis is usually based on classification. Due to lack of a unified logic, it is faced with a lot of divergences. This paper proposes a feature structure as to describe the complex sentence relationship. The feature structure of complex sentence relationship is a tuple of [Feature Value]. This paper presents a preliminary set of feature structure for Chinese complex sentence, and demonstrates them in some specific applications. Compared with classification mechanism, feature structure analysis is reflective, and its determination is accurate and easy, which is promising in resource construction and computation research for deep semantic analysis of complex sentence.
    Key words complex sentence relationship; feature structure; semantic analysis
       
       
       
  • Review
    TANG Gongbo,YU Dong,XUN Endong
    2015, 29(6): 23-29.
    Abstract ( ) PDF ( ) Knowledge map Save
    Word sense disambiguation (WSD) is a classical issues in nature language processing. In this paper, we trained a language model with the sememe information in HowNet that can represent word semantic, so as to learn the semantic features of words automatically and improve the efficiency of feature learning. Then, we represent words by vectors of sememes. Meanwhile, the contexts of the polysemes is used as features. And then we disambiguate the polysemant by computing the vectors’ cosine similarity between polysemes and feature. We choose SENSEVAL-3 as test set, and achieve 37.7% in precision, which is better than other unsupervised method in the same test data.
    Key words word embedding; HowNet; WSD; unsupervised methods
       
       
       
  • Review
    ZHENG Lijuan, SHAO Yanqiu
    2015, 29(6): 30-37.
    Abstract ( ) PDF ( ) Knowledge map Save
    Semantic analysis of sentences is essential to language study, and it is also the major bottleneck restricting the large-scale application of language information technology at present. Based on the study of deep semantics analysis methods, we propose a new semantic analysis method-Semantic Dependency Graph, and construct a corpus consisting of 30,000 sentences. Furthermore, we make a study on the semantic sentence patterns of pure pivotal sentences in the corpus, and try to construct the system of semantic sentence patterns based on semantic dependency graph. We also summarize corresponding relations between sentence patterns and semantic sentence patterns to provide the automatic semantic parsing system with a corresponding knowledge base.
    Key words semantic sentence patterns; semantic analysis; semantic dependency graph; pivotal sentence
       
       
       
  • Review
    ZHANG Nianxin, SONG Zuoyan
    2015, 29(6): 38-45.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper analyzes the qualia modification relationships of disyllabic adjective-noun compounds in Mandarin both quantitatively and qualitatively. It reveals that an adjective morpheme selectively constraints the qualia roles of the noun morpheme. Generally, when the adjective morpheme modifies the formal role or the constitutive role of the noun morpheme, a noun needs to be added in the process of meaning construction. When the adjective morpheme modifies the agentive role, the telic role or the conventionalized attribute, a verb needs to be added. Furthermore, both qualia structure and conceptual blending contribute to the meaning construction of adjective-noun compounds. If an adjective morpheme activates more than one qualia roles or qualia values, there will be polysemy or ambiguity.
    Key words adjective-noun compound; meaning construction; generative lexicon theory; qualia structure; conceptual blending theory
       
       
       
  • Review
    ZHAO Yiyi,LIU Haitao
    2015, 29(6): 46-53.
    Abstract ( ) PDF ( ) Knowledge map Save
    Networks technology provides a new perspective for linguistics in the age of big data. Network method applied in language networks is to explore the structure of the law and the evolution of language network functions. This article reviews the development of complex network based on Graph Theory and the primary mathematical modeling of social networks, language networks, aiming to strip personality traits of language networks out from the characteristics of complex networks, and giving more references for multi-level language networks studies.
    Key words language networks; network technology; network evolution; complex network characteristics; graph theory
       
       
       
  • Review
    TAN Xiaoping, YANG Lijiao, SU Jingjie
    2015, 29(6): 54-61.
    Abstract ( ) PDF ( ) Knowledge map Save
    Grammar is the key and difficult issue in TCSL. However, the knowledge-base and corpus for grammar teaching in TCSL are few, which cannot meet the demands of the development of TCSL. This paper proposes the grammar description framework for TCSL basing on the three plane theory and the teaching grammar theory. It completes a grammar knowledge-base with 121 grammar points. Then, this paper annotates the grammar points in 95 592 sentences, covering 580 basic forms and 233 semantic categories. Finally, this paper discusses the application of the knowledge base and corpus in TCSL.
    Key words grammar points; knowledge base; annotation; corpus; TCSL
       
       
       
  • Review
    HU Renfen, ZHU Qi, YANG Lijiao
    2015, 29(6): 62-68.
    Abstract ( ) PDF ( ) Knowledge map Save
    In the area of teaching Chinese as a second language, each text in the textbook has a specific topic. Topic represents the core contents of each lesson, and has close relationship with other linguistic knowledge such as vocabulary and syntax. This paper introduces a hierarchical topic bank with 4 level-1 topics, 23 level-2 topics and 246 level-3 topics. The authors manually labeled 5 457 texts from 197 classical Chinese textbooks based on the topic bank and built a topic corpus including over 120 million sentences. In order to offer comprehensive reference on topic information, syntactic constructions and HSK word level information are also extracted as supplement knowledge for topic labeling.
    Key words Chinese as a second language; topic; corpus
       
       
       
  • Review
    LI Fajie,YU Zhengtao,GUO Jianyi,LI Ying,ZHOU Lanjiang
    2015, 29(6): 69-74.
    Abstract ( ) PDF ( ) Knowledge map Save
    To leverage the rich and mature Chinese corpus for Vietnamese dependency treebank, this paper presents an approach to Vietnamese Dependency Treebank construction via Chinese-Vietnamese bilingual corpus with word alignments. Based on the word alignment information, the Chinese dependency parsing is mapped into Vietnamese Dependency structure. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, also can save manpower and time building the Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.
    Key words vietnamese dependency treebank;chinese dependency parsing;word alignment
       
       
       
  • Review
    SONG Jiaying, HE Yu, FU Guohong
    2015, 29(6): 75-82.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper we incorporate opinion element normalization with the PolarityRank algorithm and thus propose a semi-supervised approach to Chinese domain-specific sentiment lexicon expansion. We first extract a set of attribution-evaluation pairs from product reviews. In order to reduce the complexity and noises in sentiment lexicon expansion, we exploit Jaccard coefficient and rules to normalize the extracted product attributions and their relevant evaluations, respectively. Finally, we modify the PolarityRank algorithm to automatically recognize domain-specific dynamic polar words that are out of the original sentiment lexicon. Experimental results over product reviews in car and mobile-phone domains show that using the expanded domain-specific dynamic polar words helps improve polarity classification performance.
    Key words sentiment analysis; sentiment lexicon expansion; polarityRank; opinion element normalization
       
       
       
  • Review
    ZHOU Huiwei,YANG Huan,ZHANG Jing,KANG Shiyong,HUANG Degen
    2015, 29(6): 83-89.
    Abstract ( ) PDF ( ) Knowledge map Save
    Hedge is usually used to express uncertainty and possibility. When authors cannot back up their statements, they usually use hedge to express uncertain information. To avoid extracting uncertain statements as factual information, uncertain information should be distinguished from factual information. However, inadequate Chinese hedge corpus limited the research of Chinese hedge. This paper discusses the categorization of Chinese hedge, introduces the design and construction of a 24,000-sentence Chinese hedge corpus in the biomedical and Wikipedia domains. We calculate agreement rates for the corpus and reveal the domain and genre dependency of hedges. The construction of the corpus is of great significance in the research of Chinese hedge detection and Chinese information extraction. Meanwhile, the resource provides a great support for linguists to study the semantic hedge and the pragmatic hedge.
    Key words Chinese hedge; categorization; corpus; agreement analysis
       
       
       
  • Review
    GAO Shengxiang, YU Zhengtao, LONG Wenxu, DING Wei, YAN Chunting
    2015, 29(6): 90-97.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aiming at Chinese-Vietnamese bilingual news event storyline analysis, a generative model for event storyline is proposed based on global/local word pairs’ co-occurrence distribution. Firstly, the detected news topic word distribution was used as global words to characterize a global event, Then time, person, place and other event elements in the news segment divided by certain time granularity are used as local words. The are co-occurrence of global and local words is analyzed and used as supervised information, with RCRP algorithm and bilingual aligned words together, which are integrated into a bilingual topic model to get sub-topic distribution under corresponding time slice. Finally, by the sub-topic distribution representing the developing process of an event, a generative model to storyline was constructed. On Chinese-Vietnamese mixed news set crawled from the internet, the comparative experiments of storyline generation are conducted, proving that the proposed bilingual news storyline is model got better effect than the other methods.
    Key words Chinese-Vietnamese; news event storyline; global/local co-occurrence words; sub-topic distribution; bilingual topic model
       
       
       
  • Review
    LV Guoying,SU Na,LI Ru,WANG Zhiqiang,CHAI Qinghua
    2015, 29(6): 98-109.
    Abstract ( ) PDF ( ) Knowledge map Save
    Frame semantics is introduced to the research of Chinese discourse analysis which includes three subtasks discourse segmentation, discourse structure modeling and discourse relation recognition. First, the Chinese discourse coherence framework and a corresponding corpus is built based on frame semantics. Then two kinds of maximum entropy classifiers are applied to recognize the relation between discourse units and the class of discourse relation based on lexical features, dependency parser features, syntactic parser features, target features and frame sematic features. Finally, we use probability of the relation existence between discourse units to generate the discourse structure by greedy bottom-up method. Experimental results show that frame sematic can segment discourse units effectively and frame sematic feature can improve the performance of discourse structure construction and discourse relation recognition.
    Key words Discourse units; Discourse Structure; Discourse Relation; Greedy Bottom-up Method
       
       
       
  • Review
    ZHU Shanshan, HONG Yu, DING Siyuan, YAO Jianmin, ZHU Qiaoming
    2015, 29(6): 110-118.
    Abstract ( ) PDF ( ) Knowledge map Save
    Implicit discourse relation recognition is an important subtask in the discourse analysis field. Most existing studies assume the balance between the numbers of positive and negative samples, and employ random under-sampling method to keep the training data well balanced. However, the training data has imbalanced distribution in reality which affect the recognition performance of the implicit discourse relation. To solve this problem, we propose a novel implicit discourse relation recognition method based on the frame semantic vectors. Firstly, we represent the argument as a frame semantic vector using the FrameNet resource, and then mine a number of effective discourse relation samples from the external data resources based on this new representation. Finally, we add the mined samples into the origin training data sets and perform experiment on this extended data sets. Evaluation on the Penn Discourse Treebank (PDTB) show that the proposed method perform better than the current mainstream imbalanced classification methods.
    Key words implicit discourse recognition; imbalanced data; frame semantic vectors
       
       
       
  • Review
    REN Han,SHENG Yaqi,FENG Wenhe,LIU Maofu
    2015, 29(6): 119-126.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper analyzes the defects in current entailment recognition approaches based on classification strategy and proposes a novel approach to recognizing textual entailment based on a knowledge topic model. The assumption in this approach is, if two texts have an entailment relation, they should share a same or similar topic distribution. The approach builds an LDA model to estimate semantic similarities between each text and hypothesis, which provides the evidences for judging entailment relation. We also employ three knowledge bases to improve the precision of Gibbs sampling. Experiments show that knowledge topic model improves the performance of textual entailment recognition systems.
    Key words recognizing textual entailment; topic model; entailment classification;inference knowledge
       
       
       
  • Review
    YU Ningyi,AO Gaoqi,UN Endong
    2015, 29(6): 127-134.
    Abstract ( ) PDF ( ) Knowledge map Save
    The information extraction from ancient Chinese benefits language monitoring and corpus construction. This paper regards the ancient Chinese tagging in mixed corpus as a task of short text classification, and applies both rule methods and statistical methods. For rule based methods, the paper considers the effect from function words and constructions in ancient Chinese. For statistical methods, we conduct experiments on N-gram, Naive Bayes, Maximum Entropy, and Decision Tree. Experiments indicate that the unigram model over performs others in F value of 0.98. The research in this paper also provides evidence for the conclusion on Chinese evolution as a continuum.
    Key words ancient Chinese tagging; text classification; rule based model; statistic based model
       
       
       
  • Review
    MO Peng, HU Po, HUANG Xiangji, HE Tingting
    2015, 29(6): 135-140.
    Abstract ( ) PDF ( ) Knowledge map Save
    Text summarization and keyword extraction are two important research topics in Natural Language Processing (NLP), and they both generate concise information to describe the gist of text. Although these two tasks have similar objective, they are usually studied independently and their association is less considered. Based on the graph-based ranking methods, some collaborative extraction methods have been proposed, capturing the associations between sentences, between words and between the sentence and the word. Though they generate both text summary and keywords in an iterative reinforced framework, most existing models are limited to express various kinds of binary relations between sentences and words, ignoring a number of potential important high-order relationships among different text units. In this paper, we propose a new collaborative extraction method based on hypergraph. In this method, sentences are modeled as hyperedges and words are modeled as vertices to build a hypergraph, and then the summary and keywords are generated by taking advantage of higher order information from sentences and words under the unified hypergraph. Experiments on the Weibo-oriented Chinese news summarization task in NLPCC 2015 demonstrate that the proposed method is feasible and effective.
    Key words hypergraph;document Summarization;keyword extraction;collaborative extraction
       
       
       
  • Review
    RAO Gaoqi,YU Dong,XUN Endong
    2015, 29(6): 141-149.
    Abstract ( ) PDF ( ) Knowledge map Save
    Text features are often shown by its terms and phrases. Their unsupervised extraction can support various natural language processing. We propose a “Cluster-Verification” method to gain the lexicon from raw corpus, by combining latent topic model and natural annotation. Topic modeling is used to cluster strings, while we filter and optimize its result by natural annotations in raw corpus. High accuracy is found in the lexicon we gained, as well as good performance on describing domains and writing styles of the texts. Experiments on 6 kinds of domain corpora showed its promising effect on classifying their domains or writing styles.
    Key words natural annotation; natural chunk; latent topic model; domain feature; stylistic features
       
       
       
  • Review
    CHEN Jiang,LIU Wei,CHAO Wenhan,WANG Lihong
    2015, 29(6): 150-158.
    Abstract ( ) PDF ( ) Knowledge map Save
    Microblog forwarding is an important way to the information dissemination, and microblog forwarding prediction is of great value in the analysis of microblog influence and microblog topic analysis. Existing methods of microblog forwarding prediction mostly focus on microblog and user attributes. In this paper, a microblog forwarding prediction method based on hot topics is proposed. We quantitatively analyze the impact of hot content and transmission tendency on users’ forwarding behavior, and then introduc features concerned with hot topics such as forwarding interest, forwarding activity and behavior pattern. Finally, we establish the hot topic oriented microblog forwarding prediction model based on the classification algorithm. Our experimental results on real data show that the accuracy of this method is 96.6%, and the max improvement of is 12.14%.
    Key words microblog forward; forwarding prediction; hot topic
       
       
       
  • Review
    LIU Longfei, YANG Liang, ZHANG Shaowu, LIN Hongfei
    2015, 29(6): 159-165.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese micro-blog sentiment analysis aims to discover the user attitude towards hot events. This task is challenged by immense noises, rich new words, numerous abbreviations, vigorous collocation, together with the limited contextual information provided in the short texts. This paper explores the feasibility of performing Chinese micro-blog sentiment analysis by convolutional neural networks. To avoid task-specific features, character level embedding and word level embedding are adopted for convolutional neural networks(CNN). On the COAE 4th task corpus, the character level CNN achieves a sentiment prediction (in both binary positive/negative classification) accuracy of 95.42%, slightly better than the word level CNN yielding 94.65% accuracy. The results show that the convolutional neural networks model is promising in Chinese micro-blog sentiment analysis.
    Key words deep learning;sentiment analysis;convolutional neural networks;word embedding
       
       
       
  • Review
    JIANG Shengyi,HUANG Weijian,CAI Maoli,WANG Lianxi
    2015, 29(6): 166-171.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper aims to explore a method to build social emotional lexicons from microblog and apply it to analyze social emotions in social public events. First, the small-scale standard emotional lexicons are manually collected as the basic emotional lexicon. Then, word2vec, a tool based on deep learning, is used to conduct incremental learning method on the corpus from social events on microblogs to expand the basic emotional lexicon. The final emotional lexicon is filtered by HowNet and experts. In the following, the paper compares the results of emotional analysis based on the generated emotional lexicon with those based on SVM classification, demonstrating 13.9% increase in average precision and 1.5% increase in recall. Finally, the proposed methods are verified according to emotional analysis on different social events with the generated emotional lexicon.
    Key words microblogging; social emotions; lexicon; emotional analysis
       
       
       
  • Review
    CHEN Zhao,XU Ruifeng,GUI Lin,LU Qin
    2015, 29(6): 172-178.
    Abstract ( ) PDF ( ) Knowledge map Save
    Recently, the classification approach based on word embedding and convolutional neural networks achieved good results in sentiment classification task. This approach is mainly based on the contextual features of the word embeddings without the consideration of the polarity of the words. Meanwhile, this approach lacks of the use of manually compiled sentiment lexicon resources. To address these problems, this paper proposes a novel sentiment classification method which incorporates existing sentiment lexicon and convolutional neural networks. In this work, the words in text are abstractly represented by using existing sentiment words. The convolutional neural networks are used to extract sequence features from the abstracted word embeddings. Finally, the sequence features are applied to sentiment classification. The evaluations on Chinese Opinion Analysis Evaluation 2014 dataset show that our proposed approach outperforms the convolutional neural networks model with word embedding features and Nave Bayes Support Vector Machines.
    Key words convolutional neural networks; sentiment analysis; word sentiment sequence features
       
       
       
  • Review
    CHU Xiaomin, WANG Zhongqing, ZHU Qiaoming, ZHOU Guodong
    2015, 29(6): 179-184.
    Abstract ( ) PDF ( ) Knowledge map Save
    Social tags are important styles of information organizing on the Web 2.0 era. Tag recommendation can help users collect, search and share online resources effectively. The previous approaches are focused on using single types of textual information, e.g. summary of a movie. But in practice there are various types of textual information that can be used for tag recommendation. For example, a movie contains both summary and comment information. Different types of information reflect different aspects of the movie. Thus we propose a novel approach to combine both summary and comment information to recommend tags. Furthermore, we use different ensemble learning approaches to incorporate the above information. The experimental results show that our proposed approach using different types of information outperform using single types of textual information in the tag recommendation tasks.
    Key words natural language processing; social tags; ensemble learning
       
       
       
  • Review
    WANG Mingwen, FU Cuiqin, XU Fan, HONG Huan
    2015, 29(6): 185-192.
    Abstract ( ) PDF ( ) Knowledge map Save
    Different from the traditional term independence assumption-based bag-of-words model, we present a new word co-occurrence relationship-based graphic model. Our model describes the distribution difference among the terms within both subjective and non-subjective sentences sets via the term co-occurrence and syntactic information, also integrates an indegree-based term weighting calculation method. Evaluation on the benchmark dataset shows the importance of the term co-occurrence graphic model. It also shows that our model significantly outperforms the bag-of-words model currently in the subjective sentence identification field.
    Key words word co-occurrence; graphic model; subjective sentence identification; feature value; supervised learning
       
       
       
  • Review
    ZHAO Mingzhen, CHENG Liangxi, LIN Hongfei
    2015, 29(6): 193-202.
    Abstract ( ) PDF ( ) Knowledge map Save
    When mining adverse drug reactions (ADRs) from the user comments on healthcare social networks, it is very important to recognize novel ADR expressions from comments and normalize them, since people probably adopt different expressions to describe adverse reactions and new adverse reactions may emerge with the listing of new drugs as well as the diversity of drug users. This paper utilizes Conditional Random Field (CRF) model to recognize adverse reaction entities, and proposes a normalization method applied to the recognized entities. The effectiveness of this mining method is verified by comparing the mined results of known ADRs with database records, and a list of potential ADRs sorted by occurrence frequency in comments is obtained. Experimental results indicate that CRF model is capable of identifying both known and novel adverse reaction entities, and the standardization aggregates and merges the entities, which benefits the ADR discovery.
    Key words adverse drug reaction; user comment; text mining; entity normalization
       
       
       
  • Review
    LI Yachao, JIANG Jing, JIA Yangji, YU Hongzhi
    2015, 29(6): 203-207.
    Abstract ( ) PDF ( ) Knowledge map Save
    TIP-LAS is an open source toolkit for Tibetan segmentation and POS tagging. The toolkit implements the Tibetan segmentation system based on syllable tagging by the CRF model, and integrates the maximum entropy model with syllables features for Tibetan POS tagging. In the experiments, this system achieves good results. The source code is shared in the Internet, together with the experimental corpus.
    Key words Tibetan; word segmentation; part of speech tagging; conditional random fields; maximum entropy
       
       
       
  • Review
    Azragul,Alim Murat, Yusup Abaydula
    2015, 29(6): 208-212.
    Abstract ( ) PDF ( ) Knowledge map Save
    Modern Uyghur noun stem identification is a fundamental issue in the field of natural language processing. The morphological analysis is first introduced, especially on its role in identifying the POS of words. Then this paper describes the POS scheme in Uyghur, as well as the morphological characteristics of Uyghur nouns, suffix ambiguity and the disambiguation rules. The algorithm of new nouns identification in modern Uyghur language is proposed, including feature selection (features within and between words) and parameter estimation. The experiment is carried on the corpus of Uyghur physical textbooks in junior and senior middle schools.
    Key words modern Uyghur; morphological analysis; noun stems recognition
       
       
       
  • Review
    Luobsang Karten,YANG Yuanyuan,ZHAO Xiaobing
    2015, 29(6): 213-219.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tibetan word segmentation is one essential task in Tibetan language processing. In this paper, a CRF module is trained on 35.1M Tibetan corpus with manual annotation. The CRF segmentation results is processed by rules for the errors such as segmentation errors of non-Tibetan characters, recognition error of Tibetan adhesion words, segmentation errors of stop words and unregistered words. An open test demonstrate an accuracy of 96.11%, recall rate of 96.03%, and F score of 96.06%.
    Key words Tibetan; word segmentation;CRFs;knowledge fusion
       
       
       
  • Review
    ZHU Zhen,SUN Yuan
    2015, 29(6): 220-227.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes an SVM and pattern based approach to Tibetan person attribute extraction. The pattern system is built with language rules on Tibetan language features with clear semantic information, such as case-auxiliary words, particular verb and etc. Then, a machine learning approach via SVM is introduced to build a a hierarchy classifier strategy. Experiment results indicate a significant improvement in person attributes extraction.
    Key words person attributes extraction; tibetan language processing; SVM; hierarchy classifier