2017 Volume 31 Issue 3 Published: 15 June 2017
  

  • Select all
    |
    Article
  • Article
    FANG Qian; DOU Yongxiang; WANG Bangjin
    2017, 31(3): 1-8.
    Abstract ( ) PDF ( ) Knowledge map Save
    Social media is becoming a new online media, the detection and study of the community in which will contribute to the discovery of the pattern and characteristic of information transmission and sharing. In this paper, the knowledge map which focus on community detection in social media was built by some tools such as CiteSpace, SATI and UCINET based on the data from Web of Science. From the perspective of cited references, key words and burst terms, the research status, knowledge evolution process, research hotspot and front were also be analyzed using visual charts.
  • Article
    YANG Yang; LIN Hongfei; YANG Liang; REN Juwei
    2017, 31(3): 9-16.
    Abstract ( ) PDF ( ) Knowledge map Save
    The study of politics has been a hot research spot in the field of social science, such as political theory, comparative politics, public policy, and international politics. From the moral philosophy and legal theory in the traditional politics, to the scientific methodology and quantitative analysis in behavioristic politics, further to the involvement of natural science researchers, the research methods in politics have been developing and evolving. After a brief summary of previous methods in political science research, this paper discusses the origin, definition and development of computational political science at the age of the Internet, especially in the era of big data. It reviews the progress of political orientation, opinion recognition, conflict point detection, election prediction and political analysis visualization.
  • Article
    HUANG Sisi; ZHAN Weidong;
    2017, 31(3): 17-24.
    Abstract ( ) PDF ( ) Knowledge map Save
    Construction is prototypically a linguistic unit of language whose meaning can not be derived directly from the meaning of its parts. This paper concentrates on constructions in cyber language, analyzing its emergence, expansion and solidification. The emergence of a new construction is often attributed to either the special context of language usage or the intentional violation of conventional expressions. The expansion of a construction contains expansion inside the same category and expansion across different categories. The degree of solidification can be measured from three dimensions: productivity, schematicity, and compositionality.
  • Article
    JIANG Zhenchao; LI Lishuang; HUANG Degen
    2017, 31(3): 25-31.
    Abstract ( ) PDF ( ) Knowledge map Save
    In natural language processing tasks, distributed word representation has succeeded in capturingsemantic regularities and have been used as extra features. However, most word representation model arebased shallow context-window, which are not enough to express the meaning of words. The essence of wordmeaning lies in the word relations, which consist of three elements: relation type, relation direction and relateditems. In this paper, we leverage a large set of unlabeled texts, to make explicit the semantic regularity toemerge in word relations, including dependency relations and context relations, and put forward a novelarchitecture for computing continuous vector representation. We define three different top layers in the neuralnetwork architecture as corresponding to relation type, relation direction and related words, respectively.Different from other models, the relation model can use the deep syntactic information to train wordrepresentations. Tested in word analogy task and Protein-Protein Interaction Extraction task, the results showthat relation model performs overall better than others to capture semantic regularities.
  • Article
    HU Hao; LI Ping;CHEN Kaiqi
    2017, 31(3): 32-40.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the rapid development of Internet, Chinese short text has become increasingly im- portant. How to mining valuable information from massive short text has become a very important and challenging task in Chinese natural language processing. However, using the traditional methods which analyze long text often get bad results due to the sparsity of syntax and semantic. This paper proposed a Chinese word embedding method based on stroke, combined with deep learning of short text similarity calculation. This method combined Chinese word-building and its Pin-Yin attributes. The Chinese characters were mapped to a 32-dimensional vector. Then we used convolution neural network to extract the semantic of each short text and calculate similarity. Experimental results show that compared with the existing short text similarity calculation method, the method has greatly improved on performance and accuracy.
  • Article
    JIA Yuxiang ; ZAN Hongying ; FAN Ming; YU Shiwen; WANG Zhimin
    2017, 31(3): 41-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    In metaphors, abstract things are usually described in terms of concrete things. If we can decide whether a word is concrete or abstract, we will provide useful clues for automatic metaphor recognition. This paper proposed a cross-lingual knowledge transfer method to adapt English word abstractness knowledge to Chinese. Then we propose a metaphor recognition method based on word abstractness and analyze in detail the relation between word abstractness and metaphor. Experimental results show that, the cross-lingual knowledge transfer method is feasible to measure Chinese word abstractness, the abstractness-based metaphor recognition method achieves a high precision score, and it can improve the efficiency of metaphor extraction from real texts.
  • Article
    ZHANG Huaping; LI Hengxun; LI Qingmin
    2017, 31(3): 48-54.
    Abstract ( ) PDF ( ) Knowledge map Save
    Rapid development of Internet commerce and various social networking applications leads to a largenumber of user comment information. To meet the requirement of fast processing these information, sentimentand its polarity analysis arises at the moment. Emotion dictionary is the basis for all kinds of recognitionalgorithms of emotional polarity. To build a comprehensive emotional dictionary with rational weight, this paperproposes an automatic emotion weight (AEW) algorithm to mine the potential emotional words and estimate theemotion weight, with the advantage of domain independence and good scalability. The method uses special typeof co\|occurrence, which is based on Bayesian theory, to recognize unknown emotion words, judge the sentimentpolarity according to the value of its emotion weight. We verify the theoretical research by three empiricalanalysis of data form JD.com, douban.com and dianping.com, achieving a precision about 90%.
  • Article
    ZHANG Huaping; SHANG Jianyun
    2017, 31(3): 55-61.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the development of Internet, social media has become an important channel for information transmission. Focused on characteristics of the informal language in various domains inherent in social media, this paper proposes a social media-oriented open domain new word detection method. This approach can be executed in linear time complexity with a reduced memory usage, which enables real time processing large size data produced by social media. The experiment on a 6.6GB social media corpus reveal a processing speed of 2.6MB/s in normal PC, as well as 87.2% precision.
  • Article
    WAN Shengxian; LAN Yanyan ; GUO Jiafeng ; XU Jun; PANG Liang; CHENG Xueqi
    2017, 31(3): 62-68.
    Abstract ( ) PDF ( ) Knowledge map Save
    Deep learning has shown great benefits for natural language processing in recent years. Models such as Recurrent Neural Networks (RNNs) have been proposed to extract text representation, which can be applied for text classification. Long short term memory (LSTM) is an advanced kind of RNN with special neural cells. LSTM accepts a sequence of words from a sentence scans over the whole sequence and outputs the representation of the sentence. However, customary practices use only the last representation LSTM produced for classification, ignoring all other intermediate representations. A clear drawback is that it could not capture efficiently local features that are very important for determining the sentences class label. In this paper, we propose the local bidirectional long short term memory to deal with this problem, including MaxBiLSTM and ConvBiLSTM. MaxBiLSTM conducts a max pooling operation and ConvBiLSTM conducts a convolution operation followed with a max pooling operation on all intermediate representations generated by bidirectional LSTM. Experimental results on two public datasets for text classification show that local bidirectional LSTM, especially ConvBiLSTM, outperforms bidirectional LSTM consistently and reaches the state-of-the-art performances.
  • Article
    LI Yang; GUO Xiaomin ; WANG Suge; LIANG Jiye ;
    2017, 31(3): 69-76.
    Abstract ( ) PDF ( ) Knowledge map Save
    To utilize car comments efficiently, this paper extracts collocations from reviews texts, and proposes the 5-tupel based evaluation measurement of objects. With such built fuzzy formal context of car comments, we define intension fuzzy concept and intension fuzzy concept lattice. We design the algorithms for constructing a fuzzy formal context and an intension fuzzy concept lattice, and illustrate how to conduct knowledge discovery based on an intension fuzzy concept lattice with a real example.
  • Article
    ZHOU Huiwei;YANG Huan;XU Junli; ZHANG Jing;KANG Shiyong
    2017, 31(3): 77-85.
    Abstract ( ) PDF ( ) Knowledge map Save
    Hedge is usually used to express uncertainty. Hedge information indicates that authors do not backup their statements with facts. Chinese hedge information detection is of great significance for Chinese factual information extraction. Hedge information detection contains two subtasks: identifying hedges and detecting the in-sentence scopes of hedge cues. The lack of Chinese hedge scope corpus has limited the research of Chinese hedge scope information detection. This paper first manually crafted the syntactic rules for Chinese hedge scope annotation, and then constructs a Chinese hedge scope corpus. Finally, we statistically analyzed the corpus. The construction of the corpus provides a great support for Chinese uncertainty detection.
  • Article
    FENG Wenhe;LI Yancui ;REN Han; ZHOU Guodong
    2017, 31(3): 86-93.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese-English discourse treebank (CEDT) is a parallel corpus annotated with alignment discourse structure information for Chinese and English. Its core task is alignment annotation supervised by the principle of structure and relation alignment. With the corresponding annotation platform, we manually annotate the corpus, propose the evaluation methods for the alignment annotation and give the evaluation analysis, including segmentation, structure, relation, connective, relation role and center alignment. Experimental results show that the alignment annotation strategy is a feasible and efficient method of building CEDT.
  • Article
    XU Jing; YANG Xiaoping
    2017, 31(3): 94-100.
    Abstract ( ) PDF ( ) Knowledge map Save
    To accurately find out the clues of the same topic from a large number of Web news, a method of topic clues mining is proposed based on the Conditional Random Fields model. Firstly, according to the identification rules of the topic sentence, the relative characteristics were extracted and utilized on the Conditional Random Field model to get the candidate topic sentences. Then the lexical chains of topic clues were built by chronological order and lexical weight. Finally the similar clue chains in semantic needed to be merged and the whole development context of network news can be described. The experiment results show the method proposed achieves a good performance on the topic clue sentence extraction and the topic clue chains obtained can clearly show the development trend of network news.
  • Article
    HE Min; LIU Wei; LIU Yue; WANG Lihong; BAI Shuo; CHENG Xueqi
    2017, 31(3): 101-108.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the microblog properties of sparse data and difficult decision on relation of content, a feature-driven microblog topic detection method is proposed. The meaningful strings are extracted as dynamic microblog features. The author-influence and document-influence of features are defined according to the structure relation of microblogs, which form the attribute sets together with the statistics on content. The logic regression model is used to classify features into key features and noise features. The nearest neighbor clustering method is modified to derive the topics from clustering the key feartures, in which the mutual information of key features is applied as the distance measure. The microblog data experiment shows that the accuracy and recall are remarkably improved by the proposed method.
  • Article
    WANG Hongwei; MENG Yuan;
    2017, 31(3): 109-117.
    Abstract ( ) PDF ( ) Knowledge map Save
    Faced with hundreds of thousands of online reviews, helpful review features facilitate consumers to identify high quality reviews to support decision-making. Based on information adoption model, this paperexamines four kinds of useful features sets, totaling seventeen features, on the domains of camera and mobile. With baselines by the logisitc ridege regression and decision tree models, the paper investigates the GBDT model in review quality classification and features reduction, which reveals the feature contribution as the basis of key features identification. The experiment result shows that timeliness, reviewer ranking, key product features number, and review words number are key features influencing review quality, forming the optimized feature set for the GBDT model .
  • Article
    LIU Quan; ZHANG Ming
    2017, 31(3): 118-124.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the evolvement of social media like Sina Weibo and Renren, online influence maximization has attracted more and more attention from both industry and research institutions. Its very crucial for the healthy development of marketing strategies like viral marketing to maximize the influence scope given the constraints of resources. If we can specify the domain that a user belongs to more accurately with the heterogeneous information from social networks, and further maximize influence spread based on domains, traditional influence research from the general perspective as well as single perspective of structure or content will be hugely benefitted. In this paper we proposed a domain influence maximization model, which first divided users into different domains utilizing users social relations and weibo contents, and then maximized domain influence based on a greedy-dp hybrid algorithm. Experiments on Sina Weibo show that our model greatly improved the time cost of traditional models, with little errors.
  • Article
    HE Xufeng; CHEN Ling; CHEN Gencai;QIAN Kun;WU Yong; WANG Jingchang
    2017, 31(3): 125-133.
    Abstract ( ) PDF ( ) Knowledge map Save
    Considering that different collections have different contributions to the final search results, a LDA topic model based collection selection method is proposed for distributed information retrieval. Firstly, the method acquires information about the representation of each collection by query-based sampling. Secondly, a method using the LDA topic model is proposed to estimate the relevance between the query and a document. Thirdly, a method based on both term and topic is proposed to estimate the relevance between the query and the sample documents, by which the relevance between the query and collections can be estimated. Finally, M collections with the highest relevance are selected for retrieving. Experiment results demonstrates that the proposed method can improve the accuracy and recall of search results.
  • Article
    ZHANG Shijun; WANG Yongheng
    2017, 31(3): 134-139.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the development of social network, there is a large amount of valuable information produced by social network users. This paper focused on the personification recommender system based on matrix factorization. In order to improve the recommender systems, we study the user social relationship and the implicit feedback of user. We add in the matrix factorization optimization function by a social regularization, a demographic information configuration item, and the consumer records as items latent factor bias. Experiments indicates a decrease in RMSE by 0.259475 achieved by the proposed method than SVD algorithm.
  • Article
    LIU Xiong; ZHANG Yu; ZHANG Weinan; LIU Ting
    2017, 31(3): 140-146.
    Abstract ( ) PDF ( ) Knowledge map Save
    Question answering systems have been one of the hot research areas of natural language processing for a long time. To enhance the ability of analyzing complex factoid questions in question answering systems, we presented a novel method to decompose complex factoid questions: using a tree kernel based support vector machine to recognize decomposition categories of questions, and generating decomposition results with a dependency parsing based method. The evaluation shows that based on the high quality question decomposition corpus we had built, our method recognizes question decomposition categories with high performance and generated sub-question series with high quality, especially for the nested-typeones.
  • Article
    ZHANG Guiping; ZHAI Shunlong; WANG Peiyan
    2017, 31(3): 147-155.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a method by combining the topic and the behavior to describe the user interest. On the one hand, from the perspective of the topics, a topic vector model is constructed to reflect the users interest in topic. On the other hand, from the perspective of behavior, a score matrix model is constructed to reflect the users interest in behavior. Then, based on two user models, two document recommendation methods are constructed, and then combined by the linear weighted method. Experimental results show that the proposed method is better than the collaborative filtering recommendation method and the content-based recommendation method.
  • Article
    BAO Feilong; GAO Guanglai; WANG Hongwei; LU min
    2017, 31(3): 156-162.
    Abstract ( ) PDF ( ) Knowledge map Save
    Cyrillic Mongolian and Traditional Mongolian are used in Mongolia and China, respectively. Cyrillic Mongolian to Traditional Mongolian conversion not only will bring more convenience to exchanges between the two countries, but also has great significance for scientific, cultural and educational development of Mongolian. This paper proposes a highly efficient Cyrillic Mongolian to Traditional Mongolian conversion method. It adopts the rule-based approach to convert the words in the vocabulary, and the statistical model to convert the out-of-vocabulary words. A large part of Cyrillic Mongolian words correspond more than one candidates in Traditional Mongolian, which is solved by the N-gram language model. Experimental results show that the word error rate is as low as 4.12%, meeting the practical requirement.
  • Article
    JIANG Tao; YUAN Bin; YU Hongzhi; JIA Yangji
    2017, 31(3): 163-169.
    Abstract ( ) PDF ( ) Knowledge map Save
    While most Chinese or English micro-blogs are in just one single language, nearly 80% Tibetan Micro-blogs are mixed text of Tibetan and Chinese languages. If emotion orientation analysis is only targeted at Tibetan or Chinese, this analysis would be partial and fail to achieve its goal. According to the expression features of Tibetan micro-blogs, this paper puts forward the algorithm of multi-feature sentiment analysis, upon such features as emotional words, the sequence of part of speech, sentence information and emoticon signs.Dealing with Tibetan micro-blogs, this algorithm takes into consideration the emotional information of Chinese language and has improved the effect of sentiment analysis with the help bilingual information. The experimental results indicate that the sentiment analysis accuracy concerning monolingual Tibetan expression is 79.8%, which is boosted up to 82.8% after taking into consideration of the features of Chinese emotional words and Chinese punctuations.
  • Article
    DU Hui; XU Xueke; WU Dayong ; LIU Yue; YU Zhihua ; CHENG Xueqi
    2017, 31(3): 170-176.
    Abstract ( ) PDF ( ) Knowledge map Save
    We present a method for sentiment classification based on sentiment-specific word embedding (SSWE). Word embedding is the distributed vector representation of a word with fixed length in real topological space. Algorithms for learning word embedding, like word2vec, obtain this representation from large un-annotated corpus, without considering sentiment information. We make sentiment improvement for the initial word embedding and get the sentiment-specific word embedding that contains both syntactic and sentiment information.Then text representations are built based on sentiment-specific word embeddings. Sentiment polarities of texts are obtained through machine learning approaches. Experiments show that the presented algorithm performs better than sentiment classification method based on texts modeling by word, N-gram and word embeddings from word2vec.
  • Article
    ZHOU Wen; OUYANG Chunping; YANG Xiaohua; LIU Zhiming; ZHANG Shuqing; RAO Jie
    2017, 31(3): 177-183.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on the principle of “Verb Valency” and the dependency parsing, this paper proposes to treat the emotional dependency tuple (EDT) as the basic unit of Chinese emotional expression. An EDT consists of the core words (i.e. several selected categories of contents words expressing emotion in the sentence), the modifier attached to the core words, and the degree or negative words attached to either the core words or the modifiers. The EDTs are extracted from the parsed sentences, and the emotional dependency tuples based sentiment classification model is established. Experimented on the web news corpus released by COAE2014, the proposed method outperforms the semi-supervised algorithm(K-MEANS), producing comparable results to the supervised classification algorithms(KNN,SVM).
  • Article
    SUN Xiao; GAO Fei ;REN Fuji;
    2017, 31(3): 184-190.
    Abstract ( ) PDF ( ) Knowledge map Save
    This work investigates the deep features in social news which can influence the emotions of people.Three kinds of feature compression methods are used to extract shallow features from the granularities of unigram word,bigram word and theme.The work used Support Vector Machine to select the optimal shallow features of three granularities,and the optimal F1_macro are 60.5%、62.1% and 63.3% resepectirely. The work introduced Deep Belief Network (DBN) model to train and abstract the optimal shallow features, The optimal F1_macro of DBN3 are61.4%、63.5% and 66.1% respectively.The experimental results show that the deep features abstracted by Deep Belief Network have more semantic information and better performance than shallow features in determining the influence on peoples emotions by social news.
  • Article
    WAN Shengxian ; LAN Yanyan ; GUO Jiafeng ; CHENG Xueqi;
    2017, 31(3): 191-197.
    Abstract ( ) PDF ( ) Knowledge map Save
    Sentiment analysis (SA) is important in many applications such as commercial business and political election. The state-of-the-art methods of SA are based on shallow machine learning models. These methods are heavily dependent on feature engineering, however, the features for Weibo SA are difficult to be extracted manually. Deep learning (DL) can learn hierarchical representations from raw data automatically and has been applied for SA. Recently proposed DL techniques shown that one can train deep models successfully given enough supervised data. However, in Weibo SA, supervised data are usually too scarce. It is easy to obtain large scale distant supervision data in Weibo. In this paper, we proposed to pre-train deep models by distant supervision and used supervised data to fine-tune the deep models. This approach could take the advantages of distant supervision to learn good initial models while using supervised data to improve the models and to correct the errors brought by distant supervision. Experimental results on Sina Weibo dataset show that we can train deep models with small scale supervised data and obtain better results than shallow models.
  • Article
    ZU Kunlin; ZHAO Mingwei; GUO Kai; LIN Hongfei
    2017, 31(3): 198-204.
    Abstract ( ) PDF ( ) Knowledge map Save
    The problem of reliability of social network information has received considerable attention in recent years. Malicious rumors may cause social panic, even triggering a crisis of confidence. In China, the rapid growth of Sina Weibo user quantity paves the way for the spread of rumors. It has significant practical meaning for the harmonious society to clean up rumors in Sina Weibo in time. Here we consider the rumor detection task as a classification problem and propose a method by using the emotional tendencies of micro-blog comments as a feature. The experimental results show that the comments emotion brings a considerable improvement.
  • Article
    JI Qingbin; KANG Qian; LI Deyu; WANG Suge
    2017, 31(3): 205-212.
    Abstract ( ) PDF ( ) Knowledge map Save
    Study of community structure is of help to reveal the relationship between network structure and function, and community detection is essential to the community structure research. A bridgeness index based on clustering degree is defined in this paper, and applied to the community detection. The proposed algorithm includes two parts splitting and merging. The splitting algorithm identifies inter-community by bridgeness, and decomposes network by iterative removing inter-community edges until the community structure is discovered; The merging algorithm merges communities according to the community connection strength, so that the hierarchical nesting in community is revealed. Experiments on six social networks show that the proposed algorithm can effectively detect interesting communities for the whole network, and the accuracy is close to or even better than the classical algorithms.
  • Article
    WANG Yashen; HUANG Heyan; FENG Chong
    2017, 31(3): 213-222.
    Abstract ( ) PDF ( ) Knowledge map Save
    This article demonstrated that modularity maximization issue could be transformed into minimum-cut graph partitioning problem, and proposed an efficient algorithm for detecting community structure. Meanwhile, we combined the modularity theory with popular statistical inference method in two aspects: (i) transforming such statistical model into null model in modularity maximization; (ii) adapting the objective function of statistical inference method for our optimization. The experiments we conducted show that the proposed algorithm is highly effective and stable in discovering community structure from both real-world networks and synthetic networks.