Content of 民族语言及跨语言信息处理 in our journal
  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Ethnic Lauguage and Cross Language Information Processing
    HUANG Xiaohui,LI Jing
    . 2018, 32(5): 49-55.
    The recurrent neural network and the connectionist temporal classification algorithm are applied to the acoustic modeling of Tibetan speech recognition, so as to achieve end-to-end model training. According to the relationship between the input and output of the acoustic model, the time domain convolution operation on the output sequence of the hidden layer is introduced to reduce the time domain expansion of the network’s hidden layers. Experimental results show that the recurrent neural network model achieves better recognition performance in Tibetan Lhasa phoneme recognition compared with the traditional acoustic models based on Hidden Markov Model, while the acoustic model based on recurrent neural network with time-domain convolution possesses higher training and decoding efficiency while maintaining the same recognition performance.
  • Ethnic Lauguage and Cross Language Information Processing
    QIN Yue, YU Long, TIAN Shengwei, FENG Guanjun,
    Turgun Ibrahim, Askar Hamdulla,ZHAO Jianguo
    . 2018, 32(5): 56-64.
    Adopting deep learning mechanism, this paper apply Stacked Denoising Autoencoder (SDAE) to deal with Uyghur zero pronoun anaphora phenomenon. Firstly, word embedding trained on large-scale unlabeled Uyghur corpus is used as semantic features of candidate antecedents and zero pronouns. Secondly, according to Uyghur characteristics, we extract 14 hand-crafted features for zero pronoun resolution. Experimental results show that, compared to SAE(Stacked Autoencoder), SVM and ANN, the F value of SDAE is increased by 4.450%, 10.032% and8.140%, respectively.
  • Ethnic Lauguage and Cross Language Information Processing
    HU Wei, YU Long, TIAN Shengwei,Turgun Ibrahim,FENG Guanjun,Askar Hamdulla
    . 2018, 32(5): 65-73.
    The accompanying relationship between the events is common in the Uyghur language. This paper proposes a method to identify the accompanying relationship between the Uyghur events based on deep belief network(Deep Belief Network, DBN). According to the characteristics of the Uyghur language, this paper extract 12 features which are based on the event structure information; It also applies the Word Embedding to calculate the semantic similarity between the two trigger words. The experiments show that the precision rate, the recall rate and F value of the proposed method reach 81.89%, 84.32% and 82.48%, respectively, which outperforms SVM (Support Vector Machine, SVM).
  • Ethnic Language Processing and Cross Language Processing
    LIU Ruolan, NIAN Mei, Maierhaba Aisaiti
    . 2018, 32(3): 49-54.
    Emotion words are the fundamental resource for accurately analysis the opinions of the Uighur language. We investigates the automatic expansion of the web emotional words on the basis of an existing Uighur sentiment lexicon. First, we summarize the collocation rules of the conjunctions, degree adverbs and sentiment words by analyzing the linguistic features of Uighur emotional expression. Based on the rules, we design an algorithm to obtain the candidate emotional words from emotional corpus, forming the candidate sentiment lexicon. Finally, we use the Internet as a super-large corpus to design the emotional discriminant algorithm based on search engine by reusing the characteristics of Uighur conjunctions and combining with the established emotional lexicon and Uighur antonyms dictionary. The polarity of candidate emotional words is decided according to the score calculated by the algorithm, and then add them to the emotional lexicon. Compared with the emotional lexicon that was not expanded, the experimental results showed that the accuracy and recall rate of Uyghur sentence‘s tendency are significantly improved by our extended dictionary.
  • Ethnic Language Processing and Cross Language Processing
    LIU Jiao, CUI Rongyi, ZHAO Yahui
    . 2018, 32(3): 55-63.
    The paper analyses the cross lingual document similarity measure between different languages, including Chinese, English, and Korean. Initially, this paper maps a document vector in a language to another by co-occurrence information. The Latent Semantic Analysis is then employed to remedy the lack caused by polysemy across languages. Finally, the cosine similarity between two documents is calculated in the same space with equivalent semantic information. This method does not rely on a pre-existing external dictionary and knowledge base, but use the parallel corpus to establish the lexical relationship between Chinese, English, and Korean. It turns out that co-occurrence mapping contributes substantially to documents similarity measure, resulting an 95% accuracy of translation retrieval.
  • Ethnic Language Processing and Cross Language Processing
    TANG Liang, XI Yaoyi, PENG Bo, LIU Xiangwei, YI Mianzhu
    . 2018, 32(3): 64-70.
    This paper proposes a novel word vector based cross lingual event retrieval method for Vietnamese and Chinese. First, the Chinese event keywords are computed for their semantic feature vector via the word vector. Then, the corresponding Vietnamese translation vectors are computed. Finally, the cross-language keyword alignment is calculated via the similarity between the two semantic feature vector spaces. The input query can thus be mapped into the other language, and the cross lingual event retrieval is realized. Experiments on the South China Sea events related Vietnamese-Chinese bilingual corpus have shown the effectiveness of the method.
  • Ethnic Language Processing and Cross Language Processing
    LONG Congjun, Douge Tsiring, LIU Huidan
    . 2018, 32(3): 71-76.
    With the development of information technology, the Tibetan language was widely used on the Internet. To deal with the transliteration issue form Chinese texts to Tibetan, this paper collects five Tibetan website texts and examines the unified forms of transliterations. After analyzing the causes of confusion transliteration between Chinese and Tibetan, this paper proposes some transliteration principles. It also suggests that relevant government agencies should actively promote the standardization of transliteration norms.