2002 Volume 16 Issue 1 Published: 15 February 2002
  

  • Select all
    |
  • TAO Jian-hua,ZHAO Sheng,CAI Lian-hong
    2002, 16(1): 2-7.
    Abstract ( ) PDF ( ) Knowledge map Save
    The paper describes the methods of Chinese Prosodic Hierarchy Analysis and Prosody Modeling ,which are based on statistic algorithm. Meanwhile ,the paper also describes the prosody cost function and corresponding training method for the parameters. Furthermore ,the interaction among the prosodic features is analyzed in respond to its influence in speech unit selection procedure.Based on these ,a Chinese Syllable Unit Selection Model was generated for the spontaneous speech synthesis system. The tests show that the method described in the paper is much suitable to the constitution of Speech Synthesis System and improves the naturalness of the synthesis result a lot .
  • ZHANG Lei,HAN Ji-qing,WANG Cheng-fa,ZHANG Wen-xiang
    2002, 16(1): 8-13.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on the analysis of stressful speech ,an interesting fact that the different dimension of MFCC feature has different sensitivity of G-force is found. Generally ,the lower dimensions are more sensitive to stress ,and the sensitivity of higher dimensions is less. Therefore ,a new approach named weighted MFCC feature is proposed for the recognition under G-force in the paper. Using the weighted feature to emphasize the influence of higher dimensions , the better performance of recognition system can be achieved. In order to obtain the weights ,a new method named maximum relative entropy weights is proposed in which the initial weights are the linear weights. For a small-vocabulary speaker-dependent system ,the recognition rates of these methods are better than that of traditional multi-style training method. Among these methods ,maximum relative entropy weights can reach the best performance with 89.9% recognition rate ,which improves 13.1% comparing with the multi-style training method.
  • LIU Ya-bin,LI Ai-jun
    2002, 16(1): 14-19,54.
    Abstract ( ) PDF ( ) Knowledge map Save
    From the development of language ,spontaneous speech is an archaic ,common used and typical form of the language. In the past decades from 50s to 80s of the 20th century ,we focused on read speech to do our research in three fields : acoustics ,psychology and physiology. In the recent 10 years ,the research on spontaneous speech is becoming more and more important for the speech applied technology and the associated theories. Spontaneous speech rather than read speech is one of the unresolved problems faced by many speech recognition systems. Many differences exist between read and spontaneous speech in Chinese on linguistic and phonetic aspects ,such as prosodic and segmental variability ,turn takings ,discourse topics and paralinguistic phenomena. This paper gives some illustrations and then depicts the research on read and spontaneous speech by analyzing the annotated read speech corpus ASCCD and spontaneous speech corpus CASS and CADCC.
  • ZHANG Xin-yan,WANG Fan,ZHENG Fang,XU Ming-xing,WU Wen-hu
    2002, 16(1): 20-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper ,we propose a new method to integrate the sub - band information into features via both the sub - band weighting and the spectral subtraction for robust speech recognition. In this method ,just simple on - line noise estimation and sub - band processing where the sub - bands divided by the filter banks of common MFCC Calculation are added into the traditional MFCC calculation algorithm to achieve the robust MFCC ,without any prior knowledge of the noise. Furthermore ,other robust methods after the feature extraction step can be used together with this method to obtain high recognition performance in adverse environments. Experiments show that the new robust MFCC yields good recognition results compared with the traditional feature. Forexample ,at 5 to 10dB SNR , it can reduce the error rate by over 20% compared with the traditional MFCC.
  • CHEN Liang,ZHANG Xiong-wei
    2002, 16(1): 26-31.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on Takens theory ,time delay method is used to reconstruct phase space of speech signal in this paper. In hyper dimensional phase space , similar sequence repeatability (RPT) of speech are calculated. At the same time ,according to the RPT difference between voice and unvoice ,speech phonemes are classified. The method proposed in this paper provides a new way for study of speech recognition.
  • SHAN Wei,YAO Hong-xun,GAO Wen
    2002, 16(1): 32-37.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper describes an approach of classifying the continuous mouth shapes ,which are obtained from sequence images of Chainese pronunciation of vowel and consonant .Based on the audiovisual bimodal database ,we present a classifying method called Two-Step Classification. First ,we located the lip and extract the features using adaptive chromatic filter model. Then ,relying on the features chosen ,we classify the sequence mouth shapes into 15 categories. The purpose of mouth shape classification is to confirm the mumber of states ,shrink searching space and expedite convergence speed for lipreading recognition.
  • JIA Lei,MU Xiang-yu,XU Bo
    2002, 16(1): 38-43.
    Abstract ( ) PDF ( ) Knowledge map Save
    Speaker change point detection based on BIC criterion is the most widely used method in speaker change detection in broadcasting segmentation. Although the author asserts that this method is free from threshold ,the BIC value of a change point must above 0 is too strict for some short utterance.Because speakers are different from each other ,the BIC value of two different speakers is spread over a large range in our test . In this paper ,a speaker change detection method based on entropy changing trend is used to locate the change point in a sliding window with definite length. The entropy change trend is tested for every hypothesized speaker change point in the window. By this change trend detection ,the threshold is avoided successfully ,which makes the proposed speaker change detection method is possible for the detection of different kinds of speaker change and the speaker change of the short utterance.
  • HU Wei-xiang,XU Bo,HUANG Tai-yi
    2002, 16(1): 44-49.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on large speech corpus (ASCCD) with prosodic structure label ,this paper presents some statistic result on acoustic parameter of prosodic boundary. We study the syllable duration ,intensity and pitch at the boundary and select a serial of acoustic parameter to train a CART. The result shows that the parameter characterize acoustic feature of the prosodic boundary and the trained CART can classify different boundary efficiency. So it is possible to train statistical model for prosodic boundary location in Mandarin ,this is very important both for speech recognition and synthesis.
  • LV Ping,WANG Zuo-ying,LU Da-jin
    2002, 16(1): 50-54.
    Abstract ( ) PDF ( ) Knowledge map Save
    A novel speaker adaptation method named maximum likelihood model interpolation (MLMI) is proposed. The basic idea of MLMI is to compute the speaker adapted (SA) model of a test speaker by a linear convex combination of a set of speaker dependent (SD) models according to maximum likelihood (ML) criterion. This method has made use of the correlation of speech units. Then ,two concrete algorithms named mean linear interpolation and matrix linear interpolation respectively are given. Experiments show that 3 adaptation utterances can give a significant performance improvement .
  • CHEN Jun-yan,LI Juan-zi,WANG Zuo-ying
    2002, 16(1): 55-60.
    Abstract ( ) PDF ( ) Knowledge map Save
    In order to build a more accurate and robust voice command system , a novel Word Graph Expansion algorithm for voice command understanding is presented in this paper. It has been proved by experimental results that this algorithm has a much better performance than the generally adopted N-best algorithm while maintaining high computation efficiency. Also an error tolerance method is put forward to improve the robustness of our voice command understanding module ,which further decreases the understanding error rate (UER) to 16.6% with the computation efficiency almost unchanged compared with the case without error tolerance.
  • WU Gen-qing,ZHENG Fang,JIN Ling,WU Wen-hu
    2002, 16(1): 61-66.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper ,an online incremental language model adaptation method is proposed ,which is different from the traditional offline language model adaptation method. There are some problems in the online incremental adaptation. The first one is how to design a flexible framework for online adaptation ,the second one is how to adjust the parameters of the model incrementally according to the corpus collected online. In our application platform ,the whole model is divided into two parts - - the background model and the user model respectively. An effective storage structure ,integrating with parameter looking ahead technique ,accelerates the visiting procedure ; a dynamic weighting MAP method is proposed to adjust the parameters in the user model. Experiments show that it can achieve a comparative Chinese character error rate reduction in Chinese Pinyin to Hanzi translation.