Journal of Chinese Information Processing

Select

Study of Chinese Speech Synthesis System Based on Statistic Prosody Model

TAO Jian-hua,ZHAO Sheng,CAI Lian-hong

2002, 16(1): 2-7.

Abstract ( ) PDF ( )

Knowledge map

Save

The paper describes the methods of Chinese Prosodic Hierarchy Analysis and Prosody Modeling ,which are based on statistic algorithm. Meanwhile ,the paper also describes the prosody cost function and corresponding training method for the parameters. Furthermore ,the interaction among the prosodic features is analyzed in respond to its influence in speech unit selection procedure.Based on these ,a Chinese Syllable Unit Selection Model was generated for the spontaneous speech synthesis system. The tests show that the method described in the paper is much suitable to the constitution of Speech Synthesis System and improves the naturalness of the synthesis result a lot .

Select

Recognition of Speech under G-force Based on the Weighted Feature

ZHANG Lei,HAN Ji-qing,WANG Cheng-fa,ZHANG Wen-xiang

2002, 16(1): 8-13.

Abstract ( ) PDF ( )

Knowledge map

Save

Based on the analysis of stressful speech ,an interesting fact that the different dimension of MFCC feature has different sensitivity of G-force is found. Generally ,the lower dimensions are more sensitive to stress ,and the sensitivity of higher dimensions is less. Therefore ,a new approach named weighted MFCC feature is proposed for the recognition under G-force in the paper. Using the weighted feature to emphasize the influence of higher dimensions , the better performance of recognition system can be achieved. In order to obtain the weights ,a new method named maximum relative entropy weights is proposed in which the initial weights are the linear weights. For a small-vocabulary speaker-dependent system ,the recognition rates of these methods are better than that of traditional multi-style training method. Among these methods ,maximum relative entropy weights can reach the best performance with 89.9% recognition rate ,which improves 13.1% comparing with the multi-style training method.

Select

Comparative Analysis Between Read and Spontaneous Speech

LIU Ya-bin,LI Ai-jun

2002, 16(1): 14-19,54.

Abstract ( ) PDF ( )

Knowledge map

Save

From the development of language ,spontaneous speech is an archaic ,common used and typical form of the language. In the past decades from 50s to 80s of the 20th century ,we focused on read speech to do our research in three fields : acoustics ,psychology and physiology. In the recent 10 years ,the research on spontaneous speech is becoming more and more important for the speech applied technology and the associated theories. Spontaneous speech rather than read speech is one of the unresolved problems faced by many speech recognition systems. Many differences exist between read and spontaneous speech in Chinese on linguistic and phonetic aspects ,such as prosodic and segmental variability ,turn takings ,discourse topics and paralinguistic phenomena. This paper gives some illustrations and then depicts the research on read and spontaneous speech by analyzing the annotated read speech corpus ASCCD and spontaneous speech corpus CASS and CADCC.

Select

Integrating Sub - band Information into Feature Extraction for Robust Speech Recognition

ZHANG Xin-yan,WANG Fan,ZHENG Fang,XU Ming-xing,WU Wen-hu

2002, 16(1): 20-25.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper ,we propose a new method to integrate the sub - band information into features via both the sub - band weighting and the spectral subtraction for robust speech recognition. In this method ,just simple on - line noise estimation and sub - band processing where the sub - bands divided by the filter banks of common MFCC Calculation are added into the traditional MFCC calculation algorithm to achieve the robust MFCC ,without any prior knowledge of the noise. Furthermore ,other robust methods after the feature extraction step can be used together with this method to obtain high recognition performance in adverse environments. Experiments show that the new robust MFCC yields good recognition results compared with the traditional feature. Forexample ,at 5 to 10dB SNR , it can reduce the error rate by over 20% compared with the traditional MFCC.

Select

Study of Speech Characteristics Based on Phase Space Reconstruction

CHEN Liang，ZHANG Xiong-wei

2002, 16(1): 26-31.

Abstract ( ) PDF ( )

Knowledge map

Save

Based on Takens theory ,time delay method is used to reconstruct phase space of speech signal in this paper. In hyper dimensional phase space , similar sequence repeatability (RPT) of speech are calculated. At the same time ,according to the RPT difference between voice and unvoice ,speech phonemes are classified. The method proposed in this paper provides a new way for study of speech recognition.

Select

Sequence Mouth Shape Classification for Speechreading

SHAN Wei,YAO Hong-xun,GAO Wen

2002, 16(1): 32-37.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper describes an approach of classifying the continuous mouth shapes ,which are obtained from sequence images of Chainese pronunciation of vowel and consonant .Based on the audiovisual bimodal database ,we present a classifying method called Two-Step Classification. First ,we located the lip and extract the features using adaptive chromatic filter model. Then ,relying on the features chosen ,we classify the sequence mouth shapes into 15 categories. The purpose of mouth shape classification is to confirm the mumber of states ,shrink searching space and expedite convergence speed for lipreading recognition.

Select

Broadcasting Segmentation

JIA Lei,MU Xiang-yu,XU Bo

2002, 16(1): 38-43.

Abstract ( ) PDF ( )

Knowledge map

Save

Speaker change point detection based on BIC criterion is the most widely used method in speaker change detection in broadcasting segmentation. Although the author asserts that this method is free from threshold ,the BIC value of a change point must above 0 is too strict for some short utterance.Because speakers are different from each other ,the BIC value of two different speakers is spread over a large range in our test . In this paper ,a speaker change detection method based on entropy changing trend is used to locate the change point in a sliding window with definite length. The entropy change trend is tested for every hypothesized speaker change point in the window. By this change trend detection ,the threshold is avoided successfully ,which makes the proposed speaker change detection method is possible for the detection of different kinds of speaker change and the speaker change of the short utterance.

Select

An Experimental Study on Prosodic Boundary in Chinese Mandarin

HU Wei-xiang,XU Bo,HUANG Tai-yi

2002, 16(1): 44-49.

Abstract ( ) PDF ( )

Knowledge map

Save

Based on large speech corpus (ASCCD) with prosodic structure label ,this paper presents some statistic result on acoustic parameter of prosodic boundary. We study the syllable duration ,intensity and pitch at the boundary and select a serial of acoustic parameter to train a CART. The result shows that the parameter characterize acoustic feature of the prosodic boundary and the trained CART can classify different boundary efficiency. So it is possible to train statistical model for prosodic boundary location in Mandarin ,this is very important both for speech recognition and synthesis.

Select

A Speaker Adaptation Algorithm Based on Matrix Linear Interpolation

LV Ping,WANG Zuo-ying,LU Da-jin

2002, 16(1): 50-54.

Abstract ( ) PDF ( )

Knowledge map

Save

A novel speaker adaptation method named maximum likelihood model interpolation (MLMI) is proposed. The basic idea of MLMI is to compute the speaker adapted (SA) model of a test speaker by a linear convex combination of a set of speaker dependent (SD) models according to maximum likelihood (ML) criterion. This method has made use of the correlation of speech units. Then ,two concrete algorithms named mean linear interpolation and matrix linear interpolation respectively are given. Experiments show that 3 adaptation utterances can give a significant performance improvement .

Select

A Voice Command Understanding and Error Tolerance AlgorithmBased on Word Graph Expansion

CHEN Jun-yan,LI Juan-zi,WANG Zuo-ying

2002, 16(1): 55-60.

Abstract ( ) PDF ( )

Knowledge map

Save

In order to build a more accurate and robust voice command system , a novel Word Graph Expansion algorithm for voice command understanding is presented in this paper. It has been proved by experimental results that this algorithm has a much better performance than the generally adopted N-best algorithm while maintaining high computation efficiency. Also an error tolerance method is put forward to improve the robustness of our voice command understanding module ,which further decreases the understanding error rate (UER) to 16.6% with the computation efficiency almost unchanged compared with the case without error tolerance.

Select

An Online Incremental Language Model Adaptation

WU Gen-qing,ZHENG Fang,JIN Ling,WU Wen-hu

2002, 16(1): 61-66.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper ,an online incremental language model adaptation method is proposed ,which is different from the traditional offline language model adaptation method. There are some problems in the online incremental adaptation. The first one is how to design a flexible framework for online adaptation ,the second one is how to adjust the parameters of the model incrementally according to the corpus collected online. In our application platform ,the whole model is divided into two parts - - the background model and the user model respectively. An effective storage structure ,integrating with parameter looking ahead technique ,accelerates the visiting procedure ; a dynamic weighting MAP method is proposed to adjust the parameters in the user model. Experiments show that it can achieve a comparative Chinese character error rate reduction in Chinese Pinyin to Hanzi translation.

Please choose a citation manager

Content to export

2002 Volume 16 Issue 1 Published: 15 February 2002