Noise Subspace Fuzzy C-means Clustering for Robust Speech Re(6)

时间：2025-07-11

Abstract. In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built ove

x 1042 1 0 10246 t (secs)810126 5 4 3 2 1 0 200 400 600 Frame 800 1000 1200Fig. 2. VAD operation: Top- Decision function and threshold versus frames. BottomInput signal and VAD decision versus time.decision on speech processing systems [4]. The experimental framework and the objective performance tests conducted to evaluate the proposed algorithm are described in this section. The ROC curves are used in this section for the evaluation of the proposed VAD. These plots describe completely the VAD error rate and show the trade-o between the speech and non-speech error probabilities as the threshold γ varies. The Spanish SpeechDat- Car database [15] was used in the analysis. This database contains recordings in a car environment from close-talking and hands-free microphones. Utterances from the close-talking device with an average SNR of about 25dB were labeled as speech or non-speech for reference while the VAD was evaluated on the hands-free microphone. Thus, the speech and non-speech hit rates (HR1, HR0) were determined as a function of the decision threshold γ for each of the VAD tested. Figure 3 shows the ROC curves in the most unfavorable conditions (high-speed, good road) with a 5 dB average SNR. It can be shown that increasing the number of observation vectors m improves the performance of the proposed FCM-VAD. The best results are obtained for m = 8 while increasing the number of observations over this value reports no additional improvements. The proposed VAD outperforms the Sohn’s VAD [3], which assumes a single observation likelihood ratio test (LRT) in the decision rule together with an HMM-based hangover mechanism, as well as standardized VADs such as G.729 and AMR [2, 1]. It also improve recently reported methods [3, 6, 5, 7]. Thus, the proposed VAD works with improved speech/nonspeech hit rates when compared to the most relevant algorithms to date. Table 1 shows the recognition performance for the Spanish SDC database for the different training/test mismatch conditions (HM, high mismatch, MM: medium mismatch and WM: well matched) when WF and FD are performed on the base system [8]. The VAD outperforms all the algorithms used for reference, yielding relevant improvements in speech recognition.

…… 此处隐藏：502字，全部文档内容请下载后查看。喜欢就下载吧 ……

Noise Subspace Fuzzy C-means Clustering for Robust Speech Re(6).doc 将本文的Word文档下载到电脑

下载这篇word文档