Noise Subspace Fuzzy C-means Clustering for Robust Speech Re(2)

时间：2026-01-17

Abstract. In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built ove

The di erent approaches include those based on energy thresholds, pitch detection, spectrum analysis, zero-crossing rate, periodicity measure or combinations of di erent features. The speech/pause discrimination can be described as an unsupervised learning problem. Clustering is one solution to this case where data is divided into groups which are related “in some sense”. Despite the simplicity of clustering algorithms, there is an increasing interest in the use of clustering methods in pattern recognition, image processing and information retrieval [9, 10]. Clustering has a rich history in other disciplines [11] such as machine learning, biology, psychiatry, psychology, archaeology, geology, geography, and marketing. Cluster analysis, also called data segmentation, has a variety of goals. All related to grouping or segmenting a collection of objects into subsets or “clusters” such that those within each cluster are more closely related to one another than objects assigned to di erent clusters. Cluster analysis is also used to form descriptive statistics to ascertain whether or not the data consist of a set of distinct subgroups, each group representing objects with substantially di erent properties.2A suitable model for VAD{xj } = {x(i + j · D)}; iLet x(n) be a discrete time signal. Denote by yj a frame of signal containing the elements:i = 1...L (1)where D is the window shift and L is the number of samples in each frame. Consider the set of 2 · m + 1 frames {yl m , . . . yl . . . , yl+m } centered on frame yl , and denote by Y (s, j), j = l m, . . . l . . . , l + m its Discrete Fourier Transform (DFT) resp.:NF F T 1Yj (ωs ) ≡ Y (s, j) =n=0x(n + j · D) · exp ( j · n · ωs ) .(2)2π·s where ωs = NF F T , 0 ≤ s ≤ NF F T 1 and NF F T is the number of points or resolution used in the DFT (if NF F T > L then the DFT is padded with zeros). The energies for the l-th frame, E(k, l), in K subbands (k = 0, 1, ..., K 1), are computed by means of:E(k, l) = sk =K NF F Tsk+1 1 s=sk|Y (s, l)|2(3)NF F T 2Kkk = 0, 1, ..., K 1where an equally spaced subband assignment is used and · denotes the “ oor” function. Hence, the signal energy is averaged over K subbands obtaining a suitable representation of the input signal for VAD [12], the observation vector at frame l, E(l) = (E(0, l), . . . , E(K 1, l))T . The VAD decision rule is formulated over a sliding multiple observation (MO) window consisting of 2m+1 observation vectors around the frame for which the decision is being made (l), as we will show in the following sections. This strategy consisting on “long term information” provides very good results using several approaches for VAD such as [8] etc.

…… 此处隐藏：921字，全部文档内容请下载后查看。喜欢就下载吧 ……

Noise Subspace Fuzzy C-means Clustering for Robust Speech Re(2).doc 将本文的Word文档下载到电脑

下载这篇word文档