Convolutional Neural Networks for Speech Recognition
时间:2025-04-07
时间:2025-04-07
IEEE Transaction上将CNN用于语音识别的论文讲解。
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2014
Convolutional Neural Networks for Speech Recognition
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu
IEEE Transaction上将CNN用于语音识别的论文讲解。
Organization of the paper Introduction Deep Neural Networks Convolutional Neural Networks CNN with limited weight sharing Experiments Conclusions
IEEE Transaction上将CNN用于语音识别的论文讲解。
Deep Neural Networks Generally speaking, a deep neural network (DNN) refers to a feedforward neural network with more than one hidden layer.Each hidden layer has a number of units (or neurons), each of which takes all outputs of the lower layer as input, multiplies them by a weight vector, sums the result and passes it through a non-linear activation function such as sigmoid or tanh.
IEEE Transaction上将CNN用于语音识别的论文讲解。
All neuron activations in each layer can be represented in the following matrix form:
IEEE Transaction上将CNN用于语音识别的论文讲解。
For a multi-class classification problem, the posterior probability of each class can be estimated using an output softmax layer:
IEEE Transaction上将CNN用于语音识别的论文讲解。
In the hybrid DNN-HMM model, the DNN output layer computes the state posterior probabilities which are divided by the states’ priors to estimate the observation likelihoods. Because of the increased model complexity of DNNs, a pretraining algorithm is often needed, which initializes all weight matrices prior to the above backpropagation algorithm.
IEEE Transaction上将CNN用于语音识别的论文讲解。
One popular method to pretrain DNNs uses the restricted Boltzmann machine (RBM). An RBM has a set of hidden units that are used to compute a better feature representation of the input data. After learning, all RBM weights can be used as a good initialization for one DNN layer.
IEEE Transaction上将CNN用于语音识别的论文讲解。
Convolutional Neural Networks The convolutional neural network (CNN) can be regarded as a variant of the standard neural network. Instead of using fully connected hidden layers as described in the preceding section, the CNN introduces a special network structure, which consists of convolution and pooling layers.
IEEE Transaction上将CNN用于语音识别的论文讲解。
Convolutional Neural Networks Organization of the Input Data to the CNN Convolution Ply Pooling Ply Learning Weights in the CNN Treatment of Energy Features The Overall CNN Architecture Benefits of CNNs for ASR
IEEE Transaction上将CNN用于语音识别的论文讲解。
Organization of the Input Data to the CNN In using the CNN for pattern recognition, the input data need to be organized as a number of feature maps to be fed into the CNN. We need to use inputs that preserve locality in both axes of frequency and time. MFSC features will be used to represent each speech frame, along with their deltas and delta-deltas, in order to describe the acoustic energy distribution in each of several different frequency bands.
IEEE Transaction上将CNN用于语音识别的论文讲解。
As for time, a single window of input to the CNN will consist of a wide amount of context. As for frequency, the conventional use of MFCCs does present a major problem because the discrete cosine transform projects the spectral energies into a new basis that may not maintain locality.
In this paper, we shall use the log-energy computed directly from the melfrequency spectral coefficients (i.e., with no DCT), which we will denote as MFSC features.
IEEE Transaction上将CNN用于语音识别的论文讲解。
a number of one-dimensional (1-D) feature maps
three 2-D feature maps
IEEE Transaction上将CNN用于语音识别的论文讲解。
IEEE Transaction上将CNN用于语音识别的论文讲解。
Convolution Ply Every input feature map is connected to many feature maps in the convolution ply based on a number of local weight matrices. The mapping can be represented as the well-known convolution operation in signal processing.
IEEE Transaction上将CNN用于语音识别的论文讲解。
each unit of one feature map in the convolution ply can be computed as:
written in a more concise matrix form:
IEEE Transaction上将CNN用于语音识别的论文讲解。
…… 此处隐藏:2002字,全部文档内容请下载后查看。喜欢就下载吧 ……