A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH(3)

时间：2026-04-24

Room reverberation causes two perceptual distortions on clean speech: Coloration and long-term reverberation. These two effects correspond to two physical variables: Signal-toreverberant energy ratio (SRR) and reverberation time, respectively. Based on thi

TableI.ThesystematicresultsofreverberantspeechenhancementforspeechutterancesoffourfemaleandfourmalespeakersrandomlyselectedfromtheTIMITdatabase.Allsignalsaresampledat8kHz.Speaker/GenderFemale#1Female#2Female#3Female#4Male#1Male#2Male#3Male#4Average

SNRrevfw

SNRYMfw

proc

SNRfw

rev

SNRYMfw

proc rev

SNRfw

short-termphasespectrumofenhancedspeechissettothatofinverse-filteredspeechandtheprocessedspeechisreconstructedfromtheshort-termmagnitudeandphasespectrum.

3.RESULTSANDDISCUSSIONS

Acorpusofspeechutterancesfromeightspeakers,fourfemalesandfourmales,rmallisteningtestsshowthattheproposedalgorithmachievessubstantialreductionofreverberationandhaslittleaudibleartifacts.Toillustratetypicalperformance,weshowtheenhancementresultsinFig.2.Fig.2(a)and(c)showthecleanandthereverberantsignalandFig.2(b)and(d),thecorrespondingspectrograms,respectively.ThereverberantsignalisproducedbyconvolvingthecleansignalandtheroomimpulseresponsefunctioninFig.1(a)withT60=0.3s.Ascanbeseen,whilethecleansignalhasfineharmonicstructureandsilencegapsbetweenthewords,thereverberantspeechissmearedanditsharmonicstructureiselongated.

Toputourperformanceinperspective,wecomparewitharecentone-microphonereverberantspeechenhancementalgorithmproposedbyYegnanarayanaandMurthy[16].WerefertothisalgorithmastheYMalgorithm.TheYMalgorithmappliesweightstoLPresidualsothattheyresemblemorecloselythedampedsinusoidalpatternsofLPresidualfromcleanspeech.Fig.2(e)and(f)showtheprocessedspeechusingtheYMalgorithmanditsspectrogram,respectively.Ascanbeseen,spectralstructureisclearerandsomesilencegapsareattenuated.TheprocessedspeechusingouralgorithmanditsspectrogramareshowninFig.2(g)and(h).Ascanbeseen,theeffectsofreverberationhavebeensignificantlyreducedintheprocessedspeech.Thesmearingislessenedandmanysilencegapsareclearer.ThefigureclearlyshowsthatouralgorithmenhancesthereverberantspeechmorethandoestheYMalgorithm.Anaudiodemonstrationalsocanbefoundathttp://www.cse.ohio-state.edu/~dwang/demo/WuReverb.html.

Quantitativecomparisonsareobtainedfromthespeechutterancesoftheeightspeakersseparatelyutilizingfrequency-weightedsegmentalSNR[14]andpresentedinTableI.SNRrevfw,

(dB)

-3.64-3.51-3.86-4.12-3.86-3.33-3.30-3.50-3.64(dB)-3.06-3.05-3.19-3.29-2.65-2.68-2.53-2.76-2.90(dB)0.920.74-0.200.73-0.921.771.20-0.130.51(dB)0.580.460.680.831.210.650.760.750.74(dB)4.564.253.664.842.945.104.493.384.15

Theshiftdelayρindicatestherelativedelayofthelate-impulsecomponents.Thedistinctionofearlyandlatereflectionsforspeechiscommonlysetatadelayof50msinaroomimpulseresponsefunction[11].Thisdelayreflectsthepropertiesofspeechandisindependentfromreverberationcharacteristics.Consequently,ittranslatestoapproximately7framesforashiftintervalof8ms,andwechooseρ=7asaresult.Finally,thescalingfactor specifiestherelativestrengthofthelate-impulsecomponentsafterinversefilteringandwesetitto0.32.

Consideringtheshapeoftheequalizedimpulseresponse,wechooseanasymmetricalsmoothingfunctionastheRayleighdistribution:

§ i+a2i+a¨

°w(i)=a2exp¨2a2

©®

°¯w(i)=0

¸¸¹

ifi> aotherwise

,(2)

wherewechoosea=5anditcontrolsthespanofthesmoothingfunction.Thissmoothingfunctiongoesdowntozeroontheleftsidequicklybuttailsoffslowlyontherightside;therightsideofthesmoothingfunctionresemblestheshapeofreverberationtailsinequalizedimpulseresponses.

Assumingtheearly-andlate-impulsecomponentsareapproximatelyuncorrelated.,thepowerspectrumoftheearly-impulsecomponentscanbeestimatedbysubtractingthepowerspectrumofthelate-impulsecomponentsfromthatoftheinverse-filteredspeech.Theresultsarefurtherusedasanestimateofthepowerspectrumoforiginalspeech.Specifically,spectralsubtraction[7]isemployedtoestimatethepowerspectrumoforiginalspeechS~x(k;i):

SNRYMandfw,proc

SNRfw

representthefrequency-weighted

segmentalSNRvaluesofreverberantspeech,theprocessedspeechusingtheYMalgorithm,andtheprocessedspeechusingouralgorithm,respectively.TheSNRgainsbyemployingthe

rev

andYMalgorithmandouralgorithmaredenotedbySNRYMfw

proc rev

SNRfw,respectively.Ascanbeseen,theYMalgorithm

S~x(k;i)=Sz(k;i)

ªS(k;i)2 γw(i ρ) S(k;i)2º

max«,ε»,(3)2

«»Szk;i¬¼

whereε=0.001isthefloorandcorrespondstothemaximum

attenuationof30dB.

Naturalspeechutterancescontainsilentgaps,andreverberationfillssomeofthegapsrightafterhigh-intensityspeechsections.Weidentifythesesilentgapsbyexaminetheenergyofinverse-filteredspeechandenergyreductionradioafterspectralsubtractioninatimeframe.Foridentifiedsilentframes,allfrequencybinsareattenuatedby30dB.Finally,the

obtainsanaverageSNRgainof0.74dBcomparedtothatof4.15dBbyouralgorithm.

Althoughouralgorithmisdesignedforenhancingreverberantspeechusingonemicrophone,itisstraightforwardtoextenditintomulti-microphonescenarios.Manyinversefilteringalgorithms,suchasthealgorithmbyGillespieetal.[8],areoriginallyproposedusingmultiplemicrophones.Afterinversefilteringusingmultiplemicrophones,thesecondstageofouralgorithm–thespectralsubtractionmethod–canbeutilizedforreducinglong-termreverberationeffects.

…… 此处隐藏：2124字，全部文档内容请下载后查看。喜欢就下载吧 ……

A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH(3).doc 将本文的Word文档下载到电脑

下载这篇word文档

上一篇：水平未知时一种图像恢复正则化算法(图像和数字

下一篇：从惠普浅谈构建以人为本的现代企业文化建设