A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH(3)
时间:2025-03-09
时间:2025-03-09
Room reverberation causes two perceptual distortions on clean speech: Coloration and long-term reverberation. These two effects correspond to two physical variables: Signal-toreverberant energy ratio (SRR) and reverberation time, respectively. Based on thi
¬
TableI.ThesystematicresultsofreverberantspeechenhancementforspeechutterancesoffourfemaleandfourmalespeakersrandomlyselectedfromtheTIMITdatabase.Allsignalsaresampledat8kHz.Speaker/GenderFemale#1Female#2Female#3Female#4Male#1Male#2Male#3Male#4Average
SNRrevfw
SNRYMfw
proc
SNRfw
rev
SNRYMfw
proc rev
SNRfw
short-termphasespectrumofenhancedspeechissettothatofinverse-filteredspeechandtheprocessedspeechisreconstructedfromtheshort-termmagnitudeandphasespectrum.
3.RESULTSANDDISCUSSIONS
Acorpusofspeechutterancesfromeightspeakers,fourfemalesandfourmales,rmallisteningtestsshowthattheproposedalgorithmachievessubstantialreductionofreverberationandhaslittleaudibleartifacts.Toillustratetypicalperformance,weshowtheenhancementresultsinFig.2.Fig.2(a)and(c)showthecleanandthereverberantsignalandFig.2(b)and(d),thecorrespondingspectrograms,respectively.ThereverberantsignalisproducedbyconvolvingthecleansignalandtheroomimpulseresponsefunctioninFig.1(a)withT60=0.3s.Ascanbeseen,whilethecleansignalhasfineharmonicstructureandsilencegapsbetweenthewords,thereverberantspeechissmearedanditsharmonicstructureiselongated.
Toputourperformanceinperspective,wecomparewitharecentone-microphonereverberantspeechenhancementalgorithmproposedbyYegnanarayanaandMurthy[16].WerefertothisalgorithmastheYMalgorithm.TheYMalgorithmappliesweightstoLPresidualsothattheyresemblemorecloselythedampedsinusoidalpatternsofLPresidualfromcleanspeech.Fig.2(e)and(f)showtheprocessedspeechusingtheYMalgorithmanditsspectrogram,respectively.Ascanbeseen,spectralstructureisclearerandsomesilencegapsareattenuated.TheprocessedspeechusingouralgorithmanditsspectrogramareshowninFig.2(g)and(h).Ascanbeseen,theeffectsofreverberationhavebeensignificantlyreducedintheprocessedspeech.Thesmearingislessenedandmanysilencegapsareclearer.ThefigureclearlyshowsthatouralgorithmenhancesthereverberantspeechmorethandoestheYMalgorithm.Anaudiodemonstrationalsocanbefoundathttp://www.cse.ohio-state.edu/~dwang/demo/WuReverb.html.
Quantitativecomparisonsareobtainedfromthespeechutterancesoftheeightspeakersseparatelyutilizingfrequency-weightedsegmentalSNR[14]andpresentedinTableI.SNRrevfw,
(dB)
-3.64-3.51-3.86-4.12-3.86-3.33-3.30-3.50-3.64(dB)-3.06-3.05-3.19-3.29-2.65-2.68-2.53-2.76-2.90(dB)0.920.74-0.200.73-0.921.771.20-0.130.51(dB)0.580.460.680.831.210.650.760.750.74(dB)4.564.253.664.842.945.104.493.384.15
Theshiftdelayρindicatestherelativedelayofthelate-impulsecomponents.Thedistinctionofearlyandlatereflectionsforspeechiscommonlysetatadelayof50msinaroomimpulseresponsefunction[11].Thisdelayreflectsthepropertiesofspeechandisindependentfromreverberationcharacteristics.Consequently,ittranslatestoapproximately7framesforashiftintervalof8ms,andwechooseρ=7asaresult.Finally,thescalingfactor specifiestherelativestrengthofthelate-impulsecomponentsafterinversefilteringandwesetitto0.32.
Consideringtheshapeoftheequalizedimpulseresponse,wechooseanasymmetricalsmoothingfunctionastheRayleighdistribution:
§ i+a2i+a¨
°w(i)=a2exp¨2a2
©®
°¯w(i)=0
·
¸¸¹
ifi> aotherwise
,(2)
¬
wherewechoosea=5anditcontrolsthespanofthesmoothingfunction.Thissmoothingfunctiongoesdowntozeroontheleftsidequicklybuttailsoffslowlyontherightside;therightsideofthesmoothingfunctionresemblestheshapeofreverberationtailsinequalizedimpulseresponses.
Assumingtheearly-andlate-impulsecomponentsareapproximatelyuncorrelated.,thepowerspectrumoftheearly-impulsecomponentscanbeestimatedbysubtractingthepowerspectrumofthelate-impulsecomponentsfromthatoftheinverse-filteredspeech.Theresultsarefurtherusedasanestimateofthepowerspectrumoforiginalspeech.Specifically,spectralsubtraction[7]isemployedtoestimatethepowerspectrumoforiginalspeechS~x(k;i):
2
SNRYMandfw,proc
SNRfw
representthefrequency-weighted
segmentalSNRvaluesofreverberantspeech,theprocessedspeechusingtheYMalgorithm,andtheprocessedspeechusingouralgorithm,respectively.TheSNRgainsbyemployingthe
rev
andYMalgorithmandouralgorithmaredenotedbySNRYMfw
proc rev
SNRfw,respectively.Ascanbeseen,theYMalgorithm
S~x(k;i)=Sz(k;i)
2
2
ªS(k;i)2 γw(i ρ) S(k;i)2º
zz
max«,ε»,(3)2
«»Szk;i¬¼
whereε=0.001isthefloorandcorrespondstothemaximum
attenuationof30dB.
Naturalspeechutterancescontainsilentgaps,andreverberationfillssomeofthegapsrightafterhigh-intensityspeechsections.Weidentifythesesilentgapsbyexaminetheenergyofinverse-filteredspeechandenergyreductionradioafterspectralsubtractioninatimeframe.Foridentifiedsilentframes,allfrequencybinsareattenuatedby30dB.Finally,the
obtainsanaverageSNRgainof0.74dBcomparedtothatof4.15dBbyouralgorithm.
Althoughouralgorithmisdesignedforenhancingreverberantspeechusingonemicrophone,itisstraightforwardtoextenditintomulti-microphonescenarios.Manyinversefilteringalgorithms,suchasthealgorithmbyGillespieetal.[8],areoriginallyproposedusingmultiplemicrophones.Afterinversefilteringusingmultiplemicrophones,thesecondstageofouralgorithm–thespectralsubtractionmethod–canbeutilizedforreducinglong-termreverberationeffects.
…… 此处隐藏:3124字,全部文档内容请下载后查看。喜欢就下载吧 ……