A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH(2)
时间:2025-03-09
时间:2025-03-09
Room reverberation causes two perceptual distortions on clean speech: Coloration and long-term reverberation. These two effects correspond to two physical variables: Signal-toreverberant energy ratio (SRR) and reverberation time, respectively. Based on thi
determinedbysignal-to-reverberantenergyratio(SRR),whichistheratiobetweentheenergytravelingdirectlyfromasourcetoalistenerandtheenergyofallacousticreflectionsreachingthelistener,andinturn,itisdeterminedbytalker-to-microphonedistance.Shortertalker-to-microphonedistanceresultsinhigherSRRandlessspectraldeviation,hence,lesscoloration.
Consequently,weproposeatwo-stagemodeltodealwithtwotypesofdegradations–colorationandlong-termreverberation–inareverberantenvironment.Inthefirststage,ourmodelestimatesaninversefiltertoreducecolorationeffectsinordertoincreaseSRR.Thesecondstageemploysspectralsubtractiontominimizetheinfluenceoflong-termreverberation.
3.INVERSEFILTERING
Inthefirststageofouralgorithm,wederiveaninversefiltertoreducethereverberationeffectsandthisstageisadaptedfromamulti-microphoneinversefilteringalgorithmproposedbyGillespieatel.[8].AnFIRinversefilteroftheroomimpulseresponseisestimatedbymaximizingthekurtosisofthelinearprediction(LP)residualofspeechutilizingablockfrequency-domainadaptivefilter.Then,inverse-filteredspeechisobtainedbyconvolvingtheinversefilterwithreverberantspeech.
AtypicalresultfromthefirststageofouralgorithmisshowninFig.1.Fig.1(a)illustratesaroomimpulseresponsefunction(T60=0.3s)generatedbytheimagemodelofAllenandBerkley[1].Theequalizedimpulseresponse–theresultoftheroomimpulseresponseinFig.1(a)convolvedwiththeobtainedinversefilter–isshowninFig.1(b).Ascanbeseen,theequalizedimpulseresponseisfarmoreimpulse-likethantheroomimpulseresponse.Infact,theSRRvalueoftheroomimpulseresponseis–9.8dBincomparisonwith2.4dBforthatoftheequalizedimpulseresponse.
However,theaboveinversefilteringmethoddoesnotimproveonthetailpartofreverberation.Fig.1(c)and(d)showtheenergydecaycurvesoftheroomimpulseresponseandtheequalizedimpulseresponse,respectively.Ascanbeseen,exceptforthefirst50ms,theenergydecaypatternsarealmostidentical,andthustheestimatedreverberationtimesarealmostthesame,around0.3s.WhilethecolorationdistortionisreducedduetotheincreaseofSRR,thedegradationduetoreverberationtailsisnotalleviated.Inotherwords,theeffectofinversefilteringissimilartothatofmovingthesoundsourceclosertothereceiver.Inthenextsection,weintroducethesecondstageofouralgorithmtoreducetheeffectsoflong-termreverberation.
3.SPECTRALSUBTRACTION
Latereflectionsinaroomimpulseresponsefunctionsmearspeechspectrumanddegradespeechintelligibilityandquality.Likewise,anequalizedimpulseresponsecanbedecomposedintotwoparts:earlyandlateimpulses.Resemblingtheeffectsofthelatereflectionsinaroomimpulseresponse,thelateimpulseshavedeleteriouseffectsonthequalityofinverse-filteredspeech;byestimatingtheeffectsofthelateimpulsesandsubtractingthem,wecanexpecttoenhancethespeechquality.
Inapreviousversionofthisalgorithm,WuandWang[15]proposeaone-stagemethodtoenhancethereverberantspeechbyestimatingandsubtractingeffectsoflatereflections.
Thesmearingeffectsoflateimpulsesleadtothesmoothingofthesignalspectruminthetimedomain.Therefore,weassumethatthepowerspectrumoflate-impulsecomponentsisa
(a)
(b)
(c)
Time(ms)
(d)
Fig.1.(a)Aroomimpulseresponsefunctiongeneratedbytheimagemodelinanoffice-sizeroom.(b)Theequalizedimpulseresponsederivedfromthereverberantspeechgeneratedbytheroomimpulseresponsein(a)astheresultofthefirststageofouralgorithm.Energydecaycurves(c)thatcomputedfromtheroomimpulseresponsefunctionin(a).(d)Thatfromtheequalizedimpulseresponsein(b).EachcurveiscalculatedusingtheSchroederintegrationmethod.Thehorizontaldotlinerepresents–60dBenergydecaylevel.Theleftdashlinesindicatethestartingtimesoftheimpulseresponsesandtherightdashlinesthetimesatwhichdecaycurvescross–60dB.
¬
smoothedandshiftedversionofthepowerspectrumoftheinverse-filteredspeechzt:
()
Sl(k;i=γw(i ρ) Sz(k;i),
2
2
(1)
whereSz(k;i)
2
andSl(k;i)
2
are,respectively,theshort-term
powerspectraoftheinverse-filteredspeechandthelate-impulsecomponents.Indexeskandirefertofrequencybinandtimeframe,respectively.Thesymbol denotesconvolutioninthetimedomainandw(i)isasmoothingfunction.Theshort-termspeechspectrumisobtainedbyusinghammingwindowsoflength16mswith8msoverlapforshort-termFourieranalysis.
…… 此处隐藏:2287字,全部文档内容请下载后查看。喜欢就下载吧 ……