High-Productivity Stream Programming For High-Performance Sy(2)

发布时间:2021-06-05

Applications that are structured around some notion of a “stream ” are increasingly prevalent to common computing practices, and there is evidence that streaming media applications already consume a substantial fraction of the computation cycles on consu

PresentationOutline:ThepresentationwilldescribetheStreamItlanguageanditssalientfeatures[9,10].Wewillfocusonthehierarchicalnatureofthelanguage,highlightingmodularity,malleability,andportability.Inaddition,thetalkwillprovideanoverviewoftheStreamItcompilerinfrastructure,whichwasreleasedpubliclyonourwebsite(http://cag.csail.mit.edu/streamit).Wewilldescribethecompilerinthecontextofthreeresearchthrusts:automat-ingdomain-speci cDSPoptimizations,targetingdistributedcommunication-exposedarchitectures,andperformingcache-wareoptimizations.

First,wewillpresentasetofdomain-speci coptimizationsforlinearsectionsofthestreamgraph[5,2].Acom-putationislinearifeachofitsoutputscanberepresentedasanaf necombinationofitsinputs(e.g.,FIR lters,expanders,compressors,FFTs).TheStreamItcompilerrecognizeslinearcomputationusingasimpledata owanaly-sis.Itthenexploitsthelinearpropertiestoperformalgebraicsimpli cationandtotranslatelinearcomputationsintothefrequencydomain(whenpro table).Thesetransformationsyieldanaveragespeedupof4.5×onaPentium3.Second,wewilldescribeourbackendsupportfortiledandmulticoreprocessors,anddistributedcomputingplat-forms[3].WeusetheMITRawarchitectureasanevaluationvehiclefortheformer,andaclusterofPentium3pro-cessorsinterconnectedwithahigh-speednetworkasanevaluationvehicleforthelatter.Toachievegoodperformanceontheseparalleltargets,thecompilerincludesphasesforworkestimation,loadbalancing,layout,andcommunicationscheduling.Theloadbalancingstageutilizesanoveldynamic-programmingalgorithmthatcanbeextendedtocon-siderarangeofhierarchicalcostfunctions.Whentargetinga16-tileRawmachine,thecompilerachievesanaveragespeedupof16×comparedtoa1-tileRawmachine,andanaveragespeedupof9×comparedtoaPentium3.ThecompileryieldssimilarperformancegainsonthePentium3cluster.

Third,wewillpresentseveralcacheawareoptimizationstoimproveinstructionanddatalocality,andimproveregisterallocationandschedulingfreedom[7].Theoptimizationsarefoundeduponasimpleandintuitivemodelthatquanti esthetemporallocalityofastreamingprogram.ThecacheawareoptimizationsintheStreamItcompileryielda249%averagespeedup(overunoptimizedcode)forourstreamingbenchmarksuiteonaStrongARM1110processor.Theoptimizationsalsoyielda154%speeduponaPentium3anda152%speeduponanItanium2.

REFERENCES

[1]H.AbelsonandG.Sussman.StructureandInterpretationofComputerPrograms.MITPress,1985.

[2]S.Agarwal.Linearstate-spaceanalysisandoptimizationofstreamitprograms.Master’sthesis,MITCSAIL,August2004.

[3]M.Gordon,W.Thies,M.Karczmarek,J.Lin,A.S.Meli,mb,C.Leger,J.Wong,H.Hoffmann,D.Maze,andS.Amarasinghe.AStreamCompilerforCommunication-ExposedArchitectures.InASPLOS,2002.

[4]K.Kuo,R.Rabbah,andS.Amarasinghe.Aproductiveprogrammingenvironmentforstreamcomputing.InWorkshoponProductivityandPerformanceinHigh-EndComputing,SanFrancisco,CA,Feb2005.

[5]mb,W.Thies,andS.Amarasinghe.Linearanalysisandoptimizationofstreamprograms.InACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation,SanDiego,CA,June2003.

[6]S.Rixner,W.J.Dally,U.J.Kapani,B.Khailany,A.Lopez-Lagunas,P.R.Mattson,andJ.D.Owens.ABandwidth-Ef cientArchitectureforMediaProcessing.InHPCA,Dallas,TX,November1998.

[7]J.Sermulins,W.Thies,R.Rabbah,andS.Amarasinghe.Cacheawareoptimizationofstreamprograms.InLanguages,Compilers,andToolsforEmbeddedSystems,Chicago,June2005.

[8]R.Stephens.ASurveyofStreamProcessing.ActaInformatica,34(7),1997.

[9]W.Thies,M.Karczmarek,andS.Amarasinghe.StreamIt:ALanguageforStreamingApplications.InProc.oftheInt.Conf.onCompilerConstruction(CC),2002.

[10]W.Thies,M.Karczmarek,J.Sermulins,R.Rabbah,andS.Amarasinghe.Teleportmessagingfordistributedstreamprograms.InSymposium

onPrinciplesandPracticeofParallelProgramming,Chicago,Illinois,June2005.

[11]E.Waingold,M.Taylor,D.Srikrishna,V.Sarkar,W.Lee,V.Lee,J.Kim,M.Frank,P.Finch,R.Barua,J.Babb,S.Amarasinghe,and

A.Agarwal.Baringitalltosoftware:puter,30(9):86–93,1997.

精彩图片

热门精选

大家正在看