High-Productivity Stream Programming For High-Performance Sy(2)

时间:2025-04-04

Applications that are structured around some notion of a “stream ” are increasingly prevalent to common computing practices, and there is evidence that streaming media applications already consume a substantial fraction of the computation cycles on consu

PresentationOutline:ThepresentationwilldescribetheStreamItlanguageanditssalientfeatures[9,10].Wewillfocusonthehierarchicalnatureofthelanguage,highlightingmodularity,malleability,andportability.Inaddition,thetalkwillprovideanoverviewoftheStreamItcompilerinfrastructure,whichwasreleasedpubliclyonourwebsite(http://cag.csail.mit.edu/streamit).Wewilldescribethecompilerinthecontextofthreeresearchthrusts:automat-ingdomain-speci cDSPoptimizations,targetingdistributedcommunication-exposedarchitectures,andperformingcache-wareoptimizations.

First,wewillpresentasetofdomain-speci coptimizationsforlinearsectionsofthestreamgraph[5,2].Acom-putationislinearifeachofitsoutputscanberepresentedasanaf necombinationofitsinputs(e.g.,FIR lters,expanders,compressors,FFTs).TheStreamItcompilerrecognizeslinearcomputationusingasimpledata owanaly-sis.Itthenexploitsthelinearpropertiestoperformalgebraicsimpli cationandtotranslatelinearcomputationsintothefrequencydomain(whenpro table).Thesetransformationsyieldanaveragespeedupof4.5×onaPentium3.Second,wewilldescribeourbackendsupportfortiledandmulticoreprocessors,anddistributedcomputingplat-forms[3].WeusetheMITRawarchitectureasanevaluationvehiclefortheformer,andaclusterofPentium3pro-cessorsinterconnectedwithahigh-speednetworkasanevaluationvehicleforthelatter.Toachievegoodperformanceontheseparalleltargets,thecompilerincludesphasesforworkestimation,loadbalancing,layout,andcommunicationscheduling.Theloadbalancingstageutilizesanoveldynamic-programmingalgorithmthatcanbeextendedtocon-siderarangeofhierarchicalcostfunctions.Whentargetinga16-tileRawmachine,thecompilerachievesanaveragespeedupof16×comparedtoa1-tileRawmachine,andanaveragespeedupof9×comparedtoaPentium3.ThecompileryieldssimilarperformancegainsonthePentium3cluster.

Third,wewillpresentseveralcacheawareoptimizationstoimproveinstructionanddatalocality,andimproveregisterallocationandschedulingfreedom[7].Theoptimizationsarefoundeduponasimpleandintuitivemodelthatquanti esthetemporallocalityofastreamingprogram.ThecacheawareoptimizationsintheStreamItcompileryielda249%averagespeedup(overunoptimizedcode)forourstreamingbenchmarksuiteonaStrongARM1110processor.Theoptimizationsalsoyielda154%speeduponaPentium3anda152%speeduponanItanium2.

REFERENCES

[1]H.AbelsonandG.Sussman.StructureandInterpretationofComputerPrograms.MITPress,1985.

[2]S.Agarwal.Linearstate-spaceanalysisandoptimizationofstreamitprograms.Master’sthesis,MITCSAIL,August2004.

[3]M.Gordon,W.Thies,M.Karczmarek,J.Lin,A.S.Meli,mb,C.Leger,J.Wong,H.Hoffmann,D.Maze,andS.Amarasinghe.AStreamCompilerforCommunication-ExposedArchitectures.InASPLOS,2002.

[4]K.Kuo,R.Rabbah,andS.Amarasinghe.Aproductiveprogrammingenvironmentforstreamcomputing.InWorkshoponProductivityandPerformanceinHigh-EndComputing,SanFrancisco,CA,Feb2005.

[5]mb,W.Thies,andS.Amarasinghe.Linearanalysisandoptimizationofstreamprograms.InACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation,SanDiego,CA,June2003.

[6]S.Rixner,W.J.Dally,U.J.Kapani,B.Khailany,A.Lopez-Lagunas,P.R.Mattson,andJ.D.Owens.ABandwidth-Ef cientArchitectureforMediaProcessing.InHPCA,Dallas,TX,November1998.

[7]J.Sermulins,W.Thies,R.Rabbah,andS.Amarasinghe.Cacheawareoptimizationofstreamprograms.InLanguages,Compilers,andToolsforEmbeddedSystems,Chicago,June2005.

[8]R.Stephens.ASurveyofStreamProcessing.ActaInformatica,34(7),1997.

[9]W.Thies,M.Karczmarek,andS.Amarasinghe.StreamIt:ALanguageforStreamingApplications.InProc.oftheInt.Conf.onCompilerConstruction(CC),2002.

[10]W.Thies,M.Karczmarek,J.Sermulins,R.Rabbah,andS.Amarasinghe.Teleportmessagingfordistributedstreamprograms.InSymposium

onPrinciplesandPracticeofParallelProgramming,Chicago,Illinois,June2005.

[11]E.Waingold,M.Taylor,D.Srikrishna,V.Sarkar,W.Lee,V.Lee,J.Kim,M.Frank,P.Finch,R.Barua,J.Babb,S.Amarasinghe,and

A.Agarwal.Baringitalltosoftware:puter,30(9):86–93,1997.

…… 此处隐藏:2078字,全部文档内容请下载后查看。喜欢就下载吧 ……
High-Productivity Stream Programming For High-Performance Sy(2).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219