L1 Cache and TLB Enhancements to the RAMpage Memory Hierarch(7)
发布时间:2021-06-06
发布时间:2021-06-06
Abstract. The RAMpage hierarchy moves main memory up a level to replace the lowest-level cache by an equivalent-sized SRAM main memory, with a TLB caching page translations for that main memory. This paper illustrates how more aggressive components higher
mainmemory,6.25%ofthememoryismappedbytheTLB.IftheTLBhas512entries,theTLBmaps50%ofthememory.Bycomparison,witha128Bpage,a64-entryTLBonlymapsabout0.2%ofthememory,andabigincreaseinthesizeoftheTLBislikelytohaveasigni cante ect.
Thee ectonaconventionalarchitectureofincreasingTLBsizeisnotassigni cantbecauseitmapsDRAMpages( xedat4KB),notSRAMpages.Further,variationacrossL2blocksizesshouldnotberelatedtoTLBsize.4Results
Thissectionpresentsresultsofsimulations,withsomediscussion.Themainfocusisondi erencesintroducedbychangesoverprevioussimulations,butsomeadvantagesofRAMpage,aspreviouslydescribed,shouldbeevidentagainfromthesenewresults.Presentationofresultsisbrokendownintoe ectsofincreasingL1cachesize,ande ectsofincreasingTLBsize,sincetheseimprovementshaveverydi erente ectsonthehierarchiesmodelled.Resultsarepresentedfor3cases:theconventional2-levelcachewithaDRAMmainmemory,andRAMpagewithandwithoutcontextswitchesonmisses.
Theremainderofthissectionpresentsthee ectsofL1changes,thenthee ectsofTLBchanges,followedbyasummary.
4.1IncreasingL1Size
Fig.1showshowmissratesoftheL1instructionanddatacachesvaryastheirsizeincreasesforbothRAMpagewithcontextswitchesonmissesandthestan-dardhierarchy.(RAMpagewithoutswitchesonmissesfollowsthesametrendasthestandardhierarchy.)Ascachesizesincrease,themissratedecreases,initiallyfairlyrapidly.Thetrendissimilarforallmodels.
Executiontimesareplottedin g.2,normalisedtothebestexecutiontimeateachCPUspeed.Asexpected,largercachesdecreaseexecutiontimesbyre-ducingcapacitymisses,asevidentfromthereducedmissrates–withlimitstothebene tsasL1scalesup.Thebestoveralle ectisfromthecombina-tionofRAMpagewithcontextswitchesonmissesandincreasingthesizeofL1.Theexecutiontimeofthefastestvariationspeedsup10.7overtheslowestcon guration,paringagivenhi-erarchy’sslowest(1GHz,32KBL1)andfastestcase(8GHz,256KBtotalL1)resultsinaspeedupof6.12fortheconventionalhierarchy,6.5forRAMpagewithoutswitchesonmissesand9.9forswitchesonmisses.ForslowestCPUandsmallestL1,RAMpagewithswitchesonmisseshasaspeedupof1.08overtheconventionalhierarchy,risingto1.74withthefastestCPUandbiggestL1.ForRAMpagewithoutswitchesonmisses,thescalingupofimprovementovertheconventionalhierarchyisnotasstrong:fortheslowestCPUwithleastaggressiveL1,RAMpagehasaspeedupof1.03,asopposedto1.11forthefastestCPUwithlargestL1.So,whetherbycomparisonwithaconventionalarchitectureorby
上一篇:多讲话者声学网络中的助听系统
下一篇:保安辞职信 辞职报告通用范本