L1 Cache and TLB Enhancements to the RAMpage Memory Hierarch(3)
发布时间:2021-06-06
发布时间:2021-06-06
Abstract. The RAMpage hierarchy moves main memory up a level to replace the lowest-level cache by an equivalent-sized SRAM main memory, with a TLB caching page translations for that main memory. This paper illustrates how more aggressive components higher
themid-1980s,CPUspeedshaveimprovedatarateof50-100%peryear,whileDRAMlatencyhasonlyimprovedataround7%peryear[12].Ifpredictionsofthememorywall[30]arecorrect,DRAMlatencywillbecomeaseriouslimitingfactorinperformanceimprovement.Attemptsatworkingaroundthememorywallarebecomingincreasinglycommon[9],butthefundamentalunderlyingDRAMandCPUlatencytrendscontinue[27].
2.2TheRAMpageApproach
RAMpageisbasedonthenotionthatDRAM,whilestillordersofmagnitudefasterthandisk,isincreasinglystartingtodisplayoneattributeofaperipheral:thereistimetodootherworkwhilewaitingforit[24],particularlyifrelativelylargeunitsaremovedbetweenDRAMandSRAMlevel.InRAMpage,thelowest-levelcacheismanagedasthemainmemory(i.e.,asapagedvirtually-addressedmemory),withdiskasecondarypagingdevice.TheRAMpagemainmemorypagetableisinverted,tominimizeitssize.Aninvertedpagetablehasanotherbene t:noTLBmisscanresultinaDRAMreference,unlessthereferencecausingtheTLBlookupisnotinanyoftheSRAMlayers[22].
RAMpageisintendedtohavethefollowingadvantages:
–fasthits–ahitphysicallyaddressesanSRAMmemory
–fullassociativity–fullassociativitythroughpagingavoidstheslowerhitsofhardwarefullassociativity
–software-managedpaging–replacementcanbeassophisticatedasneeded–TLBmissesstoDRAMminimized–asexplainedabove
–pinninginSRAM–criticalOSdataandcodecanbepinnedinSRAM–hardwaresimplicity–thecomplexityofacachecontrollerisremovedfromthelowestlevelofSRAM
–contextswitchesonmissestoDRAM–theCPUcanbekeptbusy
Theseadvantagescomeatthecostofslowermissesbecauseofsoftwaremiss-handling,andtheneedtomakeoperatingsystemchanges.However,thelatterproblemcouldbeavoidedbyaddinghardwaresupportforthemodel.
TheRAMpageapproachhasinthepastbeenshowntoscalewellinthefaceofthegrownCPU-DRAMspeedgap,particularlywithcontextswitchesonmisses.Thee ectofcontextswitchesonmissesisthat,providedthereisworkavailablefortheCPU,waitingforDRAMcane ectivelybeeliminated[21].Contextswitchesonmisseshavethemostsigni cante ect.
2.3Alternatives
Approachestoaddressingthememorywallcanloosely(withsomeoverlaps)begroupedintolatencytoleranceandmissreduction.Someapproachestola-tencytoleranceincludeprefetch,criticalword rst,memorycompression,writebu ering,non-blockingcaches,andsimultaneousmultithreading(SMT).
上一篇:多讲话者声学网络中的助听系统
下一篇:保安辞职信 辞职报告通用范本