L1 Cache and TLB Enhancements to the RAMpage Memory Hierarch(5)
发布时间:2021-06-06
发布时间:2021-06-06
Abstract. The RAMpage hierarchy moves main memory up a level to replace the lowest-level cache by an equivalent-sized SRAM main memory, with a TLB caching page translations for that main memory. This paper illustrates how more aggressive components higher
3ExperimentalApproach
Thissectionoutlinestheapproachtothereportedexperiments.Thesimula-tionstrategyisexplained,followedbysomedetailofsimulationparameters;inconclusion,expected ndingsarediscussed.
3.1SimulationStrategy
Arangeofvariationsonastandard2-levelhierarchyiscomparedtosimilarvariationsonRAMpage,withandwithoutcontextswitchesonmisses.RAMpagewithoutcontextswitchesonmissestoconveysthee ectsofaddingassociativity(withanoperatingsystem-stylereplacementstrategy).AddingcontextswitchesonmissesshowsthevalueofalternativeworkonamisstoDRAM.Simulationsaretrace-driven,anddonotmodelthepipeline.ProcessorspeedisinGHz,representinginstructionissueratewithoutmisses,notclockspeed.
Ignoringthepipelinelevelneglectse ectslikebranchesandthepotentialforotherimprovementslikenon-blockingcaches.However,theresultsbeinglookedforherearerelativelylargeimprovements,soinaccuraciesofthiskindareunlikelytobesigni cant.Whatisimportantisthee ectastheCPU-DRAMspeedgapincreases,andthesimulationisofsu cientaccuracytocapturesuche ects,ashasbeendemonstratedinpreviouswork.
3.2SimulationParameters
Parametersaresimilartopreviouspublishedworktomakeresultscomparable.ThefollowingparametersarecommonacrossRAMpageandtheconventionalhierarchy.ThisrepresentsthebaselinebeforenewL1andTLBvariations:–L1cache–16Kbyteseachofdataandinstructioncache,physicallytaggedandindexed,direct-mapped,32-byteblocksize,1-cyclereadhit,12-cyclepenaltyformissestoL2(orRAMpageSRAMmainmemory);fordatacache:perfectwritebu ering(zeroe ectivehittime),writeback(12-cyclepenalty;9cyclesforRAMpage:noL2tagtoupdate),writeallocateonmiss–TLB–64entries,fullyassociative,randomreplacement
–DRAMlevel–DirectRambus[8]withoutpipelining:50nsbefore rstrefer-encestarted,thereafter2bytesevery1.25ns–pagingofDRAM–invertedpagetable:sameorganizationasRAMpagemainmemoryforsimplicity,in niteDRAMwithnomissestodisk–TLBandL1datahitsfullypipelined–onlytimeforL1dorTLBreplace-mentsormaintaininginclusioncostedas“hits”
Thesamememorytimingisusedasinearliersimulations.AlthoughfasterDRAMhassincebecomeavailable,thetimingcanbeseenasrelativetoapar-ticularCPU-DRAMspeedgap,andthe gurescanaccordinglyberescaled.Contextswitchesaremodelledbyinterleavingatraceoftext-bookcode.Acontextswitchistakenevery500,000references,thoughRAMpagewithcontextswitchesonmissesalsotakesacontextswitchonamisstoDRAM.TLBmissesarehandledbyatraceofpagetablelookupcode,withvariationsontimeforalookupbasedonprobablevariationsinprobesintoaninvertedpagetable[22].
上一篇:多讲话者声学网络中的助听系统
下一篇:保安辞职信 辞职报告通用范本