The simplescalar tool set, version 2.0(5)
时间:2025-05-04
时间:2025-05-04
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Horwitz et al. [3] formally described an optimal algorithm thatincludes writes; however, only MIN is implemented in the simu-lator.
We have included the Cheetah engine as a stand-alone library,which is built and resides in thelibcheetah/ directory.sim-cheetah accepts the following command-line arguments, in addi-tion to those listed at the beginning of Section4:-refs [inst | data | uni ed]
specify which reference stream to analyze.
-C [fa | sa | dm]
fully associative, set associative, or direct-mapped cache.replacement policy.
log base 2 minimum bound on number ofsets to simulate simultaneously.
log base 2 maximum bound on set number.cache line size (in bytes).
maximum associativity to analyze (in logbase 2).
cache size interval to report when simulatingfully associative caches.
maximum cache size of interest.
cache size for direct-mapped analyses.
-pcstat <stat>
where <stat> is the integer counter that youwish to pro le by text address.
To generate the statistics for the pro le, follow the followingexample:
sim-profile -pcstat sim_num_insn test-math >&!
test-math.out
objdump -dl test-math >! test-math.distextprof.pl test-math.dis test-math.out
sim_num_insn_by_pc
-R [lru | opt]-a <sets>-b <sets>-l <line>-n <assoc>-in <interval>-M <size>-C <size>
We show a segment of the text pro le output in Figure4. Makesure that “objdump” is the version created when compiling thebinutils. Also, the rst line oftextprof.pl must be changedto re ect your system’s path to Perl (which must be installed onyour system for you to use this script). As an aside, note that “-taddrprof” is equivalent to “-pcstat sim_num_insn”.
4.4 Out-of-order processor timing simulation
The most complicated and detailed simulator in the distribu-tion, by far, issim-outorder (the main code le for which issim-outorder.c—about 3500 lines long). This simulatorsupports out-of-order issue and execution, based on the RegisterUpdate Unit [5]. The RUU scheme uses a reorder buffer to auto-matically rename registers and hold the results of pendinginstructions. Each cycle the reorder buffer retires completedinstructions in program order to the architected register le.
The processor’s memory system employs a load/store queue.Store values are placed in the queue if the store is speculative.Loads are dispatched to the memory system when the addressesof all previous stores are known. Loads may be satis ed either bythe memory system or by an earlier store value residing in thequeue, if their addresses match. Speculative loads may generatecache misses, but speculative TLB misses stall the pipeline untilthe branch condition is known.
We depict the simulated pipeline ofsim-outorder inFigure5. The main loop of the simulator, located insim_main(), is structured as follows:
ruu_init();for (;;) {
ruu_commit();ruu_writeback();lsq_refresh();ruu_issue();ruu_dispatch();ruu_fetch();}
Both of these simulators are ideal for performing high-levelcache studies that do not take access time of the caches intoaccount (e.g., studies that are concerned only with miss rates). Tomeasure the effect of cache organization upon the execution timeof real programs, however, the timing simulator described inSection4.4 must be used.
4.3 Pro ling
The distribution comes with a functional simulator that pro-duces voluminous and varied pro le information.sim-pro lecan generate detailed pro les on instruction classes andaddresses, text symbols, memory accesses, branches, and datasegment symbols.
sim-pro le takes the following command-line arguments,which toggle the various pro ling features:-iclassinstruction class pro ling (e.g. ALU,
branch).
-iprofinstruction pro ling (e.g., bnez, addi).-brprofbranch class pro ling (e.g., direct, calls, con-ditional).-amprofaddr. mode pro ling (e.g., displaced, R+R).-segprofload/store segment pro ling (e.g., data,
heap).
-tsymprofexecution pro le by text symbol (functions).-dsymprofreference pro le by data segment symbol.-taddrprofexecution pro le by text address.-allturn on all pro ling listed above.
Three of the simulators (sim-pro le,sim-cache, andsim-out-order) support text segment pro les for statistical integercounters. The supported counters include any added by users, solong as they are correctly “registered” with the SimpleScalarstats package included with the simulator code (see Section4.5).To use the counter pro les, simply add the command-line ag:
This loop is executed once for each target (simulated)machine cycle. By walking the pipeline in reverse, inter-stagelatch synchronization can be handled correctly with only onepass through each stage. When the target program terminateswith anexit() system call, the simulator performs alongjmp() tomain() to generate the statistics.
The fetch stage of the pipeline is implemented inruu_fetch(). The fetch unit models the machine instructionbandwidth, and takes the following inputs: the program counter,the predictor state, and misprediction detection from the branchexecution unit(s). Each cycle, it fetches instructions from onlyone I-cache line (and it blocks on an I-cache miss until the miss
…… 此处隐藏:3215字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:农历闰月的推算
下一篇:职业技能鉴定试题(数控车)