The simplescalar tool set, version 2.0(6)

时间：2025-07-11

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

never

00401a10: ( 13, 0.01): <strtod+220> addiu $a1[5],$zero[0],1strtod.c:79

00401a18: ( 13, 0.01): <strtod+228> bc1f 00401a30 <strtod+240>strtod.c:87

00401a20: : <strtod+230> addiu $s1[17],$s1[17],100401a28: : <strtod+238> j 00401a58 <strtod+268>strtod.c:89

00401a30: ( 13, 0.01): <strtod+240> mul.d $f2,$f20,$f4

00401a38: ( 13, 0.01): <strtod+248> addiu $v0[2],$v1[3],-4800401a40: ( 13, 0.01): <strtod+250> mtc1 $v0[2],$f0

Figure 4. Sample output from text segment statistical pro le

Figure 5. Pipeline for sim-outorder

completes). After fetching the instructions, it places them in thedispatch queue, and probes the line predictor to obtain the correctcache line to access in the next cycle.

The code for the dispatch stage of the pipeline resides inruu_dispatch(). This routine is where instruction decodingand register renaming is performed. The function uses theinstructions in the input queue lled by the fetch stage, a pointerto the active RUU, and the rename table. Once per cycle, the dis-patcher takes as many instructions as possible (up to the dispatchwidth of the target machine) from the fetch queue and placesthem in the scheduler queue. This routine is the one in whichbranch mispredictions are noted. (When a misprediction occurs,the simulator uses speculative state buffers, which are managedwith a copy-on-write policy). The dispatch routine enters andlinks instructions into the RUU and the load/store queue (LSQ),as well as splitting memory operations into two separate instruc-tions (the addition to compute the effective address and the mem-ory operation itself).

The issue stage of the pipeline is contained inruu_issue() andlsq_refresh(). These routines modelinstruction wakeup and issue to the functional units, tracking reg-ister and memory dependences. Each cycle, the scheduling rou-tines locate the instructions for which the register inputs are allready. The issue of ready loads is stalled if there is an earlierstore with an unresolved effective address in the load/storequeue. If the address of the earlier store matches that of the wait-ing load, the store value is forwarded to the load. Otherwise, the

load is sent to the memory system.

The execute stage is also handled inruu_issue(). Eachcycle, the routine gets as many ready instructions as possiblefrom the scheduler queue (up to the issue width). The functionalunits’ availability is also checked, and if they have availableaccess ports, the instructions are issued. Finally, the routineschedules writeback events using the latency of the functionalunits (memory operations probe the data cache to obtain the cor-rect latency of the operation). Data TLB misses stall the issue ofthe memory operation, are serviced in the commit stage of thepipeline, and currently assume a xed latency. The functionalunits’ latencies are hardcoded in the de nition offu_config[] insim-outorder.c.

The writeback stage resides inruu_writeback(). Eachcycle it scans the event queue for instruction completions. Whenit nds a completed instruction, it walks the dependence chain ofinstruction outputs to mark instructions that are dependent on thecompleted instruction. If a dependent instruction is waiting onlyfor that completion, the routine marks it as ready to be issued.The writeback stage also detects branch mispredictions; when itdetermines that a branch misprediction has occurred, it rolls thestate back to the checkpoint, discarding the erroneously issuedinstructions.

ruu_commit() handles the instructions from the writebackstage that are ready to commit. This routine does in-order com-mitting of instructions, updating of the data caches (or memory)with store values, and data TLB miss handling. The routine keeps

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

retiring instructions at the head of the RUU that are ready tocommit until the head instruction is one that is not ready. Whenan instruction is committed, its result is placed into the archi-tected register le, and the RUU/LSQ resources devoted to thatinstruction are reclaimed.

sim-outorder runs about an order of magnitude slower thansim-fast. In addition to the arguments listed at the beginning ofSection4,sim-outorder uses the following command-line argu-ments:

Specifying the processor core-fetch:ifqsize <size>

set the fetch width to be <size> instructions.Must be a power of two. The default is 4.

-fetch:speed <ratio>

set the ratio of the front end speed relative tothe execution core (allowing <ratio> times asmany instructions to be fetched as decodedper cycle).

-fetch:mplat <cycles>

set the branch misprediction latency. Thedefault is 3 cycles.

-decode:width <insts>

set the decode width to be <insts>, whichmust be a power of two. The default is 4.

-issue:width <insts>

set the maximum issue width in a givencycle. Must be a power of two. The default is4.

-issue:inorderforce the simulator to use in-order issue. The

default is false.