FTB Item and Complete Prediction Result Interface

The basic prediction unit of BPU is the branch prediction block, but what is the structure of a branch prediction block? Why can a branch prediction block guide the execution of so many subsequent instructions? This section will describe the core data structure for generating branch prediction blocks—the FTB item, and the complete prediction result structure generated from the FTB item.

FTB item

The FTB item is the core data structure of branch prediction blocks in Xiangshan. It stores the information needed to generate a branch prediction block. When BPU performs predictions, the initial branch prediction block is first generated from a read-out FTB item. Then, this branch prediction block is passed to subsequent predictors, which read and modify the information to generate the final prediction result.

Therefore, to understand the structure of a branch prediction block, we first need to understand the structure of an FTB item. An FTB item corresponds to a branch prediction block, and its general structure is as follows:

First, it is necessary to clarify one piece of information: whether it is a branch prediction block or an FTB item, the number of instructions they can contain is set to a specific limit (16 RVC instructions in the current version of Xiangshan), called the maximum prediction length. This means that if we need to record the position of an instruction within the branch prediction block, we can use a fixed-length bit vector to specify the offset of this instruction relative to the starting address of the prediction block.

The determinant of the branch prediction block’s execution process is the information about the branch instructions. The other instructions are considered ordinary instructions and do not affect the program’s execution flow. Therefore, in a branch prediction block, we only need to record the positions of the branch instructions, while the positions of ordinary instructions are not our concern.

Therefore, the FTB item defines two types of branch instruction slotsbrSlots and tailSlot, used to store the branch instructions within the branch prediction block. In the current version of Xiangshan, brSlots contains only one slot, while tailSlot is a separate slot, totaling two slots.

Within the instructions of the maximum prediction length, if a branch instruction appears, the FTB item will record it in the corresponding slot and mark the slot as valid. If too many branch instructions appear, reaching the capacity limit of the FTB item, the excess branch instructions will be handed over to the next FTB item for storage. In each slot, we record the offset of a branch instruction relative to the starting address of the prediction block and information such as its jump target address.

The Unique tailSlot

In RISC-V, branch instructions are mainly divided into two types: conditional branches and unconditional jumps. Therefore, for a branch prediction block, it will contain at most one unconditional jump instruction. Because once this instruction is executed, the program’s execution flow will change, and subsequent instructions will no longer be executed. Hence, we define a type of slot called tailSlot specifically for storing this unconditional jump instruction. As for conditional branch instructions, they are stored in brSlots.

As its name suggests, tailSlot is located in the last slot of the entire prediction block. This is also because once the unconditional jump instruction is filled, the program will definitely jump, and subsequent instructions will be handled by other prediction blocks, so we do not need to care about the subsequent instructions. However, among the instructions before the unconditional jump instruction, we need to care about whether there are conditional branch instructions, because conditional branch instructions may or may not jump. Therefore, we need to record the relevant information of the conditional branch instructions.

tailSlot Sharing

Consider a situation: if no unconditional jump instructions appear from the starting PC of the prediction block to the maximum prediction length, but two conditional branch instructions appear instead, the tailSlot will be idle, and the second conditional branch instruction cannot be stored, causing space waste.

To solve this problem, Xiangshan adopts a method of setting a sharing mark. We can directly store the second branch instruction into the tailSlot and set the sharing mark to true, indicating that the second conditional branch instruction shares the tailSlot of the unconditional jump instruction. This way, the space of the tailSlot is effectively utilized.

The isCall, isRet, and isJalr fields in the prediction block serve the tailSlot. If the tailSlot records an unconditional jump instruction, these fields will further indicate the type of the jump instruction. There is also a field in the FTB item called always_taken, which records whether each conditional branch instruction stored in each slot is always predicted to jump. If so, subsequent predictors can directly adopt this prediction result.

Through the FTB item, we can know the instruction situation in a branch prediction block, including the position and type of branch instructions. This information will be handed over to subsequent predictors, which will predict more accurate jump targets, whether to jump, and other information.

Complete Structure of FTB Item (FTBEntry)

Interface Definition: src/main/scala/xiangshan/frontend/FTB.scala

This section describes the complete structural definition of the FTB item, containing the following signals:

  • valid: Whether the FTB entry is valid.
    • Interface Type: Bool
  • brSlots: Conditional branch instruction slots.
    • Interface Type: Vec(numBrSlot, new FtbSlot(BR_OFFSET_LEN)) (see FtbSlot for interface list)
  • tailSlot: Unconditional jump instruction slot.
    • Interface Type: new FtbSlot(JMP_OFFSET_LEN, Some(BR_OFFSET_LEN)) (see FtbSlot for interface list)
  • pftAddr: Unconditional jump instruction slot
    • Interface Type: UInt(log2Up(PredictWidth).W)
  • carry: Unconditional jump instruction slot.
    • Interface Type: Bool()
  • isCall: The instruction in the unconditional jump instruction slot is a call instruction.
    • Interface Type: Bool()
  • isRet: The instruction in the unconditional jump instruction slot is a ret instruction.
    • Interface Type: Bool()
  • isJalr: The instruction in the unconditional jump instruction slot is a jalr instruction.
    • Interface Type: Bool()
  • last_may_be_rvi_call: The instruction in the unconditional jump instruction slot may be an RVI type call instruction signal.
    • Interface Type: Bool()
  • always_taken: Whether each branch instruction in the prediction block is always predicted as Taken.
    • Interface Type: Vec(numBr, Bool())

Explanation: pftAddr and carry

Here, pftAddr stands for Partial Fallthrough Address. Fallthrough Address means that if there is no jump in the prediction block, the program will sequentially execute to the address reached. In other words, if the offset of an unconditional jump instruction is 5, then the offset corresponding to the Fallthrough Address is 6. This signal is mainly used to get the return address of the program after a function call, and this concept can be understood as the end address of the prediction block.

Partial means part of the address, which is determined by the address representation method. Here, the address representation method is as follows:

  pc: | ... |<-- log(predictWidth) -->|<-- log(instBytes) -->|
           ^                         ^
           |                         |
           carryPos                  instOffsetBits

pftAddr only records the middle offset part (the part with a length of log(predictWidth)), and a complete PC can be generated by combining it with the current PC. However, a carry might occur, so a carry bit is recorded separately. carryPos is the position in the instruction address within the prediction block where a carry might occur.

Additionally, last_may_be_rvi_call is an auxiliary signal for this address, indicating that the instruction in the unconditional jump instruction slot is an RVI type call instruction. Since pftAddr assumes the instruction length as the compressed instruction length by default when calculating, the end address is increased by only 2 bytes. If the actual call instruction is not a compressed instruction, it will lead to an incorrect return address calculation. RAS will correct this error based on this signal.

Branch Prediction Slot (FTBSlot)

Interface Definition: src/main/scala/xiangshan/frontend/FTB.scala

This interface defines the slot in the FTB entry:

  • offset: The offset of the instruction in the slot relative to the start address of the prediction block.
    • Interface Type: UInt(log2Ceil(PredictWidth).W)
  • lower: The lower bits of the jump target address.
    • Interface Type: UInt(offsetLen.W)
    • Note: The lower is set to 12 or 20 bits because the addressing capability of branch instructions is 12 bits, while the addressing capability of jump instructions is 20 bits.
  • tarStat: Whether the high bits of the PC after the jump are incremented or decremented.
    • Interface Type: UInt(TAR_STAT_SZ.W) (TAR_STAT_SZ = 2)
    • Note: The jump target address is calculated from the high bits of the current PC, the tarStat, and the lower field. The lower field directly stores the lower bits of the jump target address. The high bits of the current PC are adjusted according to the tarStat, then concatenated with the lower to get the actual jump target address. The tarStat can take three values: 0 - no increment or decrement, 1 - increment, 2 - decrement.
  • sharing: Indicates that a conditional branch instruction is stored in an unconditional jump instruction slot.
    • Interface Type: Bool()
  • valid: Indicates whether the slot is valid.
    • Interface Type: Bool()

Full Branch Prediction

  • Interface Definition: src/main/scala/xiangshan/frontend/FrontendBundle.scala

This interface defines the complete branch prediction results, included in the prediction results of each pipeline stage.

The full branch prediction result interface is similar to the FTB entry and is initially generated from the FTB entry. Two slots are split into individual signals: slot_valids, targets, offsets, is_br_sharing, etc. Additionally, several fields are added such as br_taken_mask, jalr_target to facilitate the provision of precise prediction results by subsequent predictors. The hit indicates whether an FTB entry is hit, i.e., the PC in the current prediction round indexed to an FTB entry.

Full Interface List:

  • hit: Indicates whether the FTB entry is hit.
  • Interface Type: Bool()
  • slot_valids: Indicates whether each slot in the FTB entry is valid.
  • Interface Type: Vec(totalSlot, Bool())
  • targets: The jump target address corresponding to each slot.
  • Interface Type: Vec(totalSlot, UInt(VAddrBits.W))
  • offsets: The offset of the instruction in each slot relative to the start address of the prediction block.
  • Interface Type: Vec(totalSlot, UInt(log2Ceil(PredictWidth).W))
  • fallThroughAddr: The end address of the prediction block.
  • Interface Type: UInt(VAddrBits.W)
  • fallThroughErr: Indicates that the pftAddr recorded in the FTB entry is incorrect.
  • Interface Type: Bool()
  • is_jal: Indicates that the prediction block contains a jal instruction.
  • Interface Type: Bool()
  • is_jalr: Indicates that the prediction block contains a jalr instruction.
  • Interface Type: Bool()
  • is_call: Indicates that the prediction block contains a call instruction.
  • Interface Type: Bool()
  • is_ret: Indicates that the prediction block contains a ret instruction.
  • Interface Type: Bool()
  • last_may_be_rvi_call: Indicates that the instruction in the unconditional jump instruction slot might be an RVI type call instruction.
  • Interface Type: Bool()
  • is_br_sharing: Indicates that the last slot (tailSlot) stores a conditional branch instruction.
  • Interface Type: Bool()
  • br_taken_mask: The branch prediction result, with each bit corresponding to a branch (slot), indicating whether the branch is predicted to be taken.
  • Interface Type: Vec(numBr, Bool())
  • jalr_target: The jump target of the jalr instruction in the prediction block.
  • Interface Type: UInt(VAddrBits.W)
Last modified September 13, 2024: Update the picture of BPU Top. (431c050)