ITTAGE Branch Predictor

Function Introduction

For general conditional branch instructions, only predicting whether to jump (taken) or not (not taken) is needed. However, for indirect jumps, such as call/jump instructions, it is necessary to predict where to jump to (Target). In order to make TAGE support predicting jump addresses, ITTAGE (Indirect Target TAGE) was introduced.

The main difference between ITTAGE and TAGE is that in the T0 and Tn tables, Target PC data is added. During prediction, ITTAGE selects the Target from the matched, longest history entry as the prediction result, and uses a 2-bit saturating counter to decide whether to output this result or choose an alternative prediction result. For TAGE predictor details, please refer to TAGE-SC Branch Predictor.

Kunming Lake ITTAGE Branch Predictor

In the BPU design of Kunming Lake, prediction is performed in a cascaded manner with multiple predictors, so the implementation of the subpredictor differs from the original predictor, mainly in the default prediction result.

Basic Functionality

ITTAGE’s basic functionality is similar to the TAGE branch predictor, but with the following differences:

  1. The Target is added as a jump target address item in the entry to predict the jump target address.
  2. The saturating counter ctr no longer provides the prediction direction, but instead decides whether to output the result (just the prediction information).
  3. Since there is only one indirect jump instruction in each branch prediction block, ITTAGE only considers one instruction.

Pipeline

ITTAGE contains three pipeline stages, the first stage calculates the index, and the second stage reads the result from the SRAM table using the index.

  1. Cycle 0, s0: Input of the first pipeline stage, generally pc and folded history.

Operation of the first pipeline stage:Calculate the index. Output through registers to s1.

  1. Cycle 1, s1: Input of the second pipeline stage, the index and other data calculated in the first stage.

Operation of the second pipeline stage:Access SRAM, read prediction information. Output through registers to s2.

  1. Cycle 2, s2: Input of the third pipeline stage, the original prediction information read from SRAM in the second stage.

Operation of the third pipeline stage:Process the original prediction information, decide whether to output the prediction result.

  1. Cycle 3, s3: Prediction result ready, the prediction result can now be used.

Data Structure

In the Kunming Lake implementation, the table structure of T0 and Tn is as follows:

预测器 作用 表项构成 项数
基准预测器T0 用于在其他预测器的预测结果都无效时输出预测结果 虚表,不存在。 直接将上级预测器FTB 的预测结果作为表项结果 虚表,不存在。 直接将上级预测器FTB结果作为索引到的结果
预测表T1-T2 对每个预测块的输入,所有Tn表都进行预测,在所有预测有效的结果中,选择历史记录最长的结果作为 原始预测信息。历史记录长度由输入的H决定 target:41 bitsvalid 1bittag 9bitsctr 2bitsus: 1bit(usefulness计数器) 256项
预测表T3-T5 512项

T0,TnTable Retrieval Method

The retrieval method is consistent with the TAGE branch predictor, only differing in the configuration options of each table.

表名称 FH长度 FH1长度 FH2长度 最近历史长度(用到GH中的位数)
T1 4比特 4比特 4比特 低4位,即把最新4位历史,折叠成FH、FH1、FH2
T2 8比特 8比特 8比特 低8位,即把最新8位历史,折叠成FH、FH1、FH2
T3 9比特 9比特 8比特 低13位,即把最新13位历史,折叠成FH、FH1、FH2
T4 9比特 9比特 8比特 低16位,即把最新16位历史,折叠成FH、FH1、FH2
T5 9比特 9比特 8比特 低32位,即把最新32位历史,折叠成FH、FH1、FH2

Other processes (computation method and computation formula) are similar to the TAGE-SC branch predictor.

Alternate Predictor

When the prediction result given by the Tn table has insufficient “prediction confidence,” the prediction result needs to be jumped to become an “alternate predictor.” This process is similar to TAGE. For details, please refer to the corresponding part of TAGE. Unlike TAGE, ITTAGE’s ctr does not give the prediction direction but only determines whether to output the result (prediction confidence). When ctr is 2b00, it is considered weak confidence. Choose the alternate prediction result:

  1. If multiple tables are hit, output the Target from the second-longest history table entry.
  2. Otherwise, output the T0 Target (FTB Target).

Prediction Process

The prediction process is similar to TAGE, but ITTAGE has an additional step to decide whether to output the prediction result based on ctr. The specific process is as follows:

  1. When the ctr of the ITTAGE table entry is not 2b00, output Target.
  2. When the ctr of the ITTAGE table entry is 2b00, output the alternate prediction result:
    1. If there is a second-longest history (the second table is also hit), output the Target of the second-longest.
    2. Otherwise, output the FTB Target.
  3. When the ITTAGE table entry is not hit, output the T0 Target (FTB Target).

Training Process

This process is similar to TAGE, with the following differences:

  1. Table entry updates (original prediction data):
    1. ctr:
      1. If the predicted address matches the actual address, increment the ctr counter of the corresponding provider table entry by 1.
      2. If the predicted address does not match the actual address, decrement the ctr counter of the corresponding provider table entry by 1.
      3. In ITTAGE, it is determined based on ctr whether to adopt the jump target result of this prediction. If multiple tables are hit and the ctr of the longest history table is 0, adopt the alternate prediction logic (the second-longest history table or T0). Always update the longest history table during updates, and also update the alternate prediction table if the alternate prediction is adopted.
    2. target:
      1. When the ctr of the table entry to be updated is 0 during this prediction, directly store the actual final jump result in the target, overwriting it.
      2. When applying for a new table entry, directly store the actual final jump result in the target.
      3. Otherwise, do not modify the target.
    3. usefulness:
      1. When the provider’s prediction is correct but the alternate prediction is incorrect, set the provider’s usefulness to 1.
      2. If the alternate prediction has weak confidence and is correct, set the provider’s usefulness to 1. If the alternate prediction has weak confidence and is incorrect, set the provider’s usefulness to 0.
    4. New table entry:
      1. Each time the prediction from the longest history table with confidence is incorrect (not due to using the alternate prediction), try to randomly apply for a table entry from a longer history table. The condition for application is that the usefulness of the corresponding entry is 0.
      2. If all longer entries are not 0, the allocation fails.
  2. Reset useful bit:
    1. Each time a prediction error occurs and a new table entry is applied for, if the allocation fails, increment tickCtr (an 8-bit saturated counter used to reset all usefulness). If successful, decrement tickCtr.
    2. When tickCtr reaches its maximum value, set all usefulness in ITTAGE to 0 and reset tickCtr to 0.

Interface List

接口类型 位宽 信号名 备注
input clock
input reset
input [40:0] io_in_bits_s0_pc_3 用于预测的PC
input [7:0] io_in_bits_folded_hist_3_hist_14_folded_hist T2 折叠历史
input [8:0] io_in_bits_folded_hist_3_hist_13_folded_hist T3 折叠历史
input [3:0] io_in_bits_folded_hist_3_hist_12_folded_hist T1 折叠历史
input [8:0] io_in_bits_folded_hist_3_hist_10_folded_hist T5 折叠历史
input [8:0] io_in_bits_folded_hist_3_hist_6_folded_hist T4 折叠历史
input [7:0] io_in_bits_folded_hist_3_hist_4_folded_hist T3 折叠历史
input [7:0] io_in_bits_folded_hist_3_hist_3_folded_hist T5 折叠历史
input [7:0] io_in_bits_folded_hist_3_hist_2_folded_hist T4 折叠历史
input [40:0] io_in_bits_resp_in_0_s3_full_pred_0_jalr_target
input [40:0] io_in_bits_resp_in_0_s3_full_pred_1_jalr_target
input [40:0] io_in_bits_resp_in_0_s3_full_pred_2_jalr_target
input [40:0] io_in_bits_resp_in_0_s3_full_pred_3_jalr_target
output [40:0] io_out_s3_full_pred_0_jalr_target
output [40:0] io_out_s3_full_pred_1_jalr_target
output [40:0] io_out_s3_full_pred_2_jalr_target
output [40:0] io_out_s3_full_pred_3_jalr_target
output [222:0] io_out_last_stage_meta [100:0] 有效,是ITTAGE的Meta信息
input io_s0_fire_3 s0阶段使能信号
input io_s1_fire_3 s1阶段使能信号
input io_s2_fire_0 s2阶段使能信号,相同
input io_s2_fire_1
input io_s2_fire_2
input io_s2_fire_3
input io_update_valid 是否进行更新
input [40:0] io_update_bits_pc 待更新的预测块pc索引
input [7:0] io_update_bits_spec_info_folded_hist_hist_14_folded_hist T2 更新时传入的历史
input [8:0] io_update_bits_spec_info_folded_hist_hist_13_folded_hist T3 更新时传入的历史
input [3:0] io_update_bits_spec_info_folded_hist_hist_12_folded_hist T1 更新时传入的历史
input [8:0] io_update_bits_spec_info_folded_hist_hist_10_folded_hist T5 更新时传入的历史
input [8:0] io_update_bits_spec_info_folded_hist_hist_6_folded_hist T4 更新时传入的历史
input [7:0] io_update_bits_spec_info_folded_hist_hist_4_folded_hist T3 更新时传入的历史
input [7:0] io_update_bits_spec_info_folded_hist_hist_3_folded_hist T5 更新时传入的历史
input [7:0] io_update_bits_spec_info_folded_hist_hist_2_folded_hist T4 更新时传入的历史
input [3:0] io_update_bits_ftb_entry_tailSlot_offset 待更新的FTB项offset
input io_update_bits_ftb_entry_tailSlot_sharing 待更新的FTB项是否是有条件跳转
input io_update_bits_ftb_entry_tailSlot_valid 待更新的tailSlot是否启用
input io_update_bits_ftb_entry_isRet tailSlot是否是Ret指令
input io_update_bits_ftb_entry_isJalr tailSlot是否是Jalr指令
input io_update_bits_cfi_idx_valid 控制流指令在预测块中的索引.valid信号
input [3:0] io_update_bits_cfi_idx_bits 控制流指令在预测块中的索引
input io_update_bits_jmp_taken 预测块内无条件跳转指令被触发
input io_update_bits_mispred_mask_2 是否预测错误
input [222:0] io_update_bits_meta 预测时传出 meta 信息的[222:25] 即{25h0, _ubtb_io_out_last_stage_meta[5:0] ,_tage_io_out_last_stage_meta[87:0] ,_ftb_io_out_last_stage_meta[2:0], _ittage_io_out_last_stage_meta[100:0]}
input [40:0] io_update_bits_full_target 预测块的跳转目标(下一个预测块的起始地址)

Pass-through signals that do not have an impact

These signals do not have an impact and are not important
接口类型 位宽 信号名 备注
input io_in_bits_resp_in_0_s2_full_pred_0_br_taken_mask_0 从FTB输入 完全透传到输出 包括jalr_target
input io_in_bits_resp_in_0_s2_full_pred_0_br_taken_mask_1
input io_in_bits_resp_in_0_s2_full_pred_0_slot_valids_0
input io_in_bits_resp_in_0_s2_full_pred_0_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_0_targets_0
input [40:0] io_in_bits_resp_in_0_s2_full_pred_0_targets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_0_jalr_target
input [3:0] io_in_bits_resp_in_0_s2_full_pred_0_offsets_0
input [3:0] io_in_bits_resp_in_0_s2_full_pred_0_offsets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_0_fallThroughAddr
input io_in_bits_resp_in_0_s2_full_pred_0_is_br_sharing
input io_in_bits_resp_in_0_s2_full_pred_0_hit
input io_in_bits_resp_in_0_s2_full_pred_1_br_taken_mask_0
input io_in_bits_resp_in_0_s2_full_pred_1_br_taken_mask_1
input io_in_bits_resp_in_0_s2_full_pred_1_slot_valids_0
input io_in_bits_resp_in_0_s2_full_pred_1_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_1_targets_0
input [40:0] io_in_bits_resp_in_0_s2_full_pred_1_targets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_1_jalr_target
input [3:0] io_in_bits_resp_in_0_s2_full_pred_1_offsets_0
input [3:0] io_in_bits_resp_in_0_s2_full_pred_1_offsets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_1_fallThroughAddr
input io_in_bits_resp_in_0_s2_full_pred_1_is_br_sharing
input io_in_bits_resp_in_0_s2_full_pred_1_hit
input io_in_bits_resp_in_0_s2_full_pred_2_br_taken_mask_0
input io_in_bits_resp_in_0_s2_full_pred_2_br_taken_mask_1
input io_in_bits_resp_in_0_s2_full_pred_2_slot_valids_0
input io_in_bits_resp_in_0_s2_full_pred_2_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_2_targets_0
input [40:0] io_in_bits_resp_in_0_s2_full_pred_2_targets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_2_jalr_target
input [3:0] io_in_bits_resp_in_0_s2_full_pred_2_offsets_0
input [3:0] io_in_bits_resp_in_0_s2_full_pred_2_offsets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_2_fallThroughAddr
input io_in_bits_resp_in_0_s2_full_pred_2_is_jalr RAS 模块使用的信息,透传
input io_in_bits_resp_in_0_s2_full_pred_2_is_call
input io_in_bits_resp_in_0_s2_full_pred_2_is_ret
input io_in_bits_resp_in_0_s2_full_pred_2_last_may_be_rvi_call
input io_in_bits_resp_in_0_s2_full_pred_2_is_br_sharing 从FTB输入 完全透传到输出 包括jalr_target fallThroughErr 表示 FTB项 中记录的 pftAddr 有误 生成方式:比较 pftAddr 代表的预测块结束地址是否大于预测块的起始地址,如果小于,则代表出现错误,此信号置为有效。这种情况可能会发生在 pc 索引到错误的 FTB 项的情况。 FTQ使用这个变量,与ITTAGE无关
input io_in_bits_resp_in_0_s2_full_pred_2_hit
input io_in_bits_resp_in_0_s2_full_pred_3_br_taken_mask_0
input io_in_bits_resp_in_0_s2_full_pred_3_br_taken_mask_1
input io_in_bits_resp_in_0_s2_full_pred_3_slot_valids_0
input io_in_bits_resp_in_0_s2_full_pred_3_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_3_targets_0
input [40:0] io_in_bits_resp_in_0_s2_full_pred_3_targets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_3_jalr_target
input [3:0] io_in_bits_resp_in_0_s2_full_pred_3_offsets_0
input [3:0] io_in_bits_resp_in_0_s2_full_pred_3_offsets_1
input [40:0] io_in_bits_resp_in_0_s2_full_pred_3_fallThroughAddr
input io_in_bits_resp_in_0_s2_full_pred_3_fallThroughErr
input io_in_bits_resp_in_0_s2_full_pred_3_is_br_sharing
input io_in_bits_resp_in_0_s2_full_pred_3_hit
input io_in_bits_resp_in_0_s3_full_pred_0_br_taken_mask_0 除了 jalr_target 可能被修改,其他都是透传
input io_in_bits_resp_in_0_s3_full_pred_0_br_taken_mask_1
input io_in_bits_resp_in_0_s3_full_pred_0_slot_valids_0
input io_in_bits_resp_in_0_s3_full_pred_0_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_0_targets_0
input [40:0] io_in_bits_resp_in_0_s3_full_pred_0_targets_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_0_fallThroughAddr
input io_in_bits_resp_in_0_s3_full_pred_0_fallThroughErr
input io_in_bits_resp_in_0_s3_full_pred_0_is_br_sharing
input io_in_bits_resp_in_0_s3_full_pred_0_hit
input io_in_bits_resp_in_0_s3_full_pred_1_br_taken_mask_0 同上
input io_in_bits_resp_in_0_s3_full_pred_1_br_taken_mask_1
input io_in_bits_resp_in_0_s3_full_pred_1_slot_valids_0
input io_in_bits_resp_in_0_s3_full_pred_1_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_1_targets_0
input [40:0] io_in_bits_resp_in_0_s3_full_pred_1_targets_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_1_fallThroughAddr
input io_in_bits_resp_in_0_s3_full_pred_1_fallThroughErr
input io_in_bits_resp_in_0_s3_full_pred_1_is_br_sharing
input io_in_bits_resp_in_0_s3_full_pred_1_hit
input io_in_bits_resp_in_0_s3_full_pred_2_br_taken_mask_0 同上
input io_in_bits_resp_in_0_s3_full_pred_2_br_taken_mask_1
input io_in_bits_resp_in_0_s3_full_pred_2_slot_valids_0
input io_in_bits_resp_in_0_s3_full_pred_2_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_2_targets_0
input [40:0] io_in_bits_resp_in_0_s3_full_pred_2_targets_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_2_fallThroughAddr
input io_in_bits_resp_in_0_s3_full_pred_2_fallThroughErr
input io_in_bits_resp_in_0_s3_full_pred_2_is_jalr
input io_in_bits_resp_in_0_s3_full_pred_2_is_call
input io_in_bits_resp_in_0_s3_full_pred_2_is_ret
input io_in_bits_resp_in_0_s3_full_pred_2_is_br_sharing
input io_in_bits_resp_in_0_s3_full_pred_2_hit
input io_in_bits_resp_in_0_s3_full_pred_3_br_taken_mask_0 同上
input io_in_bits_resp_in_0_s3_full_pred_3_br_taken_mask_1
input io_in_bits_resp_in_0_s3_full_pred_3_slot_valids_0
input io_in_bits_resp_in_0_s3_full_pred_3_slot_valids_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_3_targets_0
input [40:0] io_in_bits_resp_in_0_s3_full_pred_3_targets_1
input [3:0] io_in_bits_resp_in_0_s3_full_pred_3_offsets_0
input [3:0] io_in_bits_resp_in_0_s3_full_pred_3_offsets_1
input [40:0] io_in_bits_resp_in_0_s3_full_pred_3_fallThroughAddr
input io_in_bits_resp_in_0_s3_full_pred_3_fallThroughErr
input io_in_bits_resp_in_0_s3_full_pred_3_is_br_sharing
input io_in_bits_resp_in_0_s3_full_pred_3_hit
input io_in_bits_resp_in_0_last_stage_ftb_entry_valid 透传到output,不做修改 来源是FTB
input [3:0] io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_offset
input [11:0] io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_lower
input [1:0] io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_tarStat
input io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_sharing
input io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_valid
input [3:0] io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_offset
input [19:0] io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_lower
input [1:0] io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_tarStat
input io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_sharing
input io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_valid
input [3:0] io_in_bits_resp_in_0_last_stage_ftb_entry_pftAddr
input io_in_bits_resp_in_0_last_stage_ftb_entry_carry
input io_in_bits_resp_in_0_last_stage_ftb_entry_isCall
input io_in_bits_resp_in_0_last_stage_ftb_entry_isRet
input io_in_bits_resp_in_0_last_stage_ftb_entry_isJalr
input io_in_bits_resp_in_0_last_stage_ftb_entry_last_may_be_rvi_call
input io_in_bits_resp_in_0_last_stage_ftb_entry_always_taken_0
input io_in_bits_resp_in_0_last_stage_ftb_entry_always_taken_1
output io_out_s2_full_pred_0_br_taken_mask_0 完全透传传入值 prefix: io_in_bits_resp_in_
output io_out_s2_full_pred_0_br_taken_mask_1
output io_out_s2_full_pred_0_slot_valids_0
output io_out_s2_full_pred_0_slot_valids_1
output [40:0] io_out_s2_full_pred_0_targets_0
output [40:0] io_out_s2_full_pred_0_targets_1
output [40:0] io_out_s2_full_pred_0_jalr_target
output [3:0] io_out_s2_full_pred_0_offsets_0
output [3:0] io_out_s2_full_pred_0_offsets_1
output [40:0] io_out_s2_full_pred_0_fallThroughAddr
output io_out_s2_full_pred_0_is_br_sharing
output io_out_s2_full_pred_0_hit
output io_out_s2_full_pred_1_br_taken_mask_0
output io_out_s2_full_pred_1_br_taken_mask_1
output io_out_s2_full_pred_1_slot_valids_0
output io_out_s2_full_pred_1_slot_valids_1
output [40:0] io_out_s2_full_pred_1_targets_0
output [40:0] io_out_s2_full_pred_1_targets_1
output [40:0] io_out_s2_full_pred_1_jalr_target
output [3:0] io_out_s2_full_pred_1_offsets_0
output [3:0] io_out_s2_full_pred_1_offsets_1
output [40:0] io_out_s2_full_pred_1_fallThroughAddr
output io_out_s2_full_pred_1_is_br_sharing
output io_out_s2_full_pred_1_hit
output io_out_s2_full_pred_2_br_taken_mask_0
output io_out_s2_full_pred_2_br_taken_mask_1
output io_out_s2_full_pred_2_slot_valids_0
output io_out_s2_full_pred_2_slot_valids_1
output [40:0] io_out_s2_full_pred_2_targets_0
output [40:0] io_out_s2_full_pred_2_targets_1
output [40:0] io_out_s2_full_pred_2_jalr_target
output [3:0] io_out_s2_full_pred_2_offsets_0
output [3:0] io_out_s2_full_pred_2_offsets_1
output [40:0] io_out_s2_full_pred_2_fallThroughAddr
output io_out_s2_full_pred_2_is_jalr
output io_out_s2_full_pred_2_is_call
output io_out_s2_full_pred_2_is_ret
output io_out_s2_full_pred_2_last_may_be_rvi_call
output io_out_s2_full_pred_2_is_br_sharing
output io_out_s2_full_pred_2_hit
output io_out_s2_full_pred_3_br_taken_mask_0
output io_out_s2_full_pred_3_br_taken_mask_1
output io_out_s2_full_pred_3_slot_valids_0
output io_out_s2_full_pred_3_slot_valids_1
output [40:0] io_out_s2_full_pred_3_targets_0
output [40:0] io_out_s2_full_pred_3_targets_1
output [40:0] io_out_s2_full_pred_3_jalr_target
output [3:0] io_out_s2_full_pred_3_offsets_0
output [3:0] io_out_s2_full_pred_3_offsets_1
output [40:0] io_out_s2_full_pred_3_fallThroughAddr
output io_out_s2_full_pred_3_fallThroughErr
output io_out_s2_full_pred_3_is_br_sharing
output io_out_s2_full_pred_3_hit
output io_out_s3_full_pred_0_br_taken_mask_0 见对应prefix的输入
output io_out_s3_full_pred_0_br_taken_mask_1
output io_out_s3_full_pred_0_slot_valids_0
output io_out_s3_full_pred_0_slot_valids_1
output [40:0] io_out_s3_full_pred_0_targets_0
output [40:0] io_out_s3_full_pred_0_targets_1
output [40:0] io_out_s3_full_pred_0_fallThroughAddr
output io_out_s3_full_pred_0_fallThroughErr
output io_out_s3_full_pred_0_is_br_sharing
output io_out_s3_full_pred_0_hit
output io_out_s3_full_pred_1_br_taken_mask_0 见对应prefix的输入
output io_out_s3_full_pred_1_br_taken_mask_1
output io_out_s3_full_pred_1_slot_valids_0
output io_out_s3_full_pred_1_slot_valids_1
output [40:0] io_out_s3_full_pred_1_targets_0
output [40:0] io_out_s3_full_pred_1_targets_1
output [40:0] io_out_s3_full_pred_1_fallThroughAddr
output io_out_s3_full_pred_1_fallThroughErr
output io_out_s3_full_pred_1_is_br_sharing
output io_out_s3_full_pred_1_hit
output io_out_s3_full_pred_2_br_taken_mask_0 见对应prefix的输入
output io_out_s3_full_pred_2_br_taken_mask_1
output io_out_s3_full_pred_2_slot_valids_0
output io_out_s3_full_pred_2_slot_valids_1
output [40:0] io_out_s3_full_pred_2_targets_0
output [40:0] io_out_s3_full_pred_2_targets_1
output [40:0] io_out_s3_full_pred_2_fallThroughAddr
output io_out_s3_full_pred_2_fallThroughErr
output io_out_s3_full_pred_2_is_jalr
output io_out_s3_full_pred_2_is_call
output io_out_s3_full_pred_2_is_ret
output io_out_s3_full_pred_2_is_br_sharing
output io_out_s3_full_pred_2_hit
output io_out_s3_full_pred_3_br_taken_mask_0 见对应prefix的输入
output io_out_s3_full_pred_3_br_taken_mask_1
output io_out_s3_full_pred_3_slot_valids_0
output io_out_s3_full_pred_3_slot_valids_1
output [40:0] io_out_s3_full_pred_3_targets_0
output [40:0] io_out_s3_full_pred_3_targets_1
output [3:0] io_out_s3_full_pred_3_offsets_0
output [3:0] io_out_s3_full_pred_3_offsets_1
output [40:0] io_out_s3_full_pred_3_fallThroughAddr
output io_out_s3_full_pred_3_fallThroughErr
output io_out_s3_full_pred_3_is_br_sharing
output io_out_s3_full_pred_3_hit
output io_out_last_stage_ftb_entry_valid 完全透传传入的值
output [3:0] io_out_last_stage_ftb_entry_brSlots_0_offset
output [11:0] io_out_last_stage_ftb_entry_brSlots_0_lower
output [1:0] io_out_last_stage_ftb_entry_brSlots_0_tarStat
output io_out_last_stage_ftb_entry_brSlots_0_sharing
output io_out_last_stage_ftb_entry_brSlots_0_valid
output [3:0] io_out_last_stage_ftb_entry_tailSlot_offset
output [19:0] io_out_last_stage_ftb_entry_tailSlot_lower
output [1:0] io_out_last_stage_ftb_entry_tailSlot_tarStat
output io_out_last_stage_ftb_entry_tailSlot_sharing
output io_out_last_stage_ftb_entry_tailSlot_valid
output [3:0] io_out_last_stage_ftb_entry_pftAddr
output io_out_last_stage_ftb_entry_carry
output io_out_last_stage_ftb_entry_isCall
output io_out_last_stage_ftb_entry_isRet
output io_out_last_stage_ftb_entry_isJalr
output io_out_last_stage_ftb_entry_last_may_be_rvi_call
output io_out_last_stage_ftb_entry_always_taken_0
output io_out_last_stage_ftb_entry_always_taken_1

Other Meta information can be found in the corresponding sub-predictor documentation

_ubtb_io_out_last_stage_meta

_tage_io_out_last_stage_meta

_ftb_io_out_last_stage_meta

ittage_io_out_last_stage_meta[100:0]

位宽 信号名 备注
100 s3_provided 是否有结果
[99:97] s3_provider 提供结果的表项
96 s3_altProvided 是否有替代预测表项
[95:93] s3_altProvider 提供结果的替代预测表项
92 resp_meta_altDiffers 替代预测是否是弱信心的(FTB不算)
91 s3_providerU 主预测的useful bit
[90:89] s3_providerCtr 主预测给出的置信度
[88:87] s3_altProviderCtr 替代预测给出的置信度
86 resp_meta_allocate_valid_r 有空余的表项可供申请
[85:83] resp_meta_allocate_bits_r 申请哪个表中的表项
82 s3_tageTaken_dup_3 在不使用FTB的情况下始为true,使用FTB也为true
[81:41] s3_providerTarget 主预测给出的跳转地址
[40:0] s3_altProviderTarget 替代预测给出的跳转地址
Last modified September 13, 2024: Update the picture of BPU Top. (431c050)