ITTAGE Branch Predictor
Categories:
Function Introduction
For general conditional branch instructions, only predicting whether to jump (taken) or not (not taken) is needed. However, for indirect jumps, such as call/jump instructions, it is necessary to predict where to jump to (Target). In order to make TAGE support predicting jump addresses, ITTAGE (Indirect Target TAGE) was introduced.
The main difference between ITTAGE and TAGE is that in the T0 and Tn tables, Target PC data is added. During prediction, ITTAGE selects the Target from the matched, longest history entry as the prediction result, and uses a 2-bit saturating counter to decide whether to output this result or choose an alternative prediction result. For TAGE predictor details, please refer to TAGE-SC Branch Predictor.
Kunming Lake ITTAGE Branch Predictor
In the BPU design of Kunming Lake, prediction is performed in a cascaded manner with multiple predictors, so the implementation of the subpredictor differs from the original predictor, mainly in the default prediction result.
Basic Functionality
ITTAGE’s basic functionality is similar to the TAGE branch predictor, but with the following differences:
- The Target is added as a jump target address item in the entry to predict the jump target address.
- The saturating counter ctr no longer provides the prediction direction, but instead decides whether to output the result (just the prediction information).
- Since there is only one indirect jump instruction in each branch prediction block, ITTAGE only considers one instruction.
Pipeline
ITTAGE contains three pipeline stages, the first stage calculates the index, and the second stage reads the result from the SRAM table using the index.
- Cycle 0, s0: Input of the first pipeline stage, generally pc and folded history.
Operation of the first pipeline stage:Calculate the index. Output through registers to s1.
- Cycle 1, s1: Input of the second pipeline stage, the index and other data calculated in the first stage.
Operation of the second pipeline stage:Access SRAM, read prediction information. Output through registers to s2.
- Cycle 2, s2: Input of the third pipeline stage, the original prediction information read from SRAM in the second stage.
Operation of the third pipeline stage:Process the original prediction information, decide whether to output the prediction result.
- Cycle 3, s3: Prediction result ready, the prediction result can now be used.
Data Structure
In the Kunming Lake implementation, the table structure of T0 and Tn is as follows:
预测器 | 作用 | 表项构成 | 项数 |
---|---|---|---|
基准预测器T0 | 用于在其他预测器的预测结果都无效时输出预测结果 | 虚表,不存在。 直接将上级预测器FTB 的预测结果作为表项结果 | 虚表,不存在。 直接将上级预测器FTB结果作为索引到的结果 |
预测表T1-T2 | 对每个预测块的输入,所有Tn表都进行预测,在所有预测有效的结果中,选择历史记录最长的结果作为 原始预测信息。历史记录长度由输入的H决定 | target:41 bitsvalid 1bittag 9bitsctr 2bitsus: 1bit(usefulness计数器) | 256项 |
预测表T3-T5 | 512项 |
T0,TnTable Retrieval Method
The retrieval method is consistent with the TAGE branch predictor, only differing in the configuration options of each table.
表名称 | FH长度 | FH1长度 | FH2长度 | 最近历史长度(用到GH中的位数) |
---|---|---|---|---|
T1 | 4比特 | 4比特 | 4比特 | 低4位,即把最新4位历史,折叠成FH、FH1、FH2 |
T2 | 8比特 | 8比特 | 8比特 | 低8位,即把最新8位历史,折叠成FH、FH1、FH2 |
T3 | 9比特 | 9比特 | 8比特 | 低13位,即把最新13位历史,折叠成FH、FH1、FH2 |
T4 | 9比特 | 9比特 | 8比特 | 低16位,即把最新16位历史,折叠成FH、FH1、FH2 |
T5 | 9比特 | 9比特 | 8比特 | 低32位,即把最新32位历史,折叠成FH、FH1、FH2 |
Other processes (computation method and computation formula) are similar to the TAGE-SC branch predictor.
Alternate Predictor
When the prediction result given by the Tn table has insufficient “prediction confidence,” the prediction result needs to be jumped to become an “alternate predictor.” This process is similar to TAGE. For details, please refer to the corresponding part of TAGE. Unlike TAGE, ITTAGE’s ctr does not give the prediction direction but only determines whether to output the result (prediction confidence). When ctr is 2b00, it is considered weak confidence. Choose the alternate prediction result:
- If multiple tables are hit, output the Target from the second-longest history table entry.
- Otherwise, output the T0 Target (FTB Target).
Prediction Process
The prediction process is similar to TAGE, but ITTAGE has an additional step to decide whether to output the prediction result based on ctr. The specific process is as follows:
- When the ctr of the ITTAGE table entry is not 2b00, output Target.
- When the ctr of the ITTAGE table entry is 2b00, output the alternate prediction result:
- If there is a second-longest history (the second table is also hit), output the Target of the second-longest.
- Otherwise, output the FTB Target.
- When the ITTAGE table entry is not hit, output the T0 Target (FTB Target).
Training Process
This process is similar to TAGE, with the following differences:
- Table entry updates (original prediction data):
- ctr:
- If the predicted address matches the actual address, increment the ctr counter of the corresponding provider table entry by 1.
- If the predicted address does not match the actual address, decrement the ctr counter of the corresponding provider table entry by 1.
- In ITTAGE, it is determined based on ctr whether to adopt the jump target result of this prediction. If multiple tables are hit and the ctr of the longest history table is 0, adopt the alternate prediction logic (the second-longest history table or T0). Always update the longest history table during updates, and also update the alternate prediction table if the alternate prediction is adopted.
- target:
- When the ctr of the table entry to be updated is 0 during this prediction, directly store the actual final jump result in the target, overwriting it.
- When applying for a new table entry, directly store the actual final jump result in the target.
- Otherwise, do not modify the target.
- usefulness:
- When the provider’s prediction is correct but the alternate prediction is incorrect, set the provider’s usefulness to 1.
- If the alternate prediction has weak confidence and is correct, set the provider’s usefulness to 1. If the alternate prediction has weak confidence and is incorrect, set the provider’s usefulness to 0.
- New table entry:
- Each time the prediction from the longest history table with confidence is incorrect (not due to using the alternate prediction), try to randomly apply for a table entry from a longer history table. The condition for application is that the usefulness of the corresponding entry is 0.
- If all longer entries are not 0, the allocation fails.
- ctr:
- Reset useful bit:
- Each time a prediction error occurs and a new table entry is applied for, if the allocation fails, increment tickCtr (an 8-bit saturated counter used to reset all usefulness). If successful, decrement tickCtr.
- When tickCtr reaches its maximum value, set all usefulness in ITTAGE to 0 and reset tickCtr to 0.
Interface List
接口类型 | 位宽 | 信号名 | 备注 |
---|---|---|---|
input | clock | ||
input | reset | ||
input | [40:0] | io_in_bits_s0_pc_3 | 用于预测的PC |
input | [7:0] | io_in_bits_folded_hist_3_hist_14_folded_hist | T2 折叠历史 |
input | [8:0] | io_in_bits_folded_hist_3_hist_13_folded_hist | T3 折叠历史 |
input | [3:0] | io_in_bits_folded_hist_3_hist_12_folded_hist | T1 折叠历史 |
input | [8:0] | io_in_bits_folded_hist_3_hist_10_folded_hist | T5 折叠历史 |
input | [8:0] | io_in_bits_folded_hist_3_hist_6_folded_hist | T4 折叠历史 |
input | [7:0] | io_in_bits_folded_hist_3_hist_4_folded_hist | T3 折叠历史 |
input | [7:0] | io_in_bits_folded_hist_3_hist_3_folded_hist | T5 折叠历史 |
input | [7:0] | io_in_bits_folded_hist_3_hist_2_folded_hist | T4 折叠历史 |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_0_jalr_target | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_1_jalr_target | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_2_jalr_target | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_3_jalr_target | |
output | [40:0] | io_out_s3_full_pred_0_jalr_target | |
output | [40:0] | io_out_s3_full_pred_1_jalr_target | |
output | [40:0] | io_out_s3_full_pred_2_jalr_target | |
output | [40:0] | io_out_s3_full_pred_3_jalr_target | |
output | [222:0] | io_out_last_stage_meta | [100:0] 有效,是ITTAGE的Meta信息 |
input | io_s0_fire_3 | s0阶段使能信号 | |
input | io_s1_fire_3 | s1阶段使能信号 | |
input | io_s2_fire_0 | s2阶段使能信号,相同 | |
input | io_s2_fire_1 | ||
input | io_s2_fire_2 | ||
input | io_s2_fire_3 | ||
input | io_update_valid | 是否进行更新 | |
input | [40:0] | io_update_bits_pc | 待更新的预测块pc索引 |
input | [7:0] | io_update_bits_spec_info_folded_hist_hist_14_folded_hist | T2 更新时传入的历史 |
input | [8:0] | io_update_bits_spec_info_folded_hist_hist_13_folded_hist | T3 更新时传入的历史 |
input | [3:0] | io_update_bits_spec_info_folded_hist_hist_12_folded_hist | T1 更新时传入的历史 |
input | [8:0] | io_update_bits_spec_info_folded_hist_hist_10_folded_hist | T5 更新时传入的历史 |
input | [8:0] | io_update_bits_spec_info_folded_hist_hist_6_folded_hist | T4 更新时传入的历史 |
input | [7:0] | io_update_bits_spec_info_folded_hist_hist_4_folded_hist | T3 更新时传入的历史 |
input | [7:0] | io_update_bits_spec_info_folded_hist_hist_3_folded_hist | T5 更新时传入的历史 |
input | [7:0] | io_update_bits_spec_info_folded_hist_hist_2_folded_hist | T4 更新时传入的历史 |
input | [3:0] | io_update_bits_ftb_entry_tailSlot_offset | 待更新的FTB项offset |
input | io_update_bits_ftb_entry_tailSlot_sharing | 待更新的FTB项是否是有条件跳转 | |
input | io_update_bits_ftb_entry_tailSlot_valid | 待更新的tailSlot是否启用 | |
input | io_update_bits_ftb_entry_isRet | tailSlot是否是Ret指令 | |
input | io_update_bits_ftb_entry_isJalr | tailSlot是否是Jalr指令 | |
input | io_update_bits_cfi_idx_valid | 控制流指令在预测块中的索引.valid信号 | |
input | [3:0] | io_update_bits_cfi_idx_bits | 控制流指令在预测块中的索引 |
input | io_update_bits_jmp_taken | 预测块内无条件跳转指令被触发 | |
input | io_update_bits_mispred_mask_2 | 是否预测错误 | |
input | [222:0] | io_update_bits_meta | 预测时传出 meta 信息的[222:25] 即{25h0, _ubtb_io_out_last_stage_meta[5:0] ,_tage_io_out_last_stage_meta[87:0] ,_ftb_io_out_last_stage_meta[2:0], _ittage_io_out_last_stage_meta[100:0]} |
input | [40:0] | io_update_bits_full_target | 预测块的跳转目标(下一个预测块的起始地址) |
Pass-through signals that do not have an impact
These signals do not have an impact and are not important
接口类型 | 位宽 | 信号名 | 备注 |
---|---|---|---|
input | io_in_bits_resp_in_0_s2_full_pred_0_br_taken_mask_0 | 从FTB输入 完全透传到输出 包括jalr_target | |
input | io_in_bits_resp_in_0_s2_full_pred_0_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s2_full_pred_0_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_0_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_0_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_0_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_0_jalr_target | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_0_offsets_0 | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_0_offsets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_0_fallThroughAddr | |
input | io_in_bits_resp_in_0_s2_full_pred_0_is_br_sharing | ||
input | io_in_bits_resp_in_0_s2_full_pred_0_hit | ||
input | io_in_bits_resp_in_0_s2_full_pred_1_br_taken_mask_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_1_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s2_full_pred_1_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_1_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_1_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_1_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_1_jalr_target | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_1_offsets_0 | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_1_offsets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_1_fallThroughAddr | |
input | io_in_bits_resp_in_0_s2_full_pred_1_is_br_sharing | ||
input | io_in_bits_resp_in_0_s2_full_pred_1_hit | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_br_taken_mask_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_2_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_2_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_2_jalr_target | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_2_offsets_0 | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_2_offsets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_2_fallThroughAddr | |
input | io_in_bits_resp_in_0_s2_full_pred_2_is_jalr | RAS 模块使用的信息,透传 | |
input | io_in_bits_resp_in_0_s2_full_pred_2_is_call | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_is_ret | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_last_may_be_rvi_call | ||
input | io_in_bits_resp_in_0_s2_full_pred_2_is_br_sharing | 从FTB输入 完全透传到输出 包括jalr_target fallThroughErr 表示 FTB项 中记录的 pftAddr 有误 生成方式:比较 pftAddr 代表的预测块结束地址是否大于预测块的起始地址,如果小于,则代表出现错误,此信号置为有效。这种情况可能会发生在 pc 索引到错误的 FTB 项的情况。 FTQ使用这个变量,与ITTAGE无关 | |
input | io_in_bits_resp_in_0_s2_full_pred_2_hit | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_br_taken_mask_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_3_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_3_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_3_jalr_target | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_3_offsets_0 | |
input | [3:0] | io_in_bits_resp_in_0_s2_full_pred_3_offsets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s2_full_pred_3_fallThroughAddr | |
input | io_in_bits_resp_in_0_s2_full_pred_3_fallThroughErr | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_is_br_sharing | ||
input | io_in_bits_resp_in_0_s2_full_pred_3_hit | ||
input | io_in_bits_resp_in_0_s3_full_pred_0_br_taken_mask_0 | 除了 jalr_target 可能被修改,其他都是透传 | |
input | io_in_bits_resp_in_0_s3_full_pred_0_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s3_full_pred_0_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s3_full_pred_0_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_0_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_0_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_0_fallThroughAddr | |
input | io_in_bits_resp_in_0_s3_full_pred_0_fallThroughErr | ||
input | io_in_bits_resp_in_0_s3_full_pred_0_is_br_sharing | ||
input | io_in_bits_resp_in_0_s3_full_pred_0_hit | ||
input | io_in_bits_resp_in_0_s3_full_pred_1_br_taken_mask_0 | 同上 | |
input | io_in_bits_resp_in_0_s3_full_pred_1_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s3_full_pred_1_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s3_full_pred_1_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_1_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_1_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_1_fallThroughAddr | |
input | io_in_bits_resp_in_0_s3_full_pred_1_fallThroughErr | ||
input | io_in_bits_resp_in_0_s3_full_pred_1_is_br_sharing | ||
input | io_in_bits_resp_in_0_s3_full_pred_1_hit | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_br_taken_mask_0 | 同上 | |
input | io_in_bits_resp_in_0_s3_full_pred_2_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_2_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_2_targets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_2_fallThroughAddr | |
input | io_in_bits_resp_in_0_s3_full_pred_2_fallThroughErr | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_is_jalr | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_is_call | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_is_ret | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_is_br_sharing | ||
input | io_in_bits_resp_in_0_s3_full_pred_2_hit | ||
input | io_in_bits_resp_in_0_s3_full_pred_3_br_taken_mask_0 | 同上 | |
input | io_in_bits_resp_in_0_s3_full_pred_3_br_taken_mask_1 | ||
input | io_in_bits_resp_in_0_s3_full_pred_3_slot_valids_0 | ||
input | io_in_bits_resp_in_0_s3_full_pred_3_slot_valids_1 | ||
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_3_targets_0 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_3_targets_1 | |
input | [3:0] | io_in_bits_resp_in_0_s3_full_pred_3_offsets_0 | |
input | [3:0] | io_in_bits_resp_in_0_s3_full_pred_3_offsets_1 | |
input | [40:0] | io_in_bits_resp_in_0_s3_full_pred_3_fallThroughAddr | |
input | io_in_bits_resp_in_0_s3_full_pred_3_fallThroughErr | ||
input | io_in_bits_resp_in_0_s3_full_pred_3_is_br_sharing | ||
input | io_in_bits_resp_in_0_s3_full_pred_3_hit | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_valid | 透传到output,不做修改 来源是FTB | |
input | [3:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_offset | |
input | [11:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_lower | |
input | [1:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_tarStat | |
input | io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_sharing | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_brSlots_0_valid | ||
input | [3:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_offset | |
input | [19:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_lower | |
input | [1:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_tarStat | |
input | io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_sharing | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_tailSlot_valid | ||
input | [3:0] | io_in_bits_resp_in_0_last_stage_ftb_entry_pftAddr | |
input | io_in_bits_resp_in_0_last_stage_ftb_entry_carry | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_isCall | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_isRet | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_isJalr | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_last_may_be_rvi_call | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_always_taken_0 | ||
input | io_in_bits_resp_in_0_last_stage_ftb_entry_always_taken_1 | ||
output | io_out_s2_full_pred_0_br_taken_mask_0 | 完全透传传入值 prefix: io_in_bits_resp_in_ | |
output | io_out_s2_full_pred_0_br_taken_mask_1 | ||
output | io_out_s2_full_pred_0_slot_valids_0 | ||
output | io_out_s2_full_pred_0_slot_valids_1 | ||
output | [40:0] | io_out_s2_full_pred_0_targets_0 | |
output | [40:0] | io_out_s2_full_pred_0_targets_1 | |
output | [40:0] | io_out_s2_full_pred_0_jalr_target | |
output | [3:0] | io_out_s2_full_pred_0_offsets_0 | |
output | [3:0] | io_out_s2_full_pred_0_offsets_1 | |
output | [40:0] | io_out_s2_full_pred_0_fallThroughAddr | |
output | io_out_s2_full_pred_0_is_br_sharing | ||
output | io_out_s2_full_pred_0_hit | ||
output | io_out_s2_full_pred_1_br_taken_mask_0 | ||
output | io_out_s2_full_pred_1_br_taken_mask_1 | ||
output | io_out_s2_full_pred_1_slot_valids_0 | ||
output | io_out_s2_full_pred_1_slot_valids_1 | ||
output | [40:0] | io_out_s2_full_pred_1_targets_0 | |
output | [40:0] | io_out_s2_full_pred_1_targets_1 | |
output | [40:0] | io_out_s2_full_pred_1_jalr_target | |
output | [3:0] | io_out_s2_full_pred_1_offsets_0 | |
output | [3:0] | io_out_s2_full_pred_1_offsets_1 | |
output | [40:0] | io_out_s2_full_pred_1_fallThroughAddr | |
output | io_out_s2_full_pred_1_is_br_sharing | ||
output | io_out_s2_full_pred_1_hit | ||
output | io_out_s2_full_pred_2_br_taken_mask_0 | ||
output | io_out_s2_full_pred_2_br_taken_mask_1 | ||
output | io_out_s2_full_pred_2_slot_valids_0 | ||
output | io_out_s2_full_pred_2_slot_valids_1 | ||
output | [40:0] | io_out_s2_full_pred_2_targets_0 | |
output | [40:0] | io_out_s2_full_pred_2_targets_1 | |
output | [40:0] | io_out_s2_full_pred_2_jalr_target | |
output | [3:0] | io_out_s2_full_pred_2_offsets_0 | |
output | [3:0] | io_out_s2_full_pred_2_offsets_1 | |
output | [40:0] | io_out_s2_full_pred_2_fallThroughAddr | |
output | io_out_s2_full_pred_2_is_jalr | ||
output | io_out_s2_full_pred_2_is_call | ||
output | io_out_s2_full_pred_2_is_ret | ||
output | io_out_s2_full_pred_2_last_may_be_rvi_call | ||
output | io_out_s2_full_pred_2_is_br_sharing | ||
output | io_out_s2_full_pred_2_hit | ||
output | io_out_s2_full_pred_3_br_taken_mask_0 | ||
output | io_out_s2_full_pred_3_br_taken_mask_1 | ||
output | io_out_s2_full_pred_3_slot_valids_0 | ||
output | io_out_s2_full_pred_3_slot_valids_1 | ||
output | [40:0] | io_out_s2_full_pred_3_targets_0 | |
output | [40:0] | io_out_s2_full_pred_3_targets_1 | |
output | [40:0] | io_out_s2_full_pred_3_jalr_target | |
output | [3:0] | io_out_s2_full_pred_3_offsets_0 | |
output | [3:0] | io_out_s2_full_pred_3_offsets_1 | |
output | [40:0] | io_out_s2_full_pred_3_fallThroughAddr | |
output | io_out_s2_full_pred_3_fallThroughErr | ||
output | io_out_s2_full_pred_3_is_br_sharing | ||
output | io_out_s2_full_pred_3_hit | ||
output | io_out_s3_full_pred_0_br_taken_mask_0 | 见对应prefix的输入 | |
output | io_out_s3_full_pred_0_br_taken_mask_1 | ||
output | io_out_s3_full_pred_0_slot_valids_0 | ||
output | io_out_s3_full_pred_0_slot_valids_1 | ||
output | [40:0] | io_out_s3_full_pred_0_targets_0 | |
output | [40:0] | io_out_s3_full_pred_0_targets_1 | |
output | [40:0] | io_out_s3_full_pred_0_fallThroughAddr | |
output | io_out_s3_full_pred_0_fallThroughErr | ||
output | io_out_s3_full_pred_0_is_br_sharing | ||
output | io_out_s3_full_pred_0_hit | ||
output | io_out_s3_full_pred_1_br_taken_mask_0 | 见对应prefix的输入 | |
output | io_out_s3_full_pred_1_br_taken_mask_1 | ||
output | io_out_s3_full_pred_1_slot_valids_0 | ||
output | io_out_s3_full_pred_1_slot_valids_1 | ||
output | [40:0] | io_out_s3_full_pred_1_targets_0 | |
output | [40:0] | io_out_s3_full_pred_1_targets_1 | |
output | [40:0] | io_out_s3_full_pred_1_fallThroughAddr | |
output | io_out_s3_full_pred_1_fallThroughErr | ||
output | io_out_s3_full_pred_1_is_br_sharing | ||
output | io_out_s3_full_pred_1_hit | ||
output | io_out_s3_full_pred_2_br_taken_mask_0 | 见对应prefix的输入 | |
output | io_out_s3_full_pred_2_br_taken_mask_1 | ||
output | io_out_s3_full_pred_2_slot_valids_0 | ||
output | io_out_s3_full_pred_2_slot_valids_1 | ||
output | [40:0] | io_out_s3_full_pred_2_targets_0 | |
output | [40:0] | io_out_s3_full_pred_2_targets_1 | |
output | [40:0] | io_out_s3_full_pred_2_fallThroughAddr | |
output | io_out_s3_full_pred_2_fallThroughErr | ||
output | io_out_s3_full_pred_2_is_jalr | ||
output | io_out_s3_full_pred_2_is_call | ||
output | io_out_s3_full_pred_2_is_ret | ||
output | io_out_s3_full_pred_2_is_br_sharing | ||
output | io_out_s3_full_pred_2_hit | ||
output | io_out_s3_full_pred_3_br_taken_mask_0 | 见对应prefix的输入 | |
output | io_out_s3_full_pred_3_br_taken_mask_1 | ||
output | io_out_s3_full_pred_3_slot_valids_0 | ||
output | io_out_s3_full_pred_3_slot_valids_1 | ||
output | [40:0] | io_out_s3_full_pred_3_targets_0 | |
output | [40:0] | io_out_s3_full_pred_3_targets_1 | |
output | [3:0] | io_out_s3_full_pred_3_offsets_0 | |
output | [3:0] | io_out_s3_full_pred_3_offsets_1 | |
output | [40:0] | io_out_s3_full_pred_3_fallThroughAddr | |
output | io_out_s3_full_pred_3_fallThroughErr | ||
output | io_out_s3_full_pred_3_is_br_sharing | ||
output | io_out_s3_full_pred_3_hit | ||
output | io_out_last_stage_ftb_entry_valid | 完全透传传入的值 | |
output | [3:0] | io_out_last_stage_ftb_entry_brSlots_0_offset | |
output | [11:0] | io_out_last_stage_ftb_entry_brSlots_0_lower | |
output | [1:0] | io_out_last_stage_ftb_entry_brSlots_0_tarStat | |
output | io_out_last_stage_ftb_entry_brSlots_0_sharing | ||
output | io_out_last_stage_ftb_entry_brSlots_0_valid | ||
output | [3:0] | io_out_last_stage_ftb_entry_tailSlot_offset | |
output | [19:0] | io_out_last_stage_ftb_entry_tailSlot_lower | |
output | [1:0] | io_out_last_stage_ftb_entry_tailSlot_tarStat | |
output | io_out_last_stage_ftb_entry_tailSlot_sharing | ||
output | io_out_last_stage_ftb_entry_tailSlot_valid | ||
output | [3:0] | io_out_last_stage_ftb_entry_pftAddr | |
output | io_out_last_stage_ftb_entry_carry | ||
output | io_out_last_stage_ftb_entry_isCall | ||
output | io_out_last_stage_ftb_entry_isRet | ||
output | io_out_last_stage_ftb_entry_isJalr | ||
output | io_out_last_stage_ftb_entry_last_may_be_rvi_call | ||
output | io_out_last_stage_ftb_entry_always_taken_0 | ||
output | io_out_last_stage_ftb_entry_always_taken_1 |
Other Meta information can be found in the corresponding sub-predictor documentation
_ubtb_io_out_last_stage_meta
_tage_io_out_last_stage_meta
_ftb_io_out_last_stage_meta
ittage_io_out_last_stage_meta[100:0]
位宽 | 信号名 | 备注 |
---|---|---|
100 | s3_provided | 是否有结果 |
[99:97] | s3_provider | 提供结果的表项 |
96 | s3_altProvided | 是否有替代预测表项 |
[95:93] | s3_altProvider | 提供结果的替代预测表项 |
92 | resp_meta_altDiffers | 替代预测是否是弱信心的(FTB不算) |
91 | s3_providerU | 主预测的useful bit |
[90:89] | s3_providerCtr | 主预测给出的置信度 |
[88:87] | s3_altProviderCtr | 替代预测给出的置信度 |
86 | resp_meta_allocate_valid_r | 有空余的表项可供申请 |
[85:83] | resp_meta_allocate_bits_r | 申请哪个表中的表项 |
82 | s3_tageTaken_dup_3 | 在不使用FTB的情况下始为true,使用FTB也为true |
[81:41] | s3_providerTarget | 主预测给出的跳转地址 |
[40:0] | s3_altProviderTarget | 替代预测给出的跳转地址 |