Introduction to the Timing of Xiangshan Branch Prediction Unit
Categories:
Single-Cycle Prediction without Bubble
uFTB
is the only predictor in Xiangshan BPU that can generate prediction results in a single cycle. The figure below shows the prediction process of uFTB
. The s0_pc
is sent from the top level of BPU, and when the s1 stage is active, the s1_pc
retains the value of s0_pc
from the previous cycle. This means that the value of s0_pc
will move down the pipeline.
When the s1 stage is active, uFTB
receives the s1_fire
signal from the current cycle and generates a prediction result based on the s1_pc
address in this cycle, which can obtain the new PC value in the prediction result.
As shown in the figure, the top level of BPU analyzes the next PC value position based on the prediction result channel s1 and sends it to npc_Gen
(new PC generator) for generating the s0_pc of the next cycle.
In the next cycle, uFTB
gets the new PC value and starts generating the prediction block for the new PC value. Therefore, with only the s1 stage, the prediction block can be generated at a rate of one block per cycle.
Prediction Result Redirection
However, except for uFTB
, other predictors require 2-3 cycles to generate prediction results. How to utilize their prediction results? And how to generate the prediction result redirection signal?
As shown in the figure, a Predirector 2
that takes two cycles to generate a prediction result can output its prediction result to the s2 prediction result channel in the s2 stage. After the top level of BPU receives the prediction result, it analyzes the jump target address target
of the prediction block and connects it to npc_Gen
.
At this point, the signal connected to npc_Gen
contains both the old PC prediction result generated by s2 and the new PC prediction result generated by s1. How to choose which one to use for the new PC?
As mentioned earlier, BPU compares the prediction result of s2 with the prediction result of s1 from the previous cycle. If the prediction results are different, it indicates that s1 has made a wrong prediction, and naturally, the prediction result of the current cycle generated based on the wrong prediction result of the previous cycle is also wrong. Therefore, if the prediction result is incorrect in the current cycle, npc_Gen
will use the target
provided by s2 as the new s0_pc
.
This process is shown in the pipeline structure diagram as follows:
The Diff comparator compares the prediction results of the s1 stage with those of the previous cycle to generate a diff signal, guiding npc_Gen
to generate the next PC. At the same time, the diff signal indicates that the prediction result of the s1 stage is incorrect and can be used directly by BPU to redirect the prediction result channel of the s2 stage in the FTQ, instructing the FTQ to overwrite the previous prediction result.
The diff signal is also sent to each predictor through the s2_redirect interface to guide the predictors to update their states.
Furthermore, when the prediction result redirection of the s2 stage occurs, indicating that the prediction result of the s1 channel is incorrect, the s2 stage cannot continue to predict and needs to invalidate the s2_fire
signal of the predictor pipeline and wait for the corrected prediction result to flow in.
The prediction result redirection of the s3 stage is similar to this. Its pipeline structure diagram is as follows. The specific processing process is left for you to analyze.
Redirection Requests and Other Information Generation
Only when the prediction information of all three stages is incorrect will an external redirection request occur. At this time, npc_Gen
will receive the PC address from the redirection request. Since when a redirection request occurs, we assume that all three stages have predicted incorrectly, so all three stages’ fire
signals need to be invalidated. Then, npc_Gen
uses the PC that needs to be restored from the redirection request to restart the prediction.
Other information, such as the generation of the global history and the PC, follows the same principle and is maintained based on the prediction information of each stage. The global history generates a new branch history based on the prediction results of each stage.
Pipeline Control Signals
After learning about the specific process of the pipeline, you should understand the pipeline control signals in the predictor interface, as follows:
- s0_fire, s1_fire, s2_fire, s3_fire Indicate whether each stage of the pipeline is working.
- s2_redirect, s3_redirect Indicate whether a prediction result redirection has occurred.
- s1_ready, s2_ready, s3_ready Sent from the predictor to the top level of BPU, indicating whether each stage of the pipeline is ready.
Conclusion
By now, you should understand the basic design principles, external interaction logic, internal structure, timing, etc., of the Xiangshan Branch Prediction Unit, and have a rough understanding of the working principle of BPU. Xiangshan’s BPU is no longer mysterious to you.
Next, you can read the Important Structures and Interfaces Document
and combine it with the source code of Xiangshan BPU to form a more detailed understanding of BPU. When you clearly understand the working principle and signal details of BPU, you can start your verification work!