Introduction to the Timing of Xiangshan Branch Prediction Unit

The timing design of the three-stage pipeline is the essence of the Xiangshan BPU. This section will introduce how the prediction result redirection signal is generated, how a new PC is generated based on the prediction result, and how the prediction results of the three channels are handled.

Single-Cycle Prediction without Bubble

uFTB is the only predictor in Xiangshan BPU that can generate prediction results in a single cycle. The figure below shows the prediction process of uFTB. The s0_pc is sent from the top level of BPU, and when the s1 stage is active, the s1_pc retains the value of s0_pc from the previous cycle. This means that the value of s0_pc will move down the pipeline.

When the s1 stage is active, uFTB receives the s1_fire signal from the current cycle and generates a prediction result based on the s1_pc address in this cycle, which can obtain the new PC value in the prediction result.

As shown in the figure, the top level of BPU analyzes the next PC value position based on the prediction result channel s1 and sends it to npc_Gen (new PC generator) for generating the s0_pc of the next cycle.

In the next cycle, uFTB gets the new PC value and starts generating the prediction block for the new PC value. Therefore, with only the s1 stage, the prediction block can be generated at a rate of one block per cycle.

Prediction Result Redirection

However, except for uFTB, other predictors require 2-3 cycles to generate prediction results. How to utilize their prediction results? And how to generate the prediction result redirection signal?

As shown in the figure, a Predirector 2 that takes two cycles to generate a prediction result can output its prediction result to the s2 prediction result channel in the s2 stage. After the top level of BPU receives the prediction result, it analyzes the jump target address target of the prediction block and connects it to npc_Gen.

At this point, the signal connected to npc_Gen contains both the old PC prediction result generated by s2 and the new PC prediction result generated by s1. How to choose which one to use for the new PC?

As mentioned earlier, BPU compares the prediction result of s2 with the prediction result of s1 from the previous cycle. If the prediction results are different, it indicates that s1 has made a wrong prediction, and naturally, the prediction result of the current cycle generated based on the wrong prediction result of the previous cycle is also wrong. Therefore, if the prediction result is incorrect in the current cycle, npc_Gen will use the target provided by s2 as the new s0_pc.

This process is shown in the pipeline structure diagram as follows:

The Diff comparator compares the prediction results of the s1 stage with those of the previous cycle to generate a diff signal, guiding npc_Gen to generate the next PC. At the same time, the diff signal indicates that the prediction result of the s1 stage is incorrect and can be used directly by BPU to redirect the prediction result channel of the s2 stage in the FTQ, instructing the FTQ to overwrite the previous prediction result.

The diff signal is also sent to each predictor through the s2_redirect interface to guide the predictors to update their states.

Furthermore, when the prediction result redirection of the s2 stage occurs, indicating that the prediction result of the s1 channel is incorrect, the s2 stage cannot continue to predict and needs to invalidate the s2_fire signal of the predictor pipeline and wait for the corrected prediction result to flow in.

The prediction result redirection of the s3 stage is similar to this. Its pipeline structure diagram is as follows. The specific processing process is left for you to analyze.

Redirection Requests and Other Information Generation

Only when the prediction information of all three stages is incorrect will an external redirection request occur. At this time, npc_Gen will receive the PC address from the redirection request. Since when a redirection request occurs, we assume that all three stages have predicted incorrectly, so all three stages’ fire signals need to be invalidated. Then, npc_Gen uses the PC that needs to be restored from the redirection request to restart the prediction.

Other information, such as the generation of the global history and the PC, follows the same principle and is maintained based on the prediction information of each stage. The global history generates a new branch history based on the prediction results of each stage.

Pipeline Control Signals

After learning about the specific process of the pipeline, you should understand the pipeline control signals in the predictor interface, as follows:

s0_fire, s1_fire, s2_fire, s3_fire Indicate whether each stage of the pipeline is working.
s2_redirect, s3_redirect Indicate whether a prediction result redirection has occurred.
s1_ready, s2_ready, s3_ready Sent from the predictor to the top level of BPU, indicating whether each stage of the pipeline is ready.

Conclusion

By now, you should understand the basic design principles, external interaction logic, internal structure, timing, etc., of the Xiangshan Branch Prediction Unit, and have a rough understanding of the working principle of BPU. Xiangshan’s BPU is no longer mysterious to you.

Next, you can read the Important Structures and Interfaces Document and combine it with the source code of Xiangshan BPU to form a more detailed understanding of BPU. When you clearly understand the working principle and signal details of BPU, you can start your verification work!

Last modified May 6, 2025: Change version of hugo to 0.145.0 (b8f1347)