Introduction to the Xiangshan Branch Prediction Unit Structure

This section introduces the structure of the Xiangshan Branch Prediction Unit (BPU), including the integration of multiple predictors and multiple pipeline schemes, as well as the organization structure and interface design of internal sub-predictors, demonstrating how the BPU interacts with the Composer, and explaining the connection methods between sub-predictors.

How Does the BPU Integrate Internal Sub-predictors?

We already know that the Xiangshan BPU adopts multiple predictors and multiple pipeline schemes. To adapt to multiple pipelines, the BPU uses a three-channel result output interface. But how does it adapt to multiple predictors? This requires us to further explore the internal structure of the BPU.

The above figure is the BPU architecture diagram from the Xiangshan documentation. Currently, we only need to focus on one piece of information: all internal sub-predictors are encapsulated in a structure called Composer. The BPU only needs to interact with Composer.

What is Composer? Let’s first look at their definition in the Xiangshan code.

It can be seen that Composer and the five sub-predictors have a common characteristic: they all inherit from the BasePredictor base class. And the interface has been defined in the BasePredictor class. In other words, Composer and the five sub-predictors all have the same interface! The top-level BPU can directly regard Composer as a sub-predictor, without worrying about how the internal sub-predictors are connected.

Sub-predictor Interface

Next, we will look at what the sub-predictor interface looks like. This interface will involve the interaction between Composer and the top-level BPU, as well as the interaction between each sub-predictor and Composer.

Let’s take Composer as an example to illustrate the structure of the sub-predictor interface.

As shown in the above figure, the three-channel prediction results of Composer are directly output to the outside of the BPU. There is also a set of three-channel prediction results connected from the inside of the BPU to Composer. However, since the prediction results are generated by Composer, the BPU will pass an empty prediction result to Composer. The significance of this is to make the sub-predictor act as a “processor.” The sub-predictor will process the input prediction results and then output the processed prediction results.

Next, the top-level BPU will provide the information needed for prediction to the pipeline. First is the PC and branch history records (including global history and global folding history). Next, the BPU will connect some pipeline control signals between Composer and the pipeline control signals. Finally, the BPU will directly connect the externally input redirect request interface and update interface to Composer.

In the end, a simple definition of the sub-predictor interface can be given (for detailed definitions, please refer to the interface documentation):

  • in
    • (s1, s2, s3) Prediction information input
    • s0_pc PC to be predicted
    • ghist Global branch history
    • folded_hist Global folding history
  • out (s1, s2, s3) Prediction information output
  • 流水线控制信号
    • s0_fire, s1_fire, s2_fire, s3_fire Whether the corresponding pipeline stage is working
    • s2_redirect, s3_redirect Redirect signals when a prediction error is discovered in the subsequent pipeline stage
    • s1_ready, s2_ready, s3_ready Whether the sub-predictor corresponding pipeline stage is ready
  • update Update request
  • redirect Redirect request

Connection Between Sub-predictors

We now know that the interfaces between each sub-predictor and Composer are the same, and we also know how Composer is connected to the top-level BPU. This section will explain how sub-predictors are connected within Composer.

The above figure shows the connection structure of sub-predictors in Composer. It can be seen that after the three-channel prediction results are input into Composer, they are first processed by uFTB and then output. They are then successively processed by TAGE-SC, FTB, ITTAGE, and RAS, and finally connected to the prediction result output of Composer, which is then directly connected to the outside of the BPU by Composer.

For other signals, because the interfaces between Composer and each sub-predictor are the same, they are directly connected to the corresponding interfaces of each predictor by Composer, without much additional processing.

Prediction Result Interface Connection

For sub-predictors, the connection of their prediction result is that the prediction result output of one predictor is the input of the next predictor. However, it should be noted that this connection is a combinational circuit connection and is not affected by timing.

As shown in the above figure, taking the s1 channel as an example, from input to the output of the last predictor, it is all modified by combinational circuits, unaffected by timing. Registers only exist between the s1, s2, and s3 channels.

Therefore, increasing the number of sub-predictors will not increase the number of cycles required for prediction, but will only increase the delay required for prediction per cycle.

Last modified September 13, 2024: Update the picture of BPU Top. (431c050)