In this verification documentation crowdsourcing task, you will:

  1. Approach the CPL2 module of Xiangshan Kunming Lake architecture
  2. Understand the design methodology of Xiangshan L2Cache
  3. Deepen your knowledge of RISC-V instruction set architecture We welcome your participation! (Register here, QQ group: 1038700912)

CPL2 (CHI-CoupledL2) is a non-blocking level-2 cache memory. Its main functions include: receiving prefetch requests from L1, supporting BOP prefetching, supporting Temporal prefetching, and supporting the merging of memory access requests and prefetch requests.

In this phase of the task, a total of 14 subtasks were released, covering 20 submodules of CPL2.

Specification Drafting

For this verification documentation crowdsourcing task, please:

  1. Read and understand the corresponding Xiangshan module code
  2. Complete the documentation work based on our provided template
  3. Submit a PR to the repository

Contribution Submission

Please fork the XiangShan-Design-Doc repository, complete the documentation work, and initiate a PR for submission when all deliverables are ready.

Task Difficulty

Task difficulty is comprehensively determined by factors including comprehension complexity and workload. Generally:

  • Difficulty 1-3: Simple tasks
  • Difficulty 4-7: Moderate tasks
    (May involve heavier workload or require time to understand hidden requirements)
  • Difficulty 8-10: Challenging tasks
    (Typically combine significant workload with high comprehension complexity)

Reward Information

Participants will receive varying monetary awards based on:

  • Task difficulty level (1-10 scale)
  • Completion quality assessment

Task Details

The CPL2 design documentation is available for download here. Below are detailed descriptions of each subtask in this phase:

Task 1.1: SinkA & RequestBuffer Submodules

SinkA processes requests from Bus Channel A and the prefetcher, converts them into internal task format, and then sends them to RequestBuffer. Request Buffer is used to buffer temporarily blocked Channel A requests, while allowing Channel A requests that meet release conditions/do not need blocking to enter the main pipeline first.

Expected Task Difficulty:4/10

Task 1.2: SinkC & ReleaseBuf Submodules

SinkC receives requests from Bus Channel C. ReleaseBuf is used to temporarily store data released by the upstream Cache or data to be released by L2.

Expected Task Difficulty:4/10

Task 1.3: GrantBuffer Submodule

GrantBuffer receives tasks from MainPipe and forwards them based on task type.

Expected Task Difficulty:4/10

Task 1.4: TXREQ, TXRSP & TXDAT Submodules

The TXREQ module receives requests destined for the REQ channel from both the MainPipe and MSHR modules, arbitrates between them, buffers them in a queue, and finally transmits them to the CHI TXREQ bus channel.

The TXRSP module receives requests destined for the SRSP channel from both the MainPipe and MSHR modules, arbitrates between them, buffers them in a queue, and finally transmits them to the CHI TXRSP bus channel.

The TXDAT module unconditionally receives requests destined for the WDAT channel from the MainPipe module, buffers them in a queue, and finally transmits them to the CHI TXDAT bus channel.

Expected Task Difficulty:3/10

Task 1.5: RXSNP, RXRSP & RXDAT Submodules

RXSNP processes requests from Bus Channel B, converts them into internal task format, and sends them to ReqArb.

The RXRSP module accepts data-less response messages from the RXRSP channel and directly delivers the responses to MSHRCtl, using the txnID in the message to identify the mshrID.

The RXDAT module accepts data-bearing response messages from the RXDAT channel, stores the data in RefillBuffer while simultaneously sending the responses to MSHRCtl, using the txnID to identify the mshrID.

Expected Task Difficulty:4/10

Task 1.6: MSHRCtl Submodule

The MSHRCtl module is primarily responsible for allocating MSHRs to requests, delivering information from RXRSP/RXDAT/SourceC channel controllers to target MSHRs, and arbitrating MSHR requests to dispatch them to corresponding TXREQ/TXDAT channel controllers or MainPipe (RequestArb).

The internal MSHR submodule allocates an MSHR entry when either:

  • An L2 cache miss occurs,
  • A cache hit requires meta permission changes

Each allocated MSHR:

  1. Records the request state
  2. Receives responses from channel controllers and directories
  3. Issues control requests based on current state
  4. Releases the MSHR after all control requests complete

Expected Task Difficulty:6/10

Task 1.7: RefillBuf Submodule

RefillBuf temporarily stores data returned from downstream caches.

Expected Task Difficulty:4/10

Task 1.8: RequestArb Submodule

RequestArb arbitrates which request may proceed to the main pipeline, permitting only one request per clock cycle.

Expected Task Difficulty:5/10

Task 1.9: MainPipe Submodule

The non-blocking main pipeline is responsible for receiving and processing all channel requests and MSHR requests related to read/write operations with the Directory, DataStorage, RefillBuf, and ReleaseBuf, while generating responses and allocating MSHRs.

Expected Task Difficulty:9/10

Task 1.10: Directory Submodule

The Directory checks whether the requested data block is stored in L2 (hit/miss) based on read requests. If a hit occurs, it returns the metadata information of that data block. If a miss occurs, it selects an invalid way or a way to be replaced, and returns the metadata information of that way’s data. After completing the request processing, it writes the new directory information into the Directory for updating.

Expected Task Difficulty:3/10

Task 1.11: DataStorage Submodule

The DataStorage module is responsible for the storage and read/write operations of cache data, implemented using single-port SRAM.

Expected Task Difficulty:3/10

Task 1.12: Prefetcher Submodule

The Prefetcher module receives prefetch training data from the L2 Cache and sends the training data to the BOP module for prefetch training. It contains three submodules: PrefetchReceiver, PrefetchQueue, and BOP (BestOffsetPrefetch).

The PrefetchReceiver module only regenerates signals that conform to the L2 Cache prefetch request format based on input L1 DCache prefetch requests (converting recv_addr to L2 prefetch request format), with no other functionality. The PrefetchQueue module instantiates a 16-entry (inflightEntries) register-based circular queue for buffering prefetch requests. The BOP module receives training data from the Prefetcher module to perform best-offset prefetching. The BOP module contains two submodules: RecentRequestTable (used to record recent memory access requests related to BOP prefetching) and OffsetScoreTable (used to train and record scores for various offsets to obtain the current best offset).

Expected Task Difficulty:8/10

Task 1.13: MMIOBridge Submodule

The MMIOBridge module receives MMIO requests from the core-side TileLink bus, converts them into ReadNoSnp/WriteNoSnp requests in CHI protocol, and returns responses via the TileLink bus Channel D upon completion of MMIO transactions.

Expected Task Difficulty:6/10

Task 1.14: LinkMonitor Submodule

The LinkMonitor module converts messages based on the Valid-Ready handshake protocol to the L-Credit based handshake protocol, while simultaneously maintaining the power states of both TX and RX directional links.

Expected Task Difficulty:5/10