tinysys part six: fetch!
Fetch, and more
Initial design goal was not to have a split fetch unit that does more than what fetch is supposed to do. The fetch unit in tinysys currently handles the following:
- Fetch instructions
- Decode instructions
- Insert a sequence of interrupt handler header/footer on IRQ or ecall/ebreak
- Handle CPU reset requests (in coordination with the CSR unit)
- Wait for interrupts (yes, WFI runs here!)
- Wait for branch address resolution by execute unit
- Handle JAL instruction (simply move to new PC and resume fetching)
- Handle and wait for instruction cache flush
Upon reading an instruction, it decoded and injected into a FIFO that the execute logic will pull from, which lets the fetch unit keep running independently. For any IRQs / ecall / ebreak, the header and footer code lives in a small ROM and will be decoded/injected without having to bother the execute unit, which allows for faster interrupt handling.
Why decode in fetch?
Typically, most tutorials will tell you that the first two pipeline stages are fetch, then decode. This is great for most designs however tinysys requires slightly different behavior where the PC and decoded instruction are packed together so that execute doesn’t have to bother having any connection to the fetch unit apart from the FIFO (and a branch decoded signal back)
This is done by utilizing a wide FIFO that stores the PC, decoded instruction, immediate and any other relevant information that forms a fully self-contained packet as shown below:
ififodin <= {
csroffset,
sysop, func3, func7,
selectimmedasrval2,
instrOneHotOut,
bluop, aluop,
rs1, rs2, rs3, rd,
immed, prevPC};
That’s 132 bits of information, and it might look like it’s very odd to do this, however if you check the ‘IRQ’ entry and exit handlers, you’ll notice that we do the same thing there, injecting 1 fully self-contained instruction every 2 clocks into the FIFO.
Another reason we decode here is that we’d like to do special handling of certain instructions, such as cache discard or WFI, or detect illegal instructions ahead of time. While we’re at it, we don’t stop at the base instruction code (lower 7 bits) but go ahead and generate our sign extended immediate as well, so that we can for example handle JAL instruction entirely in the fetch unit itself and directly go to the next address to fetch from.
Why is WFI in the fetch unit?
One oddity of tinysys is in the way that it handles WFI (wait for interrupt)
Since the fetch unit handles interrupts themselves, it makes sense to know that we’re waiting for them “in the future” and resume fetching entirely in this unit, instead of having to talk back and forth between fetch and execute.
WFI is implemented in such a way that it’ll briefly stall, for 16 clocks or until an interrupt occurs, and resume. This lets the execute unit to keep going ahead of us and execute whatever was left in the instruction FIFO and hopefully never stall and resume from an interrupt event, all without delay.
Future
The current set of features embedded into the fetch unit will be expanded to include a few more tricks in the future, for instance branch prediction. As the execute unit is a few clocks behind, we’d like to handle non-stop fetching here and let execute ‘reset’ our instruction FIFO if we were wrong (introducing only a few clocks of delay for a branch misspredict)
Another useful way to use early decoding in the fetch unit is to route to a debug module and inject instructions coming from that unit instead of memory to implement debug aid without touching the instruction memory or I$.
Next time
Now that you know my few reasons for putting more work onto the fetch unit, we can move onto some more interesting subjects, such as plans for a new GPU. Until then, do not leave this stage’s work to the next.