RISC-V CPU & CUSTOM GPU ON AN FPGA PART 5 – OUR FIRST CPU STATE MACHINE

Welcome to part 5 of the series. In this one we’ll be looking at making a state machine, and run a simulation of it to make sure it works.

RISC-V CPU States

The RISC-V CPU has a very interesting model which makes implementing it quite easy. Memory access is limited to load and store instructions, and all the other instructions can only operate on registers. This is great, since when we’re building a state machine we do not wish to hit any conflicting memory access patters that might interfere with instruction fetching and/or memory read/write operations.

For the instruction fetching part, we have another ease at our disposal. Since we’re not implementing the compressed instruction set, the instruction fetch offset (therefore memory address increment for instruction reads) is always a multiple of 4 bytes.

However, some instructions (besides memory access) have delays and take several cycles to complete, which need to stall the CPU during their execution.

Let’s start by looking at the core of the steps necessary to run one instruction given these constraints.

  • FETCH
  • DECODE
  • STALL
  • EXEC
  • LOAD
  • STORE
  • RETIRE

The CPU needs to only travel across only a few of these since an instruction won’t load and store and stall for math operations simultaneously. Since the instructions are ‘reduced’, they’re much simpler, and we’ll most often end up travelling across a subset, for instance FETCH/DECODE/EXEC/LOAD/RETIRE.

In reality, there will be many more states in between, specialized to what we’re waiting for or what we’re storing/loading, but this is essentially the core of it.

Our First State Machine

Now let’s try to model this in our project from previous tutorial step. We first need to create a new SystemVerilog design source and name it riscvcpu, as with the previous tutorial. The contents of this file will be a template for now:

// riscvcpu.sv
`timescale 1ns / 1ps

module riscvcpu(
	input clock,
	input reset,
	output logic [3:0] diagnosis = 4'b0000);

// Number of bits for the one-hot encoded CPU state
`define CPUSTAGECOUNT           4

// Bit indices for one-hot encoded CPU state
`define CPUFETCH		0
`define CPUDECODE		1
`define CPUEXEC			2
`define CPURETIREINSTRUCTION	3

logic [`CPUSTAGECOUNT-1:0] cpustate = 0;

always @(posedge clock) begin
	if (reset) begin
		//
	end else begin

		// Clear the state bits for next clock
		cpustate <= 4'b0000;

		// Selected state can now set the bit for the
		// next state for the next clock, which will
		// override the above zero-set.
		case (1'b1)
			cpustate[`CPUFETCH]: begin
				// Turn LED off on instruction start
				diagnosis[0] <= 1'b0;
				cpustate[`CPUDECODE] <= 1'b1;
			end
			cpustate[`CPUDECODE]: begin
				// TODO:
				cpustate[`CPUEXEC] <= 1'b1;
			end
			cpustate[`CPUEXEC]: begin
				// TODO:
				cpustate[`CPURETIREINSTRUCTION] <= 1'b1;
			end
			cpustate[`CPURETIREINSTRUCTION]: begin
				// Turn the LED on when retired
				diagnosis[0] <= 1'b1;
				cpustate[`CPUFETCH] <= 1'b1;
			end
		endcase
	end
end

endmodule

The above code might be a little awkward for software programmers, especially the strange case(1’b1) statement. To put it shortly, this is a way to encode a one-hot encoding, where we’re testing the ‘1’ in the case statement to the other mutually exclusive bits in our cpustate bits. For instance, cpustate[`CPUFETCH] ties to bit 0 of the cpu state, and if that bit is a 1, the case statement will pick that path. Please refer to this link regarding one-hot encoding and how it works.

And we need to instantiate this in our top module, which we’ll change to the following

// nekotop.sv
`timescale 1ns / 1ps

module nekotop(
	// Input clock
	input CLK_I,
	// Reset on lower panel, rightmost button
	input RST_I,
	// 4 monochrome LEDs
	output [3:0] led
);

wire [3:0] diagnosis;

riscvcpu mycpu( .clock(CLK_I), .reset(RST_I), .diagnosis(diagnosis) );

assign led = diagnosis;

endmodule

Simulating the design

So far we’ve been directly testing things on the device, since the design was so simple it would not be that hard to find where a mistake would lie. As the design grows, though, we’ll need to start using the simulation tools to catch any errors before we go ahead and spend time synthesizing them.

To get started, head over to the Project Manager panel and hit Run Simulation button, and select Run Behavioral Simulation.

Since we have not written any special testbed code, the default simulation setup won’t have the reset and clock signals set up correctly. To fix this, right click on the RST_I on the Objects panel, choose Force Constant, type in a zero and hit OK. For the CLK_I, right click and select Force Clock, then use 0 for leading edge, 1 on trailing edge, and 10ns for the Period, and hit OK.

After this, select the mycpu entry from Scope panel, find the ‘cpustate’ value in Objects panel and drag in onto the simulation panel, under diagnosis.

At this point, we can now use the Run All button in the toolbar (or use F3) to run the simulation. And don’t forget to stop it using the Break button after a few seconds (F5). This will yield an image as follows, which you can zoom in to and out of using CTRL+mouse wheel:

As we can probably tell from the above image, we see the cpustate set bit0, bit1, bit2, bit3 then again bit0. One thing that we can now see here (as mentioned before in previous parts of this series) is that the diagnosis state is set when bit 3 is set (cpustate==8), but the value will be visible on the next clock (cpustate==1). This behavior is why we sometimes need the continuous assignment statements where required.

Another thing of note is that the led state, using the assign statement, is set to the value of diagnosis as soon as it changes, since they’re wired together without any registers / storage / clocking delaying the value transfer.

This concludes part 5. In the next part we’ll flesh out our device more and look at instruction fetching.