RISC-V CPU & CUSTOM GPU ON AN FPGA PART 11 – PROGRAMMING FOR NEKOICHI

BEFORE WE BEGIN

In this part we’ll talk about RISC-V compile toolchain we’ll be using, as well as upload a pre-existing terminal application to NekoIchi to interact with it.

But before we go further, and since the files are getting too large to post here, we need to grab them from a git repository. Please head over to https://github.com/ecilasun/nekoichiarticle and use the following command to pull the files for all parts of the series:

git clone https://github.com/ecilasun/nekoichiarticle.git

after creating and changing to a folder of your preference with sufficient free disk space, since this is where we’ll be working on all parts of the series from now on.

NOTE: A small change done starting in this part to the project brings us a nicely arranged include file, namely cpuops.vh, which contains all the instruction groups and CPU states so we can refer to them from the decoder and the cpu itself without having to pollute our code with inline defines.

Building the RISC-V tool chain

I would ideally not recommend to build your own compiler tool chain from scratch, but to get the compiler to produce the library variations we will need through this series, we’ll need to do that. It is not as scary as it sounds.

We will be specifically working with the riscv-gnu-toolchain, therefore we need to grab its source code from its git repository. Let’s get started

Step one: Install the prerequisites
If you’re on Ubuntu 20.20.* as was listed in the first article, use the following command to install the prerequisites first

sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev

Step two: Pull the source code from github
Make sure you have about 7 Gigabytes of disk space available, and run the following command from a directory of your choice:

git clone https://github.com/riscv/riscv-gnu-toolchain

Step three: Build
Run the following command sequence to start building the RISC-V toolchain. This will build the compiler and tool binaries as well as libraries that go with each architecture variation of the device we’ll build. As you may notice, none of the architectures include atomic instructions or double precision float as we don’t need them. The only ones we’re concerned about initially are the rv32i and rv32im variants, with the F extension added in for later versions. Also worth mentioning is that NekoIchi doesn’t support compressed instsructions to keep circuitry and decoder simple, so those are omitted in our variation.

cd riscv-gnu-toolchain/
make clean
./configure --prefix=/opt/riscv --with-multilib-generator="rv32i-ilp32--;rv32im-ilp32--;rv32imf-ilp32f--"
make

For a full list of instructions, please refer to https://github.com/riscv/riscv-gnu-toolchain

Building the uploader tool: riscvtool

For the next step, we’ll be needing our ELF binary uploader tool, which also comes with some samples to test our device. The BIOS file we have sits on the UART port waiting for incoming executable packages to arrive, writes them to the required memory address, and branches to the entry point. We will need riscvtool to send the files for us.

Step one: Run the following on the command line to pull riscvtool project:

git clone https://github.com/ecilasun/riscvtool.git

Step two: Compile riscvtool
Use the following command sequence to produce the riscvtool binary in ./build/release directory

cd riscvtool
python waf --out='build/release' configure
python waf build -v
# Use this to 'clean' the build if you need to rebuild from scratch
# python waf clean

Step three: Build the RISC-V examples and the ROM images
Simply run the supplied batch file to generate all example code and the ROM images

build.sh

Making NekoIchi persistent on Arty

If you recall, we have to first upload our design to the Arty board to test things, and if something goes wrong or hangs, we’ll have to re-do that step during software development. This is not really convenient, so we need to make a few changes to the previous project files.

Go to the part11 folder where you’ll find a slightly different version of the same project files from part10, with one minor change: we now have a device programming file generated. This is done by using the -bin_file setting in the Implementation setting panel under Write Bitstream section, which you can access by clicking the gear icon in the toolbar. Since this is already done ahead of time, all we need to do right now is to click Generate Bitstream (which will synthesize and implement the design) and wait a few minutes.

After the sythesis/imlementation/bitstream generation steps are complete, select Open Hardware Manager when asked to.

To be able to upload persistent data to the Arty board that can run every time we power the board, we’ll need this generated .bin file:

.../part11/nekotutorial.runs/impl_1/nekotop.bin

At this point, if you do not see a configuration device called ‘s25fl128sxxxxxx0-spi-x1_x2_x4’ in Hardware Manager, we’ll need to add one to be able to upload this binary file. To do this, in the Hardware Manager, right click on the xc7a100t part and select Add Configuration Memory Device… Then type this part number into the search box and select it from the list, and hit OK.

s25fl128sxxxxxx0-spi-x1_x2_x4

Please note that if your Arty A7-100 board is a different revision, the configuration device might be slightly different. Please check the Arty A7-100t reference manual for any changes/updates to the Quad-SPI Flash part number.

Once we have the part in place, you can either immediately answer this dialog box to upload or choose Program Configuration Memory Device… by right clicking on the s25fl128 part in the Device Manager

In either case, you’ll be presented with the following dialog, where we’ll navigate to the generated binary file mentioned earlier:

Once you select OK in this dialog box, the board’s Quad-SPI Flash device will be programmed with the permanent version of NekoIchi. This means any time you hit the device reconfiguration button, you’ll see NekoIchi boot instantaneously on the UART port. This will make repeated testing much easier before we get our debugging hardware aid and software ready.

NOTE: Make sure to reboot the device either by unplugging-replugging the USB cable or by using the upper-left-hand corner reprogram / reset switch. Otherwise, post-programming, your board will sit in limbo, not doing much.

Uploading our first test: miniterm

If you’ve run the build.sh script, you’ll now have a few RISC-V .elf files on your hard drive in the riscvtool root directory. One of these is called miniterm.elf.

Make sure you’ve compiled riscvtool first, and also make sure NekoIchi is connected and programmed on the Arty board. We’ll need to start a terminal program (See previous part) attached to the USB port of our Arty board to act as a monitor to observe the loading process and interact with the loaded program.

After riscvtool and examples are built, the terminal is ready and connected, and Arty is running NekoIchi, run the following command:

# Replace /dev/ttyUSB5 with the port your Arty board is connected to
./build/release/riscvtool miniterm.elf -sendelf 0x10000 /dev/ttyUSB5

This instructs riscvtool to upload the miniterm.elf at base address 0x10000 on the given USB port. During this process you should be presented with an output similar to this if everything goes well:

Program PADDR 0x00010000 relocated to 0x00010000
Executable entry point is at 0x0001067C (new relative entry point: 0x0001067C)
Sending ELF binary over COM4 @115200 bps
SEND: '.text' @0x00010074 len:00004F64 off:00000074...done (0x00004F62+0xC bytes written)
SEND: '.rodata' @0x00014FD8 len:00001C20 off:00004FD8...done (0x00001C1E+0xC bytes written)
SEND: '.eh_frame' @0x00017BF8 len:0000071C off:00006BF8...done (0x0000071A+0xC bytes written)
SEND: '.init_array' @0x00018314 len:00000008 off:00007314...done (0x00000006+0xC bytes written)
SEND: '.fini_array' @0x0001831C len:00000004 off:0000731C...done (0x00000002+0xC bytes written)
SEND: '.data' @0x00018320 len:00000870 off:00007320...done (0x0000086E+0xC bytes written)
SEND: '.got' @0x00018B90 len:00000030 off:00007B90...done (0x0000002E+0xC bytes written)
SEND: '.sdata' @0x00018BC0 len:00000010 off:00007BC0...done (0x0000000E+0xC bytes written)
SEND: '.sbss' @0x00018BD0 len:00000014 off:00007BD0...done (0x00000012+0xC bytes written)
SKIP: '.bss' @0x00018BE4 len:0000037C off:00007BD0
Branching to 0x0001067C

If instead you’re presented with an error similar to the following:

Program PADDR 0x00010000 relocated to 0x00010000
Executable entry point is at 0x0001067C (new relative entry point: 0x0001067C)
Error 2 from open: No such file or directory

it means you’ve used the wrong ttyUSB, make sure to check through the USB devices on your computer and select the right one, using this command to list the available ttyUSB ports:

dmesg | grep ttyUSB

If everything went well, the terminal window now shows this:

Make sure the terminal window is active, and type ‘help’ (without the quotes) and hit enter. You should be presented with the help output:

And to test further, we can use the dump command to see the first 256 bytes starting at binary load address, 0x10000:

This means all of our CPU parts are working as expected, and the UART communication is working as well.

You may notice we have a couple commands in there: dir and load. These are not functional yet, and if you use them you should be presented with an error message (not ready) as the SDCard reader (SPI interface) is not yet built.

P.S: When uploading other modules, take care: not all of them will work at this point without the math/floating point device and GPU. The only one that apparently works with the limited subset we’ve implemented so far is the miniterm app, so please try to stay within the confines of UART communications until the next articles.

Building a new program for NekoIchi

At this point it’s worth mentioning that NekoIchi aims to be easy to use, so its build process for regular ELF binaries is pretty straightforward. Notice the pattern in all applications produced by the build.sh script:

riscv64-unknown-elf-g++ -o ($APPNAME).elf test/($APPSOURCE).cpp test/($UTILITIES).cpp -fno-builtin -mcmodel=medany -std=c++11 -Wall -Ofast -march=rv32imf -mabi=ilp32f -ffunction-sections -fdata-sections -Wl,-gc-sections -fPIC -lgcc -lm

# $APPNAME: name of the output ELF
# $APPSOURCE: samples are usually made of one .cpp file with this name
# $UTILITIES: these are the individual libraries we'll be developing as we go, such as the utils library, console library, SDCARD/FAT libraries we use from external sources, our diskio interface, and anything else from any third party code

Simply conforming to the same pattern will let you experiment with NekoIchi until we jump into our next article and start looking at video output and the GPU. Go ahead and do anything you think possible using the UART port read/writes and see if you understand the current (and temporary) limitations of NekoIchi, for instance the lack of debugger support (which we’ll add in upcoming parts)

Looking at the compiler options, perhaps the most important bit here is the -march=rv32imf and -mabi=ilp32f options. NekoIchi is going to be a rv32imf variant of RISC-V in the end, therefore we aim to generate machine code with suitable hardware instructions for this architecture, since we don’t want to run software routes for a simple float division. But for now, note that if you are experimenting with float math, you’ll need to make sure to use these options until we get all devices up and running:

# Limited integer math / float functionality until we get the M and F extensions ready
-march=rv32i -mabi=ilp32

The -ffunction-sections -fdata-sections and -Wl,-gc-sections tell the compiler to generate sections per function and per data blob, so that the linker instruction -gc can garbage collect unused function and data sections from the final binary, reducing its size.

The -mcmodel=medany tells the compiler that our code and data live in the same 2Gb segment, since NekoIchi is an architecture with a small memory space.

And of course, we wish to run a somewhat proper C++ on our device, so we use the -std=c++11 flag for our binaries.

For further details on more RISC-V compiler options, please refer to this online gcc manual page.

Next time

This concludes part 11. In the next part, we’ll start adding the most fun bit: our framebuffer architecture, our simple GPU, and attach it to our device router to see some output. For this part, you’ll be needing the 1BitSquared Digital Video Interface PMOD which is the default video output device used in NekoIchi source code. You could choose to go with Digilent’s VGA PMOD, use a different board with built-in HDMI (such as the Arty Z7) or not use video output at all, but at this point I have capacity to support only the default video inteface. As NekoIchi evolves, we might look into supporting more video output options.