tinysys part five: video scan-out from SDRAM

Recap of arbitration

As you will recall from part four, we have an arbiter that sits between SDRAM and devices that require access to it, serving read and write request separately, and providing somewhat fair share of the access.

We can visualize that mechanism as a diagram like so:

Thanks to the arbiter, the two CPUs (their data and instruction caches), the DMA, and the video processing unit are all talking to the memory, ideally without blocking each other from doing important work.

The green block marked ‘scan-out’ is what we’ll focus on in this part, and see what we can do to make sure we wrap up quickly without interrupting the CPU or DMA access too much.

Scan-out hardware

The scan-out hardware is the unit that’s responsible for converting the values in video memory to a coherent image on screen.

In other words, this unit does continuous read access to memory, just in time to catch each scanline in process so it can provide a stable image.

The scan-out is done using a regular 640×480 video signal with 12bpp color data, which is fed to an external video chip. This chip is a Lattice SII164CTG64 module which provides a stable DVI 1.0 compatible output signal which is then fed to a compatible LCD panel over an HDMI connector (note that the actual signal itself is not a full DVI signal so there’s no audio channel)

To not over-tax the arbiter, the VPU employs an intermediate cache, which I call the ‘scanline cache’. The way this works is as follows, given the ‘video output enable’ register allows for video output:

1) The VPU scan-out logic will wait until the vertical cursor (Y coordinate) reaches 524 lines
2) If we're past scanline 524, and are at pixel 640 (X coordinate) we kick work to fill the scanline cache
3) The VPU then loops until it does 20, 40 or 80 burst reads to fill the cache depending on video width and color mode
4) The VPU will then loop around and keep filling the same cache on each new end-of-row
5) Once scanline reaches 479, we stop and go to step one

The scanline 524 is actually the last scanline of the video mode 640×480. Since we wish to be one scanline early and catch the first displayable scanline, this works to our advantage.

The VPU employs a trick in 320×240 pixel modes where it displays the scanline cache twice (it’s actually still in 640×480 mode during this), which means we modify the above logic to work during odd scanlines only to save bandwidth.

Once we have arbitration rights, a 20, 40 or 80 burst read from SDRAM is quite fast, so we’ll be done within 5 to 10 video pixel’s worth of clock times, so the scanline cache can emit pixels at its own pace without further taxing the memory.

This sort of looks like the following in VGA timing terms:

The red region at the end of the visible scanline is where the 20, 40 or 80 burst reads take place. As you can see from this the rest of the time we’re only servicing I$/D$ requests which are only 4-burst wide (that is, 512 bits) and only if we miss the cache.

The chances that I$/D$ or DMA access coinciding with these reads are minimal, however it does still happen, so imagine that red strip as a bit of a wavy line instead of a straight one. Since we have 800 pixels out of 640 we have 160 video pixel’s worth of time (at 25MHz) to finish our work, therefore this still works quite nicely without a visible delay.

Preparing for scan-out

The VPU has to spin quickly over scanlines without missing a beat so we prepare most of the data ahead of time for this purpose. We need to store the last scanline, adjusted for video mode height (240 or 480), the burst read width minus one, and some other flags that will come in handy during this operation.

// We need to start reading at 523 or 524 depending on 240 or 480 scanlines
lastscanline <= mode[1] ? 10'd524 : 10'd523;

// Set up burst count to 20 / 40 / 80 depending on video mode
unique case ({mode[2], mode[1]})
	2'b00: burstlen <= 'd19;	// 320*240 8bpp
	2'b01: burstlen <= 'd39;	// 640*480 8bpp
	2'b10: burstlen <= 'd39;	// 320*240 16bpp
	2'b11: burstlen <= 'd79;	// 640*480 16bpp
endcase

Please see the vpu.sv source code to see how the full code for the output state machine.

What’s next?

This goes into some surface detail on how the video scan-out logic works. I will still recommend reading the source code in the GitHub repo to make more sense of it.

Next time we’ll go back to visit the fetch unit and go into a bit more detail.