Tiny-FPGA-BX-Game-SoC Pacman


Here is portable Pacman working:

There are a few changes in video_vga.v and ili9341.v to make this work, and here is the C code:


This has made my night! :). Well done!

The refresh rate looks awesome too! Is that using a serial connection?



No, it is using @fabien’s 1li9341 8-bit interface code. There is a possibility of making it twice as fast as there are idle cycles between the writes and I think it should be possible to get rid of those.

Are you getting anywhere with a version with more colours?


I’m just starting to put together some graphics and C code for a trial version right now… it’s pretty much as I described previously… 16-colour global palette (from 256 colours), 16x4-colour sub-palette entries, 80x32 tile map that’s infinitely scrollable, etc.

It’s tight with the verilog though. Really tight.

In order to get things to route in a semi-reasonable amount of time (or perhaps at all) I’ve disabled the gpio and i2c code. I suspect we might even get to a point where we need to group pins in the constraint file according to usage and locate them proximally on the floorplan so that we’re not routing signals further than we need to etc.

I’m currently keeping my fingers crossed that this is going to route so I can move on to debugging all of the bugs I’ve just added… :grimacing:

After packing:
IOs          22 / 63
GBs          0 / 8
  GB_IOs     0 / 8
LCs          7168 / 7680
  DFF        1977
  CARRY      1135
  DFF PASS   1381
BRAMs        32 / 32
WARMBOOTs    0 / 1
PLLs         0 / 1

  promoted CLK$2, 2073 / 2073
  promoted $abc$70264$n5, 966 / 966
  promoted $abc$70264$n4754, 66 / 66
  promoted $abc$70264$n4740, 62 / 62
  promoted $abc$70264$n4875, 56 / 56
  promoted $abc$70264$n4742, 52 / 53
  promoted $abc$70264$n3, 31 / 31
  promoted $abc$70264$n7, 20 / 20
  promoted 8 nets
    3 sr/we
    4 cen/wclke
    1 clk
  8 globals
    3 sr/we
    4 cen/wclke
    1 clk
  realized 0, 1
  initial wire length = 142257
  at iteration #50: temp = 14.3114, wire length = 130111
  at iteration #100: temp = 7.34662, wire length = 98011
  at iteration #150: temp = 3.58276, wire length = 68617
  at iteration #200: temp = 1.15914, wire length = 51073
  at iteration #250: temp = 0.000503927, wire length = 44604
  final wire length = 44268

After placement:
PIOs       24 / 63
PLBs       959 / 960
BRAMs      32 / 32

  place time 74.86s
  pass 1, 1820 shared.
  pass 2, 1608 shared.
  pass 3, 2003 shared.
  pass 4, 2281 shared.
  pass 5, 2359 shared.
  pass 6, 2482 shared.
  pass 7, 2659 shared.
  pass 8, 2748 shared.
  pass 9, 2752 shared.
  pass 10, 2674 shared.
  pass 11, 2524 shared.
  pass 12, 2368 shared.
  pass 13, 2270 shared.
  pass 14, 2143 shared.
  pass 15, 1977 shared.
  pass 16, 1865 shared.
  pass 17, 1762 shared.
  pass 18, 1732 shared.
  ... <yawn .. looks at watch..  gawd, it's almost midnight!> ...

… it’s taking ages though.

I know that some people are keen on simulators for building/developing more complex verilog. I wonder whether it’d even be possible to simulate something like this with the peripherals etc? Would it even help with the development cycle? I’m not so sure.

Anyway, I’ve got the mother-n-law over this weekend so I’m not sure how much I’ll get done.


… it’s still routing … not looking good… :confused:

  pass 331, 121 shared.
  pass 348, 145 shared.
  pass 349, 117 shared.
  pass 350, 133 shared.
  pass 351, 132 shared.
  pass 352, 111 shared.
  pass 353, 100 shared.
  pass 354, 96 shared.
  pass 355, 101 shared.
  pass 356, 119 shared.

ETA… boo :frowning:

fatal error: failed to route
../../hdl/tiny_soc.mk:8: recipe for target 'hardware.asc' failed
make: *** [hardware.asc] Error 1

I’ll push it back up the top of the hill and let it roll down again to see what happens.


There is a demo of the start of a Mario platformer as well. I don’t plan to do any more on this until @gundy manages to get me more colours, better scrolling etc:


And a Gameboy (or a Gateboy) needs Tetris:


I don’t plan to do any more on this until @gundy manages to get me more colours, better scrolling etc:

No pressure then, aye :smiley:

I burnt about 3 days figuring out that disabling DUAL_PORT_REGS actually breaks a lot of stuff… it seemed like it might be a good way to free up a BRAM or two and a few gates… but it actually causes the CPU to freeze up for some currently inexplicable reason… oh well… it probably wouldn’t have been such an issue if I hadn’t forgotten about the fact that I’d done it. LED debugging really sucks btw.

Anyway, I’ve just managed to verify that the 2-bit-per-pixel textures (256 of them) and the palettes seem to be working as expected:

Most peripherals are currently disabled. Also there are no sprites as yet. The sprites seemed to be what was taking the lion’s share of the gates, and causing me the most routing hassle, so I’m going to have a cuppa and do some pondering about ways to make them more efficient.

I’m still using VGA output (through a digilent PMOD connector) as up until now it’s been easiest for me:

… but I guess I should probably think about getting the LCD to work here now too.

Oh, another thing that I looked at was plotting the floormap of the FPGA and mapping where the TinyFPGA BX pins fall in relation to each other. For the most part it seems like Luke has done a pretty good job of keeping the left-hand pins on the BX board to the left-hand side of the floorplan, and vice-versa, and pins that are close together on the BX pinout are also reasonably close together on the floorplan. This seems like good knowledge to have when trying to minimise wire spans.


I do have multi-colour sprites working:

But I’m really not happy with how they work, and I’m having trouble justifying the gate complexity of the current implementation. It’s affecting the place-and-route, and it’s affecting our ability to get timing closure at 16MHz (I’m currently struggling to get much above 14MHz).

At the moment I’m instantiating a separate module/set of gates for each sprite (x8), and those modules are figuring out whether the raster is currently in the sprites bounding-box or not, and if so, what address in sprite memory should be used. These addresses are being used to drive the lookup in sprite RAM, which is laid out as 2-bits per address space (eg. each read only pulls back 2 bits for the current pixel location).

I think I’ll be better off changing things a little:

  • changing the sprite RAM to a 32-bit memory (32-bits is enough for a full 16-pixel line of a sprite at 2bpp)
  • moving to a state-machine where, in the horizontal refresh, I use a clock cycle for each sprite to check if the vertical position matches the sprite, and if so, load a full 32-bits worth of sprite data (enough for the line) from sprite RAM into a temporary storage shift register.
  • then render the sprites directly from these registers as the line is being rastered.

It’s going to be quite a big refactor though, so may take a while. Hopefully it’ll be worth it.



Good to see you neglected your mother-n-law and continued working on the 2bpp sprites :grinning:

I am looking forward to seeing it all come together. I am sure you will get there (despite the pressure).

One thing I have been contemplating is how to have a system with multiple games on it.

I was wondering about putting a menu system in front of the games, which selects a game and reads it from an SD card and then writes the images to flash memory and transfers control to them,

I have most of the SD card stuff working already on BlackSoC and the rest should be similar to what Luke’s boot loader does. If we kept the graphics simple, things should fit. We could use the SD card reader on the back of the LCD screen.

The flash memory already has Luke’s bootloader, a hardware.bin and firmware.bin. This would require it to have another copy of a hardware.bin and a firmware.bin.

Is there any reason why this could not work?


Haha… Perhaps I’m not up for the “best son-in-law” award this quarter, but she left happy so all’s well :slight_smile:

Re: the SD card… here’s a random question… SD cards use SPI for comms, right?

Could we treat it as just another flash ROM? If we were prepared to forgo any kind of filesystem (using a raw block based layout instead), it seems that we could perhaps use a lightly modified spimemio module to swap in different blocks from the SD card into address space, at, say, 0x00070000 onwards…

That way we could have the “boot” firmware at 0x50000, which displays a menu. If a game exists then it can be swapped in to address space at 0x70000 and we can just jump directly to it; if not, display a “sorry, no SD card available” prompt?

It’d be easy enough to modify the game code to work from 0x70000. It should be easy enough to create some hooks for adding a fixed offset / bank # to reads from to the SD card flash too, so that the memory at location 0x70000 in the SoC can be mapped to, say, 0x00000000, 0x01000000, 0x02000000 on the SD card (for games 1,2,3). At 16MB spacing between images, you’d be able to store 256 game images even on a small 4GB SD card.

If you were to run the games directly from SD, there’d be no need to “reflash” the FPGA. Games could exit back to the main screen just by jumping back to 0x50000.

I guess potentially we could even make the SD card address space block-writable so you can save games etc.

Of course, the problem would be we’d need a (hopefully small) tool to manage the games on a card. It could essentially just be a wrapper around something like ‘dd’, and maybe some sort of metadata for use with the menu system.



Who is going to write the 256 games? We need a bigger development team.


Yes, SD card use SPI, and I suspect what you are proposing would work, although I have not looked at the detail of spimemio.

However, there are some advantages of writing to the SPI flash. It would allow us to reconfigure the Ice40 with different versions of hardware.bin for different games, so, for example, we could support different graphics modes for different games like the SNES. Also, as you say, it would support easier SD card setup. I was considering using FAT32 with all the games in the root directory, but it there are lots of games, subdirectories corresponding to submenus would be possible.


However, there are some advantages of writing to the SPI flash. It would allow us to reconfigure the Ice40 with different versions of hardware.bin for different games, so, for example, we could support different graphics modes for different games like the SNES.

Good point, well made. The ability to customise the hardware for the game is a pretty unique selling point here… although it’s fun trying to build a CPU and peripherals that can “do it all”, it’s kind of clear to me now that there are limits to what can be made to fit, and if we can specialise the hardware for the application then it’s a big win :slight_smile: An example might be that you could use the device as a music creator/tracker, but in that case you’d probably want to pare back the video support to something completely minimal and use the extra gates/RAM for things like filters and wavetable synthesis :slight_smile:

This sounds like a great idea.


So… a bit of a progress update…

I reckon I’ve now figured out every possible weird and wonderful way that you can make sprites not work… it’s been a bit of an interesting exercise in figuring out how terrible I am at figuring logic out :slight_smile: but I finally got around to getting something up and running …

Current status (with VGA):

✓ sprites with 3-colour (+ transparency)
✓ 256 tiles with 4-colours (each screen location has 8-bits for tile index, and 4-bits index into sub-palette)
✓ palettes and sub-palettes (there’s a 16-colour global palette with each of the 16-colours chosen from 256 possible, and 16 4-colour palettes to be used for tiles & sprites, where each 4-colour swab is chosen from the global palette)
✓ sprites now support transparency (so you can see through one sprite to the other sprites & background below)
✓ sprites support being flipped along both x & y axis
✓ sprites can be moved smoothly off-screen (sprite X position 0 is fully off the left of the screen, 16 is fully visible).
✓ 64x32 infinitely scrollable tiled background
✓ background scrolling locked to vsync
✓ small (up to 4 lines) worth of windowed (non-scrolling) area for things like score
✕ Audio
✕ I2C

Next on my list is porting it all across to the LCD driver (I’ve got an ILI9341 here that I can put in parallel mode for testing), and then I have to see if I can fit any of the audio stuff back in :grimacing:

I keep telling myself to “keep it simple”, and then I keep thinking “screw it - I’ve saved a few gates; what else can I add?” :wink:



A suggestion / thought… What if the sprites actually just use the tile memory instead of their own dedicated memory? Could be a good saving to be had there? Sprite definition would then be just a list of x tiles instead…?


That’s definitely plausible - even more-so now that I’m reading sprite-data in during hsync rather than as the line is being rastered. Previously, by keeping the sprite and tile RAM separate, I was able to have data streaming in to the rasterer from both sprite and tile memory at the same time. Now the sprite data comes directly from cache registers so the separate RAM is a little less of a concern.

I understand the combined sprite/tile map is how the NES did things, and the talk that @Fabien posted previously about how NES games work covered some of that quite well. It was really cool to see some of the work that went into packing as much as possible into the small space available!

On the other hand, the NES had limitations that we don’t. We’re not actually very data constrained - we’ve got multiple hundreds of kilobytes of flash that we can use for storing sprites (and textures), and swapping out sprites as required with new ones from ROM isn’t particularly difficult.

Combining the sprite and tile RAM also has the potential to complicate the sprite lookup logic - essentially adding another layer of indirection (which, itself needs to be stored somewhere). I’m not against the idea, but I’d want to make sure that the extra complexity in terms of gates etc was going to be worth it first :slight_smile: If we ever get to the point where we need a spare block of RAM for something important then it’ll definitely be worth looking into :slight_smile:



I am starting to look at what is needed for an SD card menu. As a starting point I got the SPI OLED code working as it uses the same SPI master code that I plan to use. It was nearly working but the mechanism that holds off setting iomem_ready until the SPI transfer is complete was not quite working as we didn’t need it for anything else. The SPI operations need to be synchronous, stalling the processor until they complete, or the driving code would need to change a lot. Anyway that is now working in my fork.

So I plan to now get the SD master code working the same way and use the C code for the icotools project to drive the SD card. I have code that reads a FAT32 file system.

The issue that I an now is that we are running out of easily accessible pins in the LCD version. There are 24 easily accessible pins. We use 2 for the uart, 2 for audio, 14 for the LCD and at least 6 for buttons, which already adds up to 24. 4 pins are needed for the SD card, and I would like to use the one at the back of the LCD display.

As @Fabien points out, one of the LCD pins (read edge) is not needed. We could also at a pinch omit the LCD backlight, by just connecting it to the 3.3v pin. We could stop using the uart RX pin, as there is not much likely use for that. We could connect the buttons in a grid rather than having a pin for each button, so six pins could handle 8 buttons. But I think we need 8 buttons, so at least 6 pins are needed for them. So we are still at least one pin short.

We could use the pins on that back of the board but that either means soldering wires to them or using pogo pins on the PCB that Fabien is making.

Any suggestions on the best solution to this?


You also don’t need the LCD chip-select line - it’s always set to 0

   assign ncs = 0;


So I’ve got my LCD basically working here now too…

It took quite a lot of head-scratching to get there…

I bet you guys thought you could trick me with your fancy flipping of the data-bus:

.dout({lcd_D0, lcd_D1, lcd_D2, lcd_D3, lcd_D4, lcd_D5, lcd_D6, lcd_D7});

… and other such wizardry :wink:

On the upside, after the debugging that was required to figure that out, I now know far more about the ILI9341 than I ever wanted to :).

I’ve made a few changes along the way. For example, I’ve cut back the init sequence in my version to pretty much what the ADAFruit library does… which is approx 1/4 the length of the original sequence. The code that Fabian had seemed to send a number of commands that, as far as I could tell, weren’t even valid ILI9341 commands, and did some stuff that was IMO unnecessary (eg. setting gamma curves).

Some of the timings were off too - mainly because the “delay counter” register that was used wasn’t wide enough to accommodate some of the delays (eg. a 120ms delay at 16MHz requires at least a 21-bit counter, but the counter in use was only 20 bits - I extended it a bit further to support the 500ms delay that the adafruit library uses).

I’m also using nextpnr for the place-and-route side of things now. After trialling it for a while, I’ve been pretty happy with the results.

It’s time to go and get some vitamin D. Good luck with the SD card support!!! :slight_smile: