USB Communication



So for what it’s worth, I seem to have got a version of the USB code to be more stable under P&R. Not saying it’s perfect, but definitely more stable (I have yet to make a bad run of it, but that doesn’t mean it’s perfect). Here’s an update:

I’m not entirely pleased with it, however. What I did was set about trying to get rid of latches in the synthesized logic. I haven’t removed them all, but I did end up specializing the arbitration logic and this both improved the frequency estimation from nextpnr and led to stable bitstreams from arachne.

Note that nextpnr is not yet generating a functional bitstream. Also note that there are /still/ latches being generated around the arbiters, however this is due to how they are being used (I believe). If I have some time, or if the instability re-appears, I’ll dig into it more and try and get rid of more latches in the design.



Another update on this. I updated the usb1.tar with my latest, where I went through and changed everything so verilator lint would not complain anymore. /Unfortunately/ I spoke too soon – it continues to be unreliable and depends on what random seed is provided to the P&R tool. I’m going to keep hacking on it. Yosys is still reporting some latches being generated around the arbiter. next-pnr reports a clock rate (27Mhz) below the called for frequency. So there’s plenty of places still to look for removing the instability…

Thanks for the link to the other USB core written in Migen. I’m deep enough down the rabbit hole of this one that I’m going to keep plunking away at it removing the various bobbles tools report… at least for now.


Very good. I will wait a little while to see if you make more progress before I try to use it. I might have to learn Migen to try the new implementation sometime but it would be good to see this Verilog one improved.


I’ve been plodding along with this and I’ve got a bit further with it.

Re: latches, etc., there’s a Pull Request on the Bootloader repo that seems to straighten out a lot of the yosys issues -

Re: Arachne, etc. I was surprised to learn that layout is not influenced by timing concerns. When I heard that I switched immediately to NextPNR (which lays out in such a way as to minimize time delays!) and have had much better results.

I have given the UART a pipeline-style frontend and am just doing some more testing on it. All looks very good, although inherent in the USB design seems to be a limit to a maximum of 32 (if memory serves) bytes per transfer. This was good for bootloading but may cause problems in other areas. Has anyone else run into this? Is there a simple fix? My lack of USB internal skills is embarrassing!


Have you done any more on this? I tried your usb directory from the usb1.tar file and it worked for my Hello World example but did not seem to receive data in the uart and tones examples.


Do you have your code anywhere that I could try it.

I also see on Twitter that Tramell Hudson has been doing a lot of work on the bootloader code to make it work on the Fomu board.

USB ACM FIFO and RISCV core demo

I’ve put my work into a repo here -

Since changing to NextPNR and making a few other tweaks, I have built several miniprojects with this code and have not seen the dreaded “module doesn’t boot” or “module doesn’t create a USB device” problems that the arachne pnr tool presented. Fingers crossed.

As you can read in the repo’s readme, there is one major bug remaining (that I can see) and one major issue.

The bug is that the USB side of the implementation doesn’t like transfers of more than 32bytes. Try to send longer messages and the interface locks up (although it does repair itself for the next call). For the original implementation there was no need for longer transfers, and I’m guessing this may be rather a simple fix, however I’m very inexperienced with USB internals, so I wasn’t able to immediately see what was wrong.

The major issue is that for my needs I replaced the original front end with a streaming pipeline front end for maximum transfer speed (one transfer per clock). This may not suit all users because implementing these streaming pipelines is a head-exploding nightmare. Feel free to create an alternative. I realize that a better approach would have been to just work with the existing repo, keeping the existing interface then switching to a pipeline, but at the time I really thought it wasn’t going to work, and I wanted to get some pipeline practice so we end up here.

Finally, I am far from an expert Verilogger (or anything else FPGA), so please, any comments or criticisms will be happily received.


Hi Lawrie-

I don’t have any useful update yet. I did a lot of hacking on it to get it to pass timing at 48Mhz with nextpnr. But in that process I also broke something along the way where it now works for Mac OS X but not Windows. I did observe that the inconsistencies in P&R largely, but not entirely went away the closer and closer I got to meeting timing.

I’m going to pick up David’s latest patch he just posted and see how that works for me. If it’s stable I might just run with that for my class and abandon my efforts. I’m teaching a hardware design class with TinyFPGA BX, so if it’s stable, we’ll know because about 15-20 design teams will be using it… will let you know…



Hi David-

Thank you so much for the code. I put it to work here on my stuff for my class and so far it’s awesome. It’s smaller and I have yet to get a bad P&R result. No more -s flag and hope randomness (so far). I’m having all the students update their projects to it. We’ll see how it does on a wider sample here shortly.

Lawrie: I’m going to abandon my code and stick with this one for now. If I get a spare moment I might make sure it passes verilator lint, but that’s not going to change any of the functionality / timing of the design.

Thanks all for your help!



You are welcome! Hopefully one of your smart students can fix the 32 byte limit bug!


Hey Mark… If you or your students come up with any interesting examples, perhaps you would consider adding them to the repo? I’m (possibly obviously) paranoid that the pipeline interface is not convenient to use. Let me know!


Unfortunately, the class is a basic build your 5 stage pipelined CPU. The USB UART is there to just pump bytes back to their machines for debugging. I wrapped the interface up for them and push bytes back to some python code running on their laptops that gives them a cycle by cycle display of the action inside their CPU.

So… probably not much in the way of cool examples will come out of it :frowning: I did observe the 32 byte bug you mentioned when I was initially integrating it. But because I’m using python to grab bytes off the USB port and print them to the screen, I need to slow down the rate of transmission significantly anyway. The debugger gave the students sends only a couple hundred bytes a second back to the students for now. (When they get all done I may up the clock rate on their designs…)

The interface itself looks great to me. I have some other research projects going on that we may use it in. If anything cool comes out of that in the next year I’ll let you know.

Thanks again!


For anyone using the pipelined USB UART, I fixed a few nasty bugs, including the previous 32 character packet size limit. So now it seems to work pretty well even on long contiguous messages. There was a back pressure bug on the input that got fixed too.

It still has the pipeline interface… (-:,


Does the work being done in this topic allow the TinyFPGA BX to have the same USB serial capability that I would get if I hooked up an FT232R to it? Usually I use FPGA boards as general purpose GPIO -> usb instruments and I can get a 3Mbaud comport with an FT232R on the board. Does the …usbserial repo provide same or similar functionality?


The intention is to gain access to the USB port in regular designs for GPIO and other needs. It seemed so sad to not be able to use the USB port in this way…

The resulting port is only a USB 1.0 Full Speed port, if I understand correctly, so 1.5MB/s is the theoretical limit. I have yet to push it really hard, but the bootloader (upon which the serial code is based) is pretty quick and reports 145kB/s - and it has other stuff to do beyond just transferring.

One other thing to know, is that it has been working in other weird clock setups. For example, one of my examples seems to run well when I set the (solitary) PLL to 96 or 192MHz and just divider the clock down to 48MHz. This gives some flexibility to do data acquisition at the higher frequency, as long as you’re quick.

@lawrie.griffiths, the original adapter of the bootloader to the USB SERIAL context may have other thoughts too.


Full speed is actually 12mbit/second. That’s the speed the USB bootloader operates at. It is actually limited by the write speed of the SPI flash itself. Using the USB core without the SPI interface should be able to get good performance.


I have a board design with a parallel ADC interface that pushes data out at a rate of 70mbit/s. I can divorce this board from it’s DSP companion to capture raw data in alternative ways. I like the idea of trying the tinyBX to do this because it is small and lightweight. Is the BX hardware capable of USB2.0 ‘high speed’ or a limit between ‘full speed’ and high? If it is capable of higher speed, then would a solution be possible with a python interface? I understand that python isn’t optimal for this type of application but I would only need 10~100ms blocks of data at a time.


I think 12Mbps (1.5MBps) is likely to be the upper limit for the USB system on the board as it is.

If you can do some processing on the board that results in less data you might be OK. Is there some filtering needed? Could you do a 1bit-style adaptation?

You could strap on one of the USB 2.0 FTDI modules in FIFO mode (480Mbps) but then, maybe you’d be better off connecting the FTDI and the ADC directly.


it’s 40bit parallel interface from three ADCs all synchronized to one ADC clock and then framed into 4k size packets that are synchronized to a framing clock. Which brings us back to hooking the ADC interface to an FPGA then hooking the FPGA to a USB PHY.


I am wondering how hard it could be to interface as a low-speed USB HID to the PC. Basically I am wondering if I can build a “smart keyboard” (when plugged in it is a keyboard, otherwise a mobile computer) with the TinyFPGA at its main soul.

It seems to be doable as there are AVRs that do it completely in software, but trying to read the USB HID docs is quite challenging at first sight for a beginner.

I figure that it seems not that hard to implement a PS/2 even as a beginner and maybe use a cheap 1eur converter, but if there is already an USB plug why not use that? It seems to take more time of course and maybe for no gain as even the bootloader would need to be rewritten to use some switch to indicate if it is flashing mode or keyboard mode…