USB Communication


I have been looking at doing more examples of using the USB code in user designs but one thing that holds me back is that the code is very fragile. Some builds with very minor changes just stop working, including no longer functioning as a USB device. The exact same code sometimes builds and runs fine if I build it on my Windows machine but not on my Linux machine, or vice versa. Probably caused by different versions of yosys or archne-pnr.

And one thing I noticed is that timing analysis does not work for the bootloader or my usb examples derived from it. It says that the designs will run at 0 Mhz.

I just tried building the original bootloader and my usb examples using nextpnr instead of arachne-pnr, after upgrading to the latest version of nextpnr and yosys.

The bootloader would not build with that as it said it had combinatorial loops. I added a --force flag and that got a bit further but still failed as is failed a timing check. I had to set --freq 1, i.e. the required speed to 1Mhz to get it to build. With that my examples worked.

It would be a lot of work to investigate the cause of these problems, and with Luke doing a completely new implementation of USB for the EX, it is probably not worth it.

Another thing that would be good is to be able to support USB devices classes other than CDC ACM, such as HID devices for mice, keyboards, or midi input devices, or audio output devices. That looks quite feasible, but again a lot of work, and probably not worth it on this implementation.


Thanks for all the work you put in!

One thing I noticed as I was looking over the code is that top.v has a signal resetn which it uses, and usb_uart.v also has a resetn. But internally usb_uart.v uses reset (no -n) only. There is no reset = ~resetn anywhere hence they don’t seem to be connected. And the USB logic never gets reset. I wonder if this is what was causing the bad behavior under some conditions.

I added wire reset = !resetn; at line 74 and have had no problems running it.

Of course I might be completely wrong and be missing something critical… (-:,


I’ll give that a go, and see if it makes a difference. I set reset to 0 as that is what the original bootloader code did.


FWIW I see this same behavior with the code you provided.

It is, unfortunately, fragile as you say, and dependent on the place and route tool. For instance, if I change the random seed (-s flag) to the placer uses it works or it doesn’t.

Have you made any progress tracking down the instability?

I’ve been working on removing the combinatorial loop, but I’m only half done with that (there are two badnesses there).



I have not looked at this any more for the reasons I said, but I would be interested to see how you get on removing the combinatorial loops.

Both @lukevalenty and Tim ‘mithro’ Ansell for the FOMU project seem to be working on a new USB implementation - see


There are two things that must be done to break the loop. One is to change the rising/falling edge detectors to use two flipflops instead of one. The other is to introduce a clock cycle edge around the data_done and ack logic. Here’s my current working version.

That being said, it still doesn’t remove the working/non-working based on place and route outcome. You still need to apply different -s flags to the P&R tool until you get a version that works :frowning:

I’m going to push this through nextpnr and see what it thinks now that the loop is cracked.

NOTE: I’m not sure if the way I broke the loop leads to a correct USB protocol state machine. It “works for me” but like you I only ask it do the USB setup and basic serial I/O. YMMV.



So for what it’s worth, I seem to have got a version of the USB code to be more stable under P&R. Not saying it’s perfect, but definitely more stable (I have yet to make a bad run of it, but that doesn’t mean it’s perfect). Here’s an update:

I’m not entirely pleased with it, however. What I did was set about trying to get rid of latches in the synthesized logic. I haven’t removed them all, but I did end up specializing the arbitration logic and this both improved the frequency estimation from nextpnr and led to stable bitstreams from arachne.

Note that nextpnr is not yet generating a functional bitstream. Also note that there are /still/ latches being generated around the arbiters, however this is due to how they are being used (I believe). If I have some time, or if the instability re-appears, I’ll dig into it more and try and get rid of more latches in the design.



Another update on this. I updated the usb1.tar with my latest, where I went through and changed everything so verilator lint would not complain anymore. /Unfortunately/ I spoke too soon – it continues to be unreliable and depends on what random seed is provided to the P&R tool. I’m going to keep hacking on it. Yosys is still reporting some latches being generated around the arbiter. next-pnr reports a clock rate (27Mhz) below the called for frequency. So there’s plenty of places still to look for removing the instability…

Thanks for the link to the other USB core written in Migen. I’m deep enough down the rabbit hole of this one that I’m going to keep plunking away at it removing the various bobbles tools report… at least for now.


Very good. I will wait a little while to see if you make more progress before I try to use it. I might have to learn Migen to try the new implementation sometime but it would be good to see this Verilog one improved.


I’ve been plodding along with this and I’ve got a bit further with it.

Re: latches, etc., there’s a Pull Request on the Bootloader repo that seems to straighten out a lot of the yosys issues -

Re: Arachne, etc. I was surprised to learn that layout is not influenced by timing concerns. When I heard that I switched immediately to NextPNR (which lays out in such a way as to minimize time delays!) and have had much better results.

I have given the UART a pipeline-style frontend and am just doing some more testing on it. All looks very good, although inherent in the USB design seems to be a limit to a maximum of 32 (if memory serves) bytes per transfer. This was good for bootloading but may cause problems in other areas. Has anyone else run into this? Is there a simple fix? My lack of USB internal skills is embarrassing!


Have you done any more on this? I tried your usb directory from the usb1.tar file and it worked for my Hello World example but did not seem to receive data in the uart and tones examples.


Do you have your code anywhere that I could try it.

I also see on Twitter that Tramell Hudson has been doing a lot of work on the bootloader code to make it work on the Fomu board.

USB ACM FIFO and RISCV core demo

I’ve put my work into a repo here -

Since changing to NextPNR and making a few other tweaks, I have built several miniprojects with this code and have not seen the dreaded “module doesn’t boot” or “module doesn’t create a USB device” problems that the arachne pnr tool presented. Fingers crossed.

As you can read in the repo’s readme, there is one major bug remaining (that I can see) and one major issue.

The bug is that the USB side of the implementation doesn’t like transfers of more than 32bytes. Try to send longer messages and the interface locks up (although it does repair itself for the next call). For the original implementation there was no need for longer transfers, and I’m guessing this may be rather a simple fix, however I’m very inexperienced with USB internals, so I wasn’t able to immediately see what was wrong.

The major issue is that for my needs I replaced the original front end with a streaming pipeline front end for maximum transfer speed (one transfer per clock). This may not suit all users because implementing these streaming pipelines is a head-exploding nightmare. Feel free to create an alternative. I realize that a better approach would have been to just work with the existing repo, keeping the existing interface then switching to a pipeline, but at the time I really thought it wasn’t going to work, and I wanted to get some pipeline practice so we end up here.

Finally, I am far from an expert Verilogger (or anything else FPGA), so please, any comments or criticisms will be happily received.


Hi Lawrie-

I don’t have any useful update yet. I did a lot of hacking on it to get it to pass timing at 48Mhz with nextpnr. But in that process I also broke something along the way where it now works for Mac OS X but not Windows. I did observe that the inconsistencies in P&R largely, but not entirely went away the closer and closer I got to meeting timing.

I’m going to pick up David’s latest patch he just posted and see how that works for me. If it’s stable I might just run with that for my class and abandon my efforts. I’m teaching a hardware design class with TinyFPGA BX, so if it’s stable, we’ll know because about 15-20 design teams will be using it… will let you know…



Hi David-

Thank you so much for the code. I put it to work here on my stuff for my class and so far it’s awesome. It’s smaller and I have yet to get a bad P&R result. No more -s flag and hope randomness (so far). I’m having all the students update their projects to it. We’ll see how it does on a wider sample here shortly.

Lawrie: I’m going to abandon my code and stick with this one for now. If I get a spare moment I might make sure it passes verilator lint, but that’s not going to change any of the functionality / timing of the design.

Thanks all for your help!



You are welcome! Hopefully one of your smart students can fix the 32 byte limit bug!


Hey Mark… If you or your students come up with any interesting examples, perhaps you would consider adding them to the repo? I’m (possibly obviously) paranoid that the pipeline interface is not convenient to use. Let me know!


Unfortunately, the class is a basic build your 5 stage pipelined CPU. The USB UART is there to just pump bytes back to their machines for debugging. I wrapped the interface up for them and push bytes back to some python code running on their laptops that gives them a cycle by cycle display of the action inside their CPU.

So… probably not much in the way of cool examples will come out of it :frowning: I did observe the 32 byte bug you mentioned when I was initially integrating it. But because I’m using python to grab bytes off the USB port and print them to the screen, I need to slow down the rate of transmission significantly anyway. The debugger gave the students sends only a couple hundred bytes a second back to the students for now. (When they get all done I may up the clock rate on their designs…)

The interface itself looks great to me. I have some other research projects going on that we may use it in. If anything cool comes out of that in the next year I’ll let you know.

Thanks again!


For anyone using the pipelined USB UART, I fixed a few nasty bugs, including the previous 32 character packet size limit. So now it seems to work pretty well even on long contiguous messages. There was a back pressure bug on the input that got fixed too.

It still has the pipeline interface… (-:,


Does the work being done in this topic allow the TinyFPGA BX to have the same USB serial capability that I would get if I hooked up an FT232R to it? Usually I use FPGA boards as general purpose GPIO -> usb instruments and I can get a 3Mbaud comport with an FT232R on the board. Does the …usbserial repo provide same or similar functionality?