USB Communication


#21

I have been looking at doing more examples of using the USB code in user designs but one thing that holds me back is that the code is very fragile. Some builds with very minor changes just stop working, including no longer functioning as a USB device. The exact same code sometimes builds and runs fine if I build it on my Windows machine but not on my Linux machine, or vice versa. Probably caused by different versions of yosys or archne-pnr.

And one thing I noticed is that timing analysis does not work for the bootloader or my usb examples derived from it. It says that the designs will run at 0 Mhz.

I just tried building the original bootloader and my usb examples using nextpnr instead of arachne-pnr, after upgrading to the latest version of nextpnr and yosys.

The bootloader would not build with that as it said it had combinatorial loops. I added a --force flag and that got a bit further but still failed as is failed a timing check. I had to set --freq 1, i.e. the required speed to 1Mhz to get it to build. With that my examples worked.

It would be a lot of work to investigate the cause of these problems, and with Luke doing a completely new implementation of USB for the EX, it is probably not worth it.

Another thing that would be good is to be able to support USB devices classes other than CDC ACM, such as HID devices for mice, keyboards, or midi input devices, or audio output devices. That looks quite feasible, but again a lot of work, and probably not worth it on this implementation.


#22

Thanks for all the work you put in!

One thing I noticed as I was looking over the code is that top.v has a signal resetn which it uses, and usb_uart.v also has a resetn. But internally usb_uart.v uses reset (no -n) only. There is no reset = ~resetn anywhere hence they don’t seem to be connected. And the USB logic never gets reset. I wonder if this is what was causing the bad behavior under some conditions.

I added wire reset = !resetn; at line 74 and have had no problems running it.

Of course I might be completely wrong and be missing something critical… (-:,


#23

I’ll give that a go, and see if it makes a difference. I set reset to 0 as that is what the original bootloader code did.


#24

FWIW I see this same behavior with the code you provided.

It is, unfortunately, fragile as you say, and dependent on the place and route tool. For instance, if I change the random seed (-s flag) to the placer uses it works or it doesn’t.

Have you made any progress tracking down the instability?

I’ve been working on removing the combinatorial loop, but I’m only half done with that (there are two badnesses there).

Thanks!


#25

I have not looked at this any more for the reasons I said, but I would be interested to see how you get on removing the combinatorial loops.

Both @lukevalenty and Tim ‘mithro’ Ansell for the FOMU project seem to be working on a new USB implementation - see https://github.com/mithro/valentyusb


#26

There are two things that must be done to break the loop. One is to change the rising/falling edge detectors to use two flipflops instead of one. The other is to introduce a clock cycle edge around the data_done and ack logic. Here’s my current working version. http://www.cs.washington.edu/homes/oskin/usb.tar

That being said, it still doesn’t remove the working/non-working based on place and route outcome. You still need to apply different -s flags to the P&R tool until you get a version that works :frowning:

I’m going to push this through nextpnr and see what it thinks now that the loop is cracked.

NOTE: I’m not sure if the way I broke the loop leads to a correct USB protocol state machine. It “works for me” but like you I only ask it do the USB setup and basic serial I/O. YMMV.


#27

Lawrie,

So for what it’s worth, I seem to have got a version of the USB code to be more stable under P&R. Not saying it’s perfect, but definitely more stable (I have yet to make a bad run of it, but that doesn’t mean it’s perfect). Here’s an update: http://www.cs.washington.edu/homes/oskin/usb1.tar

I’m not entirely pleased with it, however. What I did was set about trying to get rid of latches in the synthesized logic. I haven’t removed them all, but I did end up specializing the arbitration logic and this both improved the frequency estimation from nextpnr and led to stable bitstreams from arachne.

Note that nextpnr is not yet generating a functional bitstream. Also note that there are /still/ latches being generated around the arbiters, however this is due to how they are being used (I believe). If I have some time, or if the instability re-appears, I’ll dig into it more and try and get rid of more latches in the design.

Enjoy.


#28

Another update on this. I updated the usb1.tar with my latest, where I went through and changed everything so verilator lint would not complain anymore. /Unfortunately/ I spoke too soon – it continues to be unreliable and depends on what random seed is provided to the P&R tool. I’m going to keep hacking on it. Yosys is still reporting some latches being generated around the arbiter. next-pnr reports a clock rate (27Mhz) below the called for frequency. So there’s plenty of places still to look for removing the instability…

Thanks for the link to the other USB core written in Migen. I’m deep enough down the rabbit hole of this one that I’m going to keep plunking away at it removing the various bobbles tools report… at least for now.


#29

Very good. I will wait a little while to see if you make more progress before I try to use it. I might have to learn Migen to try the new implementation sometime but it would be good to see this Verilog one improved.


#30

I’ve been plodding along with this and I’ve got a bit further with it.

Re: latches, etc., there’s a Pull Request on the Bootloader repo that seems to straighten out a lot of the yosys issues - https://github.com/tinyfpga/TinyFPGA-Bootloader/pull/21

Re: Arachne, etc. I was surprised to learn that layout is not influenced by timing concerns. When I heard that I switched immediately to NextPNR (which lays out in such a way as to minimize time delays!) and have had much better results.

I have given the UART a pipeline-style frontend and am just doing some more testing on it. All looks very good, although inherent in the USB design seems to be a limit to a maximum of 32 (if memory serves) bytes per transfer. This was good for bootloading but may cause problems in other areas. Has anyone else run into this? Is there a simple fix? My lack of USB internal skills is embarrassing!