Timing Analysis leads to wisdom


#1

New Verilog engineer here. I’m writing some code that seems to fail timing (out of my depth a bit probably). As an exercise, I thought perhaps I could create a very simple program and see what timing constraints exist on it. Here’s what I see

Code

    module top (
        input CLK,     // 16MHz clock
        output PIN_6,  // OUT
        input  PIN_13, // IN
        output LED,    // User/boot LED next to power LED
        output USBPU   // USB pull-up resistor
    );

        assign USBPU = 0;
        assign LED = 0;

        assign PIN_6 = PIN_13;

    endmodule

Then the timing output

icetime topological timing analysis report
==========================================

Info: max_span_hack is enabled: estimate is conservative.

Report for critical path:
-------------------------

        pre_io_0_3_0 (PRE_IO) [clk] -> DIN0: 0.307 ns
     0.307 ns net_262 (PIN_13$2)
        odrv_0_3_262_307 (Odrv12) I -> O: 0.796 ns
        t4 (Span12Mux_v12) I -> O: 0.796 ns
        t3 (Span12Mux_v12) I -> O: 0.796 ns
        t2 (Sp12to4) I -> O: 0.662 ns
        t1 (Span4Mux_h4) I -> O: 0.465 ns
        t0 (LocalMux) I -> O: 0.486 ns
        inmux_0_23_2555_2548 (IoInMux) I -> O: 0.382 ns
     4.690 ns net_2548 (PIN_13$2)
        pre_io_0_23_1 (PRE_IO) DOUT0 [setup]: 0.103 ns
     4.793 ns io_pad_0_23_1_din

Resolvable net names on path:
     0.307 ns ..  4.690 ns PIN_13$2

Total number of logic levels: 1
Total path delay: 4.79 ns (208.64 MHz)

It’s little surprise that my designs don’t want to run faster than 200MHz when even the most simple assignment won’t run faster than 208MHz.

More experienced people:

  • what am I doing wrong where my most simple assignment is so slow
  • are there special techniques I can employ to somehow compel this assignment to run faster?
  • in general, should I have hope that my more ambitious code can at 200MHz?

#2

Can I ask what sort of speed you were expecting? The ICE40 datasheet shows, for example, that the global buffer network inside the chip runs at a maximum of 275MHz - so it’s unlikely you’re ever going to get much faster than that with a device like the ice40.

That said, in your example, you’re assigning pin6 to the input of pin13. Those pins aren’t adjacent to each other in the floorplan of the device, so there’ll be some routing overhead (the spans/muxes that are in your critical-path).

I’d be interested to know if things are any faster if you assign pin10 to the pin13 value instead, as pins 10/13 are right next to each other in the layout.

Depending on how far you want to push things this may end up being a matter of expectation-management. These aren’t super-high-speed devices - although they are more than capable for a great lot of interesting use-cases :slight_smile:

D.


#3

Thanks for the quick and clear answer. I didn’t think of pin adjacency. Indeed when I try PIN 10 - PIN 13, as you predict, things look a lot more like what I had hoped to see. See below - 548MHz!

Meanwhile I will certainly manage my expectations for my projects, and keep things chip-appropriate.

icetime topological timing analysis report
==========================================

Report for critical path:
-------------------------

        pre_io_0_3_0 (PRE_IO) [clk] -> DIN0: 0.307 ns
     0.307 ns net_262 (PIN_13$2)
        odrv_0_3_262_141 (Odrv4) I -> O: 0.548 ns
        t0 (LocalMux) I -> O: 0.486 ns
        inmux_0_3_283_269 (IoInMux) I -> O: 0.382 ns
     1.723 ns net_269 (PIN_13$2)
        pre_io_0_3_1 (PRE_IO) DOUT0 [setup]: 0.103 ns
     1.826 ns io_pad_0_3_1_din

Resolvable net names on path:
     0.307 ns ..  1.723 ns PIN_13$2

Total number of logic levels: 1
Total path delay: 1.83 ns (547.56 MHz)

#4

There’s another layer of learning here.

I was astonished to read that arachne-pnr does it’s work with no regard for timing considerations. This is a big deal. It means that if your design is pushing things a bit w.r.t. timing (i.e. you’re getting up to a lot each clock cycle) depending on other, non-timing related considerations, your design might sometimes work, and other times not work at all.

To avoid this sad state of affairs, migrate immediately to NextPNR. Not only does it do a much better job of placing and routing to minimize time delays, it also has a gorgeous GL-rendered graphical output.

Until recently, the developers of NextPNR haven’t been actively recommending people use it because they wanted to knock some more bugs out of it, but in the last few weeks they’ve changed their recommendation to people to switch.