Strange Clock Divider Issues


#1

Hi All,

I’ve been getting started with the A series boards, and I came across an issue a while back that I’ve been unable to resolve.

Fundamentally, I’m trying to chain two clock divider modules that I wrote, and I’m getting weird artefacts from the second in the chain.

screenshot removed due to 1-image limit (see comment for filename)

I intended to run the OSCH clock at ~2 MHz, into clkdiv1 (a divide by 8… 4 × 2), and then into clkdiv2 (again, divide by 8).

Instead, I’m seeing something come out of clkdiv2 (~1 MHz) around where I’d expect to see its transitions.

I’ve tried adding a global reset, with no luck:

I decided to backtrack and have found another approach that works for my current project, but I’d still really like to understand what’s going on here…

Example projects with bitstreams are here:

Any advice would be greatly appreciated.

Attie


#2

Hopefully this isn’t too offensive, but how’s your soldering? Have you tried using different pins to see if you see the same effect?


#3

No offence taken, but my soldering is good / isn’t the problem.

Just to be really sure, I’ve duplicated and then moved the signals onto other pins, and see exactly the same thing.

I’ve successfully used this board to drive an RGB LED Matrix - repurposing these pins to demonstrate the issue here.

If you had time (and the stuff / inclination) to put the bitstream into something and give it a test, I’d really appreciate at least confirmation that it’s not just happening for me… Rebuilding as well would be helpful.


#4

:+1: I thought it was worth a check…

I don’t have an A-series board to test with unfortunately - maybe someone else here can help with that.

It’s probably not all that useful to try modifying it/running it on a BX, but I can give that a go tomorrow if you’re interested.


#5

That would be awesome if you’d be up for it!

I’ve been through my divider module (it’s tiny) and cannot see why this would be happening.

/* simple clock divider
   counts from zero to the given value, and then toggles clk_out
	 counts on positive edge of clk_in
	 reset is active-high */
module clock_divider #(
	parameter CLK_DIV_WIDTH = 8,
	parameter CLK_DIV_COUNT = 0
) (
	input reset,
	input clk_in,
	output reg clk_out
);
	reg [CLK_DIV_WIDTH - 1:0] clk_count;

	always @(posedge clk_in, posedge reset) begin
		if (reset) begin
			clk_out <= 1'b0;
			clk_count <= 'b0;
		end
		else begin
			if (clk_count == (CLK_DIV_COUNT - 1)) begin
				clk_out <= ~clk_out;
				clk_count <= 'b0;
			end
			else begin
				clk_count <= clk_count + 1;
			end
		end
	end

endmodule

Strangely inverting the input to clkdiv2 helps it run correctly… but that’s a weird hack and results in a phase issue…

The following gives the result seen in the original screenshot.

clock_divider #(
	.CLK_DIV_COUNT('d4)
) clkdiv2 (
	.reset(global_reset),
	.clk_in(clk_div1),
	.clk_out(clk_div2)
);

While this gives a better result:

clock_divider #(
	.CLK_DIV_COUNT('d4)
) clkdiv2 (
	.reset(global_reset),
	.clk_in(~clk_div1),
	.clk_out(clk_div2)
);


#6

The clocking resources for the MachXO2 are described in the following Lattice tech note:

http://www.latticesemi.com/-/media/LatticeSemi/Documents/ApplicationNotes/MO/MachXO2sysCLOCKPLLDesignandUsageGuide.ashx?document_id=39080

I think you’ll find what your looking for here. Maybe there’s a built-in clock divider you could use. Or maybe you need to explicitly feed your clock signal through a global clock buffer first.


#7

That would be awesome if you’d be up for it!

I tried running your clock divider code on a BX. The code snippet I used is below. This was compiled using yosys/arachne-pnr.

module top (
    input CLK,
    output USBPU,
    output PIN_14,
    output PIN_15,
    output PIN_16);
    
    assign USBPU = 0;
    
    wire clk_div1;
    wire clk_div2;
    
    clock_divider #(
    	.CLK_DIV_COUNT('d4)
    ) clkdiv1 (
    	.reset(1'b0),
    	.clk_in(CLK),
    	.clk_out(clk_div1)
    );
    
    clock_divider #(
    	.CLK_DIV_COUNT('d4)
    ) clkdiv2 (
    	.reset(1'b0),
    	.clk_in(clk_div1),
    	.clk_out(clk_div2)
    ); 
    
    assign PIN_14 = CLK;
    assign PIN_15 = clk_div1;
    assign PIN_16 = clk_div2;
    
endmodule

The results as captured from the pins (using a digilent digital discovery) look as-expected:

I think that means that your divider code is probably okay, and perhaps Luke’s suggestion might be the path to enlightenment :slight_smile:

D.


#8

@lukevalenty thanks for the link. I’ve read parts of that document already, but I was hoping to avoid “standard” modules, in favour of learning things from the ground up… if possible - surely a simple divider like this shouldn’t require use of one of their modules.

@gundy thanks very much for trying it out! I really appreciate it.

It’s weird things like this that got me really frustrated last time I tried playing with FPGAs. If this doesn’t work reliably, then who’s to say that any logic will stand straight… :thinking:


#9

Generating clocks is always special, especially on FPGAs. You might be able to get away without using special modules for some FPGAs and some tool chains, but there is no guarantee.

Clock signals need to be distributed to all elements such that every element sees the rising edge st the same time. This requires a balanced clock tree. The clock signal starts in the middle and is propagated evenly to all parts of the chip in a tree-like net of wires.

If your clock doesn’t use a balanced clock tree and uses general routing resources instead, then there is no guarantee that the clock rising edge will occur at the exact same time for all logical elements.

Your second clock divider has several bit registers that all need the first generated clock. If the rising edge of that clock reaches different registers at different times very strange things will happen.

Usually synthesis tools are able to detect clocks and automatically route them over a global clock tree. Sometimes you need to add constraints to tell the synthesis tool which nets are clocks.

You should try adding some constraints for all the clocks you use to explicitly tell the synthesis tools how to handle them. This is pretty standard for all digital design.


#10

I would be interested to know if using a different approach with the divider makes a difference - eg. using something like below:

module clock_divider #(
    parameter CLK_DIV_WIDTH = 8,
    parameter CLK_DIV_COUNT = 2
) (
    input reset,
    input clk_in,
    output clk_out
);
    localparam FULL_SCALE = 2**CLK_DIV_WIDTH;
    localparam CLK_DIV_ADD = $rtoi(FULL_SCALE / (CLK_DIV_COUNT*2));
    
    reg [CLK_DIV_WIDTH - 1:0] clk_count;

    always @(posedge clk_in, posedge reset) begin
        if (reset) begin
            clk_out <= 1'b0;
            clk_count <= 'b0;
        end
        else begin
            clk_count <= clk_count + CLK_DIV_ADD;
        end
    end

    assign clk_out = clk_count[CLK_DIV_WIDTH-1];

endmodule

… it seems that the above approach might require fewer logic gates. Rather than doing a compare across all of the bits of the counter as well as an add, it’s doing the add but manages to avoid the compare.

A downside of that kind of approach is that dividing the clock frequency by numbers that don’t go evenly into 256 (ie. aren’t powers of 2) will result in output clocks that have a some jitter in terms of the high/low times.

Another approach might be to count down instead of up, and test for zero instead. That way you can use the NOR reduction operator (I think) across the clk_count register to check if all bits are zero eg. ~|(clk_count). Although I’m not entirely sure how that would synthesize, I suspect the logic chain would be a little smaller than your current approach too.

D.


#11

Thank you both very much for your input! I really appreciate it… I’ll have a play when I next get a chance. :+1:


#12

attie - Had similar issues. This was using Lattice’s MachX02 dev board. First, the pins should go through an IO block. Special considerations when the pin is also used as a clock. Check out the Library Primitives Guide for the MachXO2 IO modules.