Distributed RAM vs dedicated RAM vs Block RAM


#1

Sorry to ask such a newb question, but I see Distributed RAM, Dedicated RAM and Block RAM mentioned in various specs and I’m curious what is the difference between them?

When might I use one vs. another?

Thanks for any insight!


#2

Actually, as someone with plenty of FPGA experience but minimal Lattice experience specifically, I’d LOVE to hear an answer from someone who actually knows what they’re talking about!

I looked into it quickly myself because you piqued my curiosity.

As background, FPGAs generally consist of a soup of a bunch of different functional blocks sprinkled around the physical die. The big ones are LUTs for implementing your logic, flip flops near them for storing state without self-modification, and RAM blocks (called block RAM or BRAM) for implementing proper memories - for most applications, you can’t do it all in flip flops alone.

So, LUT memory is the closest to your implemented logic, and if you implement a simple flip-flop in verilog to store a bit, that’s what will get used. That only scales up so far- if you tried to implement kilobytes of that in verilog, there’d come a point pretty quickly that you either run out of physical cells (placement resources exhausted), or the routing has gotten too complicated to make it all happen.

If you need to implement a legit memory, say for a small frame buffer or something, then BRAM comes in. It’s not RIGHT next to your logic, but it’s still on-chip. BRAM has dedicated placement and routing and such at the silicon level, so meeting timing constraints is much easier. As long as your module gets its address and data lines routed to a given BRAM block with acceptable timing, the memory itself will definitely work.

Then if you need MORE memory than that, you probably need an off-chip SRAM or DRAM.

In comparison to a more traditional processor, the parity is something like LUT ram/flip-flops = registers, BRAM = cache, off-chip SRAM/DRAM = program memory.

OK so what about the specific words you mention? BRAM is BRAM. Distributed RAM - the Mach XO2 memroy guide specifically mentions “Distributed RAM.” But skimming it, I still don’t know enough to say whether it’s a different name for LUT memory, or something else - my guess is the latter. It looks like they have some small-scale RAM resources available closer to the logic, but more organized than general purpose FFs. Would love a simple explainer there. Moreover, the ICE40 Memory Usage Guid makes no mention of this at all. So I suppose the smaller device families just don’t have that feature.

Finally, “Dedicated RAM” - I can find no official mention of that in documentation, other than in informal reference to BRAM, which is, of course, an on-chip primitive dedicated to providing a big block of RAM. So I suppose “dedicated RAM” is an informal reference in whatever your reading to “a real RAM device, somewhere” rather than “a bank of flip flops you’re storing bits on in your logic.”

Hopefully that’s helpful - I’d love details and corrections from someone who actually knows in the Lattice context!


#3

Thank you for the very helpful response! I’m really new to FPGA and it seems like every time I read up about it, I come across 25 more acronyms. :slight_smile:


#4

Distributed RAM in MachXO2/3 and ECP5 devices is like you suspect a different thing than the flip flops in each lut. The ICE40 family doesnt have this kind of memory. They serve as an intermediate solution between flip flops and the block ram.

It’s a lot less flexible than block ram but it makes sense when you don’t need more than a few bytes: you get a 16x4 Ram and need 6 luts for that (on the plus side its 64bits instead of the 6 bits you would get using just the flip flops!). The details are in the MachXO2 Family Datasheet just after the architecture overview in the first few pages


#5

Since you seem to know from experience - have you personally used it and why? Instantiating BRAM is a whole thing, like “Yes, this is my single or dual port RAM, it is this big, I know exactly what it’s for and how I plan to talk to it”. Most other resources are quite a bit more subtle. Is distributed RAM kind of like a, “oh yeah, I guess I should implement that register as distributed ram since it’s kinda big and I instantiate this module 100 times…” or what?


#6

You usually instantiate it like BRAM, just using different primitives. It only has simple & dual port options at 16x4 (or it can be ROM, from 16x1 up to 128x1, using from 1 to 8 luts of a pfu).

It’s actually possible for the toolchain to infere it from verilog if you’re careful but I dont feel secure enough to rely on that.

As to why and when to use it, mostly when you need way less than the size of an EBR block (in MachXO2 the BRAMs are made with EBR blocks of 8/9kbits each). You may use it for the register file of a processor, a small stack, a buffer for a serial port or keyboard, etc.

In a simple 8/16bit processor with 1 operand instructions (the other operand in the accumulator) I would put the accumulator, flags, program counter, stack counter & index register(s) in regular flip flops, the other registers in distributed ram (let’s say 16 general purpose registers of 8/16 bits each) and the ram, rom (and microcode rom if applicable) in block ram (you could also use the UFM or user flash memory for the rom). In this scenario you can easily encode the address of a general purpose register with 4bits in the opcode of ALU instructions and use those 4bits directly as the address for the distributed ram block.


#7

VERY clarifying, thank you!