fpgacpu.org - CNets and General LFSR Counters

CNets and General LFSR Counters

Home

CNets and Datapaths >>
<< Small footprints

Usenet Postings
  By Subject
  By Date

FPGA CPUs
  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  Transputers
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

Multiprocessors
  Multis and fast unis
  Inner loop datapaths
  Supercomputers

Systems-on-a-Chip
  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

CNets
  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

Miscellaneous
  Floorplanning
  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design

Subject: XBLOX vs. "CNets", lfsr dividers, etc.
Date: 28 Nov 1995 00:00:00 GMT
newsgroups: comp.arch.fpga,comp.lsi.cad

Perhaps this is of interest.  I have created an experimental set of C++
classes called "CNets" that let me specify structural (not behavioural)
designs which ultimately emit XNF primitives such as nets, gates, and
flipflops.  Using this specification language I can build up higher
level modules, the most sophisticated of which is a pipelined 32-bit
RISC processor plus on-chip peripherals such as boot ROM, UART, and
DRAM controller, in ~65% of a XC4010.

I use this approach in preference to both schematic capture and to
synthesis from HDLs.  It is more flexible, more powerful, and more
reusable than schematic capture and yet still allows me to precisely
and explicitly control primitive instantiation and placement, which can
be tricky using synthesis.  Not to mention this approach is much more
affordable than HDL synthesis.

(By the way, "CNets" does NOT deliver FPGA device independence, but it
could be adapted to that purpose.  I'm not convinced device
independence is such a panacea anyway -- for instance, either you
design to exploit distributed SRAMs and 3-state buses or you don't, and
if your current target device doesn't implement them you probably
should not require them.)

For instance, here is my universal linear feedback shift register
divider, which is known to work nicely for simple divisors like
n==(25000000/9600):

// emit an lfsr counter and decoder to divide by n
//
// See "Efficient Shift Registers, LFSR Counters, and
// Long Pseudo-Random Sequence Generators", Peter Alfke,
// Xilinx App Note, Aug. 1995
//
void lfsr_div(Net out, Net ce, Net reset, unsigned n) {
    ...
    // choose appropriate width counter
    static unsigned taps[32][4] = {
        { 0 }, { 0 }, { 0 }, { 3, 2 },
        { 4, 3 }, { 5, 3 }, { 6, 5 }, { 7, 6 },
        { 8, 6, 5, 4 }, { 9, 5 }, { 10, 7 }, { 11, 9 },
        { 12, 6, 4, 1 }, { 13, 4, 3, 1 }, { 14, 5, 3, 1 }, { 15, 14 },
        { 16, 15, 13, 4 }, { 17, 14 }, { 18, 11 }, { 19, 6, 2, 1 },
        { 20, 17 }, { 21, 19 }, { 22, 21 }, { 23, 18 },
        { 24, 23, 22, 17 }, { 25, 22 }, { 26, 6, 2, 1 }, { 27, 5, 2, 1
},
        { 28, 25 }, { 29, 27 }, { 30, 6, 4, 1, }, { 31, 28 }
    };
    check(n <= (1 << 30));
    for (unsigned bits = 1; n >= (1U << bits); bits++)
        ;
    check((1U << (bits-1)) <= n && n < (1U << bits));

    // determine bit pattern of terminal state (after n-1 clockings of lfsr)
    unsigned w = 0;
    for (unsigned i = 1; i < n; i++) {
        unsigned in = 0;
        for (unsigned j = 0; j < 4 && taps[bits][j]; j++)
            in ^= (w >> (taps[bits][j]) - 1) & 1;
        w = ((w << 1) & ((1 << bits) - 1)) ^ !in;
        check(w != 0);
    }

    // emit shift register and gates to recognize terminal state
    bus(lfsr, bits+1);
    out = lfsr(bits,1) == w;
    lfsr[0] = gnd;
    net(lfsr_in) = nomap(xnor(lfsr[taps[bits][0]], lfsr[taps[bits][1]],
                              lfsr[taps[bits][2]], lfsr[taps[bits][3]]));
    net(lfsr_reset) = out | reset;
    ff(lfsr[1], lfsr_in & ~lfsr_reset, ce);
    for (i = 2; i <= bits; i++)
        ff(lfsr[i], lfsr[i-1] & ~lfsr_reset, ce);
}

In case it's not perfectly obvious, :-), the last two groups of statements do
the following:

    bus(lfsr, bits+1);        -- declare lfsr to be a bus of bits+1 nets
    out = lfsr(bits,1) == w;  -- emit AND gate(s) to recognize the
                                  word 'w' in bits (bits..1) of bus lfsr
    lfsr[0] = gnd;            -- set lfsr[0] to gnd, necessary for the
                                  following XNOR to be correct when
                                  taps[i] is 0

    -- emit an XNOR of up to 4 inputs taking various taps from the lfsr
        shift register flipflop outputs.  'nomap' suppresses the default FMAP:
    net(lfsr_in) = nomap(xnor(lfsr[taps[bits][0]], lfsr[taps[bits][1]],
                              lfsr[taps[bits][2]], lfsr[taps[bits][3]]));

    -- set lfsr_reset to be the OR of out and reset signals:
    net(lfsr_reset) = out | reset;

    -- emit a flipflop driving lfsr[1] whose D is the AND of lfsr_in and
        NOT lfsr_reset and whose clock enable is 'ce':
    ff(lfsr[1], lfsr_in & ~lfsr_reset, ce);

    -- emit the rest of the flipflops, each of whose D input is the Q output of
        the previous 'flop, qualified by not reset, and whose clock enables
        are 'ce':
    for (i = 2; i <= bits; i++)
        ff(lfsr[i], lfsr[i-1] & ~lfsr_reset, ce);

Anyway, the salient ideas are:
* extend a real programming language with circuit specification datatypes
* employ structural specification, close to the FPGA primitive elements

Is anyone else using a similar approach?

Jan Gray