The Myriad Uses of Block RAM |
|||
Home Flex10K CPUs >> << Floating point
Usenet Postings |
Subject: Xilinx Virtex announced, what to do with big blocks of RAM Date: Tue, 27 Oct 1998 11:40:13 -0800 X-Unsent: 1 [ed: note, never posted] See http://www.xilinx.com/products/virtex.htm. Now both Altera and Xilinx offer products with large blocks of embedded RAM. What shall we do with it? Simply put, if a function is, has, or can be implemented using a few KB or less of RAM or ROM, it can probably be implemented using embedded RAM or ROM in a suitable programmable logic device. So here are some obvious potential applications: * register file; many-hundred-bit-word register file; vector register file; windowed register file, fixed or variable size, optionally tiled; * multiple register file contexts, including user/kernel/interrupt handler shadow contexts, or multiple threads' contexts; * operand/data stacks, control (incl. return address) stacks; stack elements referenced either directly or only top of stack; unified locals+operands stack; storing one or more activation records; systems which automatically store and reload same, burst or trickle-back; * m-read n-write multiported versions of these register files or stacks, via the embedded RAM's inherent multiport access, or time-multiplexed access, or replicated copies, or supporting multiple concurrent writes using replication + each write updates only one replica and updates 'which replica valid' state + read access selects valid replica }; * hybrid schemes with small fast multiported register files/stacks using fine grained embedded RAMs, backed by larger multiple frame/context storage using large embedded RAM blocks, providing fast call/return and/or fast context switching; * multiplier, divider, and/or trigonometic lookup tables, or partial lookup tables, coefficient lookup tables, interpolant estimate tables; * control stores: wide encoded state machines, microcode, nanocode (multiple-level structures), possibly writeable; * branch prediction mechanisms including branch target address caches, branch target instruction caches, return address caches, branch history tables; * instruction buffers; loop buffers; on-chip instruction and/or data cache data, direct mapped or small-n-way set associative; cache tags (for on- or off-chip cache data); MOESI style cache coherence bits; snoop tags; schemes combining and/or concurrently accessing data+tags in same RAM block; victim buffers; write buffers and write-accumulation structures; predecoded, decompressed, or canonicalized instruction caches; any of these optionally preinitialized; * segmentation registers, translation lookaside buffers, and other per segment, page or region memory mapping state to real address, present, valid, and/or dirty bits or state including direct mapped entries, sequentially probed entries, with grouped, random, sequential, linked list and other such line replacement policies/mechanisms; * per-task state tables, including priorities, task state, next-task info, attributes, and masks; fast dedicated thread local storage; * debug support tables including breakpoint code address and/or count registers, breakpoint data address and/or value registers, nonsequential IP history, branch taken/not taken history, memory access history; dynamically reconfiguring an FPGA processor to insert such debugging features on demand; * on-chip RAM or scratchpad RAM; multiple banks of same supporting multiported access, or interleaving; optionally preinitialized; on-chip ROM; use of these for interpreter or emulator code or data, * use of on-chip RAM to buffer, exchange, or manage data between on- or off-chip processor and on- or off-chip peripherals or coprocessors; * DMA staging buffers; off-chip memory multiple-outstanding-transaction buffers/management; * I/O buffers/FIFOs/queues in general, linear, circular, or linked list, * on-chip/off-chip memory/peripheral controller's table mapping addresses to peripheral selects and wait state control timings, * DRAM open page registers; * garbage collection support: read, write barriers via page table attribute bits or region table address checks; card marking bit array (one bit per memory tile of 256 bytes or so). * on-chip message/cell buffers; queues; virtual channel message/cell buffers; node/address-to-info maps; use of same for message passing or shared memory multiprocessors and packet/cell switched network interconnect fabrics; * buffers for temporary storage of messages pending segmentation, and buffers for subsequent reassembly of cells into messages; * audio input or output buffers or delay lines; envelopes; audio samples, wave tables; tone generators; * video line input or output buffers or delay lines; sprite or overlay storage; stipples; stencils; character or pattern generator ROM, * graphics: display list, render command queue, vertex lists; transformation matrices or stacks of same; per span or chunk compositing buffer; rendering buffers for accumulating the current scan line's spans' colour, alpha, and/or Z- information; texture cache; * DCT and IDCT support (8x8 pixel blocks, quant coeff tables, huffman tables (?)), * RAMDAC colour LUTs mapping colour index to colour, or mapping colour to gamma corrected colour; VGA, XGA, etc. emulation state, * bus interface configuration state memory, incl. PCI configuration memory; peripheral device command FIFO, response FIFO; * self-diagnosis: storage of one or more samples of selected, captured readback data, readback bit skip counts, * some of the above replicated on-chip; or shared amongst multiple on-chip clients, including multiple processors; * some combination of above stored together in a single embedded RAM block or a bank of same; * hybrid uses of large embedded RAM blocks together with smaller distributed RAM blocks to achieve large storage capacity with highly multiported access to a subset of that storage; and, lest we forget, * arbitrary lookup tables for functions of 8, 9, ... 12 input bits yielding up to 16, 8, ... 1, outputs. Should be fun... (Absent from my list above are those uses of small RAMs that require content addressibility and/or heavy multiporting: reservation stations, out-of-order instruction issue/retire queues, fully associative TLBs, fully associative caches, some compression algorithms, IP routing, etc. I do not expect to see context addressible RAM as an embedded block any time soon.) "It's a good time to be us," Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved. |