

#### Virtex<sup>™</sup>-4 Source Synchronous Interface Advantage

High-Performance Source-Synchronous Interfaces Made Easy

#### We Asked Our Customers:

#### What are your challenges?

- Shorter design time, faster obsolescence
- More competition, increasing cost pressure
- Demanding complexity and performance
- Power consumption and thermal issues
- Signal integrity problems caused by faster I/O
- Implementing source-synchronous and memory interfaces
- Today's seminar addresses **Source Synchronous I/Fs**



## Agenda

- Background
- Source Synchronous Design Challenges & Solutions
- Building SFI-4.1/ SPI-4.2 applications
- Summary

## Agenda

- Background
- Source Synchronous Design Challenges & Solutions
- Building SFI-4.1/ SPI-4.2 applications
- Summary

#### **Moore Meets Einstein**



• Speed doubles every 5 years...

...but the speed of light never changes



#### System Synchronous vs. Source Synchronous





Dedicated source-synchronous clock for each datapath



#### **System Interconnect Trends**



Note: Interconnect bandwidth = #of data lines \* signaling rate per line



#### Source-Synchronous Interfaces

#### Key Characteristics

- Point to point connection instead of buses
- Higher chip-to-chip speed
  - SDR: 700 MHz clock
  - DDR: 500 MHz clock
    - 1Gbps data rate
- Higher reliability
  - Minimizes problems of skew and jitter

#### **Applications**

- Networking/Telecom
  - SPI-4.2 / SFI-4 / XSBI
  - RapidIO<sup>™</sup>
  - NPSI (CSIX)
  - Utopia IV
- Memory
  - DDR SDRAM
  - DDR 2 SDRAM
  - QDR II SRAM
  - RLDRAM II
  - FCRAM II

#### Increasing Bandwidth Reduces System Timing Margin



#### Effective Data Valid Window Shrinks Faster than the Bit Period



## Agenda

- Background
- Source Synchronous Design Challenges & Solutions
- Building SFI-4.1/ SPI-4.2 applications
- Summary

## Challenges

- 1. Data capture at high speeds
- 2. Managing clock speeds up to 700 MHz
- 3. PCB layout challenge
  - 1. I/O placement flexibility
  - 2. Channel to channel skew
- 4. Implementing multiple interfaces



#### Virtex-4 I/Os Simplify Design With Built-In Critical Circuits



#### ChipSync Circuitry in Every I/O!



Source Synchronous Interfacing Made Easy, Page 12

#### #1: Data Capture at High Speeds





Source Synchronous Interfacing Made Easy, Page 13

## **Precise Clock to Data Centering**

- Virtex-4 FPGA solution with ChipSync™ IDELAY
  - "Run time" centering of data to clock during initialization
  - 64 tap delays with 75 ps resolution
  - Maximizing design margins for higher system reliability



XILINX



#### Not available in any other FPGA, ASIC or ASSP



#### #2: Managing Clock Speeds Up to 700 MHz

- Clock distribution with minimal skew & duty cycle distortion
  - Up to 32 fully differential Global clock distribution networks
  - 4 fully differential IO clock distribution networks per bank
- Ability to forward clocks
  - FPGA can serve as a precision-aligned clock distributor:
    - One 500MHz clock in, 32 500MHz (LVDS) clocks out with less than 50ps of skew



#### **#3: PCB Layout Challenges**

- Layout constraints can result in trace length differences
- Propagation delays for connectors may not be available



#### Too Much Skew Means Words Misaligned After Bits Aligned







Source Synchronous Interfacing Made Easy, Page 17

#### Easy Word Alignment with Bitslip





Source Synchronous Interfacing Made Easy, Page 18

#### Easy Word Alignment with Bitslip





#### #4: Implementing Multiple Interfaces

- Multiple Unique clock domains
- Clock management
  - Synthesis, distribution
- IO Placement
  - Breakout
  - Board floorplan



#### Abundant Clock Resources Support Multiple Clock Domains



- Two Regional Clock nets per region
- 8-24 clock regions per device
- Up to 4 Clock-Capable I/Os per bank

- I/O Clock nets or general interconnect can drive Regional Clock nets
- Regional Buffer can divide I/O Clock rate

#### 2X The Resources for Flexible Clock Management

| Feature                    | Stratix-II                                            | Virtex-4                    |
|----------------------------|-------------------------------------------------------|-----------------------------|
| Clock inputs: Differential | 16                                                    | 32                          |
| Clock inputs: Single-ended | 16                                                    | 32                          |
| Clock regions              | 4 quadrants                                           | 8 – 24 regions              |
| Clock circuits             | 4 EPLLs, 8 FPLLs                                      | 20 DCMs + 8 PMCDs           |
| Global clocks              | 16 total, 16 per quadrant                             | 32 total, 8 per region      |
| Regional Clocks            | 32 total, 8 per quadrant                              | 16 – 48 total, 2 per region |
| I/O clocks                 | 0                                                     | 36 – 68, 4 per I/O bank     |
| Total dedicated clocks     | 48                                                    | 48 - 80                     |
| I/O Banks                  | 8 general banks and up to 4 smaller banks, restricted | 8 – 16 full featured        |

 Enables much easier implementation of multiple interfaces within the same chip

# Simpler PCB Design With Flexible I/O & Banking Rules



- All Virtex-4 I/Os can be used for source synchronous design
- 9 17 I/O banks per device



• Stratix-II offers a restrictive choice of banks and standards for source synchronous design



#### Virtex-4 Source-Synchronous Resource Summary

| Resource                     | Quantity                   |  |
|------------------------------|----------------------------|--|
| ChipSync blocks              | One per I/O                |  |
| Clock Regions                | 8-24                       |  |
| I/O Banks                    | 9-17                       |  |
| SelectIO pins                | 240-960                    |  |
| Clock-Capable I/Os           | 18-68                      |  |
| Regional Clocks              | 16-48 (2 per Clock Region) |  |
| I/Os accessible by I/O Clock | 95                         |  |
| Max Channels aligned         | 95                         |  |

# Highest Performance, Precision & Flexibility

| Feature                            | Stratix-II                                           | Virtex-4                                                 |
|------------------------------------|------------------------------------------------------|----------------------------------------------------------|
| I/O clock & data<br>alignment      | $45^{\circ}$ steps, clock only                       | 75 ps, 64 taps for both data & clock                     |
| Parallel I/O SERDES                | Left & right banks only                              | All I/Os                                                 |
| Maximum I/O speed (by speed grade) | 622 Mbps/ 844 Mbps/ 1 Gbps<br>(input), 1 Gbps output | 800 Mbps/ 900 Mbps/ 1 Gbps for<br>all inputs and outputs |

- Finer delay tap resolution independent of process, voltage and temperature
- Allows precision clock and data alignment
- Relaxes PCB design and improves design margin
- Higher performance in slowest parts cuts system cost



## Agenda

- Background
- Source Synchronous Design Challenges & Solutions
- Building SFI-4.1/ SPI-4.2 applications
- Summary

### **Application Example**



#### A 10 Gigabit OC-192 Line Card

Source Synchronous Interfacing Made Easy, Page 27

#### SFI-4 Design Made Easier With Virtex-4

- New I/O clock resources (BUFIO & BUFR) for receiver clock network
  - Easier to recover the forwarded clock for data sampling
- Dedicated ChipSync<sup>™</sup> circuitry to achieve 700MHz SDR
  - ISERDES/OSERDES- help make serial to parallel data conversion easier
  - IDELAY- precise clock to data alignment to accurately capture data within a small data-valid window
- FIFO16 for clock domain changing

#### Implementing SFI-4.1 in Virtex-4<sup>™</sup>



Source Synchronous Interfacing Made Easy, Page 29

SFI-4.1 Specification:

- Clock Frequency: 622.08MHz
- Clock Duty Cycle: 45/55
- 20-80% rise, Fall Times: 100-300ps
- Data Valid Window: 600ps

#### **Utilization**

- 63 slices / 4 BlockRAMs
- 34 LVDS I/O pairs
- 3 Global Clock Buffers / 2 BUFIO /BUFR Pairs

#### Implementing SFI-4 Receiver in Virtex-4

- Blocks used for receive:
  - Recovered clock and its network
    - BUFIO High Speed Clock distribution (serial-side)
    - BUFR Lower Speed Clock distribution (parallel-side, fabric)
  - Recovered data
    - ISERDES
  - ISERDES\_ALIGNMENT\_PROCESS
    - Clock-data training algorithm state machine
  - Interface-to-Core Synchronization
    - FIFO16



#### **SFI-4 Receiver Interface**



#### **Virtex-4 SFI-4 Design Features**

- 700MHz SDR LVDS Transmit/Receive
- 1 clock pair, 16 data channels
- 4 to 1 Serialization / 1 to 4 De-serialization
- Clock-Data Alignment
  - Bus alignment: no training pattern required
- Can also be used for XSBI and other high-speed single-data-rate LVDS applications



#### ISERDES\_ALIGNMENT\_PROCESS

- SFI-4 uses Bus-Alignment
  - Align clock and data using IDELAY on each data lane
  - Data-agnostic, non-destructive training technique:
    - Assumptions:
      - Clock and data are edge-aligned at the pins of the FPGA
      - Clock toggling at startup for several milliseconds before data is sent
    - Train to clock (1,0 pattern)
      - Find center of sampling window for the ISERDES in the clock IOB
      - Move data to optimal location (determined for Clock ISERDES)
- Implementation fully characterized and verified



#### Bus Alignment: Clock Training Circuit



## **Bus Alignment Algorithm**

- Begin incrementing delay on the clock until a 1 to 0 change is detected at Q output
- Begin counting the number of tap-delays and continue incrementing until another 1 to 0 change is detected at Q1 output. This gives the data valid window width in terms of number tap-delays
- Subtract the final tap-delay value by half the number of taps determined to equal the data valid window width
- Increment all data channels by that amount
- Data to clock alignment is complete

#### Implementing SFI-4 Transmitter in Virtex-4

- Blocks used for transmit:
  - Transmit clock and its network
    - BUFIO High Speed Clock distribution (serial-side)
      - This clock must come from an external reference (high quality) at full rate and be connected to a "clock-capable I/O"
      - The clock-capable I/O has a dedicated connection to the BUFIO
    - BUFR Lower Speed Clock distribution (parallel-side, fabric)
  - Transmit data
    - OSERDES
  - Interface-to-Core Synchronization
    - FIFO16: Moves data from Core clock domain (Global Clock buffer) to the interface clock domain (BUFR)



# **SFI-4 Transmitter Interface**



Source Synchronous Interfacing Made Easy, Page 37

### SPI-4.2 in Virtex-4



Source Synchronous Interfacing Made Easy, Page 38

## Xilinx SPI-4.2 Core Overview

- Fully compliant with OIF-SPI4-02.1 specification
- Ideal solution for POS, ATM, and Ethernet apps
  - Supports OC-192 line speeds 10-Gbps and beyond
  - Supports static and dynamic alignment
  - Point-to-point interface, symmetrical operation
  - 16-bit data bus using DDR / LVDS pin pairs
  - Common FIFO interface
    - Enables easy bridging
- Supports all Virtex-4 devices



## **Source Synchronous Clocking**



Source Synchronous Interfacing Made Easy, Page 40

## Dynamic Phase Alignment (DPA) Advantages

- Independent sample point determination for each bit
  - Bit to Bit skew & Clock distribution skew removed from timing budget, improved system timing margin
  - Supports higher speed interfaces > 700Mbps/pin pair
  - Removes need for rigorous trace length matching on PCB
- Recovered data re-aligned to reform the data bus
  - Removes skew or sampling induced bus misalignment
  - SPI-4.2 training pattern used as a reference pattern
- Virtex-4 DPA function only requires ~360 slices
  - Less than 50% of Virtex-II/Virtex-II Pro DPA solution size

## **SPI-4.2 Core Implementation**



### SPI-4.2 Performance:

-10/11/12 622-700 Mbps Static

- -10 622-800 Mbps Dynamic
- -11 622-900 Mbps Dynamic
- -12 622-1+ Gbps Dynamic

#### **Resources:**

-10/11/12 2700 Slices / 12 BlockRAMs

- -10 3050 Slices / 12 BlockRAMs
- -11 3650 Slices / 12 BlockRAMs
- -12 3650 Slices / 12 BlockRAMs

### 4VLX25 Utilization Example:

25% of slices for Static34% of slices for Dynamic



# **SPI-4.2 DPA Major Components**

- ISERDES
  - Delay chain
  - Bitslip module
  - Serial to Parallel Converter (1:4)
- Data recovery (IDELAY chain) state machine
  - Moves center of data eye for each bit separately to align with the clock edge using SPI-4.2 training pattern
- Bus de-skew (Bitslip/word alignment) state machine
  Aligns channels using SPI-4.2 training patterns



## SPI-4.2 Design Made Easier With Virtex-4

- 1 Gbps/pin data rates
- Reduce FPGA resources
  35% smaller
- Flexible pin-outs
- Low power
- 4+ cores in a single device
- Accurate data capture

- ⇒ Virtex-4 embedded SERDES
- ➡ Embedded DPA New Sink core, 64-bit UI
- ⇒ Not pin-locked Complete pin-out freedom
- ⇒ Uses dedicated circuitry
- ⇒ Abundant clock resources
- ⇒ 200 MHz IDELAYCTRL clock Calibrated 75 ps taps Independent of PVT variations



### Virtex-4 ML450 FPGA Source-Synchronous Interfaces Toolkit

- Supports all major differential I/O standards
  - SPI-4.2, SFI-4/XSBI, RapidIO<sup>™</sup>, HyperTransport<sup>™</sup>, NPSI (CSIX), Utopia IV
- 1 Gbps Double Data Rate (DDR) and 700 MHz Single Data Rate (SDR) performance
- Includes tools for debugging and fine tuning of SSIO designs
  - Bit error rate tester pinpoints problem channel(s) on LVDS bus
  - Link diagnostics for troubleshooting



#### **ML450 Development Board**



1 Gbps DDR transmitter

# Agenda

- Background
- Source Synchronous Design Challenges & Solutions
- Building SFI-4.1/ SPI-4.2 applications
- Summary



## Virtex-4 Solves SSIO Challenges

- Ensuring reliable data capture at high speeds
  - ChipSync built into every I/O: Clock-to-data centering at "run time"
- Managing clock speeds up to 700 MHz
  - Multiple differential clock distribution networks
  - Clock forwarding with minimal skew and duty cycle distortion
- Simplifying PCB layout
  - IDELAY and BITSLIP in every I/O as part of ChipSync
  - Data agnostic bus alignment and intrusive bit alignment
- Implementing multiple interfaces
  - Abundant clock resources
  - Flexible I/O and banking rules

### Source Synchronous Interfaces Made Easy



# How to Get Started

- Access latest Virtex-4 source synchronous design solutions on <u>www.xilinx.com/connectivity</u>
  - IP Cores: SPI-4.2, RapidIO
  - Application Notes: SFI-4, XSBI
  - ML450 Source Synchronous Interfaces Toolkit
    - Board level solution including: reference designs, schematic & gerber files
- Contact your local FAE for an on-site demo

### Accelerate Your Design Cycle



## Backup



100

OCH. MICCO

## Can Drive DDR Output Data With One Clock



### SAME\_EDGE

- Simplifies setup and hold requirements
- Higher performance
- Faster time-to-performance

# Versatile SelectIO<sup>™</sup>

- Every I/O is Homogeneous
  - Input & output are specified at the same frequency
- Supports 32 I/O standards including:
  - LVCMOS (3.3-V, 2.5-V, 1.8-V, 1.5-V)
  - LVPECL
  - PCI, PCI-X
  - GTL, GTL+
  - HSTL (1.8 V, 1.5 V; Classes I, II, III, IV)
    - Supports differential HSTL
  - SSTL (2.5 V, 1.8 V; Classes I, II)
    - Supports differential SSTL
  - LVDS, Bus LVDS, Extended LVDS
  - HyperTransport<sup>™</sup> (LDT)

Easier and More Flexible I/O Design!



# ISERDES Manages Incoming Data

- Frequency division
  - Data width to 10 bits
- Dynamic signal alignment
  - Bit alignment
  - Word alignment
  - Clock alignment
  - Supports Dynamic Phase Alignment (DPA)





# **OSERDES Simplifies Frequency Multiplication**

- Two separate SERDES included
  - Data SERDES: 2, 3, 4, 5,6, 7, 8, 10 bits
  - Three-state SERDES: 1,2, 4 bits
    - Ideal for memories



