



Maximizing Productivity Using Simplified DSP Design Flow



# **FPGAs for DSP Applications**

#### Benefits of FPGAs

- 10x more DSP Throughput than DSP Processors
- Cost-effective for Multi-Channel Applications
- Flexible Hardware
  Implementation
- System Integration Benefits

### Challenges

 Designing with FPGAs is Difficult



**DSP** System





## **DSP Design Flow Challenges**

- System-Level Development & Verification
- Software/Hardware Co-Development
- Design Optimization





## System Development & Verification Challenges

- Multi-Platform
  - Development across Different Tools
- Modeling Accuracy
  - Floating-Point Simulation & Fixed-Point
    Implementation Incompatible
- Conversion
  - Manual Translation from System Level to Hardware





# **Multi-Platform Challenges**

- Lack of Integrated Design Environment
- Cannot Optimize During System Development Stage
  - Lacking Details on
    Underlying Architecture
- Risks During
  Implementation Stage
  - Ambiguous Interpretation of Specification







# **Modeling Accuracy**

### Floating-Point Models

- Commonly used for Simulation
- Most Efficient & Quickest Solution for Early Analysis

#### Fixed-Point Models

- Commonly used for Implementation
- Suffers from Finite Word Length Effect
- Need Truncation, Rounding & Saturation

### **Simulation**



#### Implementation





# **Conversion Challenges**

- System Design to Hardware Implementation
  - Requires HDL
    Knowledge
  - Create RTL Model
  - Create Simulation
    Testbench
- Complex Conversion Rules
  - Bit Propagation
  - Multi-Rate Systems







# Software/Hardware Co-Development Challenges

- System Partitioning
  - Trade-off Flexibility Versus Performance
  - Need Multiple Design Iterations to Find Optimal Solution
- Integration of IP & Custom Logic
  - Different Bus Interfaces
- Software/Hardware Dependency
  - Frequent Updates to Header Files & Drivers





# **Design Optimization Challenges**

#### C/C++ Coding

- Inefficient Compiler
- Needs to Be Tailored for DSP Specific Architectural Features
- Assembly Coding
  - Need Understanding of Specific Machine Instruction
    Set for Specific Processor for Optimization
  - Systems Getting Larger & More Complex
    - Not Feasible for Hand-Coding
  - May Not Be Sufficient for Certain Intensive Number-Crunching Requirements





# **Addressing Challenges**

#### System-Level Development and Verification

- DSP Builder Tool
  - System Integration
  - Bit-True & Cycle Accurate Models
  - Automatic Translation into Hardware
- Hardware/Software Integration
  - SOPC Builder Tool and Nios Processor
    - C-based design flow
- Design Optimization
  - Hardware Acceleration
  - Flexibility in System Partitioning





### **Traditional DSP Design Flow in FPGA**

System Algorithm Design & FPGA Design Separated



### Integration Using DSP Builder

System Algorithm Design and FPGA Design Integrated



## **Bit-True & Cycle-Accurate Models**

- DSP Builder Provides Bit-True & Cycle-Accurate Simulink Blocks
- Ideal for System-Level Simulation
- Benefits
  - High-Level Abstraction
  - Don't Model Hardware Detail Involving Unnecessary Data Path Calculations
  - Faster than RTL Simulation
  - Most Important Prior to Architecture Mapping
  - Accurate Hardware Results





## **Automatic Hardware Translation**

#### **Creates HDL Code**

| 1000            |                                                                                          |  |  |  |  |
|-----------------|------------------------------------------------------------------------------------------|--|--|--|--|
| S multirate.vhd |                                                                                          |  |  |  |  |
| 25              |                                                                                          |  |  |  |  |
| 26              | library ieee;                                                                            |  |  |  |  |
| 27              | use ieee.std logic 1164.all;                                                             |  |  |  |  |
| 28              | use ieee.std logic unsigned.all;                                                         |  |  |  |  |
| 29              |                                                                                          |  |  |  |  |
| 30              | library altlink;                                                                         |  |  |  |  |
| 31              | use altlink.Altrithm.all;                                                                |  |  |  |  |
| 32              |                                                                                          |  |  |  |  |
| 33              | library lpm;                                                                             |  |  |  |  |
| 34              | use lpm.lpm_components.all;                                                              |  |  |  |  |
| 35              |                                                                                          |  |  |  |  |
|                 | Contity multirate is                                                                     |  |  |  |  |
| 37              | Port(                                                                                    |  |  |  |  |
| 38              | clock : in std_logic;                                                                    |  |  |  |  |
| 39              | <pre>sclr :in std_logic:='0';</pre>                                                      |  |  |  |  |
| 40              | iAltBuss :in std_logic_vector(7 downto 0);                                               |  |  |  |  |
| 41              | oAltBusls :out std_logic_vector(9 downto 0);                                             |  |  |  |  |
| 42              | oAltBus2s :out std_logic_vector(7 downto 0));                                            |  |  |  |  |
| 43              | end multirate;                                                                           |  |  |  |  |
| 44              | -                                                                                        |  |  |  |  |
|                 | architecture a of multirate is                                                           |  |  |  |  |
| 46              |                                                                                          |  |  |  |  |
| 47              |                                                                                          |  |  |  |  |
| 48              | <pre>signal SAAltBus10 : std_logic_vector(9 downto 0);</pre>                             |  |  |  |  |
| 49              | <pre>signal SAAltBus20 : std_logic_vector(7 downto 0);</pre>                             |  |  |  |  |
| 50              | signal AOW : std_logic_vector(7 downto 0);                                               |  |  |  |  |
| 51<br>52        | signal AlW : std_logic_vector(7 downto 0);<br>signal A2W : std_logic_vector(7 downto 0); |  |  |  |  |
| 52              | signal A2W : Std_logic_vector(7 downto U);<br>signal A3W : std logic vector(7 downto 0); |  |  |  |  |
| 53              |                                                                                          |  |  |  |  |
| 54              | signal A4W : std_logic_vector(7 downto 0);                                               |  |  |  |  |
| 56              | signal A5W : std_logic;<br>signal A6W : std_logic;                                       |  |  |  |  |
| 57              | signal A7W : std logic;                                                                  |  |  |  |  |
| 58              | signal x/w : std_logic;<br>signal sclr u9 : std logic;                                   |  |  |  |  |
| 59              | signal scil_us . scd_logic;                                                              |  |  |  |  |
| 60              |                                                                                          |  |  |  |  |
| 61              | Begin                                                                                    |  |  |  |  |
| 62              | begin                                                                                    |  |  |  |  |
| 1 02            |                                                                                          |  |  |  |  |

**HDL Synthesis** 



Synplicity





#### Creates Simulation Testbench



#### Download Design to DSP Development Kits

1

0 ps to 533901 ps





## **Accelerated Path to Co-Design**

- SOPC Builder Tool
  - Combines Existing Soft & Hard IP Blocks & Associated Software
  - Generates Interfaces between Hardware & Software
  - Solves Problem of Linking IP Cores from Several Vendors
  - Available in Quartus II Software
- Supports Existing Altera Intellectual Property (IP) & ARM<sup>®</sup>-Based Excalibur & Nios<sup>®</sup> Embedded Processors
- Allows Flexibility for Changes to Software/Hardware Partitioning







# Hardware/Software DSP Design Flow



## **Hardware Acceleration**

- Implement Computationally Intensive & Repetitive Tasks in Hardware
  - Filters, Encoders/Decoders
- Examples in DSP Processors
  - TI TMS320C6416
    - VCP Viterbi Coprocessor
      - 350 Voice Channels at 12.2 Kbps
  - Motorola MSC8102
    - EFCOP Enhanced Filter Coprocessor
      - 4 Processors at 300 MHz

Dedicated Hardware Accelerators Are Inflexible





# **Optimization Using FPGAs**

#### **DSP Processors**



- Fixed CPU Architecture
- Fixed Memory Structure
- Fixed Bus Structure
- Predefined Hardware Accelerator Blocks
- Few MAC Blocks



- Customizable CPU Structure
- Customizable Memory Structure
- Customizable Bus Structure
- User-Defined Hardware Accelerator Blocks
- Large Number of MAC Blocks



# **Hardware Acceleration in FPGA**

- Two Implementation Options
  - Custom Peripheral
  - Custom Instruction
- Custom Peripheral
  - Interface to Nios through Avalon Bus
- Custom Instruction
  - Adds Customized Logic to Nios ALU
  - Generates C & Assembly Macros





## **Custom Instructions**



Altera Nios - nios 0





### Performance Using Custom Instructions

| Floating-Point Operation | CPU Clock Cycles    |                       | Speed    |
|--------------------------|---------------------|-----------------------|----------|
| (32-Bit Data)            | Software<br>Library | Custom<br>Instruction | Increase |
| Multiplication axb       | 2874                | 19                    | 151x     |
| Multiply & Negate –(axb) | 3147                | 19                    | 165x     |
| Absolute  a              | 1769                | 18                    | 98x      |
| Negate –(a)              | 284                 | 19                    | 15x      |

Note: These Performance Calculations are Compiler-Dependant. Taken Using the Cygnus Compiler Included in Version 2.1 of Nios Embedded Processor





## **Acceleration in DSP Builder**

#### **Custom Instruction**

Algorithms Developed in DSP Builder become Integral Part of ALU

#### **Custom Peripheral**

Algorithms Developed in DSP Builder Can Be Connected to Nios Processor as Peripheral



## Summary

- Integrated Design Platform for Efficient DSP Design Flow
  - DSP Builder Tool
  - Accurate Modeling
  - Seamless Flow from System to Hardware

Versatile Tool for Software/Hardware Integration

- SOPC Builder and Nios Embedded Processor
- Easy System Partitioning
- Hardware Acceleration for Design Optimization
  - Hardware Flexibility
- DSP Design with FPGAs Becoming Easier



