CoreFIR Finite Impulse Response (FIR)
Filter Generator
Product Summary
Intended Use
•
Finite Impulse Response (FIR) Filter for Actel FPGAs
•
Core Deliverables
•
Evaluation Version
–
RTL Code of a Sample Filter and Compiled RTL
Simulation Model Fully Supported in the Actel
Libero
®
Integrated Design Environment (IDE)
A Microsoft Windows
®
Binary Executable of
the CoreFIR Generator
VHDL FIR Module
VHDL Test Harness
Key Features
•
Core Generator
–
–
•
Executable File Outputs Run-Time Library (RTL)
Code and Testbench Based on Input Parameters
Self-Checking – Executable Tests Generated
Output against Algorithm
Multiplier-Free Computation
Low Cost
Optimized for Actel FPGAs
Serialized Computation when System Clock
Rate is Faster than the Data Sample Rate
Lookup Tables Utilize Embedded RAMs
RTL Version
–
–
–
Distributed Arithmetic (DA) Algorithm
–
–
–
Synthesis and Simulation Support
•
Synthesis:
Synplicity
®
,
Synopsys
®
(Design
Compiler
TM
/FPGA
Express
TM
),
Compiler
®
/FPGA
Exemplar
TM
Simulation: OVI-Compliant Verilog Simulators and
Vital-Compliant VHDL Simulators
•
•
Folding Architecture to Minimize Design Size
–
Contents
Device Utilization and Performance ......................... 2
FIR Filter Using Distributed Arithmetic Algorithm ... 3
General Description ................................................... 5
Functional Block Description ..................................... 5
I/O Signal Description ................................................ 6
CoreFIR Generator Parameters .................................. 7
FIR Filter with Large Number of Taps ....................... 8
Clock and Reset ........................................................ 11
Input and Output Timing ........................................ 11
Appendix I: Sample Configuration File ................... 12
Ordering Information .............................................. 13
List of Changes ......................................................... 13
Datasheet Categories ............................................... 13
•
•
•
•
•
•
•
•
Efficient Structure Using Embedded RAMs
–
On-Chip DA Lookup Table Generator for FPGA
with Embedded RAMs
Embedded RAMs Initialized as DA Lookup Table
DA Lookup Table ROM Synthesis for FPGA without
Embedded RAMs
Multiple DA lookup Tables to Split Large Number
of Taps
Actel FPGA-Optimized RTL Code
Supports 2 to 128 Taps
1- to 32-Bit Input Data and Coefficient Precision
Supported Families
•
•
•
•
•
•
•
Fusion
ProASIC3/E
ProASIC
PLUS ®
Axcelerator
®
RTAX-S
SX-A
RTSX-S
December 2005
© 2005 Actel Corporation
v 3 .0
1
CoreFIR
Device Utilization and Performance
The CoreFIR generates FIR filters with many configurations.
Table 1
provides the typical utilization and performance
data for the generated FIR filters implemented with the configurations listed in
Table 2 on page 3.
Refer to
Table 2 on
page 3
for the Configuration column in
Table 1.
Table 1 •
CoreFIR Device Utilization and Performance
Cells or Tiles
Family
Fusion
Fusion
Fusion
Fusion
Fusion
Fusion
ProASIC3
ProASIC3
ProASIC3
ProASIC3
ProASIC3
ProASIC3
ProASIC
PLUS
ProASIC
PLUS
ProASIC
PLUS
ProASIC
PLUS
ProASIC
PLUS
Configuration
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
1
2
3
Combinatorial
454
1410
3080
5511
7089
24356
454
1410
3080
5511
7089
24356
558
2054
3540
6391
8775
229
693
1231
2249
3129
9132
229
693
1231
2249
3129
9132
386
1115
1831
381
1115
1831
Sequential
129
375
679
935
1708
3718
129
375
679
935
1708
3718
116
427
661
872
1606
148
478
719
852
1704
3355
148
478
719
852
1704
3355
159
480
727
159
480
727
Total
583
1784
3759
6446
8797
28074
583
1785
3759
6446
8797
28074
674
2481
4201
7271
10381
377
1171
1950
3101
4833
12487
377
1171
1950
3101
4833
12487
545
1595
2558
540
1595
2558
RAM
Blocks
0
0
0
8
0
45
0
0
0
8
0
45
0
0
0
8
0
0
0
0
4
0
32
0
0
0
4
0
32
0
0
0
0
0
0
Utilization
Device
AFS060
AFS250
AFS250
AFS600
AFS600
AFS1500
A3P060
A3P125
A3P1000
A3P1000
A3P1000
A3P1500
APA075
APA150
APA1000
APA1000
APA750
AX125
AX250
AX250
AX500
AX1000
AX2000
RTAX1000S
RTAX1000S
RTAX1000S
RTAX1000S
RTAX1000S
RTAX2000S
A54SX16A
A54SX72A
A54SX72A
RT54SX32S
RT54SX72S
RT54SX72S
Total
38%
29%
61%
47%
64%
73%
38%
58%
15%
26%
36%
73%
22%
40%
8%
13%
32%
19%
28%
46%
38%
27%
39%
2%
6%
11%
17%
27%
39%
38%
26%
42%
19%
26%
42%
Performance
MHz
69
56
52
45
40
31
69
56
52
45
40
31
29
19
19
17
13
174
110
111
74
73
46
114
76
66
41
45
29
112
64
63
52
36
36
Axcelerator
Axcelerator
Axcelerator
Axcelerator
Axcelerator
Axcelerator
RTAX-S
RTAX-S
RTAX-S
RTAX-S
RTAX-S
RTAX-S
SX-A
SX-A
SX-A
RTSX-S
RTSX-S
RTSX-S
Notes:
1. The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different
utilization and performance.
2. Cell (tile) count may vary depending on the actual coefficient values.
2
v3.0
CoreFIR
Table 2 •
Test Configurations
Configuration
1
2
3
4
5
nbits_input
8
16
12
12
16
nbits_coef
16
16
15
15
15
ntaps
8
16
32
32
64
fpga_family
All
All
All
AX, RTAX-S, APA
All
coef_fixed
1
1
1
0
1
6
16
16
128
AX, RTAX-S, APA
0
FIR Filter Using Distributed Arithmetic Algorithm
Distributed Arithmetic Algorithm Overview
FIR filters are used in applications that require exact linear phase response. Typical applications for a FIR filter include:
image processing, digital audio, digital communication, and biomedical signal processing. A FIR filter is defined in
EQ 1:
ntaps
–
1
y[n]
=
∑
0
c[n]
×
x[n]
EQ 1
where:
c[n]
=
h[ntaps
-
n
-1]
and h is the impulse response. The term
ntaps
is short for
number of taps.
In summary, the direct computation for one point of FIR requires:
ntaps
multiplications + (ntaps-1) additions.
Distributed Arithmetic (DA) is a well-known method for eliminating resources in multiply-and-accumulate structures
(MACs) implementing digital signal processing (DSP) functions. DA trades memory for combinatory elements, resulting
in an efficient implementation in FPGAs. Another feature of DA is its easy serialization of the input, which further
reduces the cost of operation when FIR data rate is low compared to the system clock, a common scenario in FIR
applications.
The input of a FIR can be expressed in the composition of its bits, as shown in
EQ 2:
x
[
n
]
=
nbits_in
–
1
∑
0
x[n][b]
×
2
b
EQ 2
where
x[n][b]
is the
b
th
bit of
x[n]
and
nbits_in
is the number of bits of input. The resulting output of the FIR filter is
shown in
EQ 3:
ntaps
–
1
y[n]
=
∑
0
ntaps
–
1
c[n] x[n]
=
∑
0
nbits_in
–
1
c[n]
∑
0
x[n][b]2
b
EQ 3
Changing the summation order gives the results shown in
EQ 4:
nbits_in
–
1
y[n]
=
∑
0
2
b
ntaps
–
1
∑
0
nbits_in
–
1
c[n] x[n][b]
=
∑
0
2
T(X[b])
b
EQ 4
v3.0
3
CoreFIR
ntaps
–
1
where:
T(X[b])
=
∑
0
c[n] x[n][b]
and
X[b]
is a collection of the
b
th
bits of
ntaps
different taps.
Note that the
x[n][b]
can only be 0 or 1. There are 2
ntaps
different values of
T.
If
T
is pre-calculated and stored inside a
RAM or ROM, the FIR computation becomes
nbits_in
table lookup operations using
x[b]
and
nbits_in–1
additions.
Multiplication operations are eliminated.
In summary, the FIR computation using DA for one point of FIR requires:
nbits_in
table lookups
+ (nbits_in-1)
additions.
The cost to eliminate multiplication is a memory block to store 2
ntaps
pre-computed values.
The serialization of table lookup and addition is possible
because table
T
is the same for each
b.
If one table
lookup and one addition can be finished in one cycle, the
total computation will finish in
b
cycles. The serialization
of the FIR introduces further opportunity to reduce the
size of the design, which is the key to an efficient FPGA
design.
The expression
x[n][b]
represents the
b
th
bit of input
x[n].
In the example, in the first cycle, all 0
th
bits of input
x[n]
to
x[n-3]
are fed into the lookup table as an input
address; in the second cycle, all 1
st
bits of inputs input
x[n]
to
x[n-3]
are fed into the lookup table; in the third
cycle, all 2
nd
bits of inputs input
x[n]
to
x[n-3]
are fed
into the lookup table; and in the fourth cycle, all 3
rd
bits
of inputs input
x[n]
to
x[n-3]
are fed into the lookup
table. The shifter shifts the outputs of the lookup table
for the inputs of the adder, which accumulates for the
final result.
Example Design of a FIR Filter Using DA
An example of a FIR with four taps (ntaps = 4) and four
bits for inputs (nbits_in = 4) is shown in
Figure 1.
x[n][3]
x[n-1][3]
x[n-2][3]
x[n-3][3]
x[n][2]
x[n-1][2]
x[n-2][2]
x[n-3][2]
x[n][1]
x[n-1][1]
x[n-2][1]
x[n-3][1]
x[n][0]
x[n-1][0]
Lookup Table
x[n-2][0]
x[n-3][0]
Shifter
Flow
Control
Adder
Reg
Figure 1 •
Example Implementation of a Bit-Serialized FIR Using DA
The serialized DA implementation in
Figure 1
uses a table lookup with 16 words, and takes four clock cycles to finish one FIR point.
4
v3.0
CoreFIR
Storage and Large Number of Taps
As seen in the previous section, the size of the lookup table is 2
ntaps
, which is exponentially increased with more
ntaps.
A design with a large number of taps needs to have several lookup tables. Let
ntaps
=
p
×
q.
If we split taps into
p
groups, each group has
q
taps. Then the FIR becomes as shown in
EQ 5:
nbits_in
–
1
y[n]
=
∑
0
2
b
n=ntaps
–
1
∑
0
c[n] x[n][b]
=
nbits_in
–
1
b
∑
0
n=pq
–
1
2
∑
0
c[n] x[n][b]
EQ 5
By splitting
ntaps
into two level summations, we have the result shown in
EQ 6:
y[n]
=
nbits_in
–
1
b
∑
0
i=p-1 j=q-1
2
∑ ∑
0
0
c[iq
+
j] x[iq
+
j][b]
EQ 6
Refer to
"FIR Filter with Large Number of Taps" on page 8
for further information.
General Description
The CoreFIR is an Actel FPGA-optimized RTL generator
that produces a finite impulse response filter. It
implements the DA algorithm to eliminate multiplication
for faster and smaller designs. The CoreFIR is a generator
which utilizes Actel FPGA’s embedded RAM blocks as DA
lookup tables (when available) to further reduce the size
of the design. The generator also reads the user system
clock rate and data sample rate to explore using a
folding or serial architecture to further reduce size,
especially when the system clock rate is much greater
than the data sampling rate. The generator
automatically switches to the use of multiple DA lookup
tables when the requested FIR filter has a large number
of taps.
Figure 2
shows the functional block diagram of a
generated FIR filter design. More complex designs may
contain multiple lookup tables, accumulators, or control
sections.
datai
Coefficients
Input
Buffers
DA Lookup Tables
(RAMs or ROM)
DA LUT
Generator
Control
Shifter
Accumulator
datao
Figure 2 •
Functional Block Diagram
Functional Block Description
The functional blocks shown in
Figure 2
illustrate the architecture of the generated FIR filter using the DA algorithm.
v3.0
5