IAY0340 - Exercise #8 - FIR sample solution

Coefficients

The given coefficients in this sample are random. In your case you should've already calculated your own coefficients according to the exercise. All the samples are using the same coefficients as shown below. For further reading of scheduling operations, binding etc. see lecture notes.

Table 1. Sample coefficients
Coefficients Coefficient value
c1 & c9 0,125
c2 & c8 0,25
c3 & c7 -0,75
c4 & c6 1,25
c5 1,0

Algorithm

Figure 1. 9-tap Finite Impulse Response (FIR) filter

According to Figure 1 the formula to calculate data_out is:
data_out = 0,125 * di-0 + 0,25 * di-1 - 0,75 * di-2 + 1,25 * di-3 + 1,0 * di-4 + 1,25 * di-5 - 0,75 * di-6 + 0,25 * di-7 + 0,125 * di-8

Alltogether 9 multiplications and 8 additions/subtractions.

Reduction

After reduction of the equation we get:
data_out = 0,125 * (di-0 + di-8) + 0,25 * (di-1 + di-7) - 0,75 * (di-2 + di-6) + 1,25 * (di-3 + di-5) + 1,0 * di-4

Alltogether 4 multiplications and 8 additions/subtractions. The multiplication with 1,0 is figurative.

Scheduling

Due to the fact that multiplication is too expensive (time!!!) we should try to substitute it with some other operation. So we use shift-add trees to substitute multiplication with shifting and adding.

data_out = ((di-0 + di-8) >> 3) + ((di-1 + di-7) >> 2) - ((di-2 + di-6) - ((di-2 + di-6) >> 2)) + (di-3 + di-5) + ((di-3 + di-5) >> 2) + di-4

Alltogether 12 additions/subtractions. After common sub-expression eliminations (no point to add di-2 + di-6 and di-3 + di-5 twice) 8 additions and 2 subtractions which means 10 operations. The operations are scheduled among 4 steps. The result of every operation is saved always in the same register (reg1, reg2, ..., reg4). Multiplexers in adder/ subtractor inputs are not optimized.

Table 2. Binding
Step add #1 add #2 add #3 sub #4
1 reg1 = di-1 + di-7 reg2 = di-2 + di-6 reg3 = di-3 + di-5 -
2 reg1 = di-0 + di-8 reg2 = di-4 + (reg1 >> 2) reg3 = reg3 + (reg3 >> 2) reg4 = reg2 - (reg2 >> 2)
3 reg1 = (reg1 >> 3) + reg2 - - reg4 = reg3 - reg4
4 reg1 = reg1 + reg4 - - -

Sample codes

In the next part 4 similar designs were eleborated which all have been tested with three different synthesise strategies. The sample codes are all shown in Table 3 both in VHDL and SystemC.

Adder/subtracter code

Most of the students need an adder-subtractor to be able to fulfill the required RTL task. It is highly recommended to use the code example on lecture slide 55. Be aware that the code there is a sequential code. Also you should understand the principle of the code to be able to implement it later. Solutions which produce more than one adder in case of adder-subtractor synthesis will be rejected!

Designs

Table 3. Designs
# Design description VHDL Code SystemC
Code
1 First design which more or less corresponds to the behavioral RTL. Functions add1, add2, add3 and sub1 are used to force the synthesizer to use the same modules. However it didn't work and in the best case the filter had 4293 gates (2032 for combinational and 2261 for registers). Design#1 Design#1 header file
Design#1 cpp file
2 Much better result can be achieved when computational units are coded as separate processes. The result introduces a pseudo finite state machine (FSM) which in turn makes the structure way more confusing. However the results of the synthesis are way better - 3474 gates (1192 for combinational and 2282 for registers). Design#2 Design#2 header file
Design#2 cpp file
3 The best result was achieved when compared to the solution 2 FSM state transition function and registers were added. Result of the synthesis - 3248 gates (1181 for combinational and 2067 for registers). Design#3 Design#3 header file
Design#3 cpp file
4 Last solution is even more precise description of the last RTL where in the data part single adders/subtractors and multiplexers are presented in detail. The results are practically the same as in case of the solution 3. Area - 3248 gates (1182 for combinational and 2066 for registers). Design#4 Design#4 header file
Design#4 cpp file
5 Piplined, scheduling & binding Design#5 Design#5 header file
Design#5 cpp file
6 Out-of-order calculations, scheduling & binding Design#6 Design#6 header file
Design#6 cpp file
7 Extra optimization on MUX Design#7 Design#7 header file
Design#7 cpp file