IAY0340 - Exercise #8 - FIR sample solution

Coefficients

The given coefficients in this sample are random. In your case you should've already calculated your own coefficients according to the exercise. All the samples are using the same coefficients as shown below. For further reading of scheduling operations, binding etc. see lecture notes.

Table 1. Sample coefficients
Coefficients Coefficient value
c1 & c9 0,125
c2 & c8 0,25
c3 & c7 -0,75
c4 & c6 1,25
c5 1,0

Algorithm

Figure 1. 9-tap Finite Impulse Response (FIR) filter

According to Figure 1 the formula to calculate data_out is:
data_out = 0,125 * di-0 + 0,25 * di-1 - 0,75 * di-2 + 1,25 * di-3 + 1,0 * di-4 + 1,25 * di-5 - 0,75 * di-6 + 0,25 * di-7 + 0,125 * di-8

Alltogether 9 multiplications and 8 additions/subtractions.

Reduction

After reduction of the equation we get:
data_out = 0,125 * (di-0 + di-8) + 0,25 * (di-1 + di-7) - 0,75 * (di-2 + di-6) + 1,25 * (di-3 + di-5) + 1,0 * di-4

Alltogether 4 multiplications and 8 additions/subtractions. The multiplication with 1,0 is figurative.

Scheduling

Due to the fact that multiplication is too expensive (time!!!) we should try to substitute it with some other operation. So we use shift-add trees to substitute multiplication with shifting and adding.

data_out = ((di-0 + di-8) >> 3) + ((di-1 + di-7) >> 2) - ((di-2 + di-6) - ((di-2 + di-6) >> 2)) + (di-3 + di-5) + ((di-3 + di-5) >> 2) + di-4

Alltogether 12 additions/subtractions. After common sub-expression eliminations (no point to add di-2 + di-6 and di-3 + di-5 twice) 8 additions and 2 subtractions which means 10 operations. The operations are scheduled among 4 steps. The result of every operation is saved always in the same register (reg1, reg2, ..., reg4). Multiplexers in adder/ subtractor inputs are not optimized.

Table 2. Binding
Step add #1 add #2 add #3 sub #4
1 reg1 = di-1 + di-7 reg2 = di-2 + di-6 reg3 = di-3 + di-5 -
2 reg1 = di-0 + di-8 reg2 = di-4 + (reg1 >> 2) reg3 = reg3 + (reg3 >> 2) reg4 = reg2 - (reg2 >> 2)
3 reg1 = (reg1 >> 3) + reg2 - - reg4 = reg3 - reg4
4 reg1 = reg1 + reg4 - - -

Sample codes

In the next part 4 similar designs were eleborated which all have been tested with three different synthesise strategies. The sample codes and synthesise strategies are all shown. To sum up the results check Table 5.

Adder/subtracter code

Most of the students need an adder-subtractor to be able to fulfill the required RTL task. It is highly recommended to use the code example on lecture slide 51. Be aware that the code there is a sequential code. Also you should understand the principle of the code to be able to implement it later. Solutions which produce more than one adder in case of adder-subtractor synthesis will be rejected!

Designs

Table 3. Designs
# Design description Code
1 Firstly VHDL code which more or less corresponds to the behavioral RTL. Functions add1, add2, add3 and sub1 are used to force the synthesizer to use the same modules. However it didn't work and in the best case the filter had 4293 gates (2032 for combinational and 2261 for registers). Delay was less than 20 ns (19,98). Design#1
2 Much better result can be achieved when computational units are coded as separate processes. The result introduces a pseudo finite state machine (FSM) which in turn makes the structure way more confusing. However the results of the synthesis are way better - 3474 gates (1192 for combinational and 2282 for registers). Delay was the same as in case of solution 1 - 19,98 ns. Design#2
3 The best result was achieved when compared to the solution 2 FSM state transition function and registers were added. Result of the synthesis - 3248 gates (1181 for combinational and 2067 for registers). Delay 19,99 ns. Design#3
4 Last solution is even more precise description of the last RTL where in the data part single adders/subtractors and multiplexers are presented in detail. The results are practically the same as in case of the solution 3. Area - 3248 gates (1182 for combinational and 2066 for registers). Delay 19,99 ns. Design#4
5 Piplined, scheduling & binding Design#5
6 Out-of-order calculations, scheduling & binding Design#6
7 Extra optimization on MUX Design#7

Synthesise strategies

The codes for the strategies are shown although as proposed in the Design Vision guide all should be doable using GUI.

Table 4. Synthesise strategies
Strategy #1 Strategy #2 Strategy #3
Initial transition, flattening hierarchy, high optimization. Initial transition, flattening hierarchy, medium optimization. Initial medium optimization, hierarchy is retained.
create_clock -name "clk" -period 20 -waveform {"0" "10"} {"clk"}
compile -map_effort low
ungroup -all -flatten
set_max_area 2000
compile -map_effort high
report_area
report_timing
set_max_area 2000
compile -map_effort low
ungroup -all -flatten
compile -map_effort medium
report_area
report_timing
set_max_area 2000
compile -map_effort medium
report_area
report_timing

Summary

As can be seen on Table 5 the best results are provided by synthesis strategy #3. In case of RTL codes design 3 and design 4 have total area the same however delay in case of design 4 is 5 ns smaller. While examining the design codes it looks simpler in case of design 3 than design 4.

Table 5. Summary
Design Strategy Intermediate area [comb.+reg.=total] Final area [comb.+reg.=total] Differnece Delay [ns]
Design 1 #1 2328 + 2074 = 4402 1929 + 2069 = 3998 +20% 19,98
#2 2285 + 2071 = 4356 2064 + 2072 = 4136 +24% 19,60
#3 2248 + 2073 = 4321 +30% 19,99
Design 2 #1 1359 + 2275 = 3634 1387 + 2276 = 3663 +9,9% 19,99
#2 1340 + 2275 = 3615 1397 + 2276 = 3673 +10,2% 19,99
#3 1322 + 2275 = 3597 +8% 19,99
Design 3 #1 1308 + 2062 = 3370 1354 + 2063 = 3417 +2,6% 20,00
#2 1301 + 2062 = 3363 1382 + 2063 = 3445 +3.4% 20,00
#3 1270 + 2062 = 3332 100% 19,98
Design 4 #1 1381 + 2062 = 3443 1393 + 2062 = 3455 +3,7% 19,99
#2 1360 + 2062 = 3422 1449 + 2062 = 3511 +5,4% 19,99
#3 1354 + 2062 = 3416 +2,5% 20,00
Design 5 #1 1145 + 2345 = 3490 1094 + 2346 = 3440 +3,2% 19,98
#2 1123 + 2345 = 3468 1128 + 2346 = 3474 +4,3% 19,98
#3 1096 + 2345 = 3441 +3,3% 19,75
Design 6 #1 1166 + 1950 = 3116 1143 + 1951 = 3094 -7,1% 19,99
#2 1137 + 1950 = 3087 1166 + 1951 = 3117 -6,5% 19,98
#3 1125 + 1950 = 3075 -7,7% 19,65
Design 7 #1 1142 + 1950 = 3092 1116 + 1951 = 3067 -8,0% 19,92
#2 1114 + 1950 = 3064 1145 + 1951 = 3096 -7,1% 19,99
#3 1104 + 1951 = 3055 -8,3% 19,93