The given coefficients in this sample are random. In your case you should've already calculated your own coefficients according to the exercise. All the samples are using the same coefficients as shown below. For further reading of scheduling operations, binding etc. see lecture notes.
Coefficients | Coefficient value |
---|---|
c1 & c9 | 0,125 |
c2 & c8 | 0,25 |
c3 & c7 | -0,75 |
c4 & c6 | 1,25 |
c5 | 1,0 |
Figure 1. 9-tap Finite Impulse Response (FIR) filter
According to Figure 1 the formula to calculate data_out is:
data_out = 0,125 * di-0 + 0,25 * di-1 - 0,75 * di-2 +
1,25 * di-3 + 1,0 * di-4 + 1,25 * di-5 - 0,75 * di-6 +
0,25 * di-7 + 0,125 * di-8
Alltogether 9 multiplications and 8 additions/subtractions.
After reduction of the equation we get:
data_out = 0,125 * (di-0 + di-8) + 0,25 * (di-1
+ di-7) - 0,75 * (di-2 + di-6) + 1,25 * (di-3 + di-5)
+ 1,0 * di-4
Alltogether 4 multiplications and 8 additions/subtractions. The multiplication with 1,0 is figurative.
Due to the fact that multiplication is too expensive (time!!!) we should try to substitute it with some other operation. So we use shift-add trees to substitute multiplication with shifting and adding.
Alltogether 12 additions/subtractions. After common sub-expression eliminations (no point to add di-2 + di-6 and di-3 + di-5 twice) 8 additions and 2 subtractions which means 10 operations. The operations are scheduled among 4 steps. The result of every operation is saved always in the same register (reg1, reg2, ..., reg4). Multiplexers in adder/ subtractor inputs are not optimized.
Step | add #1 | add #2 | add #3 | sub #4 |
---|---|---|---|---|
1 | reg1 = di-1 + di-7 | reg2 = di-2 + di-6 | reg3 = di-3 + di-5 | - | 2 | reg1 = di-0 + di-8 | reg2 = di-4 + (reg1 >> 2) | reg3 = reg3 + (reg3 >> 2) | reg4 = reg2 - (reg2 >> 2) |
3 | reg1 = (reg1 >> 3) + reg2 | - | - | reg4 = reg3 - reg4 |
4 | reg1 = reg1 + reg4 | - | - | - |
In the next part 4 similar designs were eleborated which all have been tested with three different synthesise strategies. The sample codes are all shown in Table 3 both in VHDL and SystemC.
Most of the students need an adder-subtractor to be able to fulfill the required RTL task. It is highly recommended to use the code example on lecture slide 55. Be aware that the code there is a sequential code. Also you should understand the principle of the code to be able to implement it later. Solutions which produce more than one adder in case of adder-subtractor synthesis will be rejected!
# | Design description | VHDL Code | SystemC Code |
---|---|---|---|
1 | First design which more or less corresponds to the behavioral RTL. Functions add1, add2, add3 and sub1 are used to force the synthesizer to use the same modules. However it didn't work and in the best case the filter had 4293 gates (2032 for combinational and 2261 for registers). | Design#1 | Design#1 header file Design#1 cpp file |
2 | Much better result can be achieved when computational units are coded as separate processes. The result introduces a pseudo finite state machine (FSM) which in turn makes the structure way more confusing. However the results of the synthesis are way better - 3474 gates (1192 for combinational and 2282 for registers). | Design#2 | Design#2 header file Design#2 cpp file |
3 | The best result was achieved when compared to the solution 2 FSM state transition function and registers were added. Result of the synthesis - 3248 gates (1181 for combinational and 2067 for registers). | Design#3 | Design#3 header file Design#3 cpp file |
4 | Last solution is even more precise description of the last RTL where in the data part single adders/subtractors and multiplexers are presented in detail. The results are practically the same as in case of the solution 3. Area - 3248 gates (1182 for combinational and 2066 for registers). | Design#4 | Design#4 header file Design#4 cpp file |
5 | Piplined, scheduling & binding | Design#5 | Design#5 header file Design#5 cpp file |
6 | Out-of-order calculations, scheduling & binding | Design#6 | Design#6 header file Design#6 cpp file |
7 | Extra optimization on MUX | Design#7 | Design#7 header file Design#7 cpp file |