Although it is widely known that FFT's provide a fast way to implement long convolutions or correlations (a correlation is just a convolution with the impulse response reversed and conjugated), using them requires longer and more complex code. So the question arises when is it worth it? Just how long does a convolution have to be before using FFT's provides significant savings? For some reason, this issue in not addressed in most DSP/FFT literature, despite its importance to a DSP system designer. To get a feel for the answer to this we shall consider the implementation of a complex 1D FIR filter on an 'infinite' 1D complex data stream by two methods. Direct convolution and Radix 2 FFT convolution using 50% overlap between FFT's.
We shall assume that the filter contains 2^{p}+1 taps (its impulse response is 2^{p}+1 samples long). This length has been chosen to ensure that when convolved with 2^{p} signal points, the resulting sequence has length (2^{p}+1)+(2^{p})-1=2^{p}^{+1}, which fits a Radix 2 FFT comfortably.
In each case, we shall quantify the algorithm in terms of the number of complex additions N_{A}(p) and the number of complex multiplies N_{M}(p) required to generate 2^{p} output points.
Direct convolution can be performed point by point or in blocks. In either case each output point will require 2^{p}+1 complex multiplies and 2^{p} complex additions. Therefore:
This scheme works as follows:
The following optimisations are assumed:
We may note that the saving presented optimisation 2 above effectively cancels out the cost of step 5, so we are left with two 2^{p}^{+1} point FFT's (or IFFT's) and 2^{p}^{+1} complex multiplies. We have derived the number of butterflies required by a 2^{p} Radix 2 FFT:
The number of 'trivial' butterflies (which require no multiplication) is (p>2 !!):
Each (non trivial) butterfly requires 2 complex additions and 1 complex multiply.So, for a single 2^{p}^{+1} point FFT (p>1 !!):
So for the entire algorithm (two 2^{p}^{+1} point FFT's (or IFFT's) and 2^{p}^{+1} complex multiplies):
To make the comparison simpler, we shall assume that real multiplies and adds take the same time (as they often do), and combine the figures for N_{A}(p) and N_{M}(p) as N_{OP}(p) operations on real numbers as follows:
So for direct convolution we have:
For FFT convolution we have:
The ratio of these two is what we really want..
This tells us (roughly) how much slower direct convolution (or correlation) is compared to FFT convolution. Let's tabulate a few values:
p | FIR Taps (=2^{p}+1) | (Direct/FFT) Ratio |
3 | 9 | 1.2 |
4 | 17 | 1.7 |
5 | 33 | 2.7 |
6 | 65 | 4.5 |
7 | 129 | 7.6 |
8 | 257 | 13.2 |
9 | 513 | 23.3 |
10 | 1025 | 41.8 |
For large p, a good approximation is:
So it would appear that even for filters as short as 17 taps FFT based convolution looks attractive (in terms of the number of arithmetic operations). However, a few notes of caution are appropriate:
As usual, if you really want to know which method is fastest for your particular application on your particular processor, the best advice is to try them both.
©1999 - Engineering Productivity Tools Ltd.