Bottom     Previous     Contents

In this section we describe a generalisation of the basic FFT that can be applied to a sample set where the size is any composite (non-prime) number. We shall refrain from describing an algorithm based on this idea in detail. Instead, only the mathematical basis of the algorithm will be described. The only real complexity in this algorithm is the indexing scheme, which is far more difficult than the simple bit reversing (which means bit reversed addressing is no use here).

Mixed radix algorithms are rarely used for signal processing. For most signal processing applications, your far better off designing your system with Radix 2 FFT's in mind (For example, chose your sampling frequency so that a Radix 2 FFT will give you the resolution you want from a spectrum analysis).

There are circumstances where the use of this method can bring worth while performance benefits. Sometimes you don't have the flexibility to design your system for radix 2 FFT's. In the worst case, forcing the use of a radix 2 algorithm will double the FFT size required. This loss of performance is compounded in multi dimentional FFT's. Two dimensional image processing is a good example. Images tend to come in a variety of sizes. The worst case of doubling the row and column transform sizes will result a factor of 4 increase in the total FFT size. (Things get even worse for three dimensional transforms.)

## The Maths.

Suppose the number of samples is a composite number:

The DFT can therefore be written:

We can imagine arranging the samples of f as a 2D array, of Ny rows by Nx columns. For example, suppose N=21, Ny=3, Nx=7:

Of course, the choice of 3 rows by 7 columns is arbitrary, we could just as easily have chosen 7 rows by 3 columns. However, we'll stick with this because its as good as any other. Given such a de-composition, we can define a new 2D sample set:

In our example, we would have:

We're also going to need a corresponding 2D definition of F(k). For reasons which will become apparent, we'll define this as:

Note that if we adhere to the convention of 'x' subscript for horizontal (column) indices and 'y' subscript for vertical (row) indices, then this definition of H implies that in our example F has been decomposed as follows:

So for example, we would have:

We can now re-write the original DFT as:

The complex exponential factor above can be simplified:

Periodicity tells us that the first of the above factors is always 1, the next two remind us of 2D DFT's (don't they?), the last must be some kind of twiddle factor. This is the reason H was defined the 'opposite' way from h. If this wasn't the case we wouldn't have got this result.

So the expression for the DFT has reduced to:

The inner sum is simply the DFT of each column of the array, so we have:

This final sum is simply the DFT of each row (of h') after first multiplying by the twiddle factors. The only difference between this and a straight forward 2D DFT is that an additional step (multiplication by twiddle factors) has been introduced between the column and row DFT's.

It's easy to see how this process can be reversed to generate an inverse DFT. All you do is inverse DFT each row, multiply the resulting array by the conjugates of the twiddle factors, then inverse DFT each column. This process is applied recursively, of course.

Well, that's about all the maths dealt with. Of course, the basic idea of decomposition can be applied recursively if Nx and/or Ny are also composite numbers. The only real implementation difficulty is keeping track of where various terms are in the array. Not only have we got to take account of the inherent index reversal between the 'time' and 'frequency' domains which results from the decomposition, but also the permutations introduced by the row and column transforms (which will depend on the algorithm used). In particular, during twiddle factor multiplication, the permutation resulting from each column transform needs to be reflected by the same permutation of each twiddle factor column.

## How much time will this save?

This question can't be answered accurately without knowing how long the individual row and column transforms take. If we suppose that a 'brute force' DFT is used for the rows and columns, then we can compare the number of complex multiplies required with number that would be required by brute force with no decomposition. In this analysis, we have discounted the trivial multiplies (by 1) and assumed that the twiddle factors are not 'optimised out' (as they can be sometimes).

For a NxNy point 1D 'brute force' transform, the number of complex multiplies is..

By decomposing, we can reduce this to..

The ratio of 'brute force' to decomposed is therefore:

For example, a 143 point transform could be decomposed using Nx=13, Ny=11. The corresponding ratio is approximately 6.7 (the brute force transform requires about 7 times more complex multiplies). This assumes that brute force was used for the 13 and 11 point transforms. An alternative which is useful for prime numbered transforms is the convolution method (which will be described later).

We have discussed the DIF and DIT Radix 2 algorithms in some detail. The general decomposition of composite number transforms has provided an opportunity to consider an alternative, the so called Radix 4 algorithms. These can result in slightly fewer complex multiplies, and are often used in dedicated FFT hardware or in software for 'register rich' processor architectures (typical of modern Risc processors).

Consider a 16 point transform (p=4). The number of butterflies for both the DIT and DIF algorithms was calculated earlier as:

Of these, some are trivial (require no complex multiplication):

The total number of complex additions for Radix 2 is therefore given by:

Similarly, the total number of complex multiplies for Radix 2 is:

For a 16 point transform (p=4), we get:

Now consider a 16 point transform which has been decomposed into a 4*4 array. This will require 8 4 point transforms (4 rows, 4 columns, each of size 4) and 16 twiddle factor multiplications (including the trivial ones). The matrix form of a 4 point transform was provided at the beginning of this document (disregarding scaling):

This can be evaluated using only 8 complex additions (you can check this yourself). So the total number of additions for a 16 point transform using this method is 8*8=64, which is the same as the radix 2 FFT.

Between the column and row transforms, we must perform element by element multiplication by the twiddle factors:

The permutation of the columns resulting from the 4 point FFT (probably bit reversed) has been disregarded here for simplicity. (The ordering of the multiplications makes no difference to the number required). Of these twiddle factors, the topmost row and leftmost column are all +1, and T16(2*2)=-j, so we are left with only 8 non-trivial complex multiplies. (2 fewer than than the Radix 2 algorithm requires.)

This idea can be generalised to derive the number of complex multiplies required for a 2p point pure Radix 4 algorithm. (FFT size is an integral power of 4, p is even.) It is fairly easy to show that for a Radix 4 FFT, the number of (non-trivial) complex multiplies must satisfy the following recurrence relation:

This has solution:

Compare this with the Radix 2 result:

For large p, the ratio of these is approximated by:

So the Radix 4 algorithm reduces the number of complex multiplies required by about 25%. Note that the Radix 4 formula is only defined for even p. (Of course, if p is odd, you can still use mostly Radix 4 passes with one Radix 2 pass.)

Another slight advantage of a Radix 4 algorithm is that with fixed point arithmetic, the most common scaling factor (root N) can be implemented simply by shifting the results right one place (dividing by 2) after each pass.