Pubs / Discrete cosine transform algorithms for FPGA

Discrete cosine transform algorithms for FPGA devices

Bibtex:

@mastersthesis{babic03,
  author = {Domagoj Babi\'c},
  title = {{Discrete cosine transform algorithms for FPGA devices}},
  school = {Faculty of Electrical Engineering and Computing of Zagreb
  University},
  year = {2003},
}

Summary: Polynomial transform was presented as an efficient way to map multidimensional transforms (like DCT and DFT) and convolutions to onedimensional ones. Resulting algorithms have significantly lower computational complexity, especially for convolutions and DCT. If optimal one-dimensional DCT would be available, it would be possible to obtain optimal multidimensional algorithms by using polynomial transforms. As demonstrated in the thesis, polynomial transform based DCT can be implemented in significantly less logic resources than row-column distributed arithmetic algorithm. Polynomial transform computation is performed as the series of additions and subtractions, so the finite-register length effects can be avoided resulting in higher accuracy. Second dimension of distributed arithmetic DCT implementation was more problematic due to larger input word length. The resulting longer carry chain, especially the one in the large accumulators at the output, was the major obstacle to achieving higher operating frequency. The implementation of polynomial transform DCT avoids this obstacle by using a butterfly of adders and no accumulators are needed at the output. The accuracy of both implementations has been extensively measured and detailed results were reported, more detailed then available in the previous work. Once the strong foundations for comparison were built, other factors, like resource usage, maximum achievable frequency, latency and throughput have been compared. Another contribution of the thesis is the proposal of designing the butterfly stage of such highly complex algorithms by scheduling the computation from the last stage, where the designer has no flexibility. This approach has yielded a very efficient FPGA implementation, superior to others reported.