Effros, Michelle (2000) PPM performance with BWT complexity: a fast and effective data compression algorithm. Proceedings of the IEEE, 88 (11). pp. 1703-1712. ISSN 0018-9219 http://resolver.caltech.edu/CaltechAUTHORS:EFFprocieee00
See Usage Policy.
Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:EFFprocieee00
This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless compression algorithm with the excellent compression ratios of the prediction by partial mapping (PPM) algorithms and the low complexity of codes based on the Burrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algorithm requires worst case O(n) computational complexity and memory; in contrast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n2) computational complexity. Like PPM*, the proposed algorithm allows the use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than that of the BWT-based codes and comparable to that of PPM*. In particular, the proposed algorithm yields an average rate of 2.29 bits per character (bpc) on the Calgary corpus; this result compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-based code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compression algorithms based on Lempel-Ziv (LZ) coding techniques) on the same data set. The given code does not, however, match the best reported compression performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 bpc for compress and gzip, respectively.
|Additional Information:||© Copyright 2000 IEEE. Reprinted with permission. Manuscript received February 15, 2000; revised August 8, 2000. This work was partially supported by NSF Award CCR-9909026.|
|Subject Keywords:||Burrows Wheeler Transform, lossless source coding, prediction by partial mapping algorithm, suffix trees, text compression, source code design, source coding theory, transforms for source coding|
|Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Archive Administrator|
|Deposited On:||29 Nov 2006|
|Last Modified:||26 Dec 2012 09:19|
Repository Staff Only: item control page