Published 2000 | Version public
Book Section - Chapter Open

PPM performance with BWT complexity: a new method for lossless data compression

Abstract

This work combines a new fast context-search algorithm with the lossless source coding models of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lempel codes and the compression performance of PPM-based algorithms. Both sequential and nonsequential encoding are considered. The proposed algorithm yields an average rate of 2.27 bits per character (bpc) on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bpc of PPM5 and PPM* and the 2.43 bpc of BW94 but not matching the 2.12 bpc of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms reported on the Calgary corpus results page. The proposed algorithm gives an average rate of 2.14 bpc on the Canterbury corpus. The Canterbury corpus Web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, and 2.23 bpc for BZIP2 (a BWT-based code) on the same data set.

Additional Information

© Copyright 2002 IEEE. Reprinted with permission. This material is based upon work partially supported by NSF Award No. CCR-9909026.

Files

EFFdcc00.pdf

Files (236.5 kB)

Name Size Download all
md5:0ba8daff5d65ac84b2796fe175643a1d
236.5 kB Preview Download

Additional details

Identifiers

Eprint ID
7402
Resolver ID
CaltechAUTHORS:EFFdcc00

Dates

Created
2007-02-12
Created from EPrint's datestamp field
Updated
2021-11-08
Created from EPrint's last_modified field