CaltechAUTHORS
  A Caltech Library Service

Universal lossless source coding with the Burrows Wheeler transform

Effros, Michelle and Visweswariah, Karthik and Kulkarni, Sanjeev R. and Verdú, Sergio (2002) Universal lossless source coding with the Burrows Wheeler transform. IEEE Transactions on Information Theory, 48 (5). pp. 1061-1081. ISSN 0018-9448. http://resolver.caltech.edu/CaltechAUTHORS:EFFieeetit02

[img]
Preview
PDF
See Usage Policy.

743Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:EFFieeetit02

Abstract

The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources


Item Type:Article
Additional Information:“©2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” Manuscript received July 16, 1999; revised December 27, 2001. Posted online: 2002-08-07. This paper is based on work at the California Institute of Technology supported in part by the National Science Foundation under CAREER Grant MIP-9501977, a grant from the Powell Foundation, and donations through the Intel 2000 Technology for Education Program. Work at Princeton University was supported in part by ODDR&E MURI through the Army Research Office under Grant DAAD19-00-1-0466, by the National Science Foundation under Grant NCR-9523805, and by the National Science Foundation KDI under Contract ECS-9873451. The authors wish to thank D. Linde and J. Kahn for helpful discussions related to this work.
Subject Keywords:Burrows Wheeler Transform (BWT), rate of convergence, redundancy, text compression, universal noiseless source coding, universal source coding, transforms for source coding
Record Number:CaltechAUTHORS:EFFieeetit02
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:EFFieeetit02
Alternative URL:http://dx.doi.org/10.1109/18.995542
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:1330
Collection:CaltechAUTHORS
Deposited By: Archive Administrator
Deposited On:10 Jan 2006
Last Modified:26 Dec 2012 08:43

Repository Staff Only: item control page