A Caltech Library Service

Heavy-Tailed Distributions, Generalized Source Coding and Optimal Web Layout Design

Zhu, Xiaoyun and Yu, Jie and Doyle, John (2000) Heavy-Tailed Distributions, Generalized Source Coding and Optimal Web Layout Design. California Institute of Technology , Pasadena, CA. (Unpublished)

See Usage Policy.


Use this Persistent URL to link to this item:


The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. Moreover, a direct link has been established between the self-similar nature of measured aggregate network traffic and the underlying heavy-tailed distributions of the Web traffic at the source level. This report provides a natural and plausible explanation for the origin of heavy tails in Web traffic by introducing a series of simplified models for optimal Web layout design with varying levels of realism and analytic tractability. The basic approach is to view the minimization of the average file download time as a generalization of standard source coding for data compression, but with the design of the Web layout rather than the codewords. The results, however, are quite different from standard source coding, as all assumptions produce power law distributions for a wide variety of user behavior models. In addition, a simulation model of more complex Web site layouts is proposed, with more detailed hyperlinks and user behavior. The throughput of a Web site can be maximized by taking advantage of information on user access patterns and rearranging (splitting or merging) files on the Web site accordingly, with a constraint on available resources. A heuristic optimization on random graphs is formulated, with user navigation modeled as Markov Chains. Simulations on different classes of graphs as well as more realistic models with simple geometries in individual Web pages all produce power law tails in the resulting size distributions of the files transferred from the Web sites. This again verifies our conjecture that heavy-tailed distributions result naturally from the tradeoff between the design objective and limited resources, and suggests a methodology for aiding in the design of high-throughput Web sites.

Item Type:Report or Paper (Technical Report)
Doyle, John0000-0002-1828-2486
Group:Control and Dynamical Systems Technical Reports
Record Number:CaltechCDSTR:2000.001
Persistent URL:
Usage Policy:You are granted permission for individual, educational, research and non-commercial reproduction, distribution, display and performance of this work in any format.
ID Code:28038
Deposited By: Imported from CaltechCDSTR
Deposited On:16 Jul 2006
Last Modified:26 Nov 2019 11:15

Repository Staff Only: item control page