arXiv:1007.1033v2 [cs.IT] 3 Sep 2010
1
A Theory of Network Equivalence
Part I: Point-to-Point Channels
Ralf Koetter,
Fellow, IEEE,
Michelle Effros,
Fellow, IEEE,
and Muriel M ́edard,
Fellow, IEEE
Abstract
A family of equivalence tools for bounding network capaciti
es is introduced. Part I treats networks
built from point-to-point channels. Part II generalizes th
e technique to networks containing wireless
channels such as broadcast, multiple access, and interfere
nce channels. The main result of part I is roughly
as follows. Given a network of noisy, independent, memoryle
ss point-to-point channels, a collection of
demands can be met on the given network if and only if it can be m
et on another network where each
noisy channel is replaced by a noiseless bit pipe with throug
hput equal to the noisy channel capacity.
This result was known previously for the case of a single-sou
rce multicast demand. The result given
here treats general demands – including, for example, multi
ple unicast demands – and applies even when
the achievable rate region for the corresponding demands is
unknown in both the noisy network and its
noiseless counterpart.
Keywords: Capacity, network coding, equivalence, compone
nt models
I. I
NTRODUCTION
The study of network communications has two natural facets r
eflecting different approaches to thinking
about networks. On the one hand, networks are considered in t
he graph theoretic setup consisting of
nodes connected by links. The links are typically not noisy c
hannels but noise-free bit pipes that can be
used error free up to a certain capacity. Typical concepts in
clude information flows and routing issues. On
R. Koetter was with the Technical University of Munich. M. Ef
fros is with the Department of Electrical Engineering,
California Institute of Technology, Pasadena, CA 91125 USA
e-mail: effros@caltech.edu. M. M ́edard is with the Departm
ent
of Electrical Engineering and Computer Science, Massachus
etts Institute of Technology, Cambridge, MA 02139 USA e-mai
l:
medard@mit.edu.
Submitted to IEEE Transactions on Information Theory April
14, 2010
DRAFT
the other hand, multiterminal information theory addresse
s information transmission through networks by
studying noisy channels, or rather the stochastic relation
ship between input and output signals at devices
in a network. Here the questions typically concern fundamen
tal limits of communication. The capacity
regions of broadcast, multiple access, and interference ch
annels are all examples of questions that are
addressed in the context of multiterminal information theo
ry. These questions appear to have no obvious
equivalent in networks consisting of error free bit pipes. N
evertheless, these two views of networking are
two natural facets of the same problem, namely communicatio
n through networks. This paper explores
the relationship between these two worlds.
Establishing viable bridges between these two areas shows t
o be surprisingly fertile. For example,
questions about feedback in multiterminal systems are quit
e nicely expressed in networks of error free
bit-pipes. Separation issues — in particular separation be
tween network coding and channel coding –
have natural answers, revealing many network capacity prob
lems as combinatorial rather than statistical,
even when communication occurs across networks of noisy cha
nnels. Most importantly, bounding general
network capacities reduces to solving a central network cod
ing problem described as follows: Given a
network of
error free
rate-constrained bit pipes, is a given set of demands (e.g.,
a collection of unicast
and multicast connections) simultaneously satisfiable or n
ot. In certain situations, most notably a single
multicast demand, this question is solved, and the answer is
easily characterized [1]. Unfortunately, the
general case is wide open and suspected to be hard. (Currentl
y, NP hardness is only established for
linear network coding [2].) While it appears that fully char
acterizing the combinatorial network coding
problem is out of reach [3], moderate size networks can be sol
ved quite efficiently, and there are algorithms
available that, with running time that is exponential in the
number of nodes, treat precisely this problem
for general demands [4], [5], [6]. The possibility of charac
terizing, in principle, the rate region of a
combinatorial network coding problem will be a corner stone
for our investigations.
The combinatorial nature of the network coding problem crea
tes a situation not unlike issues in complexity
theory. In that case, since precise expressions as to how dif
ficult a problem is in absolute terms are difficult
to derive, research is instead devoted to showing that one pr
oblem is essentially as difficult as another
one (even though precise characterizations are not availab
le for either). Inspired by this analogy, we here
take a similar approach, characterizing the relationship b
etween arbitrary network capacity problems and
the central combinatorial network coding capacity problem
. This characterization is, in fact, all we need
if we want to address separation issues in networks. It also o
pens the door to other questions, such as
degree-of-freedom or high signal to noise ratio analyses, w
hich reveal interesting insights.
2
It is interesting to note the variety of new tools generated i
n recent years for studying network capacities
(e.g., [1], [7], [8], [4], [9], [10], [11], [5], [12], [3]). T
he reduction of a network information theoretic
question to its combinatorial essence is also at the heart of
some of these publications (see, e.g. [12]). Our
approach is very different in terms of technique and also res
ults, focusing not on the solution of network
capacities when good outer bounds are available but on provi
ng relationships between capacity regions
even (or especially) when these capacity regions remain ina
ccessible using available analytical techniques.
Nonetheless, we believe it to be no coincidence that the redu
ction of a problem to its combinatorial essence
plays a central role in a variety of techniques for studying n
etwork capacities.
II. I
NTUITION AND
S
UMMARY OF
R
ESULTS
The goal of finding capacities for general networks under gen
eral demands is currently out of reach.
Establishing connections between the networking and infor
mation theoretic views of network communi-
cations simplifies the task by allowing us to identify both th
e stochastic and the combinatorial facets of the
communication problem and to apply the appropriate tools to
each. For example, consider a network of
independent, memoryless, noisy point-to-point channels.
To derive the multicast capacity of this network,
Borade [7] and Song, Yeung, and Cai [13] first find the noisy net
work’s cut-set outer bound and then
demonstrate the achievability of that bound by applying a mu
lticast network code over point-to-point
channels made reliable using independent channel coding on
each point-to-point channel. The resulting
separation theorem establishes one tight connection betwe
en the two natural views of communication
networks. This paper considers whether similar connection
s can be established for general demands.
Relating the capacity of stochastic networks to the network
coding capacity allows us to apply analytical
and computational tools from the network coding literature
(e.g., [4], [5], [6]) to bound the capacity of
networks of stochastic channels.
While it is tempting to believe that the separation result de
rived for a single-source multicast demand
in [7], [13] should also apply under general demands, it is cl
ear that the proof technique does not. That is,
first establishing a tight outer bound and then showing that t
hat outer bound can be achieved by separate
network and channel coding is not a feasible strategy for tre
ating all possible demand types over all
possible network topologies. The proof is further complica
ted by the observation that joint channel and
network codes have a variety of clear advantages over separa
ted codes even when separated strategies
suffice to achieve the network capacity. Example 1 illustrat
es one such advantage, showing that operating
channels above their respective capacities can improve com
munication reliability across the network as
3
t
1
*
H
H
H
Hj
X
(1
,
1)
X
(1
,
2)
p
(
y
(2
,
1)
|
x
(1
,
1)
)
p
(
y
(2
,
2)
|
x
(1
,
2)
)
H
H
H
Hj
*
Y
(2
,
1)
Y
(2
,
2)
t
2
(a)
-
n
.
.
.
-
n
.
.
.
6
?
2
nR
6
?
2
nR
(b)
-
2
n
.
.
.
6
?
2
(2
n
)
R
(c)
Fig. 1.
(a) The network discussed in the comparison of separa
te network and channel coding to joint network and channel
coding in Example 1. (b) A pair of
(2
nR
, n
)
channel codes; each is used to reliably transmit
nR
bits over
n
uses of a single
channel in the separated strategy. (c) A single
(2
(2
n
)
R
,
2
n
)
channel code; this is used to reliably transmit information
across
n
uses of the pair of channels in the joint coding strategy. The
joint coding strategy achieves twice the error exponent by o
perating
each channel at roughly twice its capacity.
a whole. It remains to be determined whether operating chann
els above their capacities can also increase
the achievable rate region for cases beyond the single-sour
ce multicast demand studied in [7], [13].
Example 1
Consider the problem of establishing a unicast connection o
ver the two-node network shown
in Figure 1(a). Node 1 transmits a pair of channel inputs
x
(1)
= (
x
(1
,
1)
, x
(1
,
2)
)
. Node 2 receives a pair
of channel outputs
y
(2)
= (
y
(2
,
1)
, y
(2
,
2)
)
. The inputs and outputs are stochastically related through
a pair
of independent but identical channels, thus
p
(
y
(2
,
1)
, y
(2
,
2)
|
x
(1
,
1)
, x
(1
,
2)
) =
p
(
y
(2
,
1)
|
x
(1
,
1)
)
p
(
y
(2
,
2)
|
x
(1
,
2)
)
for all
(
x
(1
,
1)
, x
(1
,
2)
, y
(2
,
1)
, y
(2
,
2)
)
∈X
(1
,
1)
×X
(1
,
2)
×Y
(2
,
1)
×Y
(2
,
2)
while
p
(
y
(2
,
1)
|
x
(1
,
1)
) =
p
(
y
(2
,
2)
|
x
(1
,
2)
)
for all
(
x
(1
,
1)
, y
(2
,
1)
) = (
x
(1
,
2)
, y
(2
,
2)
)
. For each rate
R < C
= max
p
(
x
)
I
(
X
;
Y
)
and each blocklength
n
,
we compare two strategies for reliably communicating from n
ode 1 to node 2. The first (see Figure 1(b))
is an optimal separate network and channel code that reliabl
y communicates across each channel using an
optimal
(2
nR
, n
)
channel code. The second strategy (see Figure 1(c)) applies
a single optimal
(2
2
nR
,
2
n
)
channel code across the pair of channels, sending the first
n
symbols of each codeword across the first
channel and the remaining
n
symbols across the second channel. The decoder observes the
outputs of
both channels and reliably decodes using its blocklength-
2
n
channel decoder. Using this approach, each
channel has
2
2
nR
possible inputs. Thus when
R
is close to
C
, this joint channel and network code
operates each channel at roughly twice its capacity – making
reliable transmission across each channel
alone impossible. Since the joint code operates a
2
n
dimensional code over
n
time steps, it achieves a
better error exponent than the separated code.
4
Our main result is roughly as follows. An arbitrary collecti
on of demands can be met on a network of
noisy, independent, memoryless point-to-point channels,
if and only if the same demands can be met
on another network where each noisy channel is replaced by a n
oiseless bit pipe of the corresponding
capacity. This result agrees with [7], [13] in the case of mul
ticast demands.
Our proof introduces a new technique for bounding the capaci
ty region of one network in terms of the
capacity region of another network. Critically, this appro
ach can be employed even when the capacity
regions of both networks are unknown. We prove equivalence b
y first showing that the rate region for
the noiseless bit-pipe network is a subset of that for the net
work of noisy channels and then showing
that the rate region for the network of noiseless bit pipes is
a superset of that for the network of noisy
channels. In each case, we show the desired relationship by d
emonstrating that codes that can be operated
reliably on one network can be operated with similar error pr
obability on the other network. Codes
for the bit-pipe network can be operated across the network o
f noisy channels using an independent
channel code across each channel. Operating codes for the ne
twork of noisy channels across the bit-
pipe network is more difficult since networks of noisy channe
ls allow a far richer algorithmic behavior
than networks of noiseless bit pipes. While it is known that a
noiseless bit-pipe of a given throughput
can emulate any discrete memoryless channel of lesser capac
ity [14], applying this result seems to be
difficult. Difficulties arise with continuous random variab
les, timing questions, and proving continuity of
rate regions in the channel statistics. Worst of all, since w
e do not know which strategy achieves the
network capacity, we must be able to emulate all of them. We th
erefore prove our main claim directly,
without exploiting [14]. We use a source coding argument to s
how that we can emulate each noisy channel
across the corresponding noiseless bit pipe to sufficient ac
curacy that any code designed for the network
of noisy channels can be operated across the noiseless bit-p
ipe network with a similar error probability.
It is important to note that the given approach does not requi
re knowing the rate region of either network
nor what the optimal codes look like, and it never answers the
question of whether a particular rate point
is in the rate region or not. The proofs only demonstrate that
any rate point in the interior of the rate
region for one network must also be in the interior of the rate
region for the other network.
The given relationship between networks of noisy point-to-
point channels and networks of noiseless bit-
pipes has a number of surprisingly powerful consequences. F
or example, it demonstrates that at its core
characterizing network capacity is a combinatoric problem
rather than a probabilistic one: Shannon’s
channel coding theorem tells us everything that we need to kn
ow about the noise in independent, point-
to-point channels. Understanding the relationship betwee
n the two facets of network communications
5
likewise lends insight into a variety of network informatio
n theoretic questions. For example, the classical
result that feedback does not increase the capacity of a poin
t-to-point channel now can be proven in
two ways. The first is the classical information theoretic ar
gument that shows that the channel has no
information that is useful to the transmitter that the trans
mitter does not already know. The second
observes that the min-cut between the transmitter and the re
ceiver in the equivalent network is the
same with or without feedback; therefore feedback does not i
ncrease capacity. While both proofs lead
to the same well-known result, the latter is easier to genera
lize. For example, the following result is
an immediate consequence of the given network equivalence a
nd the well-known characterization of
the multicast capacity in network coding [1]. Given any netw
ork of noisy, memoryless, point-to-point
channels and any multicast demand, feedback increases the m
ulticast capacity if and only if it increases
the min-cut on the equivalent deterministic network. Likew
ise, since capacities are known for a variety of
network coding problems [8], we can immediately determine w
hether feedback increases the achievable
rate regions for a variety of other demand types (e.g., multi
ple-source multicast demands, single-source
non-overlapping demands, and single-source non-overlapp
ing plus multicast demands).
III. T
HE
S
ETUP
Our notation is similar to that of Cover and Thomas [15, Secti
on 15.10]. A multiterminal network is
defined by a vertex set
V
=
{
1
, . . . , m
}
with associated random variables
X
(
v
)
∈ X
(
v
)
which are
transmitted from node
v
and
Y
(
v
)
∈ Y
(
v
)
which are received at node
v
. The alphabets
X
(
v
)
and
Y
(
v
)
may be discrete or continuous. They may also be vectors or sca
lars. For example, if node
v
transmits
information over
k
binary symmetric channels, then
X
(
v
)
=
{
0
,
1
}
k
. The network is assumed to be
memoryless and characterized by a conditional probability
distribution
p
(
y
|
x
) =
p
(
y
(1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
m
)
)
.
Note that for continuous random variables this assumption i
mplies that we restrict our attention to cases
where this conditional distribution (in this case a conditi
onal prbability density function) exists. A code of
blocklength
n
operates the network over
n
time steps with the goal of communicating, for each distinct
pair of nodes
u
and
v
, message
W
(
u
→
v
)
∈W
(
u
→
v
)
def
=
{
1
, . . . ,
2
nR
(
u
→
v
)
}
from source node
u
to sink node
v
. The messages
W
(
u
→
v
)
are independent and uniformly distributed by
assumption (the proof also goes through unchanged if the sam
e message is available at more than one
6
s
i
-
p
(
y
(
j,
1)
|
x
(
i,
1)
)
-
s
j
s
1
s
2
s
3
s
s
s
s
s
s
s
s
s
s
s
m
−
2
s
m
−
1
s
m
Fig. 2.
An
m
-node network containing a channel
p
(
y
(
j,
1)
|
x
(
i,
1)
)
from node
i
to node
j
. Here
x
(
i
)
= (
x
(
i,
1)
, x
(
i,
2)
)
,
y
(
j
)
=
(
y
(
j,
1)
, y
(
j,
2)
)
, and the distribution
p
(
y
(1)
, . . . , y
(
j
−
1)
, y
(
j,
2)
, y
(
j
+1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
i
−
1)
, x
(
i,
2)
, x
(
i
+1)
, . . . , x
(
m
)
)
on the
remaining channel outputs given the remaining channel inpu
ts is arbitrary.
node in the network). The vector of messages
W
(
u
→
v
)
is denoted by
W
. The constant
R
(
u
→
v
)
is called
the rate of the transmission, and the vector of rates
R
(
u
→
v
)
is denoted by
R
. Since no message is required
from a node
u
to itself,
R
(
u
→
u
)
= 0
, and
R
is treated as a
m
(
m
−
1)
-dimensional vector. By [16], for
any network coding problem with generic demands, we can cons
truct a multiple unicast problem such
that the given demands can be met in the original network if an
d only if the unicast demands can be
met in the constructed network. This argument generalizes i
mmediately to the network model presented
here. Therefore, there is no loss of generality (and conside
rable simplification of notation) in describing
messages for all node pairs (
(
u, v
)
∈{
1
, . . . , m
}
2
such that
u
6
=
v
) rather than messages for all possible
multicasts (
(
u, B
)
with
u
∈{
1
, . . . , m
}
and
B
⊆{
1
, . . . , m
}\
u
).
We denote the random variable transmitted by node
v
at time
t
as
X
(
v
)
t
and the full vector of time-
t
transmissions by all nodes as
X
t
. We likewise denote the random variable received by node
v
at time
t
by
Y
(
v
)
t
and the full vector of time-
t
channel outputs by
Y
t
. A network is written as a triple
m
Y
v
=1
X
(
v
)
, p
(
y
|
x
)
,
m
Y
v
=1
Y
(
v
)
!
(1)
with the additional constraint that random variable
X
(
v
)
t
is a function of random variables
{
Y
(
v
)
1
, . . . , Y
(
v
)
t
−
1
, W
(
v
→
1)
, . . . , W
(
v
→
m
)
}
alone.
While this characterization is very general, it does not exp
loit any information about the network’s
structure. Later discussion treats networks made entirely
from point-to-point channels, but we begin by
considering a network that is completely arbitrary except f
or its inclusion of a point-to-point channel from
7
node
i
to node
j
, as shown in Figure 2, that is independent of the remainder of
the network distribution.
Precisely, the conditional distribution on all channel out
puts given all channel inputs factors as
p
(
y
|
x
)
=
p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
(1)
, . . . , y
(
j
−
1)
, y
(
j,
2)
, y
(
j
+1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
i
−
1)
, x
(
i,
2)
, x
(
i
+1)
, . . . , x
(
m
)
)
,
where
X
(
i
)
=
X
(
i,
1)
×X
(
i,
2)
,
Y
(
j
)
=
Y
(
j,
1)
×Y
(
j,
2)
,
(
X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
,
Y
(
j,
1)
)
is the point-to-point
channel, and
(
X
(
i,
2)
×
Y
v
6
=
i
X
(
v
)
, p
(
y
(1)
, .., y
(
j
−
1)
, y
(
j,
2)
, y
(
j
+1)
, .., y
(
m
)
|
x
(1)
, .., x
(
i
−
1)
, x
(
i,
2)
, x
(
i
+1)
, .., x
(
m
)
)
,
Y
(
j,
2)
×
Q
v
6
=
j
Y
(
v
)
)
is the remainder of the network. As mentioned previously, fo
r continuous-alphabet channels we restrict
our attention to networks for which the above conditional pr
obability density functions exist. We also
restrict our attention to channels for which the input distr
ibution that achieves the capacity of channel
(
X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
,
Y
(
j,
1)
)
in isolation has a probability density function. This inclu
des most of the
continuous channels studied in the literature.
The notation
X
= (
X
(
i,
1)
, X
−
(
i,
1)
)
and
Y
= (
Y
(
j,
1)
, Y
−
(
j,
1)
)
is sometimes useful to succinctly distin-
guish the input and output of the point-to-point channel fro
m the remainder of the network channel inputs
and outputs. Using this notation, an
m
-node network containing an independent point-to-point ch
annel
from node
i
to node
j
is written as
N
=
X
−
(
i,
1)
×X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
−
(
j,
1)
|
x
−
(
i,
1)
)
,
Y
−
(
j,
1)
×Y
(
j,
1)
.
(2)
Figure 1(a) shows one example where the remainder of the netw
ork is itself a point-to-point channel. In
this paper we want to investigate some information theoreti
c aspects of replacing factor
p
(
y
(
j,
1)
|
x
(
i,
1)
)
.
Remark 1
The given definitions are sufficiently general to model a wide
variety of memoryless channel
types. For example, the distribution
p
(
y
−
(
j,
1)
|
x
−
(
i,
1)
)
may model wireless components like broadcast,
multiple access, and interference channels. If
X
(
i,
1)
and
Y
(
j,
1)
are vector alphabets, then the channel
from node
i
to node
j
is a point-to-point MIMO channel. In some situations it is im
portant to be able to
embed the transmissions of various nodes in a schedule which
may or may not depend on the messages to
be sent and the symbols that were received in the network. It i
s straightforward to model such a situation
in the above setup by including in the input and output alphab
ets symbols for the case when nothing was
sent on a particular node input. In this way we can assume that
at each time
t
random variables
X
(
v
)
t
and
Y
(
v
)
t
are given.
8
Definition 1
Let a network
N
def
=
m
Y
v
=1
X
(
v
)
, p
(
y
|
x
)
,
m
Y
v
=1
Y
(
v
)
!
be given. A blocklength-
n
solution
S
(
N
)
to this network is defined as a set of encoding and decoding
functions:
X
(
v
)
t
: (
Y
(
v
)
)
t
−
1
×
m
Y
v
′
=1
W
(
v
→
v
′
)
→X
(
v
)
ˆ
W
(
u
→
v
)
: (
Y
(
v
)
)
n
×
m
Y
v
′
=1
W
(
v
→
v
′
)
→W
(
u
→
v
)
mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
t
−
1
, W
(
v
→
1)
, . . . , W
(
v
→
m
)
)
to
X
(
v
)
t
for each
v
∈
V
and
t
∈{
1
, . . . , n
}
and mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
n
, W
(
v
→
1)
, . . . , W
(
v
→
m
)
)
to
ˆ
W
(
u
→
v
)
for each
u, v
∈
V
. The solution
S
(
N
)
is called a
(
λ,
R
)
-solution, denoted
(
λ,
R
)
-
S
(
N
)
, if
Pr(
W
(
u
→
v
)
6
=
ˆ
W
(
u
→
v
)
)
< λ
for all source and sink pairs
u, v
using the specified encoding and decoding functions.
Definition 2
The rate region
R
(
N
)
⊂
R
m
(
m
−
1)
+
of a network
N
is the closure of all rate vectors
R
such
that for any
λ >
0
and all
n
sufficiently large, there exists a
(
λ,
R
)
-
S
(
N
)
solution of blocklength
n
. We use
int
(
R
(
N
))
to denote the interior of rate region
R
(
N
)
.
The goal of this paper is not to give the capacity regions of ne
tworks with respect to various demands,
which is an intractable problem. Rather, we wish to develop e
quivalence relationships between capacity
regions of networks. Given the existence of a solution
(
λ,
R
)
-
S
(
N
)
of some blocklength
n
for a network
N
we will try to imply statements for the existence of a solutio
n
(
λ
′
,
R
′
)
-
S
(
N
′
)
of some blocklength
n
′
for a network
N
′
.
To make this precise, consider a memoryless network
N
containing an independent channel from node
i
to node
j
. Then
p
(
y
|
x
) =
p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
−
(
j,
1)
|
x
−
(
i,
1)
)
.
Let another network
N
′
be given with random variables
(
̃
X
(
i,
1)
,
̃
Y
(
j,
1)
)
replacing
(
X
(
i,
1)
, Y
(
j,
1)
)
in
N
.
We have replaced the point-to-point channel characterized
by
p
(
y
(
j,
1)
|
x
(
i,
1)
)
with another point-to-point
channel characterized by
̃
p
( ̃
y
(
j,
1)
|
̃
x
(
i,
1)
)
. When
I
(
X
(
i,
1)
;
Y
(
j,
1)
)
< I
(
̃
X
(
i,
1)
;
̃
Y
(
j,
1)
)
, we want to prove
that the existence of a
(
λ,
R
)
-
S
(
N
)
solution implies the existence of a
(
λ
′
,
R
′
)
-
S
(
N
′
)
solution, where
λ
′
can be made arbitrarily small if
λ
can. Since node
j
need not decode
Y
(
j,
1)
, channel capacity is not
necessarily a relevant characterization of the channel’s b
ehavior. For example a Gaussian channel from
9
t
1