of 91
arXiv:1007.1033v2 [cs.IT] 3 Sep 2010
1
A Theory of Network Equivalence
Part I: Point-to-Point Channels
Ralf Koetter,
Fellow, IEEE,
Michelle Effros,
Fellow, IEEE,
and Muriel M ́edard,
Fellow, IEEE
Abstract
A family of equivalence tools for bounding network capaciti
es is introduced. Part I treats networks
built from point-to-point channels. Part II generalizes th
e technique to networks containing wireless
channels such as broadcast, multiple access, and interfere
nce channels. The main result of part I is roughly
as follows. Given a network of noisy, independent, memoryle
ss point-to-point channels, a collection of
demands can be met on the given network if and only if it can be m
et on another network where each
noisy channel is replaced by a noiseless bit pipe with throug
hput equal to the noisy channel capacity.
This result was known previously for the case of a single-sou
rce multicast demand. The result given
here treats general demands – including, for example, multi
ple unicast demands – and applies even when
the achievable rate region for the corresponding demands is
unknown in both the noisy network and its
noiseless counterpart.
Keywords: Capacity, network coding, equivalence, compone
nt models
I. I
NTRODUCTION
The study of network communications has two natural facets r
eflecting different approaches to thinking
about networks. On the one hand, networks are considered in t
he graph theoretic setup consisting of
nodes connected by links. The links are typically not noisy c
hannels but noise-free bit pipes that can be
used error free up to a certain capacity. Typical concepts in
clude information flows and routing issues. On
R. Koetter was with the Technical University of Munich. M. Ef
fros is with the Department of Electrical Engineering,
California Institute of Technology, Pasadena, CA 91125 USA
e-mail: effros@caltech.edu. M. M ́edard is with the Departm
ent
of Electrical Engineering and Computer Science, Massachus
etts Institute of Technology, Cambridge, MA 02139 USA e-mai
l:
medard@mit.edu.
Submitted to IEEE Transactions on Information Theory April
14, 2010
DRAFT
the other hand, multiterminal information theory addresse
s information transmission through networks by
studying noisy channels, or rather the stochastic relation
ship between input and output signals at devices
in a network. Here the questions typically concern fundamen
tal limits of communication. The capacity
regions of broadcast, multiple access, and interference ch
annels are all examples of questions that are
addressed in the context of multiterminal information theo
ry. These questions appear to have no obvious
equivalent in networks consisting of error free bit pipes. N
evertheless, these two views of networking are
two natural facets of the same problem, namely communicatio
n through networks. This paper explores
the relationship between these two worlds.
Establishing viable bridges between these two areas shows t
o be surprisingly fertile. For example,
questions about feedback in multiterminal systems are quit
e nicely expressed in networks of error free
bit-pipes. Separation issues — in particular separation be
tween network coding and channel coding –
have natural answers, revealing many network capacity prob
lems as combinatorial rather than statistical,
even when communication occurs across networks of noisy cha
nnels. Most importantly, bounding general
network capacities reduces to solving a central network cod
ing problem described as follows: Given a
network of
error free
rate-constrained bit pipes, is a given set of demands (e.g.,
a collection of unicast
and multicast connections) simultaneously satisfiable or n
ot. In certain situations, most notably a single
multicast demand, this question is solved, and the answer is
easily characterized [1]. Unfortunately, the
general case is wide open and suspected to be hard. (Currentl
y, NP hardness is only established for
linear network coding [2].) While it appears that fully char
acterizing the combinatorial network coding
problem is out of reach [3], moderate size networks can be sol
ved quite efficiently, and there are algorithms
available that, with running time that is exponential in the
number of nodes, treat precisely this problem
for general demands [4], [5], [6]. The possibility of charac
terizing, in principle, the rate region of a
combinatorial network coding problem will be a corner stone
for our investigations.
The combinatorial nature of the network coding problem crea
tes a situation not unlike issues in complexity
theory. In that case, since precise expressions as to how dif
ficult a problem is in absolute terms are difficult
to derive, research is instead devoted to showing that one pr
oblem is essentially as difficult as another
one (even though precise characterizations are not availab
le for either). Inspired by this analogy, we here
take a similar approach, characterizing the relationship b
etween arbitrary network capacity problems and
the central combinatorial network coding capacity problem
. This characterization is, in fact, all we need
if we want to address separation issues in networks. It also o
pens the door to other questions, such as
degree-of-freedom or high signal to noise ratio analyses, w
hich reveal interesting insights.
2
It is interesting to note the variety of new tools generated i
n recent years for studying network capacities
(e.g., [1], [7], [8], [4], [9], [10], [11], [5], [12], [3]). T
he reduction of a network information theoretic
question to its combinatorial essence is also at the heart of
some of these publications (see, e.g. [12]). Our
approach is very different in terms of technique and also res
ults, focusing not on the solution of network
capacities when good outer bounds are available but on provi
ng relationships between capacity regions
even (or especially) when these capacity regions remain ina
ccessible using available analytical techniques.
Nonetheless, we believe it to be no coincidence that the redu
ction of a problem to its combinatorial essence
plays a central role in a variety of techniques for studying n
etwork capacities.
II. I
NTUITION AND
S
UMMARY OF
R
ESULTS
The goal of finding capacities for general networks under gen
eral demands is currently out of reach.
Establishing connections between the networking and infor
mation theoretic views of network communi-
cations simplifies the task by allowing us to identify both th
e stochastic and the combinatorial facets of the
communication problem and to apply the appropriate tools to
each. For example, consider a network of
independent, memoryless, noisy point-to-point channels.
To derive the multicast capacity of this network,
Borade [7] and Song, Yeung, and Cai [13] first find the noisy net
work’s cut-set outer bound and then
demonstrate the achievability of that bound by applying a mu
lticast network code over point-to-point
channels made reliable using independent channel coding on
each point-to-point channel. The resulting
separation theorem establishes one tight connection betwe
en the two natural views of communication
networks. This paper considers whether similar connection
s can be established for general demands.
Relating the capacity of stochastic networks to the network
coding capacity allows us to apply analytical
and computational tools from the network coding literature
(e.g., [4], [5], [6]) to bound the capacity of
networks of stochastic channels.
While it is tempting to believe that the separation result de
rived for a single-source multicast demand
in [7], [13] should also apply under general demands, it is cl
ear that the proof technique does not. That is,
first establishing a tight outer bound and then showing that t
hat outer bound can be achieved by separate
network and channel coding is not a feasible strategy for tre
ating all possible demand types over all
possible network topologies. The proof is further complica
ted by the observation that joint channel and
network codes have a variety of clear advantages over separa
ted codes even when separated strategies
suffice to achieve the network capacity. Example 1 illustrat
es one such advantage, showing that operating
channels above their respective capacities can improve com
munication reliability across the network as
3
t
1



*
H
H
H
Hj
X
(1
,
1)
X
(1
,
2)
p
(
y
(2
,
1)
|
x
(1
,
1)
)
p
(
y
(2
,
2)
|
x
(1
,
2)
)
H
H
H
Hj



*
Y
(2
,
1)
Y
(2
,
2)
t
2
(a)
-

n
.
.
.
-

n
.
.
.
6
?
2
nR
6
?
2
nR
(b)
-

2
n
.
.
.
6
?
2
(2
n
)
R
(c)
Fig. 1.
(a) The network discussed in the comparison of separa
te network and channel coding to joint network and channel
coding in Example 1. (b) A pair of
(2
nR
, n
)
channel codes; each is used to reliably transmit
nR
bits over
n
uses of a single
channel in the separated strategy. (c) A single
(2
(2
n
)
R
,
2
n
)
channel code; this is used to reliably transmit information
across
n
uses of the pair of channels in the joint coding strategy. The
joint coding strategy achieves twice the error exponent by o
perating
each channel at roughly twice its capacity.
a whole. It remains to be determined whether operating chann
els above their capacities can also increase
the achievable rate region for cases beyond the single-sour
ce multicast demand studied in [7], [13].
Example 1
Consider the problem of establishing a unicast connection o
ver the two-node network shown
in Figure 1(a). Node 1 transmits a pair of channel inputs
x
(1)
= (
x
(1
,
1)
, x
(1
,
2)
)
. Node 2 receives a pair
of channel outputs
y
(2)
= (
y
(2
,
1)
, y
(2
,
2)
)
. The inputs and outputs are stochastically related through
a pair
of independent but identical channels, thus
p
(
y
(2
,
1)
, y
(2
,
2)
|
x
(1
,
1)
, x
(1
,
2)
) =
p
(
y
(2
,
1)
|
x
(1
,
1)
)
p
(
y
(2
,
2)
|
x
(1
,
2)
)
for all
(
x
(1
,
1)
, x
(1
,
2)
, y
(2
,
1)
, y
(2
,
2)
)
∈X
(1
,
1)
×X
(1
,
2)
×Y
(2
,
1)
×Y
(2
,
2)
while
p
(
y
(2
,
1)
|
x
(1
,
1)
) =
p
(
y
(2
,
2)
|
x
(1
,
2)
)
for all
(
x
(1
,
1)
, y
(2
,
1)
) = (
x
(1
,
2)
, y
(2
,
2)
)
. For each rate
R < C
= max
p
(
x
)
I
(
X
;
Y
)
and each blocklength
n
,
we compare two strategies for reliably communicating from n
ode 1 to node 2. The first (see Figure 1(b))
is an optimal separate network and channel code that reliabl
y communicates across each channel using an
optimal
(2
nR
, n
)
channel code. The second strategy (see Figure 1(c)) applies
a single optimal
(2
2
nR
,
2
n
)
channel code across the pair of channels, sending the first
n
symbols of each codeword across the first
channel and the remaining
n
symbols across the second channel. The decoder observes the
outputs of
both channels and reliably decodes using its blocklength-
2
n
channel decoder. Using this approach, each
channel has
2
2
nR
possible inputs. Thus when
R
is close to
C
, this joint channel and network code
operates each channel at roughly twice its capacity – making
reliable transmission across each channel
alone impossible. Since the joint code operates a
2
n
dimensional code over
n
time steps, it achieves a
better error exponent than the separated code.
4
Our main result is roughly as follows. An arbitrary collecti
on of demands can be met on a network of
noisy, independent, memoryless point-to-point channels,
if and only if the same demands can be met
on another network where each noisy channel is replaced by a n
oiseless bit pipe of the corresponding
capacity. This result agrees with [7], [13] in the case of mul
ticast demands.
Our proof introduces a new technique for bounding the capaci
ty region of one network in terms of the
capacity region of another network. Critically, this appro
ach can be employed even when the capacity
regions of both networks are unknown. We prove equivalence b
y first showing that the rate region for
the noiseless bit-pipe network is a subset of that for the net
work of noisy channels and then showing
that the rate region for the network of noiseless bit pipes is
a superset of that for the network of noisy
channels. In each case, we show the desired relationship by d
emonstrating that codes that can be operated
reliably on one network can be operated with similar error pr
obability on the other network. Codes
for the bit-pipe network can be operated across the network o
f noisy channels using an independent
channel code across each channel. Operating codes for the ne
twork of noisy channels across the bit-
pipe network is more difficult since networks of noisy channe
ls allow a far richer algorithmic behavior
than networks of noiseless bit pipes. While it is known that a
noiseless bit-pipe of a given throughput
can emulate any discrete memoryless channel of lesser capac
ity [14], applying this result seems to be
difficult. Difficulties arise with continuous random variab
les, timing questions, and proving continuity of
rate regions in the channel statistics. Worst of all, since w
e do not know which strategy achieves the
network capacity, we must be able to emulate all of them. We th
erefore prove our main claim directly,
without exploiting [14]. We use a source coding argument to s
how that we can emulate each noisy channel
across the corresponding noiseless bit pipe to sufficient ac
curacy that any code designed for the network
of noisy channels can be operated across the noiseless bit-p
ipe network with a similar error probability.
It is important to note that the given approach does not requi
re knowing the rate region of either network
nor what the optimal codes look like, and it never answers the
question of whether a particular rate point
is in the rate region or not. The proofs only demonstrate that
any rate point in the interior of the rate
region for one network must also be in the interior of the rate
region for the other network.
The given relationship between networks of noisy point-to-
point channels and networks of noiseless bit-
pipes has a number of surprisingly powerful consequences. F
or example, it demonstrates that at its core
characterizing network capacity is a combinatoric problem
rather than a probabilistic one: Shannon’s
channel coding theorem tells us everything that we need to kn
ow about the noise in independent, point-
to-point channels. Understanding the relationship betwee
n the two facets of network communications
5
likewise lends insight into a variety of network informatio
n theoretic questions. For example, the classical
result that feedback does not increase the capacity of a poin
t-to-point channel now can be proven in
two ways. The first is the classical information theoretic ar
gument that shows that the channel has no
information that is useful to the transmitter that the trans
mitter does not already know. The second
observes that the min-cut between the transmitter and the re
ceiver in the equivalent network is the
same with or without feedback; therefore feedback does not i
ncrease capacity. While both proofs lead
to the same well-known result, the latter is easier to genera
lize. For example, the following result is
an immediate consequence of the given network equivalence a
nd the well-known characterization of
the multicast capacity in network coding [1]. Given any netw
ork of noisy, memoryless, point-to-point
channels and any multicast demand, feedback increases the m
ulticast capacity if and only if it increases
the min-cut on the equivalent deterministic network. Likew
ise, since capacities are known for a variety of
network coding problems [8], we can immediately determine w
hether feedback increases the achievable
rate regions for a variety of other demand types (e.g., multi
ple-source multicast demands, single-source
non-overlapping demands, and single-source non-overlapp
ing plus multicast demands).
III. T
HE
S
ETUP
Our notation is similar to that of Cover and Thomas [15, Secti
on 15.10]. A multiterminal network is
defined by a vertex set
V
=
{
1
, . . . , m
}
with associated random variables
X
(
v
)
∈ X
(
v
)
which are
transmitted from node
v
and
Y
(
v
)
∈ Y
(
v
)
which are received at node
v
. The alphabets
X
(
v
)
and
Y
(
v
)
may be discrete or continuous. They may also be vectors or sca
lars. For example, if node
v
transmits
information over
k
binary symmetric channels, then
X
(
v
)
=
{
0
,
1
}
k
. The network is assumed to be
memoryless and characterized by a conditional probability
distribution
p
(
y
|
x
) =
p
(
y
(1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
m
)
)
.
Note that for continuous random variables this assumption i
mplies that we restrict our attention to cases
where this conditional distribution (in this case a conditi
onal prbability density function) exists. A code of
blocklength
n
operates the network over
n
time steps with the goal of communicating, for each distinct
pair of nodes
u
and
v
, message
W
(
u
v
)
∈W
(
u
v
)
def
=
{
1
, . . . ,
2
nR
(
u
v
)
}
from source node
u
to sink node
v
. The messages
W
(
u
v
)
are independent and uniformly distributed by
assumption (the proof also goes through unchanged if the sam
e message is available at more than one
6
s
i
-
p
(
y
(
j,
1)
|
x
(
i,
1)
)
-
s
j
s
1
s
2
s
3
s
s
s
s
s
s
s
s
s
s
s
m
2
s
m
1
s
m
Fig. 2.
An
m
-node network containing a channel
p
(
y
(
j,
1)
|
x
(
i,
1)
)
from node
i
to node
j
. Here
x
(
i
)
= (
x
(
i,
1)
, x
(
i,
2)
)
,
y
(
j
)
=
(
y
(
j,
1)
, y
(
j,
2)
)
, and the distribution
p
(
y
(1)
, . . . , y
(
j
1)
, y
(
j,
2)
, y
(
j
+1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
i
1)
, x
(
i,
2)
, x
(
i
+1)
, . . . , x
(
m
)
)
on the
remaining channel outputs given the remaining channel inpu
ts is arbitrary.
node in the network). The vector of messages
W
(
u
v
)
is denoted by
W
. The constant
R
(
u
v
)
is called
the rate of the transmission, and the vector of rates
R
(
u
v
)
is denoted by
R
. Since no message is required
from a node
u
to itself,
R
(
u
u
)
= 0
, and
R
is treated as a
m
(
m
1)
-dimensional vector. By [16], for
any network coding problem with generic demands, we can cons
truct a multiple unicast problem such
that the given demands can be met in the original network if an
d only if the unicast demands can be
met in the constructed network. This argument generalizes i
mmediately to the network model presented
here. Therefore, there is no loss of generality (and conside
rable simplification of notation) in describing
messages for all node pairs (
(
u, v
)
∈{
1
, . . . , m
}
2
such that
u
6
=
v
) rather than messages for all possible
multicasts (
(
u, B
)
with
u
∈{
1
, . . . , m
}
and
B
⊆{
1
, . . . , m
}\
u
).
We denote the random variable transmitted by node
v
at time
t
as
X
(
v
)
t
and the full vector of time-
t
transmissions by all nodes as
X
t
. We likewise denote the random variable received by node
v
at time
t
by
Y
(
v
)
t
and the full vector of time-
t
channel outputs by
Y
t
. A network is written as a triple
m
Y
v
=1
X
(
v
)
, p
(
y
|
x
)
,
m
Y
v
=1
Y
(
v
)
!
(1)
with the additional constraint that random variable
X
(
v
)
t
is a function of random variables
{
Y
(
v
)
1
, . . . , Y
(
v
)
t
1
, W
(
v
1)
, . . . , W
(
v
m
)
}
alone.
While this characterization is very general, it does not exp
loit any information about the network’s
structure. Later discussion treats networks made entirely
from point-to-point channels, but we begin by
considering a network that is completely arbitrary except f
or its inclusion of a point-to-point channel from
7
node
i
to node
j
, as shown in Figure 2, that is independent of the remainder of
the network distribution.
Precisely, the conditional distribution on all channel out
puts given all channel inputs factors as
p
(
y
|
x
)
=
p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
(1)
, . . . , y
(
j
1)
, y
(
j,
2)
, y
(
j
+1)
, . . . , y
(
m
)
|
x
(1)
, . . . , x
(
i
1)
, x
(
i,
2)
, x
(
i
+1)
, . . . , x
(
m
)
)
,
where
X
(
i
)
=
X
(
i,
1)
×X
(
i,
2)
,
Y
(
j
)
=
Y
(
j,
1)
×Y
(
j,
2)
,
(
X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
,
Y
(
j,
1)
)
is the point-to-point
channel, and
(
X
(
i,
2)
×
Y
v
6
=
i
X
(
v
)
, p
(
y
(1)
, .., y
(
j
1)
, y
(
j,
2)
, y
(
j
+1)
, .., y
(
m
)
|
x
(1)
, .., x
(
i
1)
, x
(
i,
2)
, x
(
i
+1)
, .., x
(
m
)
)
,
Y
(
j,
2)
×
Q
v
6
=
j
Y
(
v
)
)
is the remainder of the network. As mentioned previously, fo
r continuous-alphabet channels we restrict
our attention to networks for which the above conditional pr
obability density functions exist. We also
restrict our attention to channels for which the input distr
ibution that achieves the capacity of channel
(
X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
,
Y
(
j,
1)
)
in isolation has a probability density function. This inclu
des most of the
continuous channels studied in the literature.
The notation
X
= (
X
(
i,
1)
, X
(
i,
1)
)
and
Y
= (
Y
(
j,
1)
, Y
(
j,
1)
)
is sometimes useful to succinctly distin-
guish the input and output of the point-to-point channel fro
m the remainder of the network channel inputs
and outputs. Using this notation, an
m
-node network containing an independent point-to-point ch
annel
from node
i
to node
j
is written as
N
=

X
(
i,
1)
×X
(
i,
1)
, p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
(
j,
1)
|
x
(
i,
1)
)
,
Y
(
j,
1)
×Y
(
j,
1)

.
(2)
Figure 1(a) shows one example where the remainder of the netw
ork is itself a point-to-point channel. In
this paper we want to investigate some information theoreti
c aspects of replacing factor
p
(
y
(
j,
1)
|
x
(
i,
1)
)
.
Remark 1
The given definitions are sufficiently general to model a wide
variety of memoryless channel
types. For example, the distribution
p
(
y
(
j,
1)
|
x
(
i,
1)
)
may model wireless components like broadcast,
multiple access, and interference channels. If
X
(
i,
1)
and
Y
(
j,
1)
are vector alphabets, then the channel
from node
i
to node
j
is a point-to-point MIMO channel. In some situations it is im
portant to be able to
embed the transmissions of various nodes in a schedule which
may or may not depend on the messages to
be sent and the symbols that were received in the network. It i
s straightforward to model such a situation
in the above setup by including in the input and output alphab
ets symbols for the case when nothing was
sent on a particular node input. In this way we can assume that
at each time
t
random variables
X
(
v
)
t
and
Y
(
v
)
t
are given.
8
Definition 1
Let a network
N
def
=
m
Y
v
=1
X
(
v
)
, p
(
y
|
x
)
,
m
Y
v
=1
Y
(
v
)
!
be given. A blocklength-
n
solution
S
(
N
)
to this network is defined as a set of encoding and decoding
functions:
X
(
v
)
t
: (
Y
(
v
)
)
t
1
×
m
Y
v
=1
W
(
v
v
)
→X
(
v
)
ˆ
W
(
u
v
)
: (
Y
(
v
)
)
n
×
m
Y
v
=1
W
(
v
v
)
→W
(
u
v
)
mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
t
1
, W
(
v
1)
, . . . , W
(
v
m
)
)
to
X
(
v
)
t
for each
v
V
and
t
∈{
1
, . . . , n
}
and mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
n
, W
(
v
1)
, . . . , W
(
v
m
)
)
to
ˆ
W
(
u
v
)
for each
u, v
V
. The solution
S
(
N
)
is called a
(
λ,
R
)
-solution, denoted
(
λ,
R
)
-
S
(
N
)
, if
Pr(
W
(
u
v
)
6
=
ˆ
W
(
u
v
)
)
< λ
for all source and sink pairs
u, v
using the specified encoding and decoding functions.
Definition 2
The rate region
R
(
N
)
R
m
(
m
1)
+
of a network
N
is the closure of all rate vectors
R
such
that for any
λ >
0
and all
n
sufficiently large, there exists a
(
λ,
R
)
-
S
(
N
)
solution of blocklength
n
. We use
int
(
R
(
N
))
to denote the interior of rate region
R
(
N
)
.
The goal of this paper is not to give the capacity regions of ne
tworks with respect to various demands,
which is an intractable problem. Rather, we wish to develop e
quivalence relationships between capacity
regions of networks. Given the existence of a solution
(
λ,
R
)
-
S
(
N
)
of some blocklength
n
for a network
N
we will try to imply statements for the existence of a solutio
n
(
λ
,
R
)
-
S
(
N
)
of some blocklength
n
for a network
N
.
To make this precise, consider a memoryless network
N
containing an independent channel from node
i
to node
j
. Then
p
(
y
|
x
) =
p
(
y
(
j,
1)
|
x
(
i,
1)
)
p
(
y
(
j,
1)
|
x
(
i,
1)
)
.
Let another network
N
be given with random variables
(
̃
X
(
i,
1)
,
̃
Y
(
j,
1)
)
replacing
(
X
(
i,
1)
, Y
(
j,
1)
)
in
N
.
We have replaced the point-to-point channel characterized
by
p
(
y
(
j,
1)
|
x
(
i,
1)
)
with another point-to-point
channel characterized by
̃
p
( ̃
y
(
j,
1)
|
̃
x
(
i,
1)
)
. When
I
(
X
(
i,
1)
;
Y
(
j,
1)
)
< I
(
̃
X
(
i,
1)
;
̃
Y
(
j,
1)
)
, we want to prove
that the existence of a
(
λ,
R
)
-
S
(
N
)
solution implies the existence of a
(
λ
,
R
)
-
S
(
N
)
solution, where
λ
can be made arbitrarily small if
λ
can. Since node
j
need not decode
Y
(
j,
1)
, channel capacity is not
necessarily a relevant characterization of the channel’s b
ehavior. For example a Gaussian channel from
9
t
1

@
@R
p
(
y
(2
,
1)
|
x
(1
,
1)
)
p
(
y
(2
,
2)
|
x
(1
,
2)
)








@
@R

t
2
t
1

@
@








@
@R

t
2
t
1

@
@








@
@R

t
2
Fig. 3.
The 3-fold stacked network
N
for the network
N
in Figure 1(a).
i
to
j
might contribute a real-valued estimation of the input rand
om variable; a binary erasure channel
that replaces it cannot immediately deliver the same functi
onality.
Our proof does not invent a coding scheme. Instead, we demons
trate a technique for operating any coding
scheme for
N
on the network
N
. Since there exists a coding scheme for
N
that achieves any point in the
interior of
R
(
N
)
, showing that we can operate all codes for
N
on
N
proves that
R
(
N
)
R
(
N
)
. We
do not know the form of an optimal code for
N
. Therefore, our method must work for all possible codes
on
N
. For example, it must succeed even when the code for
N
is time-varying. As a result, we cannot
apply typicality arguments across time. We introduce inste
ad a notion of
stacking
in order to exploit
averaging arguments across multiple uses of the network rat
her than trying to apply such arguments
across time.
IV. S
TACKED
N
ETWORKS AND
S
TACKED
S
OLUTIONS
An
N
-fold stacked network
N
N
is the network
N
repeated
N
times. That is,
N
N
has
N
copies of
each vertex
v
∈ {
1
, . . . , m
}
and
N
copies of the channel
p
(
y
|
x
)
. Figure 3 shows the 3-fold stacked
network for the network in Figure 1. We abuse notation by simp
lifying
N
N
to
N
throughout, specifying
the number layers in the stack (
N
) by context. Eventually,
N
will be allowed to grow without bound
in order to exploit asymptotic typicality arguments. The
N
-fold stacked network is used to deliver
N
independent messages
W
(
u
v
)
from each transmitter node
u
to each receiver node
v
. All copies of a
node can, at each time
t
, collaborate in determining their channel inputs
X
(
v
)
t
. Likewise, all copies of a
node
v
can collaborate in reconstructing messages
W
(
u
v
)
. This potential for collaboration across the
layers of the stack seems to make the
N
-fold stacked network
N
considerably more powerful than the
network
N
from which it was derived. However, the increase in the numbe
r of degrees of freedom in
a stacked network solution is accompanied by an increased bu
rden in the reconstruction constraint. A
code for the stacked network is successful only if it decodes
without error in every layer. This becomes
10
difficult as
N
grows without bound.
Since the
N
-fold stacked network contains
N
copies of
N
, it does not meet the definition of a network
(for example, its vertex set is a multiset and not a set
1
). Thus new definitions are required. We carry
over notation and variable definitions from the network
N
to the stacked network
N
by underlining
the variable names. So for any distinct
u, v
∈ {
1
, . . . , m
}
,
W
(
u
v
)
∈ W
(
u
v
)
def
=(
W
(
u
v
)
)
N
is the
N
-dimensional vector of messages that the
N
copies of node
u
send to the corresponding copies of
node
v
, and
X
(
v
)
t
∈ X
(
v
)
def
=(
X
(
v
)
)
N
and
Y
(
v
)
t
∈ Y
(
v
)
def
=(
Y
(
v
)
)
N
are the
N
-dimensional vectors of
channel inputs and channel outputs, respectively, for node
v
at time
t
. The variables in the
-th layer
of the stack are denoted by an argument
, for example
W
(
u
v
)
(
)
is the message from node
u
to
node
v
in the
-th layer of the stack and
X
(
v
)
t
(
)
is the layer-
channel input from node
v
at time
t
.
Since
W
(
u
v
)
is an
N
-dimensional vector of messages, when
W
(
u
v
)
∈W
(
u
v
)
def
=
{
1
, . . . ,
2
nR
(
u
v
)
}
in
N
,
W
(
u
v
)
∈ W
(
u
v
)
def
=
{
1
, . . . ,
2
nR
(
u
v
)
}
N
in
N
. We therefore define the rate
R
(
u
v
)
for a
stacked network to be
(log
|W
(
u
v
)
|
)
/
(
nN
)
; this normalization makes rate regions in a network and its
corresponding stacked network comparable.
Definition 3
Let a network
N
def
=
m
Y
v
=1
X
(
v
)
, p
(
y
|
x
)
,
m
Y
v
=1
Y
(
v
)
!
be given. Let
N
be the
N
-fold stacked network for
N
. A blocklength-
n
solution
S
(
N
)
to this network is
defined as a set of encoding and decoding functions
X
(
v
)
t
: (
Y
(
v
)
)
t
1
×
m
Y
v
=1
W
(
v
v
)
→X
(
v
)
ˆ
W
(
u
v
)
: (
Y
(
v
)
)
n
×
m
Y
v
=1
W
(
v
v
)
→W
(
u
v
)
mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
t
1
, W
(
v
1)
, . . . , W
(
v
m
)
)
to
X
(
v
)
t
for each
t
∈ {
1
, . . . , n
}
and
v
∈ {
1
, . . . , m
}
and mapping
(
Y
(
v
)
1
, . . . , Y
(
v
)
n
, W
(
v
1)
, . . . , W
(
v
m
)
)
to
ˆ
W
(
u
v
)
for each
u, v
∈{
1
, . . . , m
}
. The solution
S
(
N
)
is called a
(
λ,
R
)
-solution for
N
, denoted
(
λ,
R
)
-
S
(
N
)
, if the encoding and decoding functions
imply
Pr(
W
(
u
v
)
6
=
ˆ
W
(
u
v
)
)
< λ
for all source and sink pairs
u, v
.
1
The vertex set is a multiset since it contains
N
copies of each element
{
1
, . . . , m
}
.
11
Definition 4
The rate region
R
(
N
)
R
m
(
m
1)
+
of a stacked network
N
is the closure of all rate vectors
R
such that a
(
λ,
R
)
-
S
(
N
)
solution exists for any
λ >
0
and all
N
sufficiently large.
Theorem 1, below, shows that the rate regions for a network
N
and its corresponding stacked network
N
are identical. That result further demonstrates that the er
ror probability for the stacked network can be
made to decay exponentially in the number of layers
N
. The proof builds a blocklength-
n
solution for
network
N
by first using a channel code to map each message
W
(
u
v
)
∈W
(
u
v
)
def
=
{
1
, . . . ,
2
nR
(
u
v
)
}
N
to a message in alphabet
̃
W
(
u
v
)
def
=
{
1
, . . . ,
2
n
̃
R
(
u
v
)
}
N
for some
̃
R
(
u
v
)
> R
(
u
v
)
and then applying
the same blocklength-
n
solution for network
N
independently in each layer of the stack. We call such
a solution a stacked solution.
Definition 5
Let a network
N
def
= (
Q
m
v
=1
X
(
v
)
, p
(
y
|
x
)
,
Q
m
v
=1
Y
(
v
)
)
be given. Let
N
be the
N
-fold stacked
network for
N
. A blocklength-
n
stacked solution
S
(
N
)
to network
N
is defined as a set of mappings
̃
W
(
u
v
)
:
W
(
u
v
)
̃
W
(
u
v
)
X
(
v
)
t
: (
Y
(
v
)
)
t
1
×
m
Y
v
=1
̃
W
(
v
v
)
→X
(
v
)
ˆ
̃
W
(
u
v
)
: (
Y
(
v
)
)
n
×
m
Y
v
=1
̃
W
(
v
v
)
̃
W
(
u
v
)
ˆ
W
(
u
v
)
:
̃
W
(
u
v
)
→W
(
u
v
)
such that
̃
W
(
u
v
)
=
̃
W
(
u
v
)
(
W
(
u
v
)
)
X
(
v
)
t
(
) =
X
(
v
)
t

Y
(
v
)
1
(
)
, . . . , Y
(
v
)
t
1
(
)
,
̃
W
(
v
1)
(
)
, . . . ,
̃
W
(
v
m
)
(
)

ˆ
̃
W
(
u
v
)
(
) =
ˆ
̃
W
(
u
v
)

Y
(
v
)
1
(
)
, . . . , Y
(
v
)
n
(
)
,
̃
W
(
v
1)
(
)
, . . . ,
̃
W
(
v
m
)
(
)

ˆ
W
(
u
v
)
=
ˆ
W
(
u
v
)
(
ˆ
̃
W
(
u
v
)
)
for each
u, v
∈ {
1
, . . . , m
}
,
t
∈ {
1
, . . . , n
}
, and
∈ {
1
, . . . , N
}
. The solution
S
(
N
)
is called a stacked
(
λ,
R
)
-solution, denoted
(
λ,
R
)
-
S
(
N
)
, if the specified mappings imply
Pr(
W
(
u
v
)
6
=
ˆ
W
(
u
v
)
)
< λ
for all source and sink pairs
u, v
.
Theorem 1
The rate regions
R
(
N
)
and
R
(
N
)
are identical, and for each
R ∈
int
(
R
(
N
))
, there exists a
sequence of
(2
,
R
)
-
S
(
N
)
stacked solutions for
N
for some
δ >
0
.
12
Y
(
v
)
t
(1)
Y
(
v
)
t
(2)
Y
(
v
)
t
(3)



Y
(
v
)
t
(
N
)
-
-
-
-
t
t
t



t
v
v
v
v
-
-
-
-
X
(
v
)
t
(1)
X
(
v
)
t
(2)
X
(
v
)
t
(3)



X
(
v
)
t
(
N
)
X
(
v
)
t
=
X
(
v
)
t
(
Y
(
v
)
1
, . . . , Y
(
v
)
t
1
, W
(
v
1)
. . . , W
(
v
m
)
)
(a)
Y
(
v
)
(
N
1)
t
+1
Y
(
v
)
(
N
1)
t
+2
Y
(
v
)
(
N
1)
t
+3
. . .
Y
(
v
)
Nt
-
t
-
X
(
v
)
(
N
1)
t
+1
X
(
v
)
(
N
1)
t
+2
X
(
v
)
(
N
1)
t
+3
. . .
X
(
v
)
Nt
(
X
(
v
)
(
t
1)
N
+1
, . . . , X
(
v
)
tN
)
T
=
X
(
v
)
t
((
Y
(
v
)
1
, . . . , Y
(
v
)
N
)
T
, . . . ,
(
Y
(
v
)
(
t
2)
N
+1
, . . . , Y
(
v
)
(
t
1)
N
)
T
, f
(
v
1)
(
W
(
v
1)
)
, . . . , f
(
v
m
)
(
W
(
v
m
)
))
(b)
Fig. 4.
A blocklength-
n
solution
S
(
N
)
for network
N
can be operated with the same error probability over
nN
time
steps in
N
. (a) Inputs and outputs at time
t
of the
N
copies of node
v
in
N
. (b) Inputs and outputs of node
v
at times
(
N
1)
t
+1
, . . . , Nt
in
N
. Vectors
(
X
(
v
)
(
t
1)
N
+1
, . . . , X
(
v
)
tN
)
T
and
(
Y
(
v
)
(
t
1)
N
+1
, . . . , Y
(
v
)
tN
)
T
in
N
play the same role as vectors
X
(
v
)
t
and
Y
(
v
)
t
in
N
.
Proof.
We first show that
R
(
N
)
R
(
N
)
. Perhaps surprisingly, this turns out to be the easier part o
f
the proof. Let
R∈
int
(
R
(
N
))
. Then for any
λ
(0
,
1]
, there exists a
(
λ,
R
)
-
S
(
N
)
solution to the stacked
network
N
. Let
n
be the blocklength of
S
(
N
)
. The argument that follows uses
S
(
N
)
to build a blocklength
nN
(
λ,
R
)
-
S
(
N
)
solution for network
N
. Roughly, the operations performed at time
t
by the
N
copies of
node
v
in
S
(
N
)
are performed by the single copy of node
v
at times
(
t
1)
N
+ 1
, . . . , tN
in
S
(
N
)
, as
shown in Figure 4. This gives the desired result since the err
or probability and rate of
S
(
N
)
on
N
equal the
error probability and rate of
S
(
N
)
on
N
.
To make the argument formal, for each
(
u, v
)
, let
f
(
u
v
)
:
{
1
, . . . ,
2
NnR
(
u
v
)
}→{
1
, . . . ,
2
nR
(
u
v
)
}
N
be the natural one-to-one mapping from a single sequence of
N nR
(
u
v
)
bits to
N
consecutive subsequences
each of
nR
(
u
v
)
bits. Let
g
(
u
v
)
be the inverse of
f
(
u
v
)
. We use
f
(
u
v
)
to map messages from the
message alphabet of the rate-
R
(
u
v
)
blocklength-
N n
code
S
(
N
)
to the message alphabet for the
N
-layer,
rate-
R
(
u
v
)
, blocklength-
n
code
S
(
N
)
. The mapping is one-to-one since in each scenario the total n
umber
13
of bits transmitted from node
u
to node
v
is
N nR
(
u
v
)
. For each
t
∈{
1
. . . , n
}
, let
X
(
v
)
(
t
) = (
X
(
v
)
(
t
1)
N
+1
, . . . , X
(
v
)
tN
)
T
Y
(
v
)
(
t
) = (
Y
(
v
)
(
t
1)
N
+1
, . . . , Y
(
v
)
tN
)
T
denote vectors containing the channel inputs and outputs at
node
v
for
N
consecutive time steps beginning
at time
(
t
1)
N
+ 1
. This is a simple blocking of symbols into vectors, with supe
rscript
T
denoting vector
transpose. We define the solution
S
(
N
)
as
X
(
v
)
(
t
) =
X
(
v
)
t
(
Y
(
v
)
(1)
, . . . , Y
(
v
)
(
t
1)
, f
(
v
1)
(
W
(
v
1)
)
, . . . , f
(
v
m
)
(
W
(
v
m
)
))
ˆ
W
(
u
v
)
=
g
(
u
v
)
(
ˆ
W
(
u
v
)
(
Y
(
v
)
(1)
, . . . , Y
(
v
)
(
n
)
, f
(
v
1)
(
W
(
v
1)
)
, . . . , f
(
v
m
)
(
W
(
v
m
)
)))
.
Since
S
(
N
)
satisfies the causality constraints and operates precisely
the mappings from
S
(
N
)
on
N
, the
solution
S
(
N
)
achieves the same rate and error probability on
N
as the solution
S
(
N
)
achieves on
N
.
For the converse, the job is more difficult. A solution
(
λ,
R
)
-
S
(
N
)
needs to achieve an error probability
of at most
λ
for every
(
u, v
)
pair in a network. A solution
(
λ,
R
)
-
S
(
N
)
also needs to achieve an error
probability of at most
λ
for each
(
u, v
)
, but here the error event is a union over errors in each of the
N
layers with
N
growing arbitrarily large.
Let
R ∈
int
(
R
(
N
))
, and fix some
̃
R ∈
int
(
R
(
N
))
for which
̃
R
(
u
v
)
> R
(
u
v
)
for all
u, v
. We use a
solution of rate
̃
R
on
N
to build a stacked solution of rate
R
on
N
. Set
ρ
= min
u,v
(
̃
R
(
u
v
)
R
(
u
v
)
)
.
For any
p
[0
,
1]
, let
h
(
p
)
def
=
p
log
p
(1
p
) log(1
p
)
be the binary entropy function. For reasons
that will become clear later, we wish to find constants
λ
and
n
satisfying
max
u,v
̃
R
(
u
v
)
λ
+
h
(
λ
)
/n < ρ.
such that there exists a
(
λ,
̃
R
)
-
S
(
N
)
solution of blocklength
n
. This is possible because
̃
R ∈
int
(
R
(
N
))
implies that for any
λ
(0
,
1]
and all
n
sufficiently large there exists a blocklength-
n
(
λ,
̃
R
)
-
S
(
N
)
solution.
We therefore meet the desired constraint by choosing
λ
to be small (say
λ
=
ρ/
(2 max
i,j
̃
R
(
u
v
)
)
) and then
choosing
n
sufficiently large. The chosen
n
will be the blocklength of our code for all values of
N
.
Fix a
(
λ,
̃
R
)
-
S
(
N
)
solution of blocklength
n
. For the
(
λ,
̃
R
)
-
S
(
N
)
solution, denote the message set
by
̃
W
(
u
v
)
def
=
{
1
, . . . ,
2
n
̃
R
(
u
v
)
}
, and let
ˇ
W
(
u
v
)
and
ˆ
ˇ
W
(
u
v
)
be the message and its reconstruction,
respectively, using the fixed
(
λ,
̃
R
)
-
S
(
N
)
solution. We use
S
(
N
)
as the solution applied independently in
each layer of our stacked solution.
14