1
On the Benefit of Cooperation in Relay Networks
Oliver Kosut, Michelle Effros, Michael Langberg
Abstract
—This work addresses the cooperation facilitator (CF)
model, in which network nodes coordinate through a rate limited
communication device. For independent multiple-access channel
(MAC) encoders, the CF model is known to show significant
rate benefits, even when the rate of cooperation is negligible.
Specifically, the benefit in MAC sum-rate, as a function of
the cooperation rate
C
CF
, sometimes has an infinite slope at
C
CF
= 0
. This work studies the question of whether cooperation
through a CF can yield similar infinite-slope benefits when
applied to internal network encoders in which
dependence
among
MAC transmitters can be established without the help of the CF.
Towards this end, this work studies the CF model when applied to
relay nodes of a single-source, single-terminal, diamond network
consisting of a broadcast channel followed by a MAC. In
the relay channel with orthogonal receiver components, careful
generalization of the partial-decode-forward/compress-forward
lower bound to the CF model yields sufficient conditions for
an infinite-slope benefit. Additional results include derivation of
a family of diamond networks for which the infinite-slope rate-
benefit derives directly from the properties of the corresponding
MAC component when studied in isolation.
I. I
NTRODUCTION
The information theory and communication literatures ap-
proach the goal of improving network communication perfor-
mance in a variety of ways. While some studies investigate
how to get the best possible performance out of existing
networks, others seek better designs for future networks. In
practice, the way that networks improve over time is some-
where in between — a combination of adding new resources
and making better use of what is already there. We here seek
new tools for guiding that process, focusing on the questions
of whether and where small changes to an existing network
can have a big impact on network capacity.
One example of a network in which incremental network
modifications can achieve radical network improvement, in-
troduced in [1], employs the multiple-access channel (MAC)
and a node called a cooperation facilitator (CF). In practice, the
CF is any communicating device that can receive information
from multiple transmitters. In any MAC for which dependent
channel inputs from the transmitters would yield a higher
mutual information between the MAC’s inputs and output than
is achievable with the independent channel inputs employed in
calculating the MAC capacity, adding a small communication
link from the CF to either or both of the transmitters yields
O. Kosut is with the School of Electrical, Computer and Energy Engineering
at Arizona State University. Email:
okosut@asu.edu
M. Effros is with the Department of Electrical Engineering at the California
Institute of Technology. Email:
effros@caltech.edu
M. Langberg is with the Department of Electrical Engineering at
the University at Buffalo (State University of New York). Email:
mikel@buffalo.edu
This work is supported in part by NSF grants CCF-1817241, CCF-1908725,
and CCF-1909451.
a disproportionately large capacity improvement [1]. Specif-
ically, the curve describing the improvement in MAC sum-
capacity as a function of the capacity
C
CF
of the cooperation-
enabling CF output link has slope infinity at
C
CF
= 0
[1,
Theorem 3]. In some cases, even a single bit — not rate 1,
but a single bit no matter what the blocklength, suffices to
change the network capacity [2], [3].
Since the infinite-slope improvement in the MAC-capacity
results from creating dependence where none could otherwise
be observed, it is tempting to believe that the infinite-slope
phenomenon cannot occur either in cases where dependence
is already attainable or where dependence is not critical to
attaining the best possible performance. In this paper, we
explore these two intuitions — seeking to understand whether
incremental changes can achieve disproportionate channel
benefits in these scenarios.
Toward this end, we investigate a single coding framework
where both scenarios can arise. We pose this framework as a
diamond network in which a single transmitter communicates
to a collection of relays, and the relays work independently to
transmit information to a shared receiver. Since the communi-
cation goal in the diamond network is to transmit information
from a single transmitter at the start of the diamond network
to a single receiver at its end, dependence at the relays may
be available naturally; we investigate whether this availability
precludes the possibility of incremental change with dispropor-
tionate impact. When the links from relays to the receiver are
independent, point-to-point channels, the resulting degenerate
MAC fails to meet the prior condition specifying that input
dependence should increase sum capacity; we investigate
whether this failure precludes the desired small cost, large
benefit tradeoff to incremental network modifications.
The rest of this paper is organized as follows. In Section II,
we set up the problem of the diamond relay network with
N
relay nodes and a cooperation facilitator (Fig. 1), which allows
us to pose our main question about the power of cooperation
in a relay network. Our results focus on two special cases
of this network. The first, covered in Section III, is the relay
channel with orthogonal receiver components (Fig. 2). Here,
we present an achievability bound for the CF problem, as well
as sufficient conditions for the infinite-slope phenomenon to
occur. In Section IV, we explore a 3-relay example (Fig. 3)
that allows us to exploit the results of [1] on the MAC to
demonstrate the infinite-slope phenomenon in a larger network
with only one source.
II. P
ROBLEM
S
ETUP
Notation
: For any integer
k
,
[
k
]
denotes the set
{
1
,
2
,...,k
}
.
Capital letters (e.g.,
X
) denote random variables, lower-
case letters (e.g.,
x
) denote realizations of the corresponding
arXiv:2202.01878v1 [cs.IT] 3 Feb 2022
2
Fig. 1. General diamond relay network with
N
nodes and a cooperation facilitator (CF).
Fig. 2. Relay channel with orthogonal receiver components and a cooperation
facilitator, a special case of the general diamond relay network.
variable, and calligraphic letters (e.g.,
X
) denote the corre-
sponding alphabet. Vectors are denoted with superscript (e.g.,
x
n
= (
x
1
,...,x
n
)
). We use standard notation for mutual
information and entropy.
A diamond relay network with
N
relay nodes and a
cooperation facilitator (CF)—shown in Fig. 1—is given by
a broadcast channel
p
(
y
1
,...,y
N
|
x
)
, followed by a MAC
p
(
y
|
x
1
,...,x
N
)
. An
(
n,R
)
code for the diamond relay net-
work is composed of
•
an encoder
f
: [2
nR
]
→X
n
,
•
a CF-encoder
f
CF
:
∏
j
∈
[
N
]
Y
n
j
→
[2
nC
CF
]
,
•
a relay encoder
f
j
:
Y
n
j
×
[2
nC
CF
]
→X
n
j
for each
j
∈
[
N
]
,
•
a decoder
g
:
Y
n
→
[2
nR
]
.
The message
M
is assumed to be uniformly drawn from
[2
nR
]
. Encoded message
X
n
=
f
(
M
)
is transmitted by the
encoder into the broadcast channel, which outputs
Y
n
j
at relay
j
,
j
∈
[
N
]
. The CF observes all the outputs of the broadcast
channel, and encodes
K
=
f
CF
(
Y
n
1
,...,Y
n
N
)
, which is sent
to each relay. Relay
j
encodes
X
n
j
=
f
j
(
Y
n
j
,K
)
and transmits
it into the MAC. Finally, the output signal
Y
n
is received and
decoded to
ˆ
M
=
g
(
Y
n
)
. The overall probability of error is
given by
P
(
n
)
e
=
P
(
M
6
=
ˆ
M
)
. We say a rate
R
is
achievable
if there exists a sequence of
(
n,R
)
codes with
P
(
n
)
e
→
0
. The
capacity
C
(
C
CF
)
is the supremum of all achievable rates for
a given CF capacity
C
CF
. This function is non-decreasing in
C
CF
, and so its derivative
C
′
(
C
CF
)
is non-negative.
We are interested in characterizing
C
(
C
CF
)
, but more
specifically, we are focused on the following question:
Main question:
For a given network, is
C
′
(0) =
∞
?
III. R
ELAY
C
HANNEL WITH
O
RTHOGONAL
R
ECEIVER
C
OMPONENTS
The first special case of the diamond relay network that
we focus on is the relay channel with orthogonal receiver
components. Here, we specialize the general model described
above in several ways. First, we assume there are only
N
= 2
relay nodes, and further we assume that the received signal at
the decoder is made up of orthogonal components, one from
each relay. That is,
Y
= (
Y
a
,Y
b
)
, where the MAC model
factors as
p
(
y
a
,y
b
|
x
1
,x
2
) =
p
(
y
a
|
x
1
)
p
(
y
b
|
x
2
)
.
(1)
Given this factorization, the capacity of the overall network
depends on the channels from
X
1
to
Y
a
and from
X
2
to
Y
b
only through their capacities [4]. Thus, we can simplify the
problem by replacing these noisy channels with rate-limited
bit-pipes of capacities
C
1
and
C
0
from each relay to the
decoder. Finally, we assume that
C
1
=
∞
; i.e., we assume that
Relay 1 is able to transmit all of its information (consisting of
Y
n
1
as well as the CF signal
K
) directly to the decoder. These
simplifications yield the network model shown in Fig. 2. Note
that this network only has one relay node, so it makes sense to
call it a relay channel model rather than a diamond network
model. We have also relabelled
Y
2
as
Y
r
to emphasize that
it is the relay’s received signal; this also makes the notation
consistent with [5], [6].
A. Main Achievability Result
Theorem 1:
Consider a relay channel with orthogonal
receiver components with broadcast channel distribution
p
(
y
r
,y
1
|
x
)
, capacity
C
0
from relay to destination, and the CF
capacity
C
CF
. Rate
R
is achievable if
R
≤
I
(
U
;
Y
r
) + min
{
I
(
X
;
Y
1
,Y
r
|
U
)
,I
(
X
;
Y
1
,V
|
U
)
}
,
(2)
R
≤
min
{
I
(
U
;
Y
1
)
,I
(
U,Y
r
)
}
+
I
(
X
;
Y
1
|
U
)
+
I
(
V
;
X,Y
1
|
U
)
−
I
(
Y
r
;
V
|
U
) +
C
0
,
(3)
C
CF
≥
I
(
X,Y
1
;
V
|
U,Y
r
)
,
(4)
for some distribution
p
(
u,x
)
p
(
y
r
,y
1
|
x
)
p
(
v
|
x,y
1
,y
r
,u
)
.
(5)
Proof:
See Appendix A.
3
Remark 1:
Thm. 1 reduces to the well-known combined
partial-decode-forward/compress-forward lower bound for the
standard problem (without a CF), which originated in [7]. For
the relay channel with orthogonal receiver components, [5]
showed that this classical bound can be written as follows:
rate
R
is achievable if
R
≤
I
(
U
;
Y
r
) +
I
(
X
;
Y
1
,V
|
U
)
,
(6)
R
≤
min
{
I
(
U
;
Y
1
)
,I
(
U
;
Y
r
)
}
+
I
(
X
;
Y
1
|
U
) +
C
0
−
I
(
Y
r
;
V
|
U,X,Y
1
)
(7)
for some distribution
p
(
u,x
)
p
(
y
r
,y
1
|
x
)
p
(
v
|
y
r
,u
)
.
(8)
In Thm. 1, removing the CF is equivalent to setting
C
CF
=
0
. Thus, (4) implies the Markov chain
(
X,Y
1
)
−
(
U,Y
r
)
−
V
,
which implies that the variables have a joint distribution
that factors as (8). Thus, by the data processing inequality,
I
(
X
;
Y
1
,V
|
U
)
≤
I
(
X
;
Y
1
,Y
r
|
U
)
, so (2) becomes (6). In
addition,
I
(
V
;
X,Y
1
|
U
)
−
I
(
Y
r
;
V
|
U
)
(9)
=
−
H
(
V
|
U,X,Y
1
) +
H
(
V
|
U,Y
r
)
(10)
=
−
H
(
V
|
U,X,Y
1
) +
H
(
V
|
U,Y
r
,X,Y
1
)
(11)
=
−
I
(
Y
r
;
V
|
U,X,Y
1
)
,
(12)
so (3) becomes (7).
B. Sufficient Conditions for Infinite Slope
The following theorem provides a sufficient condition for
which, given a starting achievable point for the partial-decode-
forward/compress-forward bound without cooperation (i.e., the
bound in (6)–(7)), the achievable rate from Thm. 1 with coop-
eration improves over the starting point and this improvement
has infinite slope as a function of
C
CF
.
Theorem 2:
Fix a distribution
p
(
u,x
)
p
(
v
|
y
r
,u
)
. Let
R
be
a rate satisfying the no-cooperation achievability conditions
in (6)–(7) for this distribution. Suppose
I
(
X
;
Y
1
,V
|
U
)
<
I
(
X
;
Y
1
,Y
r
|
U
)
, and there does
not
exist
λ
∈
[0
,
1]
and
γ
(
u,x,y
1
,y
r
)
∈
R
for each
x,y
1
,y
r
,u
such that
p
(
v
|
u,x,y
1
) =
p
(
v
|
u,y
1
)
λ
p
(
v
|
u,y
r
)
1
−
λ
γ
(
u,x,y
1
,y
r
)
(13)
for all
u,x,y
1
,y
r
,v
where
p
(
u,x,y
1
,y
r
)
>
0
,p
(
v
|
y
r
,u
)
>
0
.
Then
lim
C
CF
→
0
C
(
C
CF
)
−
R
C
CF
=
∞
.
(14)
Proof:
See Appendix B.
Remark 2:
We note that there are two ways for (14) to hold:
(1)
C
(0)
> R
; that is, the rate
R
, while achievable without
cooperation, is smaller than the no-cooperation capacity of the
relay channel; (2)
C
(0) =
R
, and
C
′
(0) =
∞
. Here, the rate
R
is the no-cooperation capacity, so (14) indicates that the
CF really can improve the capacity of the relay network in an
infinite-slope manner. Thus, this latter case is the one we are
particularly interested in, as it gives an affirmative answer to
the Main Question. Unfortunately, for any problem instance
for which a matching converse for the no-cooperation setting
is unavailable, even if (14) holds, there is no way to know
which situation we are in. Still, if
R
represents the best-known
achievable rate for a given network, (14) has a non-trivial
consequence, showing that the state-of-the-art can be improved
disproportionately by a small amount of cooperation.
While the condition in Thm. 2 is sometimes hard to verify,
the following corollary provides a simpler sufficient condition
for the same conclusion.
Corollary 3:
Assume that
p
(
y
1
,y
r
|
x
)
>
0
for all letters
x,y
1
,y
r
. Consider any distribution
p
(
u,x
)
p
(
v
|
u,y
r
)
. Let
R
be a rate satisfying (6)–(7) for this distribution. Then at least
one of the following possibilities hold:
1) there exists a function
g
:
U ×Y
r
→ V
where rate
R
satisfies (6)–(7) with
V
=
g
(
U,Y
r
)
.
2) (14) holds.
R
PD/C
(0)
p
(
v
|
u,y
r
)
is
non-deterministic;
i.e.,
0
< p
(
v
|
u,y
r
)
<
1
for some
u,v,y
r
. Then
R
′
PD/C
(0) =
∞
.
Proof:
See Appendix C.
C. Example Relay Channels
For some relay channels, [8] showed that the compress-
forward bound achieves capacity. Thus, it is possible to
definitively answer the Main Question for these channels. The
following example illustrates one such channel.
Example 1:
Let
X
∈ {
0
,
1
}
,
Y
1
=
X
⊕
Z
,
Y
r
=
Z
⊕
W
, where
⊕
indicates modulo-2 addition,
Z
∼
Ber
(
p
)
,
W
∼
Ber
(
δ
)
, and
X,Z,W
are mutually independent. For this
channel, the capacity without cooperation is shown in [8] to
be given by
C
(0) =
max
p
(
v
|
y
r
):
I
(
Y
r
;
V
)
≤
C
0
1
−
H
(
Z
|
V
)
.
(15)
Moreover, this rate is achieved by compress-forward by choos-
ing
X
∼
Ber
(1
/
2)
and setting
p
(
v
|
y
r
)
to be the distribution
achieving the maximum in (15). Corollary 3 applies to this
channel, since
p
(
y
1
,y
r
|
x
)
>
0
for all
(
x,y
1
,y
r
)
as long as
p,δ
∈
(0
,
1)
. Moreover, the only deterministic distributions
from
Y
r
to
V
are where either
V
is a constant, or
V
=
Y
r
(or equivalent). It is easy to see that as long as
0
< C
0
<
H
(
Y
r
) =
H
(
p
⊕
δ
)
, neither of these choices for
V
is optimal.
Therefore, in all non-trivial cases,
C
′
(0) =
∞
for this channel.
The following relay channel example is one for which the
no-cooperation capacity is not known. However, we can verify
the sufficient condition from Thm. 2, thus showing that an
infinite-slope improvement is possible through cooperation.
Example 2:
Let
X
∈ {
0
,
1
}
, and let
p
(
y
1
,y
r
|
x
) =
p
(
y
1
|
x
)
p
(
y
r
|
x
)
, where each of the two component channels
is a binary erasure channel (BEC) with erasure probability
p
.
An achievable rate for the no-cooperation case from (6)–(7)
is given by taking
U
=
∅
,
X
to be uniform on
{
0
,
1
}
, and
p
(
v
|
y
r
)
to be a channel that further erases any un-erased bit
with probability
q
; that is,
p
(
v
|
y
r
) =
1
−
q v
=
y
r
∈{
0
,
1
}
q
v
=
e, y
r
∈{
0
,
1
}
,
1
v
=
y
r
=
e,
0
otherwise
.
(16)
4
This leads to the achievable rate
R
= max
q
∈
[0
,
1]
min
{
(1
−
p
)(1 +
p
(1
−
q
))
,
1
−
p
−
H
((1
−
p
)(1
−
q
)) + (1
−
p
)
H
(
q
) +
C
0
}
(17)
where
H
(
·
)
is the binary entropy function.
Note that this channel does not satisfy the conditions of
Corollary 3, since
p
(
y
1
,y
r
|
x
)
is not always positive. Instead,
we verify the sufficient condition of Thm. 2 directly. Suppose
that there exists a
λ
and
γ
satisfying (13). Note that
Y
1
−
X
−
Y
r
−
V
is a Markov chain, so
p
(
v
|
x,y
1
) =
p
(
v
|
x
)
. For
x
∈{
0
,
1
}
and any
y
1
∈{
x,e
}
p
V
|
X
(
x
|
x
)
p
V
|
X
(
e
|
x
)
=
(1
−
p
)(1
−
q
)
1
−
(1
−
p
)(1
−
q
)
(18)
=
p
V
|
Y
1
(
x
|
y
1
)
λ
(1
−
q
)
1
−
λ
γ
′
(
x,y
1
,x
)
·
γ
′
(
x,y
1
,x
)
p
V
|
Y
1
(
e
|
y
1
)
λ
q
1
−
λ
(19)
=
(
1
−
q
q
)
1
−
λ
(
p
V
|
Y
1
(
x
|
y
1
)
p
V
|
Y
1
(
e
|
y
1
)
)
λ
(20)
=
(
1
−
q
q
)
1
−
λ
(
(1
−
p
)(1
−
q
)
1
−
(1
−
p
)(1
−
q
)
)
λ
, y
1
=
x
(
1
2
(1
−
p
)(1
−
q
)
1
−
(1
−
p
)(1
−
q
)
)
λ
, y
1
=
e.
(21)
This cannot hold with equality for both
y
1
=
x
and
y
1
=
e
unless
p
= 0
,
p
= 1
, or
q
= 1
. Therefore, except in these
trivial cases, infinite slope improvement occurs.
IV. 3-
RELAY NETWORK EXAMPLE
In the analysis of Theorem 1 and
C
′
(0)
for the orthogonal-
receiver setting given in Sections III-A and III-B, relay-
cooperation is governed by the statistics of the broadcast
channel
p
(
y
1
,y
r
|
x
)
and, roughly speaking, is designed to
“remove” from
Y
n
r
message information that can be obtained
at the receiver from
Y
n
1
. In this aspect, we say that the design
of cooperation information looks
backwards
and is governed
by the broadcast channel of the diamond network.
In this section, we study a
forward
form of cooperation, that
takes into account the MAC appearing in the second stage of
the diamond network. For forward-looking cooperation, it is
tempting to treat the MAC “in isolation”, rather than as part
of a larger network, and to design cooperation solely based
on the MAC noise statistics, as done in [1], [3]. In general,
designing cooperation by treating the MAC as an isolated
component may not suffice to improve communication of the
diamond network, because MAC encoders in the diamond
network potentially hold dependent information resulting from
the broadcast stage of communication. Nevertheless, in what
follows, we present a family of 3-relay diamond networks
for which cooperation-gain in the network as a whole is
derived directly from the MAC cooperation-gain when studied
in isolation; the latter is well understood and given in [1]. Our
network family is described below and depicted in Figure 3.
Consider the
diamond network
defined by broadcast
channel
(
X
,p
(
y
0
,y
1
,y
2
|
x
)
,
Y
0
,
Y
1
,
Y
2
)
,
and
MAC
(
X
0
,
X
1
,
X
2
,q
(
y
|
x
0
,x
1
,x
2
)
,
Y
)
.
More
specifically,
as
depicted
in
Figure
3,
consider
the
case
in
which
X
=
Y
0
=
Y
1
=
X
0
=
X
1
=
{
0
,
1
}
;
Y
2
=
X
2
=
{
0
,
1
}
2
;
X
Y
1
Y
0
Y
2
= X
2
=
(
X
⊕
Y
Z
,Z
)
X
1
X
0
Y=
(
Y
W
,
X
2
)
f
0
f
1
Y
0
,Y
1
,
Z
~
Ber
(
0.5
)
iid
W
Fig. 3. The 3-relay diamond network of Claim 4.
Y
=
Y
W
×X
2
for a given memoryless 2-user binary MAC
W
:
(
X
0
,
X
1
,p
W
(
y
W
|
x
0
,x
1
)
,
Y
W
)
; for any
x
,
p
(
Y
0
,Y
1
,Y
2
|
x
)
induces
(
Y
0
,Y
1
,Y
2
)
where
Y
0
,
Y
1
,
Z
are independent Bernoulli(0.5)
random variables and
Y
2
= (
x
⊕
Y
Z
,Z
)
;
X
n
0
=
f
0
(
Y
n
0
)
,
X
n
1
=
f
1
(
Y
n
1
)
,
X
n
2
=
f
2
(
Y
n
2
)
for relay encoders
f
0
,
f
1
,
and
f
2
; and
q
(
Y
|
x
0
,x
1
,x
2
)
for which
Y
= (
Y
W
,x
2
)
. As
Y
n
holds the value of
X
n
2
∈ {
0
,
1
}
n
, which in turn depends
on
Y
n
2
∈ {
0
,
1
}
n
through
f
2
, we assume without loss of
generality that
X
n
2
=
Y
n
2
.
Using the independent nature of relays
Y
0
and
Y
1
, in
Claim 4 below we tie the cooperation gain of the 2-transmitter
MAC
W
with the cooperation gain of the diamond network.
Claim 4:
Let
C
sum
(
C
CF
)
be the sum-capacity of the 2-
transmitter MAC
W
with user cooperation of rate
C
CF
.
Then the capacity
C
(
C
CF
)
of the diamond network satisfies
C
′
(0) =
∞
if
C
′
sum
(0) =
∞
.
Proof:
See Appendix D.
R
EFERENCES
[1] P. Noorzad, M. Effros, and M. Langberg, “The unbounded benefit
of encoder cooperation for the k-user MAC,”
IEEE Transactions on
Information Theory
, vol. 64, no. 5, pp. 3655–3678, 2018.
[2] M. Langberg and M. Effros, “On the capacity advantage of a single bit,”
in
IEEE Workshop on Network Coding and Applications (NetCod)
, 2016,
pp. 1–6.
[3] O. Kosut, M. Effros, and M. Langberg, “Every bit counts: Second-
order analysis of cooperation in the multiple-access channel,” in
IEEE
International Symposium on Information Theory, (ISIT)
, 2021, pp. 2214–
2219.
[4] R. Koetter, M. Effros, and M. M
́
edard, “A theory of network equivalence
– Part I: Point-to-point channels,”
IEEE Trans. Inf. Theory
, vol. 57, pp.
972–995, Feb. 2011.
[5] A. El Gamal, A. Gohari, and C. Nair, “Achievable rates for the relay
channel with orthogonal receiver components,” in
2021 IEEE Information
Theory Workshop (ITW)
, 2021, pp. 1–6.
[6] ——, “Strengthened cutset upper bound on the capacity of the relay
channel and applications,” in
2021 IEEE International Symposium on
Information Theory (ISIT)
, 2021, pp. 1344–1349.
[7] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,”
IEEE Transactions on Information Theory
, vol. 25, no. 5, pp. 572–584,
1979.
[8] M. Aleksic, P. Razaghi, and W. Yu, “Capacity of a class of modulo-sum
relay channels,”
IEEE Transactions on Information Theory
, vol. 55, no. 3,
pp. 921–930, 2009.
[9] A. El Gamal and Y. Kim,
Network Information Theory
.
Cambridge
University Press, 2011.
5
A
PPENDIX
A
P
ROOF OF
T
HEOREM
1
We denote
T
(
n
)
as the robustly typical set. See [9] for
the definition, as well as the formal statement of the packing
lemma, which will be used in the proof.
We employ the following lemma, which is a slight variation
on the covering lemma from [9].
Lemma 5:
Let
(
U,X,
ˆ
X
)
∼
p
(
u,x,
ˆ
x
)
and
′
<
. Let
(
u
n
,x
n
)
∈
T
′
(
U,X
)
be a pair of fixed sequences, and let
ˆ
X
n
(
m
)
,m
∈ A
, where
|A| ≥
2
nR
, be random sequences,
conditionally independent of each other, each uniformly dis-
tributed on
T
(
n
)
(
ˆ
X
|
u
n
)
. Let
M
?
be the smallest
m
for which
ˆ
X
n
(
m
)
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
.
(22)
If there is no such
m
, we say
M
?
is undefined. Then,
1) there exists
δ
(
)
that tends to zero as
→
0
such that
lim
n
→∞
P
(
M
?
is undefined
) = 0
, if
R > I
(
X
;
ˆ
X
|
U
) +
δ
(
)
,
2) conditioning on the event that
M
?
is defined,
ˆ
X
n
(
M
?
)
is uniformly distributed in
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
.
Proof:
For any
m
∈A
, we have
P
((
u
n
,x
n
,
ˆ
X
n
(
m
))
∈
T
(
n
)
|
U
n
=
u
n
,X
n
=
x
n
)
(23)
=
P
((
u
n
,x
n
,
ˆ
X
n
(
m
))
∈
T
(
n
)
|
U
n
=
u
n
)
(24)
=
∑
ˆ
x
n
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
1
|
T
(
n
)
(
ˆ
X
|
u
n
)
|
(25)
=
|
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
|
|
T
(
n
)
(
ˆ
X
|
u
n
)
|
(26)
≥
2
−
n
(
I
(
X
;
ˆ
X
|
U
)+
δ
(
))
.
(27)
The remainder of the proof of statement 1 follows from an
identical argument to that of the standard covering lemma,
i.e., [9, Lemma 3.3].
To prove statement 2, note first that, if
M
?
is defined, then
by definition
ˆ
X
n
(
M
?
)
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
. Now, for any
ˆ
x
n
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
,
P
(
ˆ
X
n
(
M
?
) = ˆ
x
n
|
M
?
is defined
)
(28)
=
∑
m
∈A
P
(
M
?
=
m
|
M
?
is defined
)
P
(
ˆ
X
n
(
m
) = ˆ
x
n
∣
∣
∣
ˆ
X
n
(
m
′
)
/
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
for all
m
′
< m,
ˆ
X
n
(
m
)
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
)
(29)
=
∑
m
∈A
P
(
M
?
=
m
|
M
?
is defined
)
P
(
ˆ
X
n
(
m
) = ˆ
x
n
∣
∣
ˆ
X
n
(
m
)
∈
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
)
(30)
=
∑
m
∈A
P
(
M
?
=
m
|
M
?
is defined
)
1
|
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
|
(31)
=
1
|
T
(
n
)
(
ˆ
X
|
u
n
,x
n
)
|
(32)
where (30) holds since
ˆ
X
n
(
m
)
are independent for different
m
, and (31) since
ˆ
X
n
(
m
)
is uniform on
T
(
n
)
(
ˆ
X
|
u
n
)
, and
T
(
n
)
(
ˆ
X
|
u
n
)
⊂
T
(
n
)
(
ˆ
X
|
u
n
)
. This proves statement 2.
We now proceed to the main proof of the theorem. The fol-
lowing argument combines partial-decode-forward/compress-
forward strategies, as discussed in Remark 1. Fix rates
R
a
,R
b
,S
to be determined, where
R
a
+
R
b
=
R
. Also fix
small constants
0
<
′
<
. We construct a code as follows.
Codebook generation:
•
For each
m
a
∈
[2
nR
a
]
, generate
u
n
(
m
a
)
∼
∏
n
i
=1
p
U
(
u
i
)
.
•
For each
m
a
∈
[2
nR
a
]
,m
b
∈
[2
nR
b
]
, generate
x
n
(
m
a
,m
b
)
∼
∏
n
i
=1
p
X
|
U
(
x
i
|
u
i
(
m
a
))
.
•
For each
m
a
∈
[2
nR
a
]
,
`
∈
[2
nS
]
,
k
∈
[2
nC
CF
]
,
generate
v
n
(
m
a
,`,k
)
∼
Unif
[
T
(
n
)
(
V
|
u
n
(
m
a
))
]
, and
corresponding source coding bins
k
∈
[2
nC
CF
]
, generate
m
0
(
m
a
,`,k
)
∼
Unif
[2
nC
0
]
.
•
For each
m
a
∈
[2
nR
a
]
,
y
n
r
∈ Y
n
r
and
k
∈
[2
nC
CF
]
, let
`
(
m
a
,y
n
r
,k
)
be the smallest
`
∈
[2
nS
]
such that
(
u
n
(
m
a
)
,y
n
r
,v
n
(
m
a
,`,k
))
∈
T
(
n
)
(
U,Y
r
,V
)
.
(33)
If there is no such
`
, we say
`
(
m
a
,y
n
r
,k
)
is undefined.
Encoding:
At the transmitter, given message
m
=
(
m
a
,m
b
)
, send
x
n
(
m
a
,m
b
)
.
CF coding:
At the CF, given
y
n
1
and
y
n
r
, first find the unique
pair
ˆ
m
a
,
ˆ
m
b
such that
(
u
n
( ˆ
m
a
)
,x
n
( ˆ
m
a
,
ˆ
m
b
)
,y
n
1
,y
n
r
)
∈
T
(
n
)
(
U,X,Y
1
,Y
r
)
.
(34)
Next, find the smallest
k
∈
[2
nC
CF
]
such that
`
(
m
a
,y
n
r
,k
)
is
defined, and
(
u
n
( ˆ
m
a
)
,x
n
( ˆ
m
a
,
ˆ
m
b
)
,y
n
r
,y
n
1
,v
n
( ˆ
m
a
,`
( ˆ
m
a
,y
n
r
,k
)
,k
))
∈
T
(
n
)
(
U,X,Y
r
,Y
1
,V
)
.
(35)
Send this
k
. If there is no such
k
, declare an error.
Relay coding:
At the relay, given
y
n
r
and
k
, first find
ˆ
m
a
such that
(
u
n
( ˆ
m
a
)
,y
n
r
)
∈
T
(
n
)
(
U,Y
r
)
.
(36)
Then let
`
=
`
( ˆ
m
a
,y
n
r
,k
)
, and send
m
0
( ˆ
m
a
,`,k
)
. If
`
( ˆ
m
a
,y
n
r
,k
)
is undefined, declare an error.
Decoding:
At the decoder, given
y
n
1
,
m
0
, and
k
, find
ˆ
m
a
,
ˆ
m
b
,
ˆ
`
such that
(
u
n
( ˆ
m
a
)
,x
n
( ˆ
m
a
,
ˆ
m
b
)
,y
n
1
,v
n
( ˆ
m
a
,
ˆ
`,k
))
∈
T
(
n
)
,
(37)
m
0
( ˆ
m
a
,
ˆ
`,k
) =
m
0
.
(38)
Error analysis:
Throughout the error analysis, we assume
without loss of generality that
m
a
=
m
b
= 1
. To prove that
m
a
is decoded correctly at both the CF and the relay, and that
m
b
is decoded correctly at the CF, it suffices to consider the
following error events:
E
1
=
{
(
u
n
(1)
,x
n
(1
,
1)
,y
n
r
,y
n
1
)
/
∈
T
(
n
)
′
(
U,X,Y
r
,Y
1
)
}
,
(39)
E
2
=
{
(
u
n
(
m
a
)
,y
n
r
)
∈
T
(
n
)
(
U,Y
r
)
for some
m
a
6
= 1
}
(40)
E
3
=
{
(
u
n
(1)
,x
n
(1
,m
b
)
,y
n
r
,y
n
1
)
∈
T
(
n
)
(
U,X,Y
r
,Y
1
)
for some
m
b
6
= 1
}
.
(41)
By the law of large numbers
P
(
E
1
)
→
0
. By the packing
lemma,
P
(
E
2
)
→
0
and
P
(
E
3
)
→
0
if
R
a
< I
(
U
;
Y
r
)
,
(42)
6
R
b
< I
(
X
;
Y
1
,Y
r
|
U
)
.
(43)
We now show that
`
(1
,y
n
r
,k
)
is defined for most values of
k
. For each
k
∈
[2
nC
CF
]
, define the event
E
4
(
k
) =
{
(
u
n
(1)
,y
n
r
,v
n
(1
,`,k
))
/
∈
T
(
n
)
(
U,Y
r
,V
)
for all
`
∈
[2
nS
]
}
.
(44)
Since we have already established that
P
(
E
1
)
→
0
, by
Lemma 5,
P
(
E
4
(
k
))
→
0
if
S > I
(
Y
r
;
V
|
U
)
.
(45)
Moreover, Lemma 5 asserts that, given
`
(1
,y
n
r
,k
)
is
defined,
v
n
(1
,`
(1
,y
n
r
,k
)
,k
)
is uniformly distributed on
T
(
n
)
(
V
|
u
n
(1)
,y
n
r
)
. Now consider the event
E
5
=
{
|{
k
:
`
(1
,y
n
r
,k
)
is defined
}|
<
(1
−
)2
nC
CF
}
.
(46)
Since
`
(1
,y
r
,k
)
is defined if and only if
E
4
(
k
)
does not occur,
it is straightforward to show that
P
(
E
5
)
→
0
as
n
→∞
. Now
consider the error event in which the CF cannot find a value
of
k
to transmit, i.e.,
E
6
=
{
(
u
n
(1)
,x
n
(1
,
1)
,y
n
r
,y
n
1
,v
n
(1
,`
(1
,y
n
r
,k
)
,k
))
/
∈
T
(
n
)
for all
k
∈
[2
nC
CF
]
}
.
(47)
We may now apply Lemma 5 a second time to find that
P
(
E
6
)
→
0
if
C
CF
> I
(
X,Y
1
;
V
|
U,Y
r
)
.
(48)
Let us further assume without loss of generality that
k
= 1
,
`
(1
,y
n
,
1) = 1
, and
m
0
(1
,
1
,
1) = 1
. Assuming that error
events
E
1
,
E
2
,
E
3
,
E
6
do not occur, the relay selects
ˆ
m
a
=
`
=
m
0
= 1
, and
(
u
n
(1)
,x
n
(1
,
1)
,y
n
r
,y
n
1
,v
n
(1
,
1
,
1))
∈
T
(
n
)
.
(49)
Now consider the following decoding error events:
E
7
=
{
(
u
n
(1)
,x
n
(1
,m
b
)
,y
n
1
,v
n
(1
,
1
,
1))
∈
T
(
n
)
for some
m
b
6
= 1
}
,
(50)
E
8
=
{
(
u
n
(1)
,x
n
(1
,m
b
)
,y
n
1
,v
n
(1
,`,
1))
∈
T
(
n
)
,
m
0
(1
,`,
1) = 1
for some
m
b
6
= 1
,`
6
= 1
}
,
(51)
E
9
=
{
(
u
n
(
m
a
)
,x
n
(
m
a
,m
b
)
,y
n
1
,v
n
(
m
a
,`,
1))
∈
T
(
n
)
,
m
0
(
m
a
,`,
1) = 1
for some
m
a
6
= 1
,m
b
,`
}
.
(52)
Applying the packing lemma several times,
P
(
E
7
)
→
0
if
R
b
< I
(
X
;
Y
1
,V
|
U
)
,
(53)
P
(
E
8
)
→
0
if
R
b
+
S < I
(
X
;
Y
1
|
U
) +
I
(
V
;
X,Y
1
|
U
) +
C
0
,
(54)
and
P
(
E
9
)
→
0
if
R
a
+
R
b
+
S < I
(
U,X
;
Y
1
) +
I
(
V
;
X,Y
1
|
U
) +
C
0
.
(55)
We now collect the various rate conditions required for all
of the error event probabilities to vanish. It is advantageous if
S
is as small as possible; from the lower limit in (45), we may
assume that
S
is slightly larger than
I
(
Y
r
;
V
|
U
)
. We have three
conditions on
R
b
, namely (43), (53), and (54). Combining each
of these with the condition on
R
a
in (42), and recalling that
R
=
R
a
+
R
b
, we need
R < I
(
U
;
Y
r
) +
I
(
X
;
Y
1
,Y
r
|
U
)
,
(56)
R < I
(
U
;
Y
r
) +
I
(
X
;
Y
1
,V
|
U
)
,
(57)
R < I
(
U
;
Y
r
) +
I
(
X
;
Y
1
|
U
) +
I
(
V
;
X,Y
1
|
U
) +
C
0
−
I
(
Y
r
;
V
|
U
)
.
(58)
Furthermore, from (55) we need
R < I
(
U,X
;
Y
1
) +
I
(
V
;
X,Y
1
|
U
)
−
I
(
Y
r
;
V
|
U
) +
C
0
(59)
=
I
(
U
;
Y
1
) +
I
(
X
;
Y
1
|
U
) +
I
(
V
;
X,Y
1
|
U
)
−
I
(
Y
r
;
V
|
U
) +
C
0
.
(60)
Therefore, the conditions in the statement of the theorem
imply that
R
a
,R
b
,S
can be found such that all of the above
conditions hold.
A
PPENDIX
B
P
ROOF OF
T
HEOREM
2
Under the starting distribution
p
(
u,x
)
p
(
y
1
,y
r
|
x
)
p
(
v
|
u,y
r
)
,
(61)
I
(
X,Y
1
;
V
|
U,Y
r
) = 0
. To show (14), we modify this distribu-
tion slightly, in a way that gives
I
(
X,Y
1
;
V
|
U,Y
r
)
>
0
, which
corresponds to positive
C
CF
, while increasing the achieved
rate. In particular, we leave
p
(
u,x
)
fixed, but change the
conditional distribution for
v
to
q
(
v
|
u,x,y
1
,y
r
) =
p
(
v
|
u,y
r
) +
αr
(
v
|
u,x,y
1
,y
r
)
(62)
where
α
≈
0
. For a variable
A
⊂ {
U,X,Y
1
,Y
r
}
, we further
define
r
(
v
|
a
)
, for example by
r
(
v
|
u,y
r
) =
∑
x,y
1
p
(
x,y
1
|
u,y
r
)
r
(
v
|
u,x,y
1
,y
r
)
.
(63)
Thus
q
(
v
|
a
) =
p
(
v
|
a
) +
αr
(
v
|
a
)
. In order for
q
to be a valid
distribution, we need
∑
v
r
(
v
|
u,x,y
1
,y
r
) = 0
for all
u,x,y
1
,y
r
.
(64)
Thus, these
r
functions are not really distributions; instead
they satisfy
∑
v
r
(
v
|
a
) = 0
for any variable
A
. Moreover,
if
p
(
v
|
u,y
r
) = 0
, then in order for
q
to be valid, we
need
r
(
v
|
u,x,y
1
,y
r
)
≥
0
; we here make the simplifying
assumption that
r
(
v
|
u,x,y
1
,y
r
) = 0
for any
u,y
r
,v
where
p
(
v
|
u,y
r
) = 0
. This assumption has the following conse-
quence. Suppose for some
(
u,x,y
1
,y
r
,v
)
,
p
(
u,x,y
1
,y
r
,v
) =
0
. Recalling
p
(
u,x,y
1
,y
r
,v
) =
p
(
u,x,y
1
,y
r
)
p
(
v
|
u,y
r
)
(65)
it must be true that either
p
(
u,x,y
1
,y
r
)
or
p
(
v
|
u,y
r
)
is zero.
That is, if
p
(
u,x,y
1
,y
r
)
>
0
, then
r
(
v
|
u,x,y
1
,y
r
) = 0
, so
q
(
v
|
u,y
1
,y
r
,u
) = 0
. In particular
p
(
u,x,y
1
,y
r
)
q
(
v
|
u,x,y
1
,y
r
)
(66)
vanishes if and only if
p
(
u,x,y
1
,y
r
,v
)
vanishes.
7
We are changing the distribution of
V
, but not of
U
, so
many terms in the lower bounds in Thm. 1 do not change
with
α
. Define the following functions:
f
1
(
α
) =
I
q
(
X
;
V
|
U,Y
1
)
,
f
2
(
α
) =
I
q
(
V
;
X,Y
1
|
U
)
−
I
q
(
Y
r
;
V
|
U
)
,
C
CF
(
α
) =
I
q
(
X,Y
1
;
V
|
U,Y
r
)
.
Note that
f
1
and
f
2
include all the terms that change
with
V
in (2) and (3) respectively. Since by assumption,
I
(
X
;
Y
1
,V
|
U
)
< I
(
X
;
Y
1
,Y
r
|
U
)
, if
f
1
increases with
α
then
so does the right-hand side of (2). Thus, to prove (14), it is
enough for
C
′
CF
(0) = 0
, f
′
1
(0)
>
0
, f
′
2
(0)
>
0
.
(67)
We first show that, given the assumptions we have already
made,
C
′
CF
(0) = 0
. We have
C
CF
(
α
) =
I
q
(
X,Y
1
;
V
|
U,Y
r
)
(68)
=
D
(
q
(
u,x,y
1
,y
r
,v
)
‖
q
(
u,x,y
1
,y
r
)
q
(
v
|
u,y
r
))
(69)
=
∑
u,x,y
1
,y
r
p
(
u,x,y
1
,y
r
)
D
(
q
(
v
|
u,x,y
1
,y
r
)
‖
q
(
v
|
u,y
r
))
.
(70)
Note that
p
(
v
|
x,y
1
,y
r
,u
) =
p
(
v
|
u,y
r
)
, so
C
CF
(0) = 0
. To
find
C
′
CF
(
α
)
, consider an arbitrary function of the form
f
(
α
) =
D
(
p
(
x
) +
αr
1
(
x
)
‖
p
(
x
) +
αr
2
(
x
))
(71)
=
∑
x
(
p
(
x
) +
αr
1
(
x
)) log
p
(
x
) +
αr
1
(
x
)
p
(
x
) +
αr
2
(
x
)
(72)
where
∑
x
r
i
(
x
) = 0
for
i
= 1
,
2
, and
r
1
(
x
) =
r
2
(
x
) =
0
whenever
p
(
x
) = 0
. Thus, the only relevant terms in the
summation are where
p
(
x
)
>
0
, so
f
′
(0) = lim
α
→
0
f
′
(
α
)
(73)
= lim
α
→
0
∑
x
:
p
(
x
)
>
0
[
r
1
(
x
) log
p
(
x
) +
αr
1
(
x
)
p
(
x
) +
αr
2
(
x
)
+
r
1
(
x
)
(74)
−
r
2
(
x
)
p
(
x
) +
αr
1
(
x
)
p
(
x
) +
αr
2
(
x
)
]
(75)
=
∑
x
:
p
(
x
)
>
0
(
r
1
(
x
)
−
r
2
(
x
))
(76)
= 0
.
(77)
To apply this analysis to the function
C
CF
(
α
)
from (70), given
any
u,x,y
1
,y
r
, consider
D
(
q
(
v
|
u,x,y
1
,y
r
)
‖
q
(
v
|
u,y
r
))
.
(78)
Recall that
q
(
v
|
u,x,y
1
,y
r
) =
p
(
v
|
u,y
r
) +
αr
(
v
|
u,x,y
1
,y
r
)
,
(79)
q
(
v
|
u,y
r
) =
p
(
v
|
u,y
r
) +
αr
(
v
|
u,y
r
)
.
(80)
Moreover,
we
have
made
the
assumption
that
r
(
v
|
u,x,y
1
,y
r
) = 0
whenever
p
(
v
|
u,y
r
) = 0
, so we
have a scenario matching the above assumptions on
f
(
α
)
.
Thus,
C
′
CF
(0) = 0
.
We now consider the conditions when
f
′
1
(0)
,f
′
2
(0)
>
0
.
Consider a variable
A
⊂ {
X,Y
1
,Y
r
,U
}
. Recalling the fact
that if
p
(
a
)
>
0
and
p
(
v
|
a
) = 0
, then
q
(
v
|
a
) = 0
, we may
write
H
q
(
V
|
A
) =
−
∑
a,v
:
q
(
v
|
a
)
>
0
p
(
a
)
q
(
v
|
a
) log
q
(
v
|
a
)
(81)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
)
>
0
,
q
(
v
|
u,x,y
1
,y
r
)
>
0
p
(
u,x,y
1
,y
r
)
q
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
(82)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
q
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
(83)
where we have used the fact that
p
and
q
have precisely the
same support. Thus
d
dα
H
q
(
V
|
A
)
(84)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
d
dα
q
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
(85)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
[
r
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
+
q
(
v
|
u,x,y
1
,y
r
)
r
(
v
|
a
)
q
(
v
|
a
)
]
(86)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
r
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
+
∑
a,v
p
(
a
)
r
(
v
|
a
)
(87)
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
r
(
v
|
u,x,y
1
,y
r
) log
q
(
v
|
a
)
.
(88)
In particular,
d
dα
H
q
(
V
|
A
)
∣
∣
∣
∣
α
=0
=
−
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
r
(
v
|
u,x,y
1
,y
r
) log
p
(
v
|
a
)
.
(89)
Now we may easily write
f
′
1
(0) =
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
r
(
v
|
u,x,y
1
,y
r
)
·
log
p
(
v
|
u,x,y
1
)
p
(
v
|
u,y
1
)
,
(90)
f
′
2
(0) =
∑
u,x,y
1
,y
r
,v
:
p
(
u,x,y
1
,y
r
,v
)
>
0
p
(
u,x,y
1
,y
r
)
r
(
v
|
u,x,y
1
,y
r
)
·
log
p
(
v
|
u,x,y
1
)
p
(
v
|
u,y
r
)
.
(91)