Price
Computation
in Random
Early
Markmg
(REM)
Sanjeewa Athuraliya and Steven
Low
Department
of
Electrical
&
Electronic Engineering
University
of
Melbourne, Australia
{
sadsa,slow}
@ee.mu.oz.au
Absrrcr-
We
proposed
earlier
a flow
control
algorithm
derived
from
solving
the
dual
of
a welfare
"bation
problem.
The
algorithm
how-
ever
requires
communication
between
network
links
and
sources
that
is
not achievable
on
the
eurrpnt
Internet.
We
then
extended
the basic
al-
gorithm
to
a Random
Early
Marking
(REM)
scheme
which
can
be
im-
plemented
using
only
binary
feedback.
In
this
paper we proposed
a new
price
computation
algorithm
for
REM
and
prosent
simulation
results
to
illustrate
its
superior
performance over
the
previous versions.
I.
INTRODUCTION
We
have
proposed
in
[8],
[ll]
a flow
control algorithm de-
rived
from solving the dual of a welfare maximization problem
introduced
in [6].
The
algorithm
however
requires communica-
tion between network links and sources that
is not achievable
on
the current Internet. In [9] we describe a Random Early
Marking (REM) algorithm which
is a practical implementation
of the basic algorithm
in
[
111
using binary feedback.
It can
be
implemented,
e.g.,
with
the proposed explicit congestion
noti-
fication
(ECN) bit
in the
IP
(Internet Protocol) header
[3], [13].
The
purpose
of
this paper
is
to propose
and
evaluate
an
en-
hancement
to
the REM algorithm
of
[9].
In
the basic algorithm
of
[8],
[
111,
each link computes a con-
gestion measure we
call
a 'price' based on the aggregate rate
of
sources that
go
through the link.
A
source adjusts
its
transmis-
sion rate based
on
the sum
of
link prices
in its
path,
which
must
be
fed back to the source.
The rate
required by a link for price
computation
is
the aggregate
source
rate,
which
is
generally
different from
the
offered
rate
at
the link (see Section
111-A).
and
hence also must
be
explicitly communicated from sources
to
links.
REM consists
of
two extensions,
one
to simplify the
price computation
and
the other the price feedback.
In
this
pa-
per we propose a
new
price computation scheme
and
illustrate
its
superior performance over the ones in
[9].
The
current
TCP
flow
control relies on implicit feedback
where a source must infer network congestion,
after
the
fact,
from observed round
trip
time and
loss.
To
provide more
timely
feedback, link algorithms
have
recently been proposed for
TCP
which
probabilistically mark packets
based
on local conges-
tion
[4], [5], [9].
Ideally link algorithm,
which
feeds back con-
gestion information,
and
source algorithm,
which
adapts
traffic
rate to congestion,
are
designed jointly
and work
in
concert to
steer the network
as
a whole towards a (possibly moving) desir-
able operating point.
The
optimal reaction
to
a mark depends
both on the objective of the
flow
control and the information a
mark conveys
[7], [5],
[ll],
[9].
For
RED
[4]
a mark
is to
be
interpreted
by
a source
as
a request to reduce
its
rate.
In con-
trast,
in
the optimization based approach
of
[7],
[5]
and
[
111,
[9],
a mark conveys the necessary information for a source to
decide whether to increase
or
decrease
its
rate
based
on
its
own
marginal
utility.
In
[5]
the
marks per unit
flow
a source re-
ceives
is the shadow price
at a single bottleneck link, where the
shadow price
is defined as the marginal increment
in
expected
loss
at the link with a marginal increment in load.
With
REM,
the marks allow a source
to
estimate the
sum
of
shadow prices
at the links in
its
path. Here,
however, shadow
price
is defined
to be the marginal increase in aggregate source
utility
with
a
marginal increase in link capacity. Simulation evaluation of
REM and comparison
with
RED
is reported
in
[9].
In
Section
I1 we
review
the model
and
the
basic
algorithm of
[8].
[ll].
In Section
I11
we present the
REM
algorithm
and
the
new
price computaion scheme. In Section IV
we
present simu-
lation results to demonstrate the superiority
of
the
new
scheme.
11.
MODEL
A.
The
optimization problem and
its
dual
Consider a network that consists of a set
L
=
(1,.
. .
,
L}
of unidirectional links of capacity
q,
I
E
L.
The
network
is
shared
by
a set
S
=
(1,.
. .
,S}
of
sources.
Source
s
is
characterized
by
four parameters
(L(s),
U,,
m,,
M,).
The
path
L(s)
L
for a
point-to-point
connection
is
a set
of
links that source
s
uses,
U,
:
!R+
+
!R
is a utility
function,
m,
2
0
and
M,
<
00
are
the minimum
and maximum
trans-
mission rates, respectively, required by source
s.
Source
s
at-
tains a
utility
U,(x,)
when
it transmits
at rate
x,
that
satisfies
m,
5
x,
5
M,.
We
assume
U,
is increasing,
strictly
concave
and twice continuously differentiable in
its
argument.
For
each
link
1
let
S(1)
=
{s
E
S
I
1
E
L(s)}
be
the set
of
sources that
use link
1.
By
definition
I
E
L(s)
if and only
ifs
E
S(1).
Our objective
is to choose source rates
z
=
(x,,
s
E
S)
so
as
485
0-7803-6283-7/00/$10.00
0
2000
EEE
to:
where
[z]:
=
min{max{z,a},b}.
Here
U:-'
is the inverse
of
U;,
which
exists over the range
[U;(M,),
U;(m,)]
when
U,
max
v,(zs)
(1;)
is continuously differentiable
and
strictly
concave. When
p
is
a vector,
z,(p)
=
z,(pJ)
=
z,(ClEL(,)pi).
The meaning
should be clear from the context. Let
z@)
=
(z,(p),
s
E
s).
C.
Basic algorithm
m.<z.<M.
,
subject
to
z,
5
cl,
1
=
1,.
. .
,
L.
(2)
SES(1)
The
constraint
(2)
says
that the aggregate source rate
a1
any
link
1
does
not
exceed the
capacity.
A
unique maximizer ex-
ists
since the objective function
is strictly
concave, and
hence:
continuous,
and
the feasible solution set
is compact.
Since the source rates
z,
are coupled
by
the
constraini:
(2).
solving the primal problem
(1-2)
directly requires coordination
among possibly
all
sources
and
is impractical
in real networks.
A
distributed
and
decentralized solution
is provided
by
its
dual
[ll]:
where
If we interpret
pl
as the price per unit bandwidth
at link
1
then
pb
is the
total
price per unit bandwidth for
all
links in the path
of
s.
Hence
z5pd
represents the bandwidth cost to source
s
when
it transmits
at rate
z,,
and
B,(pB)
represents the maxi-
mum
benefit
s
can achieve
at the given price
pa.
A
source
s
can
be induced
to
solve maximization
(3)
by bandwidth charging.
For each
p,
a unique maximizer of
(3,
denoted by
z,
(p),
exists
since
U,
is strictly
concave. Moreover,
by
duality theory, there
exists a dual optimal price
p*
2
0
such that
(z,(P+'),
s
E
S)
that
is
individually optimal,
i.e.,
solves
(3)
for each
s,
is
&o
socially optimal,
i.e.,
solves
(1-2).
B.
Notations
Unless otherwise specified,
z
usually denotes a vector whose
ith
component
is some
zj
defined before
z
is
introduced. For
a scalar
I,
z+
=
max{z,O};
for a vector
z.
z+
is
the
vec-
tor whose
ith
component
is
zf.
Given a price vector
p,
ps
=
ClEL(,)pl
is
the sum
of
link prices along the path
of
s,
sometimes referred to as a
path
price.
Given
a rate vector
z,
z1
=
CsEs(l)
z,
is the aggregate source rate
at link
1.
Let
z,@)
be
the unique maximizer
in
(3).
We
will abuse
notation and use
z,(.)
both
as
a function of scalar price
p
E
e+
and of vector price
p
E
@I.
When
p
is a scalar,
by
the
Kuhn-
Tucker theorem,
z,
(p)
is given by
(4)
In
[
111,
we solve the dual problem using gradient projection
method
(e.g.,
[12], [2])
where link prices
are
adjusted in op-
posite direction to the gradient
VD(t)
whose components
are
VlD(t)
=
c1
-
z'(t).
We
abuse notation and use
z,(.)
both as
a function of time
t
and a function
of
price
p(t)
given by
(4);
the meaning should
be
clear from the context. Then the price
computations and
rate
adjustments
are
given
by:
Basic
algorithm:
where
-y
>
0
is
a step-size. Hence the prices are adjusted
in
proportion
to
excess demand.
The
algorithm takes the familiar
form of reactive
flow
control where each link computes a price
based on local source rates, and each source selects a rate based
on the price on
its
path.
In
[ll]
we prove that the basic algorithm
(5-6)
converges
to the optimal rates provided
the
utility
functions are
strictly
concave increasing and their second derivatives
are
bounded
away
from
zero.
Specifically
if
{(z(t),p(t))}
is
a sequence
generated
by
(5-6)
then any limit point
(z*,p*)
is
primal-
dual optimal. Moreover, provided that the sources
and
links
perform their updates frequently enough, convergence
is main-
tained even
in an asynchronous environment where sources and
links may compute and communicate
at
different times with
different frequencies, and where feedback delays are substan-
tial
and
time-varying.
111.
RANDOM EARLY
MARKING
Under the basic algorithm
(5-6)
a link
1
needs the aggregate
source rate
z'(t)
for price computation and a source
s
needs a
scalar feedback of path price
p"(t)
for
rate adjustment.
This
communication requirement
is
not achievable on the current
Intemet. In
this
section, we explain how to perform price com-
putation based on offered rate and buffer occupancy,
thus
elim-
inating the need
for
explicit communication from sources
to
links, and
how
to feed back the prices to sources using only a
single
bit.
The
combination
is the
REM
algorithm.
A.
Price computation
Let
xl8
(t)
be the offered
rate
from source
s
at link
1
at time
t.
Let
i.'(t)
=
CsEs(l)
zl,(t)
be
the aggregate offered
rate at
link
1.
This rate
is generally different from the aggregate source
486
rate
d(t)
=
CsEs(l)
z8(t)
used
in the basic algorithm, except
in equilibrium
[lo,
Lemma
21.
The
source rate
z'(t)
is not
available at link
1
but
the offered rate
gl(t)
can
be
measured
at link
1.
We
assume each link has a large
buffer
so
that no
packets
are
lost.
Let
bl(t)
be
the
buffer
backlog at link
1
at time
t,
and
bl,(t)
be the fraction
of
bl(t)
that is from source
s.
The
aggregate buffer occupancy evolves according
to:
bl(t
+
1)
=
[bl(t)
+
2(t)
-
Cl]+.
(7)
We
now
present three algorithms for price computation.
All three are
based
on the idea
of
approximating the gradient
V1D(t)
=
cl
-
d(t)
in carrying out the gradient projection
algorithm
(5).
The
first algorithm approximates the gradient
by
estimating
the aggregate source rate
d(t)
by
the offered rate
i?(t)
(cf.
(5)):
PC1:
p1(t
+
1)
=
[Pl(t)
+
7(P1(t)
-
Cl)]+
Multiplying both sides of
(7)
by
the positive step-size
7.
we
see
that the buffer process automatically performs the price com-
putation
PCl,
provided
that
q
is
the true
link
capacity
avail-
able
to serve
the sources
in
S(1)
and that we identify
pl(t)
with
ybl(t).
Our
second algorithm thus simply sets the price to a
fraction of the
buffer
occupancy:
PC2:
p1(t)
=
yb&)
PC1 and PC2 are proposed,
and
their convergence to opti-
mality proved,
in
[
101.
Their performance in
REM
is evaluated
and
comparison
with
RED
is
made
in
[9].
PC2
is simpler
to
implement
as
links
do
not need to measure the offered rates.
It
however
does not
scale:
as
the number
of
sources increases, the
equilibrium price
vector
p*,
and hence the equilibrium buffer
vector
b*
=
y-'p*,
increases
steadily.
This not only necessi-
tates large buffer in the network,
but
worse still,
it leads to large
feedback delays. Algorithm
PC1
can alleviate the problem
by
setting
cl
in
PC1
to
be
a fraction
p
E
(0,l)
of
the true link
capacity.
Then in equilibrium, the offered rate
i?(t)
=
c1
is
strictly less than the true link capacity and hence backlogs
will
clear. However,
to be
effective,
p
needs to
be
significantly less
than
1,
leading
to low
utilization.
These considerations motivate
our
new
algorithm:
PC3:
p&
+
1)
=
bl(t)
+
7(bl(t)
+
&t)
-
Cl)]+
where
cl
can
be
the true link capacity. In equilibrium, price
p*
stabilizes and hence
(for
a saturated link)
b;
+
?*'
=
q.
If the
equilibrium buffer is nonzero
bf
>
0,
then the offered
rate
is
strictly less than the capacity
2*'
<
q,
and
hence the buffer
b;
could not have
been
in
equilibrium. Hence, by contradiction,
we must have both zero buffer
b;
=
0
and
full utilization
2*l
=
q
in equilibrium.
B.
Price
feedback
The
basic idea is for a source
s
to estimate the
path
price
p"(t)
from binary feedback and adjust its rate according to
(6)
using the estimate
$(t)
in place
of
the true value
p"(t).
We
now
describe the method
for
price feedback
in
a simplified
synchronous model where time is slotted into update
periods.
Sources and links update their prices and rates
at the beginning
of
each period.
We
assume that every source receives a
suffi-
cient number of (acknowledgment) packets in each period
so
that a reasonable estimation can
be
made.
On packet arrival
in
period
t,
if it is not
marked,
a link
1
marks it
with
a probability that is exponential in its current
price
pl(t),
independent of all other packets:
m&)
=
1
-
4-1
(8)
where
4
>
1
is a constant. Hence the higher the price the more
likely packets
are
marked.
The
end-to-end marking probabil-
ity
for
each packet
of
source
s
is then
1
-
II~~L(~)(I
-
ml(t))
=
1
-
$-p'(Q
(9)
A mark is placed in the
ECN
bit
of
a packet en-route to its
destination and is carried back to its source
in
the ECN bit of
the packet's acknowledgment, unmodified
in the retum path.
A source estimates
m"(t),
and hence the
pricep"(t),
by
the
fraction of marked packets in period
t.
Suppose source
s
re-
ceives acknowledgment for packets
1,2,.
. .
,
N(t)
in period
t.
Let
Ek(t)
be
1
if the kth packet in period
t
is marked and
0
otherwise,
k
=
1,2,
.
. .
,
N(t).
Let
W(t)
be an estimate
of
the
end-to-end marking probability
mS(t):
m8(t)
=
Then from
(9)
a price
estimate is
log(1
-
*"(t))
1%
4
$"(t)
=
-
C.
REM
algorithm
ods
together yields the
REM
algorithm.
Putting both the price computation
and
price feedback
meth-
Link
1's
algorithm:
1.
On
packet anival,
if it
is
not
marked,
mark
it with
probabil-
ity
ml
(t)
given
by
(8),
independent
of
all
other
packets.
2.
At
the beginning
of
each period
t,
update
price
using either
PCI,
PC2,
or
PC3.
Source
3's
algorithm:
1.
At
the end
of
each period
t,
estimate path price
$(t)
from
the
fraction
&"
(t)
of
its packets marked using
(10).
487
2.
Compute a
new rate
x,(t
+
1)
=
x,w(t))
for
the next
period,
where
x,
(-)
is
given
by
(4).
We
now
compare the performance of
PC1,
PC2
and
IC3
in
the REM algorithm.
IV.
PERFORMANCE
Our
simulation model consists of four sources transferring
data
to
a common destination, as shown
in
Figure
1.
Each
source
is connected to a router via a link of capacity
15
pack-
etdms
and then to the destination via a shared link of capacity
12
packetdms.
The
propagation delay of these links are labeled
in the
figure.
The
single bottleneck link allows close observa-
tion
of
the behavior of REM in a simple environment. Notice
that the bandwidth-delay product
is
very
significant:
if each
packet
is
1,OOO
bytes
then
the largest round
trip
bandwidth-
delay product
for
the shared link
is
156kB.
This should be
coml-
pared
with
the small amount of buffering (averaging around
10kB)
under
Pcl
and
PC3;
see
Figures 2(b) and
4(b).
The
starting times of the sessions are staggered
by
1
s:
The
first
source,
S1,
is active from
Os-5s.
S2
from
Os-3s,
S3
from
ls-
4s
and
S4
from
2s-5s.
The
utility
function of the REM sources
was
a,
log(x,)
,
where
a,
was
set
to
C,
C
is the bottleneck link
capacity in
packetds.
We
used
q5
=
1.2
in
(8).
A
Fig.
1.
Network
Topology
Figure 2 shows the performance
of
REM under
PC1
with
7
set
to
0.005.
The
thick lines show the equilibrium (optimal)
values.
The
large oscillations when only a single source
is ac-
tive
are
due
to the high sensitivity of the
log
utility
function
at
low
prices.
As
more sources activate their windows converge
to
a neighborhood
of
the optimal
values.
The
oscillation around
the optimal values
are
mainly
due
to randomness in the price
feedback mechanism.
Here
q
was set
to
65%
of the true link
capacity
to
avoid buildup of the
buffer.
Hence,
as
discussed
in Section
111-A,
the link
is
not fully
utilized.
For exaniple,
until
2s
the window size oscillates around
52
packets while a
window
size
of
80
is required
for
100%
utilization.
Figure
3
shows the results
for
PC2
with
7
set to
0.01.
‘This
high value was chosen to keep the equilibrium buffer
at a lower
level.
Large
7
leads
to
large oscillations
in
price and
win’dow
size.
As
buffer builds up round
trip
delay and
its
fluctuation
increases, preventing the
window
from converging.
Figure
4
shows the results
for
PC3
with
7
set
to
0.005.
The
price and the congestion
window
converge
to 100%
utilization
while the
buffer
remains
at a low
level.
V.
CONCLUSION
We
have
proposed a
new
price computation for REM algo-
rithm proposed
in
[9]
which
overcomes the problems
of
low
utilization or large buffer and delay suffered
by
the previous
schemes. Simulation results confirms the superior performance
of
the
new
scheme. REM
with
extensive simulation studies
il-
lustrating
its stability,
robustness and fairness
is presented
in
REFERENCES
Sanjeewa
Athuraliya,
Steven
Low,
and
David
Laps-
ley.
Random Early
Marking.
Submitted
for
publication,
http://www.ee.mu.oz.adstaf&’slow/research/,
January
2000.
Dimihi
P.
Bertsekas
and
John
N.
Tsitsiklis.
Parallel
and
dktributed com-
putation.
RenticeHall,
1989.
S.
Floyd.
TCP
and
Explicit
Congestion
Notification.
ACM
Computer
Communicatwn
Review,
24(5),
October
1994.
S.
Floyd
and
V.
Jacobson.
Random
early detection gateways
for
conges-
tion
avoidance.
IEEDACM Trans.
on
Networking,
1(4):397413,
August
1993.
R.
J.
Gibbens
and
F.
P.
Kelly. Resource pricing
and
the
evolution
of
congestion
control.
Automatico,
35,1999.
F.
P.
Kelly.
Charging
and rate
control
for
elastic
traffic.
European
Transactions
on
-Telecommunications,
8~33-37, 1997.
http://www.statslab.cam.ac.uWfrank/elastic.h~.
Frank
F‘.
Kelly,
Aman
Maulloo,
and
David
Tan.
Rate
control for
com-
munication
networks:
Shadow
prices,
proportional fairness
and
stability.
Joud
of
Operations
Research
Society,
49(3):237-252,
March
1998.
David E. Lapsley
and
Steven
H.
Low. An
optimization
approach
to
ABR
control.
In
Proceedings
of
the
ICC,
June
1998.
David E. Lapsley
and
Steven
H.
Low.
Random Early Marking
for
Internet
Congestion
Control.
In
Proceedings
of
IEEE
Gbbecom’99,
December
1999.
Steven
H.
Low.
Optimization
flow
control with
on-line
measurement.
In
Proceedings
of
the
ITC,
volume
16,
June
1999.
Steven
H.
Low
and
David
E.
Lapsley.
Optimization
flow
contr01,
I
basic
algorithm
and
convergence.
IEEEACM
Transactions
on
Networking,
7(6):861-874,
December
1999.
http://www.ee.mu.oz.au/staff/slow/res~ch/.
David
G.
Luenberger.
Linear
and
Nonlinear Programming,
2nd
Ed
Addison-Wesley Publishing
Company,
1984.
K.
K.
Ramakrishnan and
S.
Floyd. A
F’roposal
to
add
Explicit
Congestion
Notification
(ECN)
to
IP.
Internet
dmft
draft-kksjf-ecn-Ol.at,
July
1998.
(a)
Window.
(b)
Buffer.
Fig.
2.
XI:
Window
and
Buffer.
(a)
Window.
(b)
Buffer,
Fig.
3.
PC2:Window
and
Buffer,
(a)
Window.
(b)
Buffer.
Fig.
4.
FC3:
Window
and
Buffer.
489