An
Enhanced
Random
Early
Marking
Algorithm for Internet
Flow
Control
Sanjeewa Athuraliya David Lapsley Steven
Low
Department
of
EEE,
University
of Melbourne,
Australia
{
sadsa,laplsey,slow
}
@ee.mu.oz.au
Abstruct-
We
propose
earlier
an
optimization
based
flow
control
for
the
Internet
called
Random Early
Mark@
(REM).
In
this
paper
we
propose
and
evaluate
an
enhancement that
attempts
to
sped
up
the
convergence
of
REM
in
the face
of
large
feedback delays.
REM
can
be
regarded
as
an
implementation of
an
optimization
algorithm
in
a
distributed
network.
The basic
idea
is
to treat
the
optimization
algo-
rithm
as
a
discrete
time
system
and apply
linear
control
techniques
to
stabilize
its
transient.
We
show
that
the
modified
algorithm
is
stable
globally
and
converges
exponentially
locally.
This
algorithm
translates
into
an
enhanced
REM
scheme and
we
illustrate
the
performance
im-
provement
through
simulation.
Keywordr-Internet
flow
control, pricing,
optimization
flow
control,
marking,
REM,
feedback
delay
I.
INTRODUCTION
We
proposed
earlier
a
flow
control scheme for the
Inter-
net
called Random Early Marking
(REM)
[
11.
It is derived
from
an optimization model
where each source is
charac-
terized
by
a utility
function
that models
its
valuation
of
bandwidth
and
the goal is
to maximize aggregate
source
utility over their transmission
rates
subject to capacity con-
straints
[2].
131,
[4].
The
basic
flow
control algorithm
can
be regarded as a distributed computation performed
by
the
sources
and
links
to
minimize
the
dual
problem.
The
al-
gorithm
however
requires
communication between
sources
and
links.
This communication requirement is greatly sim-
plified
in
[51,
[l]
and
leads
to
REM,
a
binary feedback
scheme similar to Random Early Detection
(RED)
[6].
The
purpose
of
this
paper is
to propose an enhancement to
REM
that
attempts to significantly speed
up
its
convergence
in the
face
of
large feedback delays.
The
value
of
the
optimization
model presented
in
[2],
[4]
is twofold.
First,
though it
may
not
be
possible,
nor
criti-
cal,
that
optimality
is exactly attained
in a real
network,
the
optimization framework offers a means
to
explicitly steer
the
entire
network towards a
desirable operating
point.
Sec-
ond
it makes
possible
a systematic method to
design
and
refine practical
flow
control
schemes,
which
can be treated
simply
as
implementations
of
a certain
optimization
algo-
rithm, where modifications
to the
flow control mechanism
is guided
by
modifications
to
the optimization algorithm.
For
instance,
it is
well
known
that
Newton
algorithm has
The
first
two
authors
acknowledge
the
Australian
Commonwdth
GOV-
ernment
for
their
Australian
Postgraduate
Awards,
and
the
last
author
ac-
knowledges
the
support
of
the
Australian
Research
Council
under
grants
S499705
and
A49930405
much
faster convergence than
gradient
projection algorithm.
By
replacing the
gradient
projection algorithm
presented
in
[Z],
[4]
by
the
Newton
algorithm
we
derive
in
[7]
a practi-
cal
Newton-like
flow
control scheme
that can
be
proved
to
maintain optimality,
has the same
communication
require-
ment
as the original scheme
but
enjoys
a much
better
con-
vergence property.
This
paper provides another example
on
how
the
optimization
framework can be exploited
to
sys-
tematically refine
REM.
There is
a tremendous
literature
on
flow
control. The
works
closest
to
this paper are
optimization
based
[8],
[9],
[lo],
[ll],
1121,
[13],
[21,
[SI,
[l]
where
the
problem
is
formulated
as
one
of
optimizing
a social
welfare
and
the
flow
control mechanisms
are
derived as solutions
to the op-
timization problem. They differ
in
their choice of
objec-
tive
functions or their solution
approaches,
and
result in
rather different
flow
control
mechanisms
to
be
implemented
at the sources
and
the
network
links.
In particular
both
[
1
I],
[
121
and our
work
solve
the
same
optimization
problem
of
maximizing
aggregate
utility
over
source
transmission
rates.
The
two works
however
differ
in
their
solution approach,
which lead
to different
algorithms
and their
implementation
through
marking
[13],
[I].
See
[4]
for
a detailed compari-
son.
The
paper is
structured as
follows.
In
Section
I1 we
sum-
marize our
optimization
model and the
REM
algorithm.
In
Section
In
we
extend the model to include
feedback
de-
lay
and
derive the enhanced algorithm. In
Section
IV we
present preliminary simulation
results
to illustrate
the
per-
formance improvement.
We
conclude
in
Section
V
with
fu-
ture
work.
All
proofs are
omitted
and
can
be
found
in
a
forthcoming
full
paper.
11.
OPTIMIZATION
MODEL
AND
REM
Consider a network
that consists
of
a
set
L
=
{
1,
.
. .
,
L}
of
unidirectional
links
of
capacity
q.
1
E
L.
The
network
is
shared by a
set
S
=
(1,.
.
.
,
S)
of
sources.
Source
s
is characterized by
four
parameters
(L(s),U,,m,,
M8).
The
path
L(s)
C
L
is a subset
of
links that source
s
uses,
U,
:
R+
3
!R
is
a utility
function,
m,
2
0
and
M,
2
00
are
the minimum and maximum transmission rates,
respec-
tively,
required by
source
s.
Source
s
attains a
utility
U,
(z,)
when
it transmits at
rate
2,
that satisfies
m,
5
zs
5
M,.
0-7803-5880-5/00/$10.00
(c)
2000
IEEE
1425
IEEE
INFOCOM
2000
We
assume
U,
is increasing
and
strictly
concave
in its
argu-
ment. Let
1,
=
[m,,
MB]
denote the range
in which
source
rate
z8
must
lie
and
I
=
(I,,
s
E
S)
be
the
vector.
For
each
link
1
let
S(1)
=
{s
E
S
1
1
E
L(s))
be
the
set
of sources
that use link
1.
Note
that
1
E
L(s)
if and only
if
8
E
S(I).
Our objective
is to choose source
rates
z
=
(z,,
s
E
S)
so
as
to:
p:
maX2.€l,,sES
CV,(za)
(1)
8
subject
to
z8
5
q,
I
=
1,.
.
.
,
L.(2)
SES(1)
The
constraint
(2)
says
that
the
total
source
rate
at any
link
1
is less
than
the
capacity.
A
unique
maximizer,
called
the
primal optimal
solution, exists
since
the
objective function
is strictly
concave, and hence continuous,
and
the
feasible
solution set
is compact.
Though the objective function
is
separable
in
z8,
the
source
rates
z8
are coupled by the
constraint
(2).
Solv-
ing the
primal
problem
(1-2)
directly
requires coordination
among
possibly
all
sources
and
is impractical
in real
net-
works.
The
key to
a distributed
and
decentralized solution
is to
look
at its dual,
e.g.,
[
14,
Section
3.4.21,
[
151:
D:
min,>o
D~P)
=
c~,(Ps)
+CP~
(3)
8
1
where
ps
=
p1.
EL(,)
(4)
(5)
The first term
of
the
dual objective function
D(p)
is decom-
posed
into
S
separable subproblems
(4-5).
If we
interpret
pi
as
the
price
per
unit bandwidth
at
link
1
then
ps
is
the
total
price
per
unit
bandwidth
for all links
in the path
of
S.
Hence
zapa
represents
the bandwidth
cost
to source
s
when
it transmits
at rate
E,,
and
B8(p8)
represents the maximum
benefit
s
can achieve
at
the
given
price
pa.
We
shall
see
that
this
scalar
pa
summarizes
all
the congestion
informa-
tion
source
s
needs
to
know.
A
source
s
can
be
induced
to
solve
maximization
(4)
by
bandwidth charging.
For
each
p,
a unique maximizer,
denoted by
z@(p),
exists
since
U8
is
strictly
concave.
~
In
general
(z8(p),
s
E
S)
may
not
be primal
optimal,
but
by
the
duality theory,
there
exists
a
p*
3
0
such
that
(z8@),
s
E
S)
is indeed primal optimal. Hence
we
will
solve the dual problem
(3).
Given minimizing
prices
p*
the
primal
optimal source
rates
z*
=
z(p”)
can
be
obtained by
0-7803-5880-5/00/$10.00
(c)
2000
IEEE
1426
individual
sources
s:~
U:-’@)
ifUi(M8)
L
P
I
~i(m,)
if
UL(m8)
<
p
(6)
(2
if
U;(M8)
>
p
Here
is
the inverse of
U:,
which
exists
over the range
[U:(M8),
U;(m8)]
since
U;
is
continuous
and
U,
strictly
concave.
It is illustrated in
Figure
1.
Let
z(p)
=
(
z8
(p)
,
s
E
X8(p)
=
S)
“a,
Fig.
1.
Source
rate
zs
(p)
as
a
function
of
(scalar)
price
p.
In
[2],
[4]
we propose
to
solve
the
dual
problem
using
the
gradient projection algorithm
that
leads
to the following
optimization
flow
control algorithm:
Al:
Basic
OFC
Pl(t
+
1)
=
kl(t)
4-
r(z’(t)
-
a>]+
zs(t)
=
28Ols(t))
Here
z‘(t)
:=
CsEs(l)
z8(t)
is
the aggregate
source
rate
at
link
I
at time
t,
and
[2]+
=
max{z,O}.
Hence a link
raises
or
reduces
its
price accordingly
as the
demand
2’
(t)
is
greater
or
less
than
the
supply
cr
of
bandwidth.
A
source
raises
or
reduces
its
rate
accordingly
as
the
path
price
p8(t)
is low
or
high
(see
(6)).
It
is
shown
in
[4]
that
provided
all
utility functions
are
strictly
concave increasing
and
their
second
derivatives
are
bounded
away
from
zero,
the basic
OFC
(optimization
flow
control)
algorithm
A1
converges
to
yield
the
optimal rates
for
sufficiently
small
stepsize
7.
As
discussed there,
though
the optimization problem
is
formulated
as
a static problem
the
flow
control algorithm
naturally
adapts
to changing link
capacities
and
set
of sources
at
a
link.
simply
use
the current
link
capacity
q(t)
and
the
current
set
S(l;t)
of
sources
at
link
1.
Algorithm A1 requires communication
of
link prices
to
sources
and
source
rates
to
links,
and
hence cannot
be
im-
plemented
on the
Internet.
In
[SI
we
show
that a link can
We
abuse
notation
and
use
zs
(a)
both
as
a
function
of
scalar
price
p
E
R+
and
of
vector
price
p
E
WF’.
When
p
is
a
scalar,
zs(p)
is
given
by
(6).
When
p
is
a
vector,
zs@)
=
zS(ps)
=
z.Y(C~~~(~)PI).
The
meaning
should
be
clear
from
the
context.
IEEE
INFOCOM
2000
simply set
its
price
to
a fraction
of
the
buffer
occupancy.
This is equivalent
to the link estimating the aggregate
source
rate
z’(t)
by
the
measured aggregate
input
rate
$l(t)
at the
link and using this estimate
in
the calculation
of
the gra-
dient.
We
prove there that
this
approximate gradient
pro-
jection
algorithm
also converges
to
yield
the
optimal rates.
This simplification eliminates
the
need
for
explicit commu-
nication
from
sources to
links.
In
the
reversed
direction
we
propose
a method
in
[l]
that
communicates link prices to
sources
using
only binary feedback.
The
basic idea
is for
a
link
to mark
a packet
with
a probability that is exponential
to
its link price
pr
(t)
so
that the end
to end marking probability
of a
packet
is exponential
to
the
path price
ps
(t)
.
A source
can then estimate
the
end
to
end marking probability and
hence
p8(t).
This can
be
implemented using
the
proposed
ECN
(Explicit Congestion Notification) bit
in the
IP
header
[16],
[17].
Combining these
two
simplifications yields the
REM
algorithm
of
[
11,
where the marking probability
is
ex-
ponential
in
buffer occupancy.
A2:
REM
Link
Z’s
algorithm:
1.
Set
price
pl(t)
=
7bi(t)
where
bl(t)
is
the
(average)
buffer
occupancy
in period
t.
2.
For eachpacket
that
is
notmarked,
markit
with
theprob-
ability
Source
s’s
algorithm:
1.
Count
the fraction
rizs
(t)
of
packets received in period
t
that
are
muked,
and
estimate
the
path
price
by:
a”@)
=
-
log2(1
-
??P(t))
2.
Choose
a
new transmission
rate
zs(t
+
1)
for
the
next
period:
z,(t
+
1)
=
z8(lj8(t)).
111.
ENHANCED
ALGORITHM
The
model
and
algorithms
in the
last
section assume zero
feedback
delay.
In
this section
we
relax
this
assumption and
propose
an
enhanced algorithm.
Let
71,
and
r81
be the
(constant) delay from link
2
to
source
8
and
from
s
to
1,
respectively.
Let
2
=
maxp,t,,C~p~
+
~,1
I
s
E
S(2)
n
S(I’)}.
We
will
see
be-
low
that
d
is
the maximum time
for
a price change in link
I’
to
affect
the
price
at link
1
through
shared sources
8.
In
particular
-
if delays
are
symmetric,
Tis
=
T~Z
for all
I,
s,
then
d
is the maximum round trip delay
of
the
network.
The
basic
OFC
algorithm
A1
becomes:
4-
7(
zs(t
-
781)
-
cr)
(7)
SES(1)
I’
Hence link
2
computes the next price
using
the
delayed
source rates
z,(t
-
~~1)
and
source
s
computes
a new
rate
using delayed prices
pl(t
-
71,).
It
is
important to
note that
the algorithm described by
(7-8)
does not require
the
links
and sources to know their delays
719,
~,1.
This
algorithm
is
a special case
of
the
asynchronous
model
considered
in
[4]
where
links
and sources update
every period using
the
most recent
data.
According
to
[4,
Theorem
21
the
algo-
rithm
converges
to
yield
the
optimal source rates
provided
the
stepsize
7
is
small
enough.
Suppose there
is a unique dual optimal price
P*.~
This
would
be
the
case,
e.g., if
the utility functions are
U,
(5,)
=
a,
logz,
and every link
I
with
pf
>
0
has a
single-link
connection
s(l),
i.e.,
for
all
I,
there exists
s(l)
such
that
L(s(Z))
=
(1).
We
may assume without
loss
of gener-
ality
that
p;
>
0
for
all
1,
because links
1
with
p;
=
0
are
not
saturated in equilibrium and hence
can
be
omit-
ted
from consideration. Then provided
that
the
stepsize
is
sufficiently small
the
sequence
{p(t)}
generated
by
(7-8)
converges to
p*.
Moreover
for
all
sufficiently large
t,
we
have
pi(t)
>
0
for all
I,
and we can omit the
projection
operation
in price
computation
for
sufficiently
large
t.
As-
sume
also
that
U:-’(Ms)
=
0,
Ui-l(m,)
=
CO
so
that
5,
(p)
=
m(t+
1)
=
PI@)
4-
Y(
zs(t
-
Tsl)
-
4
(9)
(p).
Then
the
system
is described
by:
S€S(I)
We
will
regard
(9-10)
as
a discrete time system
to be stabi-
lized, where
the
states
are
the
link prices.
For
the
rest
of
the
section we
first
consider the linearized
system
of
(9-10)
in the
neighbourhood
of
the unique
equi-
librium
p*.
Then
we
will
design
a deadbeat controller for
the linearized system
to
speed
up
its convergence.
This
con-
trol
law
however
requires
a link to
know
the entire
network
2The
simulation
in
Section
IV
also
shows
what
might
happen
when
dual
optimal
prices
are
nonunique:
link
prices
may
oscillate
between
two
dual
optimal
limit
points
while
source
rates
converge
to
the
unique
primal
opti-
mal
vector.
0-7803-5880-5/00/$10.00
(c)
2000
IEEE
142
7
IEEE
INFOCOM
2000
topology
and
is therefore impractical.
Next
we
derive
an
approximate
control
law
that can
be
implemented
by
each
link using local
information. Finally
we
apply
the
control
law
to the
REM
algorithm.
Even though
the control
laws
are
derived
for a
linear
sys-
tem
around
the
equilibriump*
their
performance
in the
non-
linear setting
away
from
the equilibrium
is investigated
in
the
next
section
through simulation.
A.
Linearized
system
Let
p(t)
be
the
(2
+
l)L
dimensional
expanded ‘state’:
whereeachp(t-T),T
=
0,.
. .
,d,isaL-vector.
The(1,~)th
element
of
p(t)
is
pr(t
-
7).
Let
p*
be
the
(2
+
l)L
d’ imen-
sional vector
with
each
of
p(t
-
T),
7
=
0,
. . .
,
d,
replaced
by
the unique
equilibrium
p*
.
-
For
any
scalar
p
define
p8
(p)
by
1
=
-u;’(zs(p))
where
z(p)
is given
by
(6).
Then after
some manipulations
we
have
wherep8(t)
=
(~IEL(s)p~(t
-
TI#)).
Hencethefirstorder
term
in the
Taylor
expansion of
z8
(t)
is
Vjj~SW)@(t)
-
P*)
-Pep'">
c
l’EL(8)
Linearizing
(9-30)
around
the
equilibrium
p*
and
letting
$(t)
=
p(t)
-
p*
we
thus
have,
after rearranging,
$I(t
+
1)
=
?ji(t)
-
Y~bn~$v(t
-
71‘8
-
781)(11)
where
b1r1
=
CsEs(l)ns(l,)
pa@”).
The
last
equality uses
the
fact that
x1
(p*)
=
cr
by
complementary slackness, since
by
assumption
pf
>
0.
(1
1) makes the interdependence
of
link prices
apparent.
It says that the
new
price
pi(t
+
1)
at
link
1
depends not
only on the current price
at link
1,
but
also
on
past
prices
pp
(t
-
nt8
-
~~1)
at other links
1’.
This
1’
is because
in response
to
a price change
at link
l‘,
sources
s
E
S(1)
n
S(1’)
that traverse
both
links
I‘
and
I
adjust
their
rates,
which
then
affect the price
at link
1.
Hence there is
a
delay
of
Tits
+
r81
for
a price
update
at link
I’
to affect
the
price
at link
1
through
shared sources
S.
This
coupling
of
link prices
is described
by
blp
.
To
express
(1
1)
in matrix
form,
define the
L
x
L
matrices
A(T),
T
=
0,.
.
.
,
z,
by:
[A(7)111’
=
PSW)
l(7-1’8
+
781
=
7)
8cs(l)ns([I)
Then
we have,
defining
@(t)
=
E(t)
ij(t
-
1)
a
.
.
ij(t
-
This describes
the
local
behavior
of
the
gradient projec-
tion algorithm around
the (unique)
equilibrium
p*
when
the
feedback delays are nonzero.
Note
the difference between our
system
and
that
of
[18].
Their system
has
two
important features that
simplify sig-
nificantly
the controller design.
First
their control
(the
ex-
plicit rates)
are
calculated
at
the
network link
where
the
current as
well
as past
buffer levels
are
available. Second,
all sources receive the
same
explicit rate
(single congestion
node
case), except possibly
with
different
feedback
delay,
and
these past rates are also available
at the
link
for
calcu-
lation.
These
allow
a simple
proportional-plus-derivative
controller
and
lead
to the simple close
loop equations
(38)
and
(39)
in their paper.
In our case, current
source rates
are
not available
at a link;
moreover
a link
may
not
know
the
value
of
+rsl
and
hence
may
not
be
able
to use
past
source
rates in price adjustment (except the
most
recent one).
On
the
other hand
a link
always
has
the
most current price,
so
we
consider
a controller that
uses only past
values
of local
prices.
B.
Deadbeat
controller
Suppose
in
computing a
new
price
each link
1
averages
its
past
link prices
and
diagonally scale
the gradient:
0-7803-5880-5/00/$10.00
(c)
2000
IEEE
1428
IEEE INFOCOM
2000
Note
that
link
1
uses only the
most
recent source rates
and
this
does
not
require
it to know
the
value
of
r81.
The
goal
is
to
compute the averaging parameters
pi
(T)
and the scaling
parameter
71
in order to place
the
poles of
the
discrete time
system
(12)
at the origin,
i.e.,
deadbeat control
law.
In
equi-
librium both sides
of
(12)
must
be
p*
and
hence
p~
(T)
must
satisfy (since
CsEs(r)
2:
=
cr
in equilibrium)
-
d
Cpr(T)
=
1,
fordl
z
(13)
T=o
Following
similar derivation as in the last subsection,
the
linearized system around
p*
is
(
14)
where
M(T)
=
diag(pr(T)),
T
=
0
,...,a,
and
G
=
diag(y1)
are
L
x
L
diagonal matrices.
The
following result
explains
how
to
choose the averaging matrices
M(T)
and
the scaling matrix
G.
For simplicity
of
exposition
we
as-
sume
a
large network where
all
delays
are
nonzero,
qa
2
1,
T,~
2
1
(otherwise,
condition
(15)
below
is
replaced
by
det(X1-
M(0)
-t-
GA(0))
=
XL).
Theorem
1:
Suppose all delays
are
nonzero. Then
all
poles
of
(14)
are at the origin
if and only
if
M(T)
and
G
are chosen
to
satisfy
M(0)
=
0
(15)
-
det(M(.r)
-
GA(7))
=
0,
T
=
0,.
.
.
,
d
(16)
-
d
CM(7)
=
I
(17)
T=o
Moreover
the above conditions uniquely determine the
di-
m
Interestingly the theorem says that
if all delays
are
nonzero
then
current prices
m(t)
should not
be
used
in the
weighted
average
of past
prices.
Note
that
the
above theorem ensures rapid convergence
only around the unique equilibrium point.
p*.
In
order
to
choose
the averaging parameters
pi
(7)
to place all
poles
at
the origin,
we
have
given up
the
choice
of
stepsize
7
(which
now
becomes
a matrix).
This
loss
of
freedom
may
upset
global
stability.
This
is indeed observed
in
our
simulation,
presented
in
the
next section.
agonal matrices
M(T)
and
G.
C.
Appmxinlate
deadbeat
controller
Theorem
1
describes
how
to
choose the
averaging matri-
ces
M(T)
and the gain matrix
G
to enforce
rapid conver-
gence
towards
the equilibrium
p*
when
the
system
is
in a
neighborhood
of
p'.
However
the
equations
(15-16)
in the
theorem can only
be
solved centrally,
as the off-diagonal el-
ements
of the matrices
A(T)
imply that
a link
needs
to know
information on sources
at other links, and hence
is
imprac-
tical.
In
this subsection
we
derive
an
approximation
to the
control
law
of
the
last section that can
be
implemented
by
individual links
locally.
The
idea is
to approximate the
ma-
trices
A(T)
by
their diagonal
terms.
Let
B(7)
=
diag(P'(T),1
E
L)
be
the
L
x
L
diagonal
matrices whose diagonal elements
are
In
words
the
Ith
diagonal elements
P~(T)
of
B(7)
are the
sums
of
(f),
sum
over all sources
s
that
traverse link
I
and have
a
round trip delay
of
7.
With
this
approximation
there
is
enough degree
of
freedom,
provided
2
2
2,
that
we
can reduce the
gain
matrix
G
to
an
arbitrary
scalar
-y
>
0.
This means that
we
can
choose
7
to
be
sufficiently small
to
ensure global convergence; contrast this
with
the
exact
deadbeat control law
of
the
last subsection.
We
will
see
in
the
next section that
the
performance
of
this controller
can
be
better
than
that
of
the
exact deadbeat controller
whose
stepsize
y
is
fixed
by
the
condition
(13).
The
conditions corresponding
to
(15-17)
in
Theorem
I
reduce to the following choice
of
weights
pr
(7):
and
for all other
I
and
T,
p1(~)
are
chosen to
satisfy
d
r=O
This control law
is much simpler than
that expressed
in The-
orem
1:
a link
I
can choose
its
own weights
pl(~)
based
on
the information about sources that traverse link
1;
we
will
come
back
to this point
in
Section
V.
With
this approximate deadbeat controller the
corre-
0-7803-5880-5/00/$10.00
(c)
2000
IEEE
142
9
IEEE
INFOCOM
2000