of 5
High
Speed
Networks
OPTIMIZATION FLOW CONTROL WITH
NEWTON-LIKE
ALGORITHM
*
Sanjeewa Athuraliya
and Steven
Low
Department
of
EEE, University
of
Melbourne,
Parkville,
Vic
3052,
Australia
Abstract
We
proposed
earlier
an
optimization approach
to
reactive
flow
control
where
the objective
of
the control
is to
maxi-
mize
the aggregate
utility
of
all
sources over
their
transmis-
sion
rates.
The control mechanism
is
derived as
a gradi-
ent
projection
algorithm
to
solve the dual problem.
In
this
paper
we
extend
the algorithm
to
a scaled gradient projec-
tion.
The diagonal scaling matrix approximates
the
diago-
nal
terms
of
the Hessian
and
can
be
computed
at individual
links
using
the same information required
by
the unscaled
algorithm.
We
prove
the
convergence
of
the scaled algo-
rithm
and
present simulation
results
that
illustrate its
supe-
riority to the
unscaled algorithm.
1
Introduction
We
have
proposed previously
an
optimization approach
to
flow
control
where
the control mechanism
is
derived as
a
gradient projection algorithm
to
solve the dual
of
a global
optimization problem
[14,
13,
171.
The
solution
is decom-
posed
into
simple algorithms
that
are executed
at
individ-
ual
links
and
sources using
‘local’
information.
It is
well
known that Newton
method, where
the
gradient
is
scaled
by
the inverse
of
the second derivative matrix, typically en-
joys
a much
faster
convergence than gradient projection
al-
gorthim.
For
us, however,
the exact
Newton method will
require
non-local
information and hence cannot
be
easily
implemented
in
a large network.
The
purpose
of this
paper
is to
describe
an
approximate
Newton method to
solve the
dual
optimization problem using only diagonal
scaling,
and
illustrate its
behavior
with
preliminary simulation
results.
Specifically consider
a network
that
consists
of
a
set
L
of
unidirectional
links
of
capacities
cl,
1
E
L.
The
network
is
shared
by
a set
S
of
sources, where source
s
is characterized
by
a utility
function
U,(z,)
that
is concave increasing
in
its
transmission
rate
xg.
The
goal
is
to
calculate source
rates
that
maximize
the
sum
of
the
utilities
CsES
Us(z,)
over
x,
subject
to
capacity
constraints.
Solving
this
problem
centrally
would
require
not
only the knowledge
of
all
util-
ity
functions,
but worse
still,
complex coordination among
‘The
first author
acknowledges
the Australian Commonwealth
Gov-
ernment
for
their Australian
Postgraduate
Award, and
the
second
author
acknowledges the
support
of
the
Australian Research Council
under grant
S499705.
potentially
all
sources due
to
coupling
of
sources through
shared
links.
The
key
is to
solve the dual problem
that
de-
composes the
task
into
simple local computations
at indi-
vidual
links
and
sources.
The
algorithm
takes
the familiar form
of
reactive
flow
con-
trol. Based
on the
local
aggregate
source
rate
each link
1
E
L
calculates
a ‘price’
pl
for
a unit
of
bandwidth.
A
source
s
is fed back
the scalar price
ps
=
pl,
where
the sum
is
taken over all
links
that
s
uses,
and
it chooses
a transmis-
sion
rate
x,
that
maximizes
its
own
benefit
U,(z,)
-
psz,,
utility minus
the bandwidth
cost.
These individually
opti-
mal
rates
(x,
(p”)
s
E
S)
may
not
be
socially optimal for
a
general price
vector
(pl,
1
E
L),
i.e., they may
not maximize
the aggregate
utility.
The
algorithm
iteratively
approaches
a price
vector
(p;
1
E
L)
that
aligns individual
and
social
optimality.
The basic algorithm
to
solve the dual problem presented
in
[
141
is a gradient projection method.
A
preliminary proto-
type
based
on
this
algorithm
is
discussed
in
[13].
Its
con-
vergence
is
proved
in
[18]
in both
synchronous
and
asyn-
chronous
settings.
The
basic algorithm requires commu-
nication
of
link
prices
to
sources and source
rates to links
and
is
thus
not
implementatble
in
the current
Internet.
In
[15],
we
describe
a Random
Early Marking
(REM)
scheme
which can be regarded
as
a practical
implementation
of
the
basic algorithm
in
[14,
181
using binary feedback.
It can
be
implemented,
e.g.,
with
the proposed
explicit
congestion
notification
(ECN)
bit
in
the
IP
(Internet
Protocol) header
[6,201.
In
this
paper,
we
generalize the basic algorithm
of
gradient
projection
to an
approximate
Newton
algorithm
that
has
a
better
convergence property.
There
is a tremendous
literature
on flow
control,
including
early
schemes based
on
practical
experiences,
e.g.,
[lo,
71,
and
recent schemes based on control
theory,
e.g.,
[l,
3,
41.
Optimization based
flow
control
have been
proposed
in
[9,
5,
11,
12,
8,
14, 16,
181
All
these
works
motivate
flow
control
by
an
optimization problem
and
derive
their
control
mechanisms as solutions
to
the optimization problem. They
differ
in
their
choice
of
objective functions
or
their
solution
1264
0-7803-5796-5/99/$10.00
0
1999
IEEE
Global Telecommunications
Conference
-
Globecorn’99
High
Speed
Networks
approaches,
and
result
in rather different
flow
control
mech-
anisms
to
be implemented at the sources
and
the
network
links.
The
present paper is structured as follows.
In
Section
2
we
review
our optimization framework and describe the
New-
ton
like algorithm.
In
Section
3
we first
show
that the
al-
gorithm converges
and
then show,
through simulations,
that
it converges
significantly faster
than
the gradient projection
algortihm
of
[14,
161.
All
proofs are omitted due
to
space
limitation.
2
Model
and algorithm
2.1
Model
Consider a
network that
consists
of
a set
L
=
(1,.
. .
,
L}
of
unidirectional
links
of
capacities
cl,
1
E
L.
'
The
net-
work
is shared
by
a set
S
=
{
1,
.
. .
,
S}
of
sources. Source
s
is characterized
by
four parameters
(L(
s),
U,,
m,,
Ms).
The
path
L(s)
L
is a subset
of
links
that
source
s
uses,
U,
:
?J?+
?J?
is
a utility function,
m,
2
0
and
M,
5
CO
are
the
minimum
and
maximum transmission rates, respec-
tively,
required
by
source
s.
Source
s
attains a
utility
U,(z,)
when
it transmits
at rate
z,
that satisfies
m,
5
z,
5
Ms.
Let
I,
=
[m,,
M,]
denote the range in
which
source rate
2,
must lie
and
I
=
(I,,s
E
S)
be the
vector.
We
as-
sume
that
U,
is increasing,
strictly
concave,
and
twice con-
tinuously
differentiable
on
I,[m,,
M,].
For
each link
I
let
S(1)
=
{s
E
S
1
1
E
L(s))
be
the set
of
sources
that use
link
1.
Note that
1
E
L(s)
if and
only
ifs
E
S(1).
Our objective
is to choose source rates
z
=
(xs,
s
E
S)
so
as
to:
p:
max&€I.
CUs(zs)
(1)
S
subject to
z,
5
CI,
E
=
1,
...,
L.
(2)
SES(1)
The constraint (2) says
that
the aggregate source rate
at any
link
1
is less
than
the capacity.
A
unique
maximizer,
called
the
primal
optimal solution, exists since
the
objective func-
tion
is strictly concave, and hence continuous,
and
the
fea-
sible solution
set
is compact.
Though the objective function
is separable
in
z,,
the source
rates
2,
are coupled
by
the
constraint
(2).
Solving the
pri-
mal
problem
(1-2)
directly requires coordination
among
possibly
all sources
and
is impractical in
real networks.
The
key
to
a distributed
and
decentralized solution is to
look
at
We
abuse notation and use the
same
symbol
to
denote both a
set and
its
cardinality when there
is
no
danger
of
confusion.
its dual, e.g.,
[2,
Section
3.4.21, [19]:
The
first
term
of
the
dual objective function
D(p)
is decom-
posed
into
S
separable subproblems
(4-5).
If we
interpret
pi
as the price
per unit
bandwidth at link
1
then
ps
is the
total
price per
unit
bandwidth for
all
links
in
the
path
of
s.
Hence
zsps
represents the bandwidth cost to source
s
when
it
transmits at rate
z,,
and
Bs(pS)
represents the
maximum
benefit
s
can
achieve
at the given
price
ps.
A
source
s
can be
induced to
solve maximization (4)
by
bandwidth
charging.
For
each
p,
a unique
maximizer, denoted
by
z,(p),
exists
since
U,
is strictly concave.
In
general
(zs(p),
s
E
S)
may not be
primal
optimal,
but
by
the
duality
theory,
there exists a
p*
2
0
such that
(z,(p*),
s
E
S)
is indeed primal optimal. Hence
we
will
focus
on
solving the dual problem
(3).
Once
we
have
ob-
tained
the minimizing prices
p*
the
primal
optimal source
rates
x*
=
z(p*)
can
be obtained
by
individual
sources
s
by
solving (4), a simple maximization (see
below).
The im-
portant
point to note
is that,
given
p*,
individual sources
s
can
solve (4)
separately without
the
need to coordinate
with
other sources.
In
a sense
p*
serves as a coordination signal
that
aligns individual optimality
of
(4)
with
social optimal-
ity
of(1)2.
Indeed
the unique maximizer
z(p)
for (4)
can
be
given
ex-
plicitly,
from the
Kuhn-Tucker
theorem, in
terms
of
the
marginal
utility3:
where
[z]:
=
max{a,min{b,z}}.
Here
U:-'
is
the
in-
verse
of
U:,
which
exists over
the
range
[Ui(M,),
Ui(rn,)]
if
U:
is continuous
and
U,
strictly
concave. Let
z(p)
=
(ES(P),S
E
SI.
'Despite the notation, a source
s
does
not require the vector price
p,
but
only
a scalar
p"
=
clEL.(s)
pl
that represents the
sum
of
link prices
on
its path;
see
below.
3We
abuse notation and
use
zs
(.)
both
as
a function
of
scalar price
p
E
R+
and
of
vector price
p
E
d:'.
When
p
is a scalar,
zs
(p)
is given
by
(6).
When
p
is a vector,
zs(p)
=
z,(ps)
=
zs(clcL(s)pl).
The
meaning should be clear from the context.
Global
Telecommunications
Conference
-
Globecom'99
1265
--
High
Speed
Networks
2.2
Algorithm
gradient projection algorithm where
link
prices are adjusted
in
opposite direction
to
the gradient
VD(p(t)):
zs(t)
=
[U;-l(p"(t
-
l))]::
is
the source
rate
at
time
as each
link
Z
requires the
path
price
p"(t)
of
all
sources
s
E
S(1)
going
through
1.
Instead
we
use link price
pl(t)
as
a substitute:
In [14,
161
we
propose
to
solve the dual problem using the
t'
This
approximation
is
however
to
imp1ement
assumption,
the utility
functions are
stiictly
concave
and
hence
VD(p)
indeed
exists
with
its
Z-th
component
given
by:
Simulation
results
suggest
that using
pl
(t)
in
place
of
p"(t)
has a similar
behavior.
Finally
we
ensure
that
H(t)
is
strictly
positive
definite
by
making
its
diagonal
terms at
least
as large as
an
6
>
0:
(8)
where
z1
(p)
:=
CsEs(l)
z,
(p)
is the aggregate source
rate
at
link
1.
Hence the algorithm
is decentralized: each
link
1
can individually carry out the price adjustment
given
the
aggregate source
rates
d
(t)
at its
link,
and each source can
dD
dP1
-b)
=
Cl
-
.YP)
(10)
rnax(6,
ifk
=
z
otherwise
Hkl(t)
=
individually compute
its rate
using
(6)
given the scalar price
we
summarize.
PS(t)
=
C&L.(")
Pl(t).
It
is well known that Newton
method, where the gradient
is
scaled
by
the inverse
of
the Hessian,
*korithm:
At
update
times
t
=
1,2,.
.
.:
gradient
Projection
P(t
-k
1)
=
b(t)
-
r[v2D@(t))l-'VD(p(t))1+
(9)
Link
1's
algorithm:
typically
converges
much
than the gradient projection
algorithm
(7).
This
price adjustment
however
is
difficult
to
implement
in
a large network since the
Hessian
V2D(p)
computation cannot
be
distributed
to
individual
links,
as
a
link
may
require the
rates
or
the second
derivatives
of
utili-
ties
of
sources
at other links
[16,
Lemma21. This
is clearly
not
scalable.
We
propose instead an approximating positive
definite
diagonal scaling matrix
H(t)
that
can
be
computed
Given Source
rates
2,
(t),
s
E
S(l),
at
ljnk
1,
compute a new
price
Pl(t
+
1)
=
[P&)
+
rHi'(z"t)
-
Cl)]+.
wherezl(t)
=
Cscs(i)
zs(t)
andHll
is
given
bY
(10)-
Source
"'
at individual
links
using
the same information available
un-
der
the
gradient projection algorithm.
Given
path
price
p"(t)
=
&L(8)pl(t),
choose
a
new
source
rate
z,
(t
+
1)
:
zs(t
+
1)
=
arg
max
Us(zs)
-
ps(t)zs
First
H(t)
retains
only the diagonal terms
of
the
Hessian
and has
zero off-diagonal
terms.
Second the diagonal terms
are
approximated
by
finite
differences. From
(8)
the diago-
x.a.
=
[U;-'(P"(t))lmM,"
nal
terms are
3
Performance
In this
section
we
first
prove
that
the scaled gradient projec-
tion
algorithm
given
in
the
last
section converges. Then
we
illustrate
through simulation studies
that its
convergence
is
superior
to the unscaled algorithm.
zs(t)
-
zs(t
-
1)
=
-
c
p"(t)
-p"(t-
1)
3.1
Convergence
The
scaled algorithm generates a sequence
that
approaches
the optimal
rate allocation,
provided
the following condi-
tions
are
satisfied:
sES(1)
where
s',(P"(t))
is
the
total
derivative
of
the scalar func-
tion
xs(-)
evaluated
at
the
path
price
p"(t)
at
time
t,
and
1266
Global Telecommunications
Conference
-
Globecom'99
High
Speed
Networks
ht2
Unt3
UnkI
0
+p+
C1:
On
the
interval
I,
=
[m,,M,],
the
utility functions
U,
are increasing, strictly concave,
and twice
continu-
ously
differentiable.
C2: The curvatures
of
U,
are bounded
away
from zero
on
I,:
-Ui(x,)
2
l/~,
>
0
for all
x,
E
I,.
lbo-
m-
I.-
pi.
1:
sc.nca
I:
I
I
4,
01
I"
a-
These conditions
imply
the
VD
is Lipschitz
which
leads
to the
convergence
of
the
algorithm.
Define
L
:=
maxSEs
IL(s)I,
3
:=-maqEL
IS(l)l,
and
Z
:=
max
{E,,
s
E
S}.
In
words
X
is the length
of
a longest
path
used
by
the sources,
7
is the number
of
sources shar-
ing
a most
congested link, and
E
is the
upper bound
on
all
-
4Ji(X8).
Theorem
1
Suppose assumptions
CI-C2
hold and
the
step
size
satisjes
0
<
y
<
2~/%%.
Then
starting
from
any
initial
rates
m
5
x(0)
5
M
and
prices
p(0)
2
0,
every
limit
point
(x*,p*)
of
the
sequence
(x(t),p(t))
generated
by
the
algorithm are
primal-dual
optimal.
That
is,
x
*
gives
the
source
rates
that
maximize aggregate utility
and
p*
the
shadow bandwidth
prices.
Note
that
x*
is unique
but
p*
may
not be unique.
We
have
found from
our simulation experience that,
in
practice, a
step size
y
much
larger
than
the bound in the theorem can be
used,
e.g.,
in
the
simulation reported
below,
y
=
1.
More-
over
the
scaled algorithm seems
much
less sensitive to
y
than the unscaled
algorithm.
3.2
Simulation
results
We
now
present
simulation study carried out for the
net-
work in
Figure 1
shared
by
five
connections,
with
sources
Si
and
destinations
Di,
i
=
1,.
.
.
,
5.
Connection
SI-Dl
time
40s
S3
at time
OS,
S4
at time 120s,
S5
at time
160s.
Once
turned
on, sources S2,
S3,
and
S4
remained active
un-
til time
240s,
and
S5
turned
off
earliest at time 200s. This
enabled us to
observe the dynamic behaviour
of
the algo-
rithm
as demand for bandwidth varies. The
utility
functions
of
the sources
were
set
to
a,log(l
+
x,),
with
a,
equal
to
1
x
lo4,
5
x
lo4,
7
x
lo4,
6
x
lo4,
2
x
lo4
for sources
S
1 ,S2,S3,S4
and
S5
respectively.
Notice
that the
longest
connection
S1-Dl
was
set to
have
the smallest
marginal
utility.
The step size
y
used
to adjust the
link prices
was
set
to
1.
A new
link bandwidth price
was
calculated
every
1s. The target bandwidth was'set at 200
packets
per 1s
mea-
suring interval.
Figure 2
shows
the source rates for each source
under
the
unscaled
gradient projection algorithm.
From time
WOs,
only
source
S1
was
active. Its rate climbed steadily
to the
target
bandwidth
of
200
packetsh.
From time 40s, source
S2
became
active whose rate, after
an
initial overshoot, sta-
bilized to
about 167
packet&.
This squeezed source
Sl's
rate
to about
33
packetds.
At
these rates sources S1
and
S2
had
the same marginal
utility. At
times
OS,
120s,
and
160s
when
sources
S3,
S4,
S5
became-active, similar dynamics
were
observed.
Sl's
rate bounced
back to
200 packetds af-
ter
all other sources
had
turned
off.
Figure
3
shows the source rates under the scaled gradient
projection
algorithm. While
the
same
kind
of
interaction
among
the sources occurred as under
the unscaled
algo-
rithm,
we
see
that
the convergence
to
optimal rates
was
achieved much
faster under the scaled algorithm,
though
the
magnitude
of
rate fluctuation
was
also
much larger.
The
faster
convergence
rate implied less overloading of,
and
hence
much
less
buffer
requirement at, the links. Figure
4
shows
the
buffer
occupancy at each link under the
two
schemes. Figure
3
shows
the
source rates
under
the
scaled
gradient projection algorithm.
times
of
the other sources are staggered
with
S2
starting
at
Global
Telecommunications
Conference
-
Globecom'99
1267
High
Speed
Networks
--I
I
I.
I
Y)'mE-Imm
Figure
3:
Source
rates
under scaled gradient projection.
4
Conclusion
The
flow
control mechanism
of
[14,
161
is
derived
as a
gradient projection algorithm
to
solve a dual optimization
problem.
In
this
paper
we
have
extended the algorithm
to
a
scaled gradient projection, using a diagonal scaling
that
can
be
implemented
with
the same information as
that
is avail-
able under the basic algorithm
of
[14,
161.
We
have proved
the convergence
of
the algorithm and
have
presented sim-
ulation
results that illustrate its
superior performance com-
pared
to
the unscaled
algori
UD
:1
ax,
00
---
'-I'
IM
References
Im.
Figure
4:
Buffer
occupancy.
[l]
L.
Benmohamed and
S.
M.
Meerkov. Feedback
control
of
congestion
in
store-and-forward networks: the
case
of
a sin-
gle
congested node.
IEEWACM
Transactions
on
Network-
ing,
1(6):693-707,
December 1993.
[2]
Dimitri
P. Bertsekas and John
N.
Tsitsiklis.
Parallel
and
dis-
tributed computation.
Prentice-Hall, 1989.
[3]
Flavio Bonomi, Debasis Mitra, and
Judith
B.
Seery. Adaptive
algorithms
for
feedback-based
flow
control
in
high-speed
wide-area
ATM
networks.
IEEE Journal
on
Selected Areas
in
Communications,
13(7):
1267-1283,
September 1995.
[4]
S.
Chong,
R.
Nagarajan, and
Y.-T.
Wang.
Designing stable
ABR
flow
control with rate feedback and open
loop
control:
first order control case.
Performance
Evaluation,
34(4):
189-
206, December 1998.
[SI
Costas Courcoubetis, Vasilios A. Siris, and George
D.
Sta-
moulis. Integration
of
pricing and
flow
control
for
ABR
services
in
ATM
networks.
Proceedings
of
Globecom
'96,
November 1996.
[6]
S.
Floyd.
TCP
and Explicit Congestion Notification.
ACM
Computer Communication
Review,
24(5), October 1994.
[7]
S.
Floyd and
V.
Jacobson. Random early detection gateways
for congestion avoidance.
IEEELACM
Trans.
on
Networking,
1(4):397413,
August 1993.
[8]
R.
J. Gibbens and
F.
P. Kelly.
Resource pricing and
the
evo-
lution
of
congestion control.
Automatica,
35,
1999.
[9] Jamal Golestani and Supratik Bhattacharyya. End-to-end
congestion control
for
the Internet: A global optimization
framework.
In
Proceedings
of
International
Conj
on
Net-
work
Protocols
(ICNP),
October 1998.
[
101
V.
Jacobson. Congestion avoidance and control.
Proceedings
of
SIGCOMM'88,
ACM,
August 1988.
An
updated version
is available via
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[
111 F.
P. Kelly.
Charging and rate control
for
elastic traffic.
Eu-
ropean
Transactions
on
Telecommunications,
8:33-37,
1997.
http://www.statslab.cam.ac.uk/hanWelastic.html.
[
121
Frank
P. Kelly, Aman Maulloo, and David
Tan.
Rate control
for
communication networks: Shadow prices, proportional
fairness and stability.
Journal
of
Operations
Research Soci-
ety,
49(3):237-252,
March 1998.
[13] David
E.
Lapsley and Steven
H.
Low.
An
IP
Implementation
of Optimization Flow Control.
In
Proceedings
of
the
Globe-
com'98,
November
1998.
[14] David
E.
Lapsley and Steven
H.
Low. An
optimization ap-
proach
to
ABR control.
In
Proceedings
of
the
ICC,
June
1998.
[
151
David E. Lapsley and Steven
H.
Low.
Optimization
flow
con-
trol,
11:
REM (Random Early Marking) and enhancements.
To
be submitted for publication, 1999.
[16] David E. Lapsley and Steven
H.
Low.
Random Early Mark-
ing
for
Internet Congestion Control. In
Proceedings
of
IEEE
Globecom'99,
December 1999.
[17] Steven
H.
Low.
Optimization
flow
control with on-line mea-
surement.
In
Proceedings
of
the
ITC,
volume 16,
June
1999.
[
181
Steven
H.
Low and David E. Lapsley. Optimization
flow
con-
trol,
I:
basic algorithm and convergence.
IEEELACM
Tran-
sancations
on
Networking,
1999.
To
appear.
[
191
David
G.
Luenberger.
Linear and Nonlinear Programming,
2nd
Ed.
Addison-Wesley Publishing Company, 1984.
[20]
K.
K.
Ramakrishnan and
S.
Floyd. A Proposal to add Explicit
Congestion Notification (ECN) to IP. Internet draft draft-
kksjf-ecn-01 .txt, July 1998.
1268
Global Telecommunications
Conference
-
Globecom'99