Optimization flow control with newton-like algorithm - Global Telecommunications Conference, 1999. GLOBECOM '99

High

Speed

Networks

OPTIMIZATION FLOW CONTROL WITH

NEWTON-LIKE

ALGORITHM

*

Sanjeewa Athuraliya

and Steven

Low

Department

of

EEE, University

of

Melbourne,

Parkville,

Vic

3052,

Australia

Abstract

We

proposed

earlier

an

optimization approach

to

reactive

flow

control

where

the objective

of

the control

is to

maxi-

mize

the aggregate

utility

of

all

sources over

their

transmis-

sion

rates.

The control mechanism

is

derived as

a gradi-

ent

projection

algorithm

to

solve the dual problem.

In

this

paper

we

extend

the algorithm

to

a scaled gradient projec-

tion.

The diagonal scaling matrix approximates

the

diago-

nal

terms

of

the Hessian

and

can

be

computed

at individual

links

using

the same information required

by

the unscaled

algorithm.

We

prove

the

convergence

of

the scaled algo-

rithm

and

present simulation

results

that

illustrate its

supe-

riority to the

unscaled algorithm.

1

Introduction

We

have

proposed previously

an

optimization approach

to

flow

control

where

the control mechanism

is

derived as

a

gradient projection algorithm

to

solve the dual

of

a global

optimization problem

[14,

13,

171.

The

solution

is decom-

posed

into

simple algorithms

that

are executed

at

individ-

ual

links

and

sources using

‘local’

information.

It is

well

known that Newton

method, where

the

gradient

is

scaled

by

the inverse

of

the second derivative matrix, typically en-

joys

a much

faster

convergence than gradient projection

al-

gorthim.

For

us, however,

the exact

Newton method will

require

non-local

information and hence cannot

be

easily

implemented

in

a large network.

The

purpose

of this

paper

is to

describe

an

approximate

Newton method to

solve the

dual

optimization problem using only diagonal

scaling,

and

illustrate its

behavior

with

preliminary simulation

results.

Specifically consider

a network

that

consists

of

a

set

L

of

unidirectional

links

of

capacities

cl,

1

E

L.

The

network

is

shared

by

a set

S

of

sources, where source

s

is characterized

by

a utility

function

U,(z,)

that

is concave increasing

in

its

transmission

rate

xg.

The

goal

is

to

calculate source

rates

that

maximize

the

sum

of

the

utilities

CsES

Us(z,)

over

x,

subject

to

capacity

constraints.

Solving

this

problem

centrally

would

require

not

only the knowledge

of

all

util-

ity

functions,

but worse

still,

complex coordination among

‘The

first author

acknowledges

the Australian Commonwealth

Gov-

ernment

for

their Australian

Postgraduate

Award, and

the

second

author

acknowledges the

support

of

the

Australian Research Council

under grant

S499705.

potentially

all

sources due

to

coupling

of

sources through

shared

links.

The

key

is to

solve the dual problem

that

de-

composes the

task

into

simple local computations

at indi-

vidual

links

and

sources.

The

algorithm

takes

the familiar form

of

reactive

flow

con-

trol. Based

on the

local

aggregate

source

rate

each link

1

E

L

calculates

a ‘price’

pl

for

a unit

of

bandwidth.

A

source

s

is fed back

the scalar price

ps

=

pl,

where

the sum

is

taken over all

links

that

s

uses,

and

it chooses

a transmis-

sion

rate

x,

that

maximizes

its

own

benefit

U,(z,)

-

psz,,

utility minus

the bandwidth

cost.

These individually

opti-

mal

rates

(x,

(p”)

s

E

S)

may

not

be

socially optimal for

a

general price

vector

(pl,

1

E

L),

i.e., they may

not maximize

the aggregate

utility.

The

algorithm

iteratively

approaches

a price

vector

(p;

1

E

L)

that

aligns individual

and

social

optimality.

The basic algorithm

to

solve the dual problem presented

in

[

141

is a gradient projection method.

A

preliminary proto-

type

based

on

this

algorithm

is

discussed

in

[13].

Its

con-

vergence

is

proved

in

[18] 

in both

synchronous

and

asyn-

chronous

settings.

The

basic algorithm requires commu-

nication

of

link

prices

to

sources and source

rates to links

and

is

thus

not

implementatble

in

the current

Internet.

In

[15],

we

describe

a Random

Early Marking

(REM) 

scheme

which can be regarded

as

a practical

implementation

of

the

basic algorithm

in

[14,

181

using binary feedback.

It can

be

implemented,

e.g.,

with

the proposed

explicit

congestion

notification

(ECN)

bit

in

the

IP

(Internet

Protocol) header

[6,201.

In

this

paper,

we

generalize the basic algorithm

of

gradient

projection

to an

approximate

Newton

algorithm

that

has

a

better

convergence property.

There

is a tremendous

literature

on flow

control,

including

early

schemes based

on

practical

experiences,

e.g.,

[lo,

71,

and

recent schemes based on control

theory,

e.g.,

[l,

3,

41.

Optimization based

flow

control

have been

proposed

in

[9,

5,

11,

12,

8,

14, 16,

181

All

these

works

motivate

flow

control

by

an

optimization problem

and

derive

their

control

mechanisms as solutions

to

the optimization problem. They

differ

in

their

choice

of

objective functions

or

their

solution

1264

0-7803-5796-5/99/$10.00

0

1999

IEEE

Global Telecommunications

Conference

-

Globecorn’99

High

Speed

Networks

approaches,

and

result

in rather different

flow

control

mech-

anisms

to

be implemented at the sources

and

the

network

links.

The

present paper is structured as follows.

In

Section

2

we

review

our optimization framework and describe the

New-

ton

like algorithm.

In

Section

3

we first

show

that the

al-

gorithm converges

and

then show,

through simulations,

that

it converges

significantly faster

than

the gradient projection

algortihm

of

[14,

161.

All

proofs are omitted due

to

space

limitation.

2

Model

and algorithm

2.1

Model

Consider a

network that

consists

of

a set

L

=

(1,.

. .

,

L}

of

unidirectional

links

of

capacities

cl,

1

E

L.

'

The

net-

work

is shared

by

a set

S

=

{

1,

.

. .

,

S}

of

sources. Source

s

is characterized

by

four parameters

(L(

s),

U,,

m,,

Ms). 

The

path

L(s)

L

is a subset

of

links

that

source

s

uses,

U,

:

?J?+

?J?

is

a utility function,

m,

2

0

and

M,

5

CO

are

the

minimum

and

maximum transmission rates, respec-

tively,

required

by

source

s.

Source

s

attains a

utility

U,(z,)

when

it transmits

at rate

z,

that satisfies

m,

5

z,

5

Ms.

Let

I,

=

[m,,

M,]

denote the range in

which

source rate

2,

must lie

and

I

=

(I,,s

E

S)

be the

vector.

We

as-

sume

that

U,

is increasing,

strictly

concave,

and

twice con-

tinuously

differentiable

on

I,[m,,

M,].

For

each link

I

let

S(1)

=

{s

E

S

1

E

L(s))

be

the set

of

sources

that use

link

1.

Note that

1

E

L(s)

if and

only

ifs

E

S(1).

Our objective

is to choose source rates

z

=

(xs, 

s

E

S)

so

as

to:

p:

max&€I.

CUs(zs)

(1)

S

subject to

z,

5

CI,

E

=

1,

...,

L.

(2)

SES(1)

The constraint (2) says

that

the aggregate source rate

at any

link

1

is less

than

the capacity.

A

unique

maximizer,

called

the

primal

optimal solution, exists since

the

objective func-

tion

is strictly concave, and hence continuous,

and

the

fea-

sible solution

set

is compact.

Though the objective function

is separable

in

z,,

the source

rates

2,

are coupled

by

the

constraint

(2).

Solving the

pri-

mal

problem

(1-2)

directly requires coordination

among

possibly

all sources

and

is impractical in

real networks.

The

key

to

a distributed

and

decentralized solution is to

look

at

We

abuse notation and use the

same

symbol

to

denote both a

set and

its

cardinality when there

is

no

danger

of

confusion.

its dual, e.g.,

[2,

Section

3.4.21, [19]:

The

first

term

of

the

dual objective function

D(p)

is decom-

posed

into

S

separable subproblems

(4-5).

If we

interpret

pi

as the price

per unit

bandwidth at link

1

then

ps

is the

total

price per

unit

bandwidth for

all

links

in

the

path

of

s.

Hence

zsps

represents the bandwidth cost to source

s

when

it

transmits at rate

z,,

and

Bs(pS) 

represents the

maximum

benefit

s

can

achieve

at the given

price

ps.

A

source

s

can be

induced to

solve maximization (4)

by

bandwidth

charging.

For

each

p,

a unique

maximizer, denoted

by

z,(p),

exists

since

U,

is strictly concave.

In

general

(zs(p),

s

E

S)

may not be

primal

optimal,

but

by

the

duality

theory,

there exists a

p*

2

0

such that

(z,(p*),

s

E

S)

is indeed primal optimal. Hence

we

will

focus

on

solving the dual problem

(3).

Once

we

have

ob-

tained

the minimizing prices

p*

the

primal

optimal source

rates

x*

=

z(p*)

can

be obtained

by

individual

sources

s

by

solving (4), a simple maximization (see

below).

The im-

portant

point to note

is that,

given

p*,

individual sources

s

can

solve (4)

separately without

the

need to coordinate

with

other sources.

In

a sense

p*

serves as a coordination signal

that

aligns individual optimality

of

(4)

with

social optimal-

ity

of(1)2.

Indeed

the unique maximizer

z(p)

for (4)

can

be

given

ex-

plicitly,

from the

Kuhn-Tucker

theorem, in

terms

of

the

marginal

utility3:

where

[z]:

=

max{a,min{b,z}}.

Here

U:-'

is

the

in-

verse

of

U:,

which

exists over

the

range

[Ui(M,),

Ui(rn,)]

if

U:

is continuous

and

U,

strictly

concave. Let

z(p)

=

(ES(P),S 

E

SI.

'Despite the notation, a source

s

does

not require the vector price

p,

but

only

a scalar

p"

=

clEL.(s)

pl

that represents the

sum

of

link prices

on

its path;

see

below.

3We

abuse notation and

use

zs

(.)

both

as

a function

of

scalar price

p

E

R+

and

of

vector price

p

E

d:'. 

When

p

is a scalar,

zs

(p)

is given

by

(6).

When

p

is a vector,

zs(p)

=

z,(ps)

=

zs(clcL(s)pl).

The

meaning should be clear from the context.

Global

Telecommunications

Conference

-

Globecom'99

1265

--

High

Speed

Networks

2.2

Algorithm

gradient projection algorithm where

link

prices are adjusted

in

opposite direction

to

the gradient

VD(p(t)):

zs(t)

=

[U;-l(p"(t

-

l))]::

is

the source

rate

at

time

as each

link

Z

requires the

path

price

p"(t)

of

all

sources

s

E

S(1)

going

through

1.

Instead

we

use link price

pl(t)

as

a substitute:

In [14,

161

we

propose

to

solve the dual problem using the

t'

This

approximation

is

however

to

imp1ement

assumption,

the utility

functions are

stiictly

concave

and

hence

VD(p)

indeed

exists

with

its

Z-th

component

given

by:

Simulation

results

suggest

that using

pl

(t)

in

place

of

p"(t)

has a similar

behavior.

Finally

we

ensure

that

H(t)

is

strictly

positive

definite

by

making

its

diagonal

terms at

least

as large as

an

6

>

0:

(8)

where

z1

(p)

:=

CsEs(l)

z,

(p)

is the aggregate source

rate

at

link

1.

Hence the algorithm

is decentralized: each

link

1

can individually carry out the price adjustment

given

the

aggregate source

rates

d

(t)

at its

link,

and each source can

dD

dP1

-b)

=

Cl

-

.YP) 

(10)

rnax(6,

ifk

=

z

otherwise

Hkl(t)

=

individually compute

its rate

using

(6)

given the scalar price

we

summarize.

PS(t) 

=

C&L.(")

Pl(t).

It

is well known that Newton

method, where the gradient

is

scaled

by

the inverse

of

the Hessian,

*korithm:

At

update

times

t

=

1,2,.

.

.:

gradient

Projection

P(t

-k

1)

=

b(t)

-

r[v2D@(t))l-'VD(p(t))1+

(9)

Link

1's

algorithm:

typically

converges

much

than the gradient projection

algorithm

(7).

This

price adjustment

however

is

difficult

to

implement

in

a large network since the

Hessian

V2D(p)

computation cannot

be

distributed

to

individual

links,

as

a

link

may

require the

rates

or

the second

derivatives

of

utili-

ties

of

sources

at other links

[16,

Lemma21. This

is clearly

not

scalable.

We

propose instead an approximating positive

definite

diagonal scaling matrix

H(t)

that

can

be

computed

Given Source

rates

2,

(t),

s

E

S(l),

at

ljnk

1,

compute a new

price

Pl(t

+

1)

=

[P&) 

+

rHi'(z"t)

-

Cl)]+.

wherezl(t)

=

Cscs(i)

zs(t)

andHll

is

given

bY

(10)- 

Source

"'

at individual

links

using

the same information available

un-

der

the

gradient projection algorithm.

Given

path

price

p"(t)

=

&L(8)pl(t),

choose

a

new

source

rate

z,

(t

+

1)

:

zs(t

+

1)

=

arg

max

Us(zs) 

-

ps(t)zs

First

H(t)

retains

only the diagonal terms

of

the

Hessian

and has

zero off-diagonal

terms.

Second the diagonal terms

are

approximated

by

finite

differences. From

(8)

the diago-

x.a.

=

[U;-'(P"(t))lmM,"

nal

terms are

3

Performance

In this

section

we

first

prove

that

the scaled gradient projec-

tion

algorithm

given

in

the

last

section converges. Then

we

illustrate

through simulation studies

that its

convergence

is

superior

to the unscaled algorithm.

zs(t)

-

zs(t

-

1)

=

-

c

p"(t)

-p"(t-

1)

3.1

Convergence

The

scaled algorithm generates a sequence

that

approaches

the optimal

rate allocation,

provided

the following condi-

tions

are

satisfied:

sES(1)

where

s',(P"(t))

is

the

total

derivative

of

the scalar func-

tion

xs(-) 

evaluated

at

the

path

price

p"(t)

at

time

t,

and

1266

Global Telecommunications

Conference

-

Globecom'99

High

Speed

Networks

ht2

Unt3

UnkI

0

+p+

C1:

On

the

interval

I,

=

[m,,M,],

the

utility functions

U,

are increasing, strictly concave,

and twice

continu-

ously

differentiable.

C2: The curvatures

of

U,

are bounded

away

from zero

on

I,:

-Ui(x,) 

2

l/~,

>

0

for all

x,

E

I,.

lbo- 

m-

I.-

pi.

1:

sc.nca

I:

I

4,

01

I"

a-

These conditions

imply

the

VD

is Lipschitz

which

leads

to the

convergence

of

the

algorithm.

Define

L

:=

maxSEs

IL(s)I,

3

:=-maqEL

IS(l)l,

and

Z

:=

max

{E,,

s

E

S}.

In

words

X

is the length

of

a longest

path

used

by

the sources,

7

is the number

of

sources shar-

ing

a most

congested link, and

E

is the

upper bound

on

all

-

4Ji(X8).

Theorem

1

Suppose assumptions

CI-C2

hold and

the

step

size

satisjes

0

<

y

<

2~/%%.

Then

starting

from

any

initial

rates

m

5

x(0)

5

M

and

prices

p(0)

2

0,

every

limit

point

(x*,p*) 

of

the

sequence

(x(t),p(t))

generated

by

the

algorithm are

primal-dual

optimal.

That

is,

x

*

gives

the

source

rates

that

maximize aggregate utility

and

p*

the

shadow bandwidth

prices.

Note

that

x*

is unique

but

p*

may

not be unique.

We

have

found from

our simulation experience that,

in

practice, a

step size

y

much

larger

than

the bound in the theorem can be

used,

e.g.,

in

the

simulation reported

below,

y

=

1.

More-

over

the

scaled algorithm seems

much

less sensitive to

y

than the unscaled

algorithm.

3.2

Simulation

results

We

now

present

simulation study carried out for the

net-

work in

Figure 1

shared

by

five

connections,

with

sources

Si

and

destinations

Di,

i

=

1,.

.

,

5.

Connection

SI-Dl

time

40s

S3

at time

OS,

S4

at time 120s,

S5

at time

160s.

Once

turned

on, sources S2,

S3,

and

S4

remained active

un-

til time

240s,

and

S5

turned

off

earliest at time 200s. This

enabled us to

observe the dynamic behaviour

of

the algo-

rithm

as demand for bandwidth varies. The

utility

functions

of

the sources

were

set

to

a,log(l

+

x,), 

with

a,

equal

to

1

x

lo4,

5

x

lo4,

7

x

lo4, 

6

x

lo4,

2

x

lo4

for sources

S

1 ,S2,S3,S4

and

S5

respectively.

Notice

that the

longest

connection

S1-Dl

was

set to

have

the smallest

marginal

utility.

The step size

y

used

to adjust the

link prices

was

set

to

1.

A new

link bandwidth price

was

calculated

every

1s. The target bandwidth was'set at 200

packets

per 1s

mea-

suring interval.

Figure 2

shows

the source rates for each source

under

the

unscaled

gradient projection algorithm.

From time

WOs,

only

source

S1

was

active. Its rate climbed steadily

to the

target

bandwidth

of

200

packetsh.

From time 40s, source

S2

became

active whose rate, after

an

initial overshoot, sta-

bilized to

about 167

packet&.

This squeezed source

Sl's

rate

to about

33

packetds.

At

these rates sources S1

and

S2

had

the same marginal

utility. At

times

OS,

120s,

and

160s

when

sources

S3,

S4,

S5

became-active, similar dynamics

were

observed.

Sl's

rate bounced

back to

200 packetds af-

ter

all other sources

had

turned

off.

Figure

3

shows the source rates under the scaled gradient

projection

algorithm. While

the

same

kind

of

interaction

among

the sources occurred as under

the unscaled

algo-

rithm,

we

see

that

the convergence

to

optimal rates

was

achieved much

faster under the scaled algorithm,

though

the

magnitude

of

rate fluctuation

was

also

much larger.

The

faster

convergence

rate implied less overloading of,

and

hence

much

less

buffer

requirement at, the links. Figure

4

shows

the

buffer

occupancy at each link under the

two

schemes. Figure

3

shows

the

source rates

under

the

scaled

gradient projection algorithm.

times

of

the other sources are staggered

with

S2

starting

at

Global

Telecommunications

Conference

-

Globecom'99

1267

High

Speed

Networks

--I

I

I.

I

Y)'mE-Imm

Figure

3:

Source

rates

under scaled gradient projection.

4

Conclusion

The

flow

control mechanism

of

[14,

161

is

derived

as a

gradient projection algorithm

to

solve a dual optimization

problem.

In

this

paper

we

have

extended the algorithm

to

a

scaled gradient projection, using a diagonal scaling

that

can

be

implemented

with

the same information as

that

is avail-

able under the basic algorithm

of

[14,

161.

We

have proved

the convergence

of

the algorithm and

have

presented sim-

ulation

results that illustrate its

superior performance com-

pared

to

the unscaled

algori

UD

:1

ax,

00

---

'-I'

IM

References

Im.

Figure

4:

Buffer

occupancy.

[l]

L.

Benmohamed and

S.

M.

Meerkov. Feedback

control

of

congestion

in

store-and-forward networks: the

case

of

a sin-

gle

congested node.

IEEWACM

Transactions

on

Network-

ing,

1(6):693-707,

December 1993.

[2]

Dimitri

P. Bertsekas and John

N.

Tsitsiklis.

Parallel

and

dis-

tributed computation.

Prentice-Hall, 1989.

[3]

Flavio Bonomi, Debasis Mitra, and

Judith

B.

Seery. Adaptive

algorithms

for

feedback-based

flow

control

in

high-speed

wide-area

ATM

networks.

IEEE Journal

on

Selected Areas

in

Communications,

13(7):

1267-1283,

September 1995.

[4]

S.

Chong,

R.

Nagarajan, and

Y.-T.

Wang.

Designing stable

ABR

flow

control with rate feedback and open

loop

control:

first order control case.

Performance

Evaluation,

34(4):

189-

206, December 1998.

[SI

Costas Courcoubetis, Vasilios A. Siris, and George

D.

Sta-

moulis. Integration

of

pricing and

flow

control

for

ABR

services

in

ATM

networks.

Proceedings

of

Globecom

'96,

November 1996.

[6]

S.

Floyd.

TCP

and Explicit Congestion Notification.

ACM

Computer Communication

Review,

24(5), October 1994.

[7]

S.

Floyd and

V.

Jacobson. Random early detection gateways

for congestion avoidance.

IEEELACM

Trans.

on

Networking,

1(4):397413,

August 1993.

[8]

R.

J. Gibbens and

F.

P. Kelly.

Resource pricing and

the

evo-

lution

of

congestion control.

Automatica,

35,

1999.

[9] Jamal Golestani and Supratik Bhattacharyya. End-to-end

congestion control

for

the Internet: A global optimization

framework.

In

Proceedings

of

International

Conj

on

Net-

work

Protocols

(ICNP),

October 1998.

[

101

V.

Jacobson. Congestion avoidance and control.

Proceedings

of

SIGCOMM'88,

ACM,

August 1988.

An

updated version

is available via

ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 

[

111 F.

P. Kelly.

Charging and rate control

for

elastic traffic.

Eu-

ropean

Transactions

on

Telecommunications,

8:33-37,

1997.

http://www.statslab.cam.ac.uk/hanWelastic.html. 

[

121

Frank

P. Kelly, Aman Maulloo, and David

Tan.

Rate control

for

communication networks: Shadow prices, proportional

fairness and stability.

Journal

of

Operations

Research Soci-

ety,

49(3):237-252,

March 1998.

[13] David

E.

Lapsley and Steven

H.

Low.

An

IP

Implementation

of Optimization Flow Control.

In

Proceedings

of

the

Globe-

com'98,

November

1998.

[14] David

E.

Lapsley and Steven

H.

Low. An

optimization ap-

proach

to

ABR control.

In

Proceedings

of

the

ICC,

June

1998.

[

151

David E. Lapsley and Steven

H.

Low.

Optimization

flow

con-

trol,

11:

REM (Random Early Marking) and enhancements.

To

be submitted for publication, 1999.

[16] David E. Lapsley and Steven

H.

Low.

Random Early Mark-

ing

for

Internet Congestion Control. In

Proceedings

of

IEEE

Globecom'99,

December 1999.

[17] Steven

H.

Low.

Optimization

flow

control with on-line mea-

surement.

In

Proceedings

of

the

ITC,

volume 16,

June

1999.

[

181

Steven

H.

Low and David E. Lapsley. Optimization

flow

con-

trol,

I:

basic algorithm and convergence.

IEEELACM

Tran-

sancations

on

Networking,

1999.

To

appear.

[

191

David

G.

Luenberger.

Linear and Nonlinear Programming,

2nd

Ed.

Addison-Wesley Publishing Company, 1984.

[20]

K.

Ramakrishnan and

S.

Floyd. A Proposal to add Explicit

Congestion Notification (ECN) to IP. Internet draft draft-

kksjf-ecn-01 .txt, July 1998.

1268

Global Telecommunications

Conference

-

Globecom'99