An Enhanced Random Early Marking Algorithm for Internet Flow Control - INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies

An

Enhanced

Random

Early

Marking

Algorithm for Internet

Flow

Control

Sanjeewa Athuraliya David Lapsley Steven

Low

Department

of

EEE,

University

of Melbourne,

Australia

{

sadsa,laplsey,slow

}

@ee.mu.oz.au

Abstruct-

We

propose

earlier

an

optimization

based 

flow

control

for

the

Internet

called

Random Early

Mark@

(REM).

In

this

paper

we

propose

and

evaluate

an

enhancement that

attempts

to

sped

up

the

convergence 

of

REM

in

the face

of

large

feedback delays.

REM

can

be

regarded

as

an

implementation of

an

optimization

algorithm

in

a

distributed

network.

The basic

idea

is

to treat

the

optimization

algo-

rithm

as

a

discrete 

time

system

and apply

linear

control

techniques

to

stabilize 

its

transient.

We

show

that

the

modified

algorithm

is

stable

globally 

and

converges

exponentially 

locally.

This 

algorithm 

translates

into

an

enhanced

REM

scheme and

we

illustrate

the

performance 

im-

provement

through

simulation. 

Keywordr-Internet

flow

control, pricing,

optimization

flow

control,

marking,

REM,

feedback

delay

I.

INTRODUCTION

We

proposed

earlier

a

flow

control scheme for the

Inter-

net

called Random Early Marking

(REM)

[

11.

It is derived

from 

an optimization model

where each source is

charac-

terized

by

a utility

function

that models

its

valuation

of

bandwidth

and

the goal is

to maximize aggregate

source

utility over their transmission

rates

subject to capacity con-

straints

[2]. 

131,

[4].

The

basic

flow 

control algorithm

can

be regarded as a distributed computation performed

by

the

sources

and

links

to

minimize

the

dual

problem.

The

al-

gorithm

however

requires

communication between

sources

and

links.

This communication requirement is greatly sim-

plified

in

[51,

[l]

and

leads

to

REM,

a

binary feedback

scheme similar to Random Early Detection

(RED)

[6].

The

purpose

of

this

paper is

to propose an enhancement to

REM

that

attempts to significantly speed

up

its

convergence

in the

face

of

large feedback delays.

The

value

of

the

optimization

model presented

in

[2],

[4]

is twofold.

First,

though it

may

not

be

possible,

nor

criti-

cal,

that

optimality

is exactly attained

in a real

network,

the

optimization framework offers a means

to

explicitly steer

the

entire

network towards a

desirable operating

point.

Sec- 

ond

it makes

possible

a systematic method to

design

and

refine practical

flow

control

schemes,

which

can be treated

simply

as

implementations

of

a certain

optimization

algo-

rithm, where modifications

to the

flow control mechanism

is guided

by

modifications

to

the optimization algorithm.

For

instance,

it is

well

known

that

Newton

algorithm has

The

first

two

authors

acknowledge

the

Australian

Commonwdth

GOV-

ernment

for

their

Australian

Postgraduate 

Awards,

and

the

last

author

ac-

knowledges

the

support

of

the

Australian

Research

Council 

under

grants 

S499705

and

A49930405

much

faster convergence than

gradient

projection algorithm.

By

replacing the

gradient

projection algorithm

presented

in

[Z], 

[4]

by

the

Newton

algorithm

we

derive

in

[7]

a practi-

cal

Newton-like

flow

control scheme

that can

be

proved

to

maintain optimality,

has the same

communication

require-

ment

as the original scheme

but

enjoys

a much

better

con-

vergence property.

This

paper provides another example

on

how

the

optimization

framework can be exploited

to

sys-

tematically refine

REM. 

There is

a tremendous

literature

on

flow

control. The

works

closest

to

this paper are

optimization

based

[8],

[9],

[lo],

[ll],

1121,

[13],

[21,

[SI, 

[l]

where

the

problem

is

formulated

as

one

of

optimizing

a social

welfare

and

the

flow

control mechanisms

are

derived as solutions

to the op-

timization problem. They differ

in

their choice of

objec-

tive

functions or their solution

approaches,

and

result in

rather different

flow

control

mechanisms

to

be

implemented

at the sources

and

the

network

links.

In particular

both

[

1

I],

[

121

and our

work

solve

the

same

optimization

problem

of

maximizing

aggregate

utility

over

source

transmission

rates.

The

two works

however

differ

in

their

solution approach,

which lead

to different

algorithms

and their

implementation

through

marking

[13],

[I].

See

[4]

for

a detailed compari-

son.

The

paper is

structured as

follows.

In

Section

I1 we

sum-

marize our

optimization

model and the

REM

algorithm.

In

Section

In

we

extend the model to include

feedback

de-

lay

and

derive the enhanced algorithm. In

Section

IV we

present preliminary simulation

results

to illustrate

the

per-

formance improvement.

We

conclude

in

Section

V

with

fu-

ture

work.

All

proofs are

omitted

and

can

be

found

in

a

forthcoming

full

paper.

11.

OPTIMIZATION 

MODEL

AND

REM

Consider a network

that consists

of

a

set

L

=

{

1,

.

. .

,

L}

of

unidirectional

links

of

capacity

q.

1

E

L.

The

network

is

shared by a

set

S

=

(1,.

.

,

S)

of

sources.

Source

s

is characterized by

four

parameters

(L(s),U,,m,,

M8).

The

path

L(s)

C

L

is a subset

of

links that source

s

uses,

U,

:

R+

3

!R

is

a utility

function,

m,

2

0

and

M,

2

00

are

the minimum and maximum transmission rates,

respec-

tively,

required by

source

s.

Source

s

attains a

utility

U,

(z,)

when

it transmits at

rate

2,

that satisfies

m,

5

zs

5

M,.

0-7803-5880-5/00/$10.00 

(c)

2000

IEEE

1425

IEEE

INFOCOM

2000

We

assume

U,

is increasing

and

strictly

concave

in its

argu-

ment. Let

1,

=

[m,,

MB]

denote the range

in which

source

rate

z8

must

lie

and

I

=

(I,,

s

E

S)

be

the

vector.

For

each

link

1

let

S(1)

=

{s

E

S

1

E

L(s))

be

the

set

of sources

that use link

1.

Note

that

1

E

L(s)

if and only

if

8

E

S(I).

Our objective

is to choose source

rates

z

=

(z,, 

s

E

S)

so

as

to:

p:

maX2.€l,,sES

CV,(za)

(1)

8

subject

to

z8

5

q,

I

=

1,.

.

,

L.(2)

SES(1)

The

constraint

(2)

says

that

the

total

source 

rate

at any

link

1

is less

than

the

capacity.

A

unique

maximizer,

called

the

primal optimal

solution, exists

since

the

objective function

is strictly

concave, and hence continuous,

and

the

feasible

solution set

is compact.

Though the objective function

is

separable

in

z8,

the

source

rates

z8

are coupled by the

constraint

(2).

Solv-

ing the

primal

problem

(1-2)

directly

requires coordination

among

possibly

all

sources

and

is impractical

in real

net-

works.

The

key to

a distributed

and

decentralized solution

is to

look

at its dual,

e.g.,

[

14,

Section

3.4.21,

[

151:

D:

min,>o

D~P)

=

c~,(Ps)

+CP~

(3)

8

1

where

ps

=

p1.

EL(,) 

(4)

(5)

The first term

of

the

dual objective function

D(p)

is decom-

posed

into

S

separable subproblems

(4-5).

If we

interpret

pi

as

the

price

per

unit bandwidth

at

link

1

then

ps

is

the

total

price

per

unit

bandwidth

for all links

in the path

of

S.

Hence

zapa

represents

the bandwidth

cost

to source

s

when

it transmits

at rate

E,,

and

B8(p8)

represents the maximum

benefit

s

can achieve

at

the

given

price

pa.

We

shall

see

that

this

scalar

pa

summarizes

all

the congestion

informa-

tion

source

s

needs

to

know.

A

source

s

can

be

induced

to

solve

maximization 

(4)

by

bandwidth charging.

For

each

p,

a unique maximizer,

denoted by

z@(p), 

exists

since

U8

is

strictly

concave.

~

In

general

(z8(p),

s

E

S)

may

not

be primal

optimal,

but

by

the

duality theory,

there

exists

a

p*

3

0

such

that

(z8@),

s

E

S)

is indeed primal optimal. Hence

we

will

solve the dual problem

(3).

Given minimizing

prices

p*

the

primal

optimal source

rates

z*

=

z(p”)

can

be

obtained by

0-7803-5880-5/00/$10.00

(c)

2000

IEEE

1426

individual

sources

s:~

U:-’@)

ifUi(M8)

L

P

I

~i(m,)

if

UL(m8)

<

p

(6)

(2

if

U;(M8)

>

p

Here

is

the inverse of

U:,

which

exists

over the range

[U:(M8),

U;(m8)]

since

U;

is

continuous

and

U,

strictly

concave.

It is illustrated in

Figure

1.

Let

z(p)

=

(

z8

(p)

,

s

E

X8(p)

=

S)

“a, 

Fig.

1.

Source

rate 

zs

(p)

as

a

function

of

(scalar)

price

p.

In

[2],

[4]

we propose

to

solve

the

dual

problem

using

the

gradient projection algorithm

that

leads

to the following

optimization

flow

control algorithm:

Al:

Basic

OFC

Pl(t

+

1)

=

kl(t)

4-

r(z’(t)

-

a>]+

zs(t)

=

28Ols(t))

Here

z‘(t)

:=

CsEs(l)

z8(t)

is

the aggregate

source

rate

at

link

I

at time

t,

and

[2]+

=

max{z,O}.

Hence a link

raises

or

reduces

its

price accordingly

as the

demand

2’ 

(t)

is

greater

or

less

than

the

supply

cr

of

bandwidth.

A

source 

raises

or

reduces

its

rate

accordingly

as

the

path

price

p8(t)

is low

or

high

(see

(6)).

It

is

shown

in

[4]

that

provided

all

utility functions

are

strictly

concave increasing

and

their

second

derivatives

are

bounded

away

from

zero,

the basic

OFC

(optimization

flow

control)

algorithm

A1

converges

to

yield

the

optimal rates

for

sufficiently

small

stepsize

7.

As

discussed there,

though

the optimization problem

is

formulated

as

a static problem

the

flow

control algorithm

naturally

adapts

to changing link

capacities

and

set

of sources

at

a

link.

simply

use

the current

link

capacity

q(t)

and

the

current

set

S(l;t)

of

sources

at

link

1.

Algorithm A1 requires communication

of

link prices

to

sources

and

source

rates

to

links,

and

hence cannot

be

im-

plemented

on the

Internet.

In

[SI

we

show

that a link can

We

abuse

notation

and

use

zs

(a)

both 

as

a

function

of

scalar

price

p

E

R+

and

of

vector

price 

p

E

WF’. 

When

p

is

a

scalar, 

zs(p)

is

given

by

(6). 

When

p

is

a

vector,

zs@) 

=

zS(ps) 

=

z.Y(C~~~(~)PI).

The

meaning 

should

be

clear

from 

the

context.

IEEE

INFOCOM

2000

simply set

its

price

to

a fraction

of

the

buffer

occupancy.

This is equivalent

to the link estimating the aggregate

source

rate

z’(t)

by

the

measured aggregate

input 

rate

$l(t)

at the

link and using this estimate

in

the calculation

of

the gra-

dient.

We

prove there that

this

approximate gradient

pro-

jection

algorithm

also converges

to

yield

the

optimal rates.

This simplification eliminates

the

need

for

explicit commu-

nication

from 

sources to

links.

In

the

reversed

direction

we

propose

a method

in

[l]

that

communicates link prices to

sources

using

only binary feedback.

The

basic idea

is for

a

link

to mark

a packet

with

a probability that is exponential

to

its link price

pr

(t)

so

that the end

to end marking probability

of a

packet

is exponential

to

the

path price

ps

(t)

.

A source

can then estimate

the

end

to

end marking probability and

hence

p8(t).

This can

be

implemented using

the

proposed

ECN

(Explicit Congestion Notification) bit

in the

IP

header

[16],

[17].

Combining these

two

simplifications yields the

REM

algorithm

of

[

11,

where the marking probability

is

ex-

ponential

in

buffer occupancy.

A2:

REM

Link 

Z’s

algorithm:

1.

Set

price

pl(t)

=

7bi(t)

where

bl(t)

is

the

(average)

buffer

occupancy

in period

t.

2.

For eachpacket

that

is

notmarked,

markit

with

theprob-

ability

Source

s’s 

algorithm:

1.

Count

the fraction

rizs

(t)

of

packets received in period

t

that

are

muked,

and

estimate

the

path

price

by:

a”@)

=

-

log2(1

-

??P(t))

2.

Choose

a

new transmission

rate

zs(t

+

1)

for

the

z,(t

+

1)

=

z8(lj8(t)).

111.

ENHANCED

ALGORITHM

The

model

and

algorithms

in the

last

section assume zero

feedback

delay.

In

this section

we

relax

this

assumption and

propose

an

enhanced algorithm.

Let

71,

and

r81

be the

(constant) delay from link

2

to

source

8

and

from

s

to

1,

respectively.

Let

2

=

maxp,t,,C~p~ 

+

~,1

I

s

E

S(2)

n

S(I’)}.

We

will

see

be-

low

that

d

is

the maximum time

for

a price change in link

I’

to

affect

the

price

at link

1

through 

shared sources

8.

In

particular

-

if delays

are

symmetric,

Tis

=

T~Z

for all

I,

s,

then

d

is the maximum round trip delay

of

the

network.

The

basic

OFC

algorithm

A1

becomes:

4-

7(

zs(t

-

781)

-

cr)

(7)

SES(1)

I’ 

Hence link

2

computes the next price

using

the

delayed

source rates

z,(t

-

~~1)

and

source

s

computes

a new

rate

using delayed prices

pl(t

-

71,).

It

is

important to

note that

the algorithm described by

(7-8)

does not require

the

links

and sources to know their delays

719,

~,1.

This

algorithm

is

a special case

of

the

asynchronous

model

considered

in

[4]

where

links

and sources update

every period using

the

most recent

data.

According

to

[4,

Theorem

21

the

algo-

rithm 

converges

to

yield

the

optimal source rates

provided

the

stepsize

7

is

small

enough.

Suppose there

is a unique dual optimal price

P*.~

This

would

be

the

case,

e.g., if

the utility functions are

U,

(5,)

=

a,

logz,

and every link

I

with

pf

>

0

has a

single-link

connection

s(l),

i.e.,

for

all

I,

there exists

s(l)

such

that

L(s(Z))

=

(1).

We

may assume without

loss

of gener-

ality

that

p;

>

0

for

all

1,

because links

1

with

p;

=

0

are

not

saturated in equilibrium and hence

can

be

omit-

ted

from consideration. Then provided

that

the

stepsize

is

sufficiently small

the

sequence

{p(t)}

generated

by

(7-8)

converges to

p*.

Moreover

for

all

sufficiently large

t,

we

have

pi(t)

>

0

for all

I,

and we can omit the

projection

operation

in price

computation

for

sufficiently

large

t.

As-

sume

also

that

U:-’(Ms)

=

0,

Ui-l(m,)

=

CO

so

that

5,

(p)

=

m(t+

1)

=

PI@) 

4-

Y(

zs(t

-

Tsl)

-

4

(9)

(p).

Then

the

system

is described

by:

S€S(I)

We

will

regard

(9-10)

as

a discrete time system

to be stabi-

lized, where

the

states

are

the

link prices.

For

the

rest

of

the

section we

first

consider the linearized

system

of

(9-10)

in the

neighbourhood

of

the unique

equi-

librium

p*.

Then

we

will

design

a deadbeat controller for

the linearized system

to

speed

up

its convergence.

This

con-

trol

law

however

requires

a link to

know

the entire

network

2The

simulation

in

Section

IV

also

shows

what

might

happen

when

dual

optimal

prices

are

nonunique:

link 

prices

may

oscillate 

between

two

dual

optimal

limit

points

while

source

rates 

converge

to

the

unique 

primal

opti-

mal

vector.

0-7803-5880-5/00/$10.00

(c)

2000

IEEE

142

7

IEEE

INFOCOM

2000

topology

and

is therefore impractical.

derive

an

approximate

control

law

that can

be

implemented

by

each

link using local

information. Finally

we

apply

the

control

law

to the

REM

algorithm.

Even though

the control

laws

are

derived

for a

linear

sys-

tem

around

the

equilibriump*

their

performance

in the

non-

linear setting

away

from

the equilibrium

is investigated

in

the

through simulation.

A.

Linearized

system

Let

p(t)

be

the

(2

+

l)L

dimensional

expanded ‘state’:

whereeachp(t-T),T

=

0,.

. .

,d,isaL-vector.

The(1,~)th

element

of

p(t)

is

pr(t

-

7).

Let

p*

be

the

(2

+

l)L

d’ imen-

sional vector

with

each

of

p(t

-

T),

7

=

0,

. . .

,

d,

replaced

by

the unique

equilibrium

p*

.

-

For

any

scalar

p

define

p8

(p)

by

1

=

-u;’(zs(p))

where

z(p) 

is given

by

(6).

Then after

some manipulations

we

have

wherep8(t)

=

(~IEL(s)p~(t

-

TI#)). 

Hencethefirstorder

term 

in the

Taylor

expansion of

z8

(t)

is

Vjj~SW)@(t)

-

P*)

-Pep'">

c

l’EL(8)

Linearizing

(9-30) 

around

the

equilibrium

p*

and

letting

$(t)

=

p(t)

-

p*

we

thus

have,

after rearranging,

$I(t

+

1)

=

?ji(t)

-

Y~bn~$v(t

-

71‘8

-

781)(11)

where

b1r1

=

CsEs(l)ns(l,)

pa@”).

The

last

equality uses

the

fact that

x1

(p*)

=

cr

by

complementary slackness, since

by

assumption

pf

>

0.

(1

1) makes the interdependence

of

link prices

apparent.

It says that the

new

price

pi(t

+

1)

at

link

1

depends not

only on the current price

at link

1,

but

also

on

past

prices

pp

(t

-

nt8

-

~~1)

at other links

1’.

This

1’

is because

in response

to

a price change

at link

l‘,

sources

s

E

S(1)

n

S(1’)

that traverse

both

links

I‘

and

I

adjust

their

rates,

which

then

affect the price

at link

1.

Hence there is

a

delay

of

Tits

+

r81

for

a price

update

at link

I’

to affect

the

price

at link

1

through

shared sources

S.

This

coupling

of

link prices

is described

by

blp

.

To

express

(1

1)

in matrix

form,

define the

L

x

L

matrices

A(T),

T

=

0,.

.

,

z,

by:

[A(7)111’

=

PSW)

l(7-1’8

+

781

=

7)

8cs(l)ns([I)

Then

we have,

defining

@(t)

=

E(t)

ij(t

-

1)

a

.

ij(t

-

This describes

the

local

behavior

of

the

gradient projec-

tion algorithm around

the (unique)

equilibrium

p*

when

the

feedback delays are nonzero.

Note

the difference between our

system

and

that

of

[18].

Their system

has

two

important features that

simplify sig-

nificantly

the controller design.

First

their control

(the

ex-

plicit rates)

are

calculated

at

the

network link

where

the

current as

well

as past

buffer levels

are

available. Second,

all sources receive the

same

explicit rate

(single congestion

node

case), except possibly

with

different

feedback

delay,

and

these past rates are also available

at the

link

for

calcu-

lation.

These

allow

a simple

proportional-plus-derivative 

controller

and

lead

to the simple close

loop equations

(38)

and

(39)

in their paper.

In our case, current

source rates

are

not available

at a link;

moreover

a link

may

not

know

the

value

of

+rsl 

and

hence

may

not

be

able

to use

past

source

rates in price adjustment (except the

most

recent one).

On

the

other hand

a link

always

has

the

most current price,

so

we

consider

a controller that

uses only past

values

of local

prices.

B.

Deadbeat

controller

Suppose

in

computing a

new

price

each link

1

averages

its

past

link prices

and

diagonally scale

the gradient:

0-7803-5880-5/00/$10.00

(c)

2000

IEEE

1428

IEEE INFOCOM

2000

Note

that

link

1

uses only the

most

recent source rates

and

this

does

not

require

it to know

the

value

of

r81.

The

goal

is

to

compute the averaging parameters

pi

(T)

and the scaling

parameter

71

in order to place

the

poles of

the

discrete time

system

(12)

at the origin,

i.e.,

deadbeat control

law.

In

equi- 

librium both sides

of

(12)

must

be

p*

and

hence

p~

(T)

must

satisfy (since

CsEs(r)

2:

=

cr

in equilibrium)

-

d

Cpr(T)

=

1,

fordl

z

(13)

T=o

Following

similar derivation as in the last subsection,

the

linearized system around

p*

is

(

14)

where

M(T)

=

diag(pr(T)), 

T

=

0

,...,a,

and

G

=

diag(y1)

are

L

x

L

diagonal matrices.

The

following result

explains

how

to

choose the averaging matrices

M(T)

and

the scaling matrix

G.

For simplicity

of

exposition

we

as-

sume

a

large network where

all

delays

are

nonzero,

qa

2

1,

T,~

2

1

(otherwise,

condition

(15)

below 

is

replaced

by

det(X1-

M(0)

-t-

GA(0))

=

XL).

Theorem

1:

Suppose all delays

are

nonzero. Then

all

poles

of

(14)

are at the origin

if and only

if

M(T)

and

G

are chosen

to

satisfy

M(0)

=

0

(15)

-

det(M(.r)

-

GA(7))

=

0,

T

=

0,.

.

,

d

(16)

-

d

CM(7)

=

I

(17)

T=o

Moreover

the above conditions uniquely determine the

di-

m

Interestingly the theorem says that

if all delays

are

nonzero

then

current prices

m(t)

should not

be

used

in the

weighted

average

of past

prices.

Note

that

the

above theorem ensures rapid convergence

only around the unique equilibrium point.

p*.

In

order

to

choose

the averaging parameters

pi

(7)

to place all

poles

at

the origin,

we

have

given up

the

choice

of

stepsize

7

(which

now

becomes

a matrix).

This

loss

of

freedom

may

upset

global

stability.

This

is indeed observed

in

our

simulation,

presented

in

the

next section.

agonal matrices

M(T)

and

G.

C.

Appmxinlate

deadbeat

controller

Theorem

1

describes

how

to

choose the

averaging matri-

ces

M(T)

and the gain matrix

G

to enforce

rapid conver-

gence

towards

the equilibrium

p*

when

the

system

is

in a

neighborhood

of

p'.

However

the

equations

(15-16)

in the

theorem can only

be

solved centrally,

as the off-diagonal el-

ements

of the matrices

A(T)

imply that

a link

needs

to know

information on sources

at other links, and hence

is

imprac-

tical.

In

this subsection

we

derive

an

approximation

to the

control

law

of

the

last section that can

be

implemented

by

individual links

locally.

The

idea is

to approximate the

ma-

trices

A(T)

by

their diagonal

terms.

Let

B(7)

=

diag(P'(T),1 

E

L)

be

the

L

x

L

diagonal

matrices whose diagonal elements

are

In

words

the

Ith

diagonal elements

P~(T)

of

B(7)

are the

sums

of

(f),

sum

over all sources

s

that

traverse link

I

and have

a

round trip delay

of

7.

With

this

approximation

there

is

enough degree

of

freedom, 

provided

2

2,

that

we

can reduce the

gain 

matrix

G

to

an

arbitrary

scalar

-y

>

0.

This means that

we

can

choose

7

to

be

sufficiently small

to

ensure global convergence; contrast this

with

the

exact

deadbeat control law

of

the

last subsection.

We

will

see

in

the

next section that

the

performance

of

this controller

can

be

better

than 

that

of

the

exact deadbeat controller

whose

stepsize

y

is

fixed 

by

the

condition

(13).

The

conditions corresponding

to

(15-17)

in

Theorem

I

reduce to the following choice

of

weights

pr

(7):

and

for all other

I

and

T,

p1(~)

are

chosen to

satisfy

d

r=O

This control law

is much simpler than

that expressed

in The-

orem

1:

a link

I

can choose

its

own weights

pl(~)

based

on

the information about sources that traverse link

1;

we

will

come

back

to this point

in

Section

V.

With

this approximate deadbeat controller the

corre-

0-7803-5880-5/00/$10.00

(c)

2000

IEEE

142

9

IEEE

INFOCOM

2000