2002.08908.pdf

arXiv:2002.08908v1 [cs.PF] 20 Feb 2020

Asymptotically Optimal Load Balancing in Large-scale Hete

rogeneous

Systems with Multiple Dispatchers

Xingyu Zhou

Department of ECE

The Ohio State University

zhou.2055@osu.edu

Ness Shroff

Department of ECE and CSE

The Ohio State University

shroff.11@osu.edu

Adam Wierman

Department of Computing and Mathematical Sciences

California Institute of Technology

adamw@caltech.edu

Abstract

We consider the load balancing problem in large-scale heter

ogeneous systems with multiple dispatchers.

We introduce a general framework called Local-Estimation-

Driven (LED). Under this framework, each dis-

patcher keeps local (possibly outdated) estimates of queue

lengths for all the servers, and the dispatching

decision is made purely based on these local estimates. The l

ocal estimates are updated via infrequent

communications between dispatchers and servers. We derive

sufficient conditions for LED policies to achieve

throughput optimality and delay optimality in heavy-traffic

, respectively. These conditions directly imply

delay optimality for many previous local-memory based poli

cies in heavy traffic. Moreover, the results en-

able us to design new delay optimal policies for heterogeneo

us systems with multiple dispatchers. Finally,

the heavy-traffic delay optimality of the LED framework direc

tly resolves a recent open problem on how to

design optimal load balancing schemes using delayed inform

ation.

1 Introduction

Load balancing, which is responsible for dispatching jobs on parallel s

ervers, has attracted significant interest

in recent years. This is motivated by the challenges associated with e

fficiently dispatching jobs in large-scale

data centers and cloud applications, which are rapidly increasing in siz

e. A good load balancing policy not

only ensures high throughput by maximizing server utilization, but imp

roves the user experience by minimizing

delay.

There have been numerous load balancing policies proposed in the liter

ature. The most straightforward one

is Join-Shortest-Queue (JSQ), which has been shown to enjoy opt

imal delay in both non-asymptotic (for homo-

geneous servers) and asymptotic regimes [22, 5, 4]. However, it is d

ifficult to implement in today’s large-scale

data centers due to the large message overhead between the disp

atcher and servers. As a result, alternative

load balancing policies with low message overhead have been proposed

. For example, the Power-of-

policy [12]

has been shown to achieve optimal average delay in heavy traffic with

only 2

messages per arrival [10]. An-

other common load balancing policy is the pull-based Join-Idle-Queue (

JIQ) [9, 16], which has been shown to

outperform the Power-of-

policy using less overhead. However, both Power-of-

and JIQ mainly achieve good

performance for systems with homogeneous servers. Recently,

some works consider heterogeneous servers and

propose flexible and low message overhead policies that achieve optim

al delay in heavy traffic [29, 27]. How-

ever, only a single dispatcher is considered in these works. Theoret

ical analysis of load balancing with multiple

dispatchers has mainly focused on the JIQ policy so far [13, 17], which

has a poor performance in heavy traffic

and is even generally unstable for heterogeneous systems [29].

Note that heterogeneous systems with multiple dispatchers are no

w almost the default scenarios in today’s

cloud infrastructures. On one hand, the heterogeneity comes fr

om the usage of multiple generations of CPUs and

various types of devices [6]. On the other hand, with the massive amo

unt of data, a scalable cloud infrastructure

needs multiple dispatchers to increase both throughput and robus

tness [15].

Motivated by this, a recent work [1] proposes a new framework nam

ed Loosely-Shortest-Queue (LSQ) for

designing load balancing policies for heterogeneous systems with mult

iple dispatchers. In particular, under this

framework, each dispatcher keeps its own, local, and possibly outd

ated view of each server’s queue length. Upon

arrival, each dispatcher routes to the server with shortest local

view. A small amount of message overhead is

used to update the local view. The authors successfully establish s

ufficient conditions on the update scheme for

the system to be stable. Moreover, extensive simulations were con

ducted to show that LSQ policies significantly

outperform well-known low-communication policies while using similar com

munication overhead in both het-

erogeneous and homogeneous cases. However, no theoretical g

uarantees on the delay performance are provided.

It is worth noting that the key challenge for establishing a delay perf

ormance guarantee for this framework is

that it only uses possibly outdated local information to dispatch job

s. In fact, the problem of designing delay

optimal load balancing schemes that only have access to delayed info

rmation has recently been listed as an open

problem in [8].

Inspired by this, in this paper, we are particularly interested in the f

ollowing questions:

Is is possible to

establish delay performance guarantees for load balancing

in heterogeneous systems with multiple dispatchers? If

so, can these guarantees be achieved using only delayed info

rmation?

Contributions.

To answer the questions above, we propose a general framework

of load balancing for

heterogeneous systems with multiple dispatchers that uses only de

layed (out-of-date) information about the

system state. We call this framework Local-Estimation-Driven (LE

D) and it generalizes the LSQ framework.

Our main results provide sufficient conditions for LED policies to be bot

h throughput optimal and delay optimal

in heavy-traffic. Our key contributions can be summarized as follows

First, we introduce the LED framework for designing load balancing p

olicies for heterogeneous systems with

multiple dispatchers. In this framework, each dispatcher keeps its

own local estimates of queue lengths for all

the servers, and makes its dispatching decision based purely on its o

wn local estimates according to a certain

dispatching strategy. The local estimates are updated infrequen

tly via an update strategy that is based on

communications between dispatchers and servers.

Second, we derive sufficient conditions for LED policies to be through

put optimal and delay optimal in heavy-

traffic. The importance of the sufficient conditions is three-fold: (i)

It can be shown that previous local-memory

based policies (e.g., LSQ) satisfy our sufficient conditions. As a result

, we are able to show that they are not

only throughput optimal (in a stronger sense) but also delay optima

l in heavy-traffic. (ii) The conditions allow

us to design new delay optimal load balancing policies with zero dispatch

ing delay and low message overhead

that work for heterogeneous servers and multiple dispatchers. (

iii) These conditions also provide us with a

systematic approach for generalizing previous optimal policies to th

e case of multiple dispatchers and exploring

the trade-off between memory (i.e., local estimations) and message

overhead.

Third, the LED framework also resolves the open problem posed in [8],

which asks how to design heavy-

traffic delay optimal policies that only use delayed information. Our ma

in results for LED policies not only

demonstrate that it is possible to achieve optimal delay in heavy-tra

ffic via only delayed information, but

highlight conditions on the extent to which old information is useful. Mo

reover, they provide methods for using

the delayed information to achieve optimality in heavy traffic. Intere

stingly, the LED framework also shows

that, in the case of multiple dispatchers, inaccurate information ca

n actually lead to improved performance.

To establish the main results, we need to address the following two ke

y challenges. First, each dispatcher in

our model has access to delayed and outdated system information

. Moreover, each dispatcher does not know

the arrivals to servers from other dispatchers, since there is no c

ommunication between them. As a result, for

throughput optimality, we have to carefully design our Lyapunov fu

nction, since the local estimates can also be

unbounded. For delay optimality, we consider two queueing systems

: a local-estimation system and the actual

system. Then, we have to transfer the negative drift on the local-

estimation system to the actual system, which

requires establishing new bounds and the analysis of sample paths.

Related work.

The study of efficient load balancing algorithms has been a hot topic fo

r a long time and

spans across different asymptotic regimes. The most extensively in

vestigated policy might be Join-Shortest-

Queue (JSQ), under which the incoming jobs are always sent to the s

erver with the shortest queue length. JSQ

has been shown to be optimal in a stochastic order sense in the non-

asymptotic regime for arbitrary arrival and

non-decreasing failure rate identical service processes [22, 23]. I

n the heavy-traffic asymptotic regime, in which

the normalized load approaches one and the number of servers is fix

ed, JSQ has been proved to achieve optimal

delay even for heterogeneous servers using both diffusion approx

imations in [5] and the recently proposed drift-

based method [4]. However, the optimality of JSQ comes at the cost o

f a large amount of communication between

dispatchers and servers, which is particularly undesirable for large

-scale data centers. Thus, some popular low-

message overhead alternative policies have been proposed, e.g., Po

wer-of-

and Join-Idle-Queue (JIQ). Under

Power-of-

, the dispatcher only needs to sample

≥

2 servers and sends arrivals to the server with the shortest

queue length among the

samples. This simple policy has been shown to enjoy a doubly exponent

ial decay

rate in response time in the large-system asymptotic regime [12] and

achieve optimal delay in heavy-traffic for

homogeneous servers [3, 10]. Another low-message overhead polic

y is JIQ (or Pull-based policy) [9, 16], under

which arrivals are sent to one of the idle servers, if there are any, a

nd to a randomly selected server otherwise.

Compared to JSQ and Power-of-

, JIQ has the nice property of zero dispatching delay since each arr

ival can

be instantaneously routed rather than waiting for feedback from

servers. Moreover, JIQ has been shown to

outperform Power-of-

with even smaller message overhead (at most one per job). In part

icular, under JIQ,

arriving jobs achieve asymptotic zero waiting time in the large-syste

m regime while Power-of-

does not. An even

stronger result suggests that, in the Halfin-Whitt asymptotic reg

ime, JIQ achieves the same delay performance

as JSQ [14]. Nevertheless, the performance of JIQ drops substan

tially in heavy traffic with a finite number of

servers, even for homogeneous servers. In fact, it is not heavy

-traffic delay optimal in this case [29]. Motivated

by this, recent works have proposed alternative pull-based policies

that not only enjoy all the nice features of

JIQ but also achieve optimal delay in heavy-traffic [29, 27]. However,

these studies only consider the case of

single dispatcher.

Compared to the large literature on the single dispatcher case, the

re are only a few works for the scenario of

multiple dispatchers, and they mainly focus on the JIQ policy. In part

icular, [13] presents a new large-system

asymptotic analysis of JIQ without the simplifying assumptions in [9]. T

he property of asymptotically zero

waiting time of JIQ was generalized to the case of multiple dispatchers

in [17]. However, the results for JIQ

in [9, 13, 17] all assume that the loads at various dispatchers are st

rictly equal. Without this assumption, [19]

shows that the waiting time under JIQ no longer vanishes in the large-

system regime and two enhanced JIQ

schemes are proposed. As mentioned earlier, although JIQ is a scala

ble choice for the multiple-dispatcher case,

it is not delay optimal in heavy traffic for homogeneous servers and n

ot even generally stable for heterogeneous

systems.

The case of heterogeneous systems with multiple dispatchers has r

eceived very little attention from the

theoretical community so far. To the best of our knowledge, the f

ramework proposed in [1] is the first attempt

to study efficient load balancing schemes with a theoretical guarant

ee for the scenario of heterogeneous systems

with multiple dispatchers. In particular, under the proposed Loose

ly-Shortest-Queue (LSQ) framework, each

dispatcher independently keeps its own local view of sever queue len

gths and routes jobs to the shortest among

them. Communication is used only to update the local views and make s

ure that they are not too far from the

real queue lengths. The main contributions of [1] are the sufficient c

onditions for any LSQ policy to achieve

strong stability with low message overhead. Additionally, extensive s

imulations have been used to demonstrate

its appeal. Nevertheless, a theoretical guarantee on the delay pe

rformance of LSQ policies remains an important

unsolved question.

It is worth pointing out that the idea of using local memory to hold pos

sibly old information for load

balancing was also explored in two recent works [2, 20]. As we discuss

later, these two proposed policies are

in our LED framework. Both works only consider a single dispatcher a

nd homogeneous servers, which is also

a special case of our model. Further, their analysis focuses on the

large-system asymptotic regime where the

number of servers goes to infinity, while our analysis deals with a finite

number of servers.

2 System Model and Preliminaries

This section describes the system model and assumptions consider

ed in this paper. Then, several necessary

preliminaries are presented.

2.1 System model

We consider a discrete-time (i.e., time-slotted) load balancing system

consisting of

dispatchers and

possibly-

heterogeneous servers. Each server maintains an infinite capacit

y FIFO queue. At each dispatcher, there is a

local memory, through which the dispatcher can have some (possib

ly delayed) information about the system

states. In each time-slot, the central dispatcher routes the ne

w incoming tasks to one of the servers, immediately

upon arrival. Once a task joins a queue, it remains in that queue until

its service is completed. Each server is

assumed to be work conserving, i.e., a server is idle if and only if its corr

esponding queue is empty.

2.1.1 Arrivals

Let

(

) denote the number of exogenous tasks that arrive at dispatche

at the beginning of time-slot

. We

assume that

(

) =

∑

(

) is an integer-valued random variable, which is

i.i.d.

across time-slots. The

mean and variance of

(

) are denoted by

and

, respectively. We further assume that there is a positive

probability that

(

) is zero. The allocation of total arriving tasks among the

dispatchers is allowed to use

any arbitrary policy that is independent of system states. Note th

at, in contrast to previous works on multiple

dispatchers [9, 13, 17], we do not require that the loads at all dispat

chers are equal. We assume that there is

a strictly positive probability for tasks to arrive at each dispatcher

at any time-slot

. That is, there exists a

strictly positive constant

such that

(

)

≥

∀

(

m, t

)

∈M×

(1)

where

{

, . . . , M

}

. Moreover, we assume that

(

) is

i.i.d

across time-slots with mean arrival rate

denoted by

. We further let

(

) denote the number of new arrivals at server

from dispatcher

at the

beginning of time-slot

. Let

(

) =

∑

(

) be the total number of arriving tasks at server

at the

beginning of time-slot

2.1.2 Service

Let

(

) denote the amount of service that server

offers for queue

in time-slot

. That is,

(

) is the

maximum number of tasks that can be completed by server

at time-slot

. We assume that

(

) is an

integer-valued random variable, which is

i.i.d.

across time-slots. We also assume that

(

) is independent

across different servers as well as the arrival process. The mean

and variance of

(

) are denoted as

and

, respectively. Let

and

denote the mean and variance of the hypothetical total

service process

(

)

∑

(

). Let

−

characterize the distance between the arrival rate and the

boundary of capacity region.

2.1.3 Queue Dynamics

Let

(

) be the queue length of server

at the beginning of time slot

. Let

(

) denote the number of tasks

routed to queue

at the beginning of time-slot

according to the dispatching decision. Then the evolution of

the length of queue

is given by

(

+ 1) =

(

) +

(

)

−

(

) +

(

)

, n

= 1

, . . . , N,

(2)

where

(

) = max

{

(

)

−

(

)

−

(

)

}

is the unused service due to an empty queue.

We do not assume any specific distribution for arrival and service pr

ocesses. Moreover, in contrast to previous

works [29, 4], we do not require that both arrival and service proce

sses have a finite support. Instead, we only

need the condition that their distributions are light-tailed. More spe

cifically, we assume that

[

(

)

]

≤

and

[

(

)

]

≤

(3)

for each

where the constants

∞

and

∞

are all independent of

2.2 Local-Estimation-Driven (LED) framework

We are interested in the case that the local memory at each dispatc

her

stores an estimate of the queue

length for each server

. In particular, we let

(

) be the local estimate of the queue length for server

from

dispatcher

at the beginning of time-slot

(before any arrivals and departures). More specifically, we introd

uce

the following framework for load balancing.

Definition 1.

A Local-Estimation-Driven (LED) policy is composed of the f

ollowing components:

(a)

Dispatching strategy:

At the beginning of each time-slot, each dispatcher

chooses one of the servers

for new arrivals purely based on its local estimates (i.e., l

ocal queue length estimates

)

(b)

Update strategy:

At the end of each time-slot, each dispatcher would possibly

update its local estimates,

e.g., synchronize local queue length estimate with the true

queue length.

The definition of LED is broad, and it includes a variety of classical load

balancing policies. For example, it

can be seen to include LSQ policy studied in [1], by choosing the dispatc

hing strategy to be that new arrivals

at each dispatcher are dispatched to the queue with the shortest

local estimate. Moreover, it also includes two

recent local memory based policies in [2, 20] that are developed for t

he case of single dispatcher and homogeneous

servers.

To study LED, we model the system as a discrete-time Markov chain

{

(

) = (

(

)

, m

(

))

, t

≥

}

with state

space

, using the queue length vector

(

) together with the memory state

(

)

(

)

(

)

, . . . ,

(

)).

We consider a set of load balancing systems

{

(

)

(

)

, t

≥

}

parameterized by

such that the mean arrival rate of

the total exogenous arrival process

{

(

)

(

)

, t

≥

}

(

)

−

. Note that the parameter

characterizes the

distance between the arrival rate and the boundary of the capac

ity region. We are interested in the throughput

performance and the steady-state delay performance in the hea

vy-traffic regime under any LED policy.

A load balancing system is stable if the Markov chain

{

(

)

, t

≥

}

is positive recurrent, and

{

}

denotes the random vector whose distribution is the same as the st

eady-state distribution of

{

(

)

, t

≥

}

. We

have the following definition.

Definition 2

(Throughput Optimality)

A load balancing policy is said to be throughput optimal if fo

r any

arrival rate within the capacity region, i.e., for any

ǫ >

, the system is positive recurrent and all the moments

∥

(

)

∥

are finite.

Note that this is a stronger definition of throughput optimality than

that in [1, 21, 25] because, besides the

positive recurrence, it also requires all the moments to be finite in st

eady state for any arrival rate within the

capacity region.

To characterize the steady-state average delay performance in

the heavy-traffic regime when

approaches

zero, by Little’s law, it is sufficient to focus on the summation of all the

queue lengths. First, recall the following

fundamental lower bound on the expected sum queue lengths in a loa

d balancing system under any throughput

optimal policy [4]. Note that this result was originally proved with the as

sumption of finite support on the

service process (Lemma 5 in [4]), which can be generalized to service p

rocesses with light-tailed distributions

with a careful analysis of the unused service, see our proof of Lem

ma 6.

Lemma 1.

Given any throughput optimal policy and assuming that

(

)

converges to a constant

decreases to zero, then

lim inf

↓

[

∑

(

)

]

≥

(4)

where

The right-hand-side of Eq. (4) is the heavy-traffic limit of a hypothe

sized single-server system with arrival

process

(

)

(

) and service process

∑

(

) for all

≥

0. This hypothetical single-server queueing system is

often called the

resource-pooled system

. Since a task cannot be moved from one queue to another in the load

balancing system, it is easy to see that the expected sum queue leng

ths of the load balancing system is larger

than the expected queue length in the resource-pooled system. H

owever, if a policy achieves the lower bound in

Eq. (4) in the heavy-traffic limit, based on Little’s law this policy achieve

s the minimum average delay of the

system in steady-state, and thus said to be heavy-traffic delay op

timal, see [4, 10, 21, 24, 25, 29].

Definition 3

(Heavy-traffic Delay Optimality in Steady-state)

A load balancing scheme is said to be heavy-

traffic delay optimal in steady-state if the steady-state que

ue length vector

(

)

satisfies

lim sup

↓

[

∑

(

)

]

≤

where

is defined in Lemma 1.

2.3 Dispatching Preference

In order to provide a unified way to specify the dispatching strateg

y in LED, we first introduce a concept

called

dispatching preference

. In particular, let

(

) be the probability that new arrivals at dispatcher

are

dispatched to server

at time-slot

. We define

(

)

(

)

−

, which is the difference in probability that

server

will be chosen under a particular dispatching strategy and random r

outing (weighted by service rate).

Then, we have the following definition.

Definition 4

(Dispatching preference)

Fix a dispatcher

, let

(

)

be a permutation of

, . . . , N

)

that

satisfies

(1)

(

)

≤

(2)

(

)

≤

. . .

≤

(

)

(

)

The dispatching preference at dispatcher

is a

-dimensional vector denoted by

∆

(

)

, the

th component of

which is given by

∆

(

)

(

)

In words, the dispatching preference at a dispatcher

specifies how servers with different local estimates

are preferred in a unified way such that it is independent of the actu

al values of local estimates. It only depends

on the relative order of local estimates. More specifically, fix a dispa

tcher

, by definition we can see that

weighted random routing strategy has no preference for any ser

vers and ∆

(

) = 0 for any

. On the other

hand, if new arrivals are always dispatched to the server with the sh

ortest local estimate (e.g, LSQ policy), we

have ∆

(

)

0 and ∆

(

)

0 for all 2

≤

. Thus, we can see that a positive value for ∆

(

) means that

the dispatching strategy has a preference for the server with th

th shortest local estimation. This observation

directly motivates the following two definitions.

Definition 5

(Tilted dispatching strategy)

A dispatching strategy adopted at dispatcher

is said to be tilted

if there exists a

∈{

, . . . N

}

such that for all

∆

(

)

≥

for all

≤

and

∆

(

)

≤

for all

≥

Definition 6

(

-tilted dispatching strategy)

A dispatching strategy adopted at dispatcher

is said to be

-tilted

if for all

(i) it is a tilted dispatching strategy and (ii) there exists

a positive constant

such that

∆

(

)

≥

and

∆

(

)

≤−

Remark 1.

Note that similar definitions were first provided in [29] for t

he case of a single dispatcher with

up-to-date information. Based on these definitions, sufficie

nt conditions were presented for throughput and

heavy-traffic optimality. However, these conditions cannot

be directly applied to our model due to the following

two major challenges. One is that, in our model, each dispatc

her only has access to outdated information. The

other is that each dispatcher has no idea of the arrivals at th

e servers coming from other dispatchers, since there

is no communication between them. To handle these challenge

s, we have to develop new techniques.

We end this section by providing intuitions behind the two definitions. T

o start, it can be seen easily

that

∑

∆

(

) = 0 for all

and

via the definition of dispatching preference. Roughly speaking, a tilt

dispatching strategy means that compared to (weighted) random

routing (which does not have any preference),

the probabilities of choosing servers with shorter local estimates (

the first

shortest ones) are increased, and,

as a result, the probabilities of choosing servers with longer local es

timates are reduced. This is the reason

why we call it tilted, since more preference is given to queues with sho

rter local estimates. Therefore, a tilted

dispatching strategy can be viewed as a strategy that is as least as

‘good’ as (weighted) random routing. On the

other hand, a

-tilted dispatching strategy can be viewed as a strategy that is str

ictly better than (weighted)

random routing. The reason is that, besides the fact that it is tilted

, it also requires that there is a strictly

positive preference of the server with the shortest local estimat

ion.

3 Main Results

In this section, we first present the sufficient conditions for LED po

licies to be throughput optimal and heavy-

traffic delay optimal. Then, we explore several example policies within L

ED framework to demonstrate its

flexibility in designing new load balancing schemes.

3.1 Sufficient Conditions

Let us begin with the sufficient conditions for LED policies to be throug

hput optimal. In particular, we specify

conditions for the dispatching strategy and update strategy tha

t guarantee throughput optimality.

To state the theorem, we need the following notation. Let

(

) be an indicator function which equals 1 if

and only if the local estimate of server

’s queue length at dispatcher

gets updated, i.e., the estimated queue

length

(

) is set to the actual queue length

(

) at the end of time-slot

Theorem 1.

Consider an LED policy. Suppose the dispatching strategy at

each dispatcher is tilted and the

update strategy can guarantee the condition that there exis

ts a positive constant

such that

[

(

)

(

) =

]

≥

(5)

holds for all

and

(

m, n, t

)

∈M×N ×

. Then, this policy is throughput optimal, i.e., the system u

nder this

policy is positive recurrent with all the moments being boun

ded for any

ǫ >

Proof.

See Section 5.1

Note that this theorem directly implies that LSQ is not only strongly st

able but also enables the system

to have all the moments bounded in steady-state. Moreover, it su

ggests that any dispatching strategy that is

as good as (weighted) random routing is sufficient to guarantee thr

oughput optimality. Further, the update

probability can be a function of the traffic load.

Now, we turn to presenting the sufficient conditions for LED policies t

o be delay optimal in heavy traffic. In

order to achieve delay optimality, we need stronger conditions on bo

th the dispatching strategy and the update

strategy.

Theorem 2.

Consider an LED policy. Suppose the dispatching strategy at

each dispatcher is

-tilted with a

uniform lower bound

δ >

being independent of

. Suppose the update strategy can guarantee that there exist

a positive constant

(independent of

) such that

[

(

)

(

) =

]

≥

(6)

holds for all

and

(

m, n, t

)

∈M×N ×

. Then, this policy is heavy-traffic delay optimal.

Proof.

See Section 5.2

This theorem not only establishes a delay performance guarantee f

or many previous local-memory based

policies (e.g., LSQ in [1], low-message policies in [2, 20]), but provides us

with the flexibility to design new

delay optimal load balancing for different scenarios with heterogene

ous servers and multiple dispatchers, as

discussed in the next section. More importantly, our results direct

ly suggest that it is possible to use only

delayed information to achieve delay optimality, which resolves one of

the open problems listed in [8].

High-level proof idea.

We end this section by providing drift-based intuitions behind the tec

hnical proofs.

In particular, let us consider two queueing systems: a local-estimat

ion system and the actual system (i.e., queue

lengths at servers). For throughput optimality, it requires the ac

tual system to have a drift towards the origin.

First, by the definition of a tilted dispatching strategy, it provides a

n equivalent drift on the local-estimation

system that is towards the origin. Then, the condition on the updat

e strategy guarantees that the local-

estimation system is not too far away from the actual system. Hen

ce, the actual system also has a drift towards

the origin. The heavy-traffic delay optimality not only requires a drift

towards the origin, but also needs a drift

towards the line that all the queue lengths are equal. First, by the d

efinition of a

-tilted dispatching strategy,

there is a drift towards the line that all the local estimates are equa

l within a given dispatcher. Then, by the

condition for the update strategy, the drift on the local-estimatio

n system can be transfered to a drift on the

actual system, and hence delay optimality. Note that, in the curre

nt proof, in order to make this ‘drift-transfer’

process valid, we impose the condition that both

and

are independent of

, which is not necessarily required

and both of them could possibly be a particular function of

as in [26]. This relaxation could be an interesting

future research direction.

3.2 Examples

To illustrate the applications of Theorems 1 and 2, in this section, we in

troduce examples of LED policies

that are both throughput optimal and heavy-traffic delay optimal.

The flexibility provided by our sufficient

conditions not only allows us to include previous policies as special case

s, but enables us to design new flexible

policies.

Let us first introduce some typical

-tilted dispatching strategies.

Example 1

(Local–Join-Shortest-Queue (L-JSQ))

At the beginning of each time-slot

, the dispatcher for-

wards its arrivals to the server with the shortest local esti

mation with ties broken arbitrarily. That is, consider

dispatcher

, the chosen server is

∗

∈

arg min

{

}

This dispatching strategy is the same as that in the LSQ policy in [1]. By t

he definition of dispatching

preference, we can see that under L-JSQ, ∆

(

) = 1

−

(1)

/μ

0 and ∆

(

) =

−

(

)

/μ

0. Hence, it

-tilted even for heterogeneous servers with

min

/μ

where

min

= min

Instead of always joining the server with the shortest local estima

te, it is also possible to join a sever whose

queue length is below a threshold while satisfying the condition of

-tilted dispatching preference.

Example 2

(Local–Join-Below-Average (L-JBA))

At the beginning of each time-slot

, the dispatcher forwards

its arrivals to a randomly chosen server whose local estimat

e is below or equal to the average local queue length

estimation. That is, consider dispatcher

with the average local estimate being

(

) =

∑

(

)

. Let

{

(

)

≤

(

)

}

. Then, for each

∈A

(

) =

∑

∈A

, and for

i /

∈A

(

) = 0

It can be easily shown from the definition that L-JBA is also

-tilted. Note that, compared to L-JSQ, in the

heterogeneous case, it needs the dispatcher to know the service

rate of each server, which can be easily obtained

by the update strategies introduced next. This strategy is more fl

exible than L-JSQ since it does not require

new arrivals to be only sent to the server with the shortest local es

timate, which could be used in the scenarios

with data locality. Moreover, some randomness in the dispatching st

rategy is also useful, as discussed in the

next section.

Further, it is possible to generalize many previous heavy-traffic dela

y optimal policies into the LED frame-

work. For example, we can directly apply the Power-of-

policy as our dispatching strategy.

Example 3

(Local–Power-of-

(L-Pod))

At the beginning of each time-slot

, the dispatcher randomly chooses

≥

servers and sends arrivals to the server that has the shortes

t local estimation among the

servers.

It can be easily shown that L-Pod is tilted for homogeneous servers

. Moreover, for a given

, we have

∆

(

) =

−

and ∆

(

) =

−

, and hence it is

-tilted with

Now, let us turn to discussing update strategies that satisfy the c

ondition in Theorem 2. In particular,

the update strategy can either be push-based (dispatcher samp

les servers) or pull-based (servers report to

dispatchers).

Definition 7

(Push-Update)

If there are new arrivals, then at the end of the time-slot the

dispatcher

samples

distinct servers with a positive probability

. Then, it updates the corresponding

local estimations with the

true values.

It has been shown in [1] that even for

= 1, the push-update strategy is guaranteed to satisfy the cond

ition

in Theorem 2.

Definition 8

(Pull-Update)

At the end of each time-slot, for each server

if there are completed tasks, then

the server will uniformly at random pick a dispatcher

and then abide by one of the following two rules:

•

If the server becomes idle (i.e., no tasks), it sends

(

to dispatcher

•

If not, it sends

(

n, Q

)

to dispatcher

with probability

It has been shown in [1] that for any ˆ

p >

0, the pull-update strategy is guaranteed to satisfy the condition

in Theorem 2.

Now, having introduced both the dispatching strategy and the upd

ate strategy, we can combine them to

obtain different LED policies that are delay optimal in heavy-traffic. F

or example, we have L-JSQ-Push, L-JSQ-

Pull, L-JBA-Push, L-JBA-Pull for heterogeneous servers, as we

ll as L-Pod-Push and L-Pod-Pull for homogeneous

servers.

We end this section by summarizing the contributions of the LED fram

ework. (i)

It covers previous

polices.

L-JSQ-Push (with ˆ

= 1) and L-JSQ-Pull are the same as LSQ policies considered in [1], whic

include the policies developed in both [2] and [20] as special cases. Thu

s, by Theorems 1 and 2, all these policies

are throughput and heavy-traffic delay optimal. (ii)

It allows randomness in dispatching.

The randomness

introduced in L-JBA and L-Pod is helpful when dealing with the scenar

io with an extreme low budget on the

message overhead, as discussed next. (iii)

It enables trade-offs between memory and message overhead.

For example, L-Pod-Push and L-Pod-Pull represent good example

s that trade memory for low message overhead.

That is, if each dispatcher directly uses the traditional Power-of-

without any memory, then at least 4 messages

needed to guarantee delay optimality in heavy-traffic. In contrast

, in both L-Pod-Push and L-Pod-Pull, the

worst-case

message overhead is just 1 per arrival. In addition, the message ca

n be further reduced by choosing

a smaller value of ˆ

in the update strategy.

4 Discussion

Before moving to the proofs, we would like to discuss key features a

nd insights about LED, and point out

possible refinements on LED.

4.1 Key features of LED

In this section, we highlight the key features of the LED framework

, including low message overhead, zero

dispatching delay, low computational complexity and appealing perfo

rmance across various loads.

Low message overhead.

It should be noted that the communication overhead occurs only du

ring the

update phase in LED policies. For the push-update strategy, the n

umber of messages per arrival is at most 2

(

can even be one). For the pull-update strategy, the number of me

ssages per arrival is at most 1. In contrast,

JSQ needs 2

messages per arrival and Power-of-

needs at least 4 messages per arrival. Although JIQ has a

comparative worst-case message overhead as LED policies, it is not

stable for heterogeneous servers.

Zero dispatching delay.

Another key feature of all LED policies is that there is zero dispatch

ing delay.

That is, the dispatcher can immediately route its new arrivals to the c

hosen server since the decision is made

purely based on its local estimations. Moreover, the communication

between dispatchers and servers happens

only after the decision is made. This is in contrast to typical push-ba

sed policies like JSQ and Power-of-

, under

which the dispatcher has to wait for the response of sampled serve

rs to make its dispatching decision, resulting

in a non-zero dispatching delay.

Low computational complexity.

In order to implement LED policies, each dispatcher has to keep an

array of size

its local estimations. Such a space requirement is negligible in a modern

cluster. Further, the

operations required by dispatching strategies of LED policies are ve

ry efficient. For example, in order to find

the server with the minimal local estimate in L-JSQ, we can keep the a

rray in a min-heap data structure. For

L-JBA, we can calculate the average by using an efficient running ave

rage algorithm. For the simple L-Pod, it

only needs random number generators.

Appealing performance across loads.

Although the theoretical delay optimality for the LED frame-

work holds in the heavy-traffic asymptotic regime, the family of LED p

olicies includes efficient policies that

significantly outperform alternative low-message overhead policies

with the same (or even smaller) amount of

communications. For example, if the dispatching strategy adopts L

-JSQ in LED, then it reduces to the LSQ

policy proposed in [1], which appeals to enjoy good performance over

a wide range of traffic loads in different

scenarios via extensive simulations.

As mentioned earlier, the class of heavy-traffic delay optimal LED po

licies is broad and includes flexible

choices of different dispatching and update strategies based on diff

erent application scenarios. The actual delay

performance (except the heavy-load scenario) varies with the pa

rticular choice of dispatching strategy or update

strategy under different scenarios. Thus, it is not possible to pick o

ne particular LED policy that fits every

circumstance, which is also not the focus of this paper. Instead, it

would be useful to present some useful

insights about the LED framework, as presented in the following. Th

ese insights could serve as the guidance on

the choice or design of new LED policies.

4.2 Useful insights from LED

The main trait of the LED framework is that only local, possibly delayed

and inaccurate information, is used

for making the dispatching decision. In the following, we present two

useful insights about the use of inaccurate

delayed information for load balancing.

Inaccurate information can improve performance.

A big problem for load balancing with multiple

dispatchers is

herd behavior

, which means that arrivals at different dispatchers join the same se

rver. This often

leads to a poor delay performance in practice [18]. For example, JSQ u

sed in the case of multiple dispatchers

leads to a serious herd behavior since all the dispatchers will route a

rrivals to the single shortest queue. In

contrast, under the LED framework, each dispatcher may believe

that a different queue is the shortest according

to its own local estimates because these estimates are inaccurate

and delayed. Thus, jobs at different dispatchers

are sent to different queues that may not have the actual shorte

st length but still have relatively small queue

lengths. This intuition is illustrated by Fig. 1. In particular, we conside

r a set up with 10 dispatchers and

100 heterogeneous servers. All the LED policies are configured to

have the same average message overheads as

Power-of-2. It can be seen that the LED policies are not only stable

but achieve a much better performance

compared to JSQ, which suffers from the herd behavior in the multiple

-dispatcher case.

Randomness is useful for heavily-delayed information.

As mentioned earlier, the LED framework

provides us with the possibility of exploring load balancing with extreme

ly low message overhead by choosing

a small value ˆ

in the update strategy. As a result, the local information at each d

ispatcher will only be

updated after a long time interval. In this case, if a deterministic disp

atching strategy (e.g., L-JSQ) is adopted,

it would again incur herd behavior (even for a single dispatcher case)

since all the arrivals during the long

update interval will join the same queue. This is another motivation f

or considering L-JBA and L-Pod, which

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Load (

)

Mean response time (time slots)

JSQ

L-JSQ-Pull

L-JSQ-Push

L-JBA-Pull

L-JBA-Push

JIQ

Power-of-2

Figure 1: Inaccurate information could improve performance in mult

iple-dispatcher case.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Load (

)

200

400

600

800

1000

1200

Mean response time (time slots)

L-JSQ-Push

L-JBA-Push

L-Pod-Push

Figure 2: Randomness is useful for heavily-delayed information.

naturally introduce a certain level of randomness and hence help av

oid the herd behavior as suggested by [11].

To illustrate this insight, we consider a set up with 10 dispatchers and

100 homogeneous servers. We compare

the delay performance of L-JSQ-Push, L-Pod-Push and L-JBA-P

ush with the update probability set to ˆ

= 0

and

= 2. As shown in Fig. 2, both L-JBA-Push and L-Pod-Push outperfo

rms L-JSQ-Push, which suffers from

herd behavior because of heavily-delayed information.

4.3 Refinements on LED

Our main results suggest that there is a large class of heavy-traffic

delay optimal LED policies. On the one

hand, it provides us with flexibility to tailor our policy design for differen

t application scenarios with different

choices of dispatching and update strategies. On the other hand,

it also suggests the need for refinements on

LED beyond delay optimality in heavy-traffic. To this end, we introduc

e two possible directions for refinements.

Degree of queue imbalance.

As introduced in [28],

degree of queue imbalance

is a refined metric to further

distinguish heavy-traffic delay optimal policies. The idea is that, inste

ad of looking at the average queue length

(and hence average delay), the degree of queue imbalance measur

es the expected difference in queue lengths

among the servers. By following the proof of Proposition 5.6 in [28], w

e can establish that the degree of queue

imbalance of all heavy-traffic delay optimal LED policies is

(

). Thus, even though by Theorem 2, any

positive

and

are sufficient for delay optimality in heavy-traffic, a dispatching stra

tegy with smaller

or a

update strategy with a smaller

could affect the performance in practice.

Other asymptotic regimes.

In this paper, we focus on the heavy-traffic asymptotic regime whe

re the

number of servers is fixed and the load approaches one. As mention

ed before, there are also other asymptotic

regimes in the analysis of load balancing schemes. One possible directio

n is to extend the fluid-limit techniques

for the large-system regime in [20] to the case of multiple dispatche

rs and heterogeneous servers. Another

alternative regime is the many-server heavy-traffic regime (e.g., Ha

lfin-Whitt regime), which tends to keep a

balance between heavy-traffic regime and large-system regime. St

udying LED in such a regime is another

interesting direction for future work.

5 Proofs

In this paper, we extend the Lyapunov drift-based approach dev

eloped in [4] to allow for unbounded supports

of arrival and service processes. In particular, we replace the fin

iteness condition on the drift in [4] by a

stochastically dominated condition, as shown in (C2) in Lemma 2. As pr

oved in [7], this weaker condition,

combined with a negative drift condition, can still guarantee finite mo

ment bounds. Besides a weaker condition,

we also replace the one-step drift with a

-step drift. Formally, we use the following lemma to derive bounded

moments in steady state.

Lemma 2.

For an irreducible aperiodic and positive recurrent Markov

chain

{

(

)

, t

≥

}

over a countable

state space

, which converges in distribution to

, and suppose

X →

is a Lyapunov function. We

define the

time slot drift of

∆

(

)

[

(

))

−

(

))]

(

) =

)

where

(

)

is the indicator function. Suppose for some positive finite i

nteger

, the

time slot drift of

satisfies

the following conditions:

•

(C1) There exists an

η >

and a

κ <

∞

such that for any

= 1

, . . .

and for all

∈X

with

(

)

≥

[∆

(

)

(

) =

]

≤−

η.

•

(C2)

∆

(

)

|≺

for all

and all

∈X

, and

[

θW

]

is finite for some

θ >

Then

{

(

))

, t

≥

}

converges in distribution to a random variable

for which there exists a

∗

and

∗

∞

such that

[

∗

]

≤

∗

which directly implies that all the moments of

exist and are finite.

5.1 Proof of Theorem 1

To start with, let us first show that the Markov chain

{

(

) = (

(

)

, m

(

))

, t

≥

}

with

(

)

(

)

(

)

, . . . ,

(

))

is irreducible and aperiodic. Let the initial state be

(0) = (

(0)

, m

(0)) = (0

) and the state space

consists of all the states that can be reached from the initial stat

e. Consider any state

, the queue length

vector

can reach the initial state with a positive probability since the event t

hat there are no exogenous ar-

rivals and all the offered service is at least one during each time-slot h

appens with positive probability under our

assumptions. Moreover, under the condition for the update stra

tegy given by Eq. (5), the event that

remains

as the initial state while all

reach to the initial state happens with a positive probability. Theref

ore, any

state in the state space can reach the initial state, and hence the

Markov chain is irreducible. The aperiodicity

of the Markov chain comes from the fact that the transition proba

bility from the initial state to itself is positive.

In order to show positive recurrence, we adopt the Foster-Lyap

unov theorem. In particular, we consider the

following Lyapunov function

(

)) =

(

)

∑

∥

(

)

−

(

)

∥

, and in the rest of the proof we use

(

) as an abbreviation of

(

)) Let

(

)

(

)

−

(

)

. The conditional mean drift of

(

) defined

(

))

[

(

)

−

(

)

(

)] can be decomposed as follows

(

)) =

(

) +

∑

(

)

(7)

where

(

)

[

(

)

−k

(

)

(

)

]

(

)

[

(

)

−

(

)

(

)]

Let us first consider the tern

(

). Note that for all

and

[

(

+ 1)

(

) =

]

≤

[(1

−I

(

)) (

(

) +

(

) +

(

))

(

) =

]

(

)

≤

−

)

(

) +

max

(8)

where (a) follows from the condition in Eq. (5) and

max

= max

. Then, we have (the time reference

dropped for simplicity)

(

)

[

−

∑

(

+ 1)

−

(

)

(

) =

]

−

∑

[

(

+ 1)

−

(

)

(

)]

]

(

)

≤

−

∑

[

−

(

) +

max

]

≤−

(

) +

max

(9)

where (a) follows from Eq. (8). Let us turn to consider the term

(

). By the queue dynamics in Eq. (2),

(

)

[

−

∑

(

+ 1)

−k

(

)

(

) =

]

[

−

∑

(

) +

(

)

−

(

) +

(

)

−k

(

)

]

(

)

≤

[

−

∑

(

) +

(

)

−

(

)

−k

(

)

]

[

−

∑

(

)

(

)

−

(

)

(

)

−

(

)

]

(

)

≤

[

−

∑

(

)

(

)

−

(

)

]

(10)

where (a) follows from the facts that

(

) +

(

)

−

(

) +

(

) = max(

(

) +

(

)

−

(

)

0) for any

≥

0, and (max(

0))

≤

for any

∈

; (b) holds by our assumption of light-tailed distributions for the

total arrival process and each service process in Eq. (3). In par

ticular, we have that the second moments for

total arrival process and service process of each server are fin

ite (independent of

), and hence there exists a

finite upper bound

which is independent of the load parameter

Now, let us continue to work on Eq. (10). In particular, we have

[

−

∑

(

)

(

)

−

(

)

(

) =

]

−

∑

[

(

)

(

)

−

(

)

(

)]

(

) =

]

−

∑

[

(

)

(

)

(

)]

(

) =

]

(11)

−

∑

[

∑

(

)

(

) =

]

(12)

For Eq. (11), we have

−

∑

[

(

)

(

)

(

)]

(

) =

]

−

∑

[

∑

(

)

∑

[

(

)

(

)]

]

−

∑

[

∑

(

)

∑

(

)

(

) =

]

(

)

−

∑

[

∑

(

)

∑

(

) +

)

(

) =

]

where (a) follows from the definition of

(

). Then, it can be further simplified as follows.

−

∑

[

(

)

(

)

(

)]

(

) =

]

(

)

−

∑

[

∑

(

)

∑

(

)

]

−

∑

[

∑

(

)

∑

(

−

)

]

−

∑

[

∑

(

)

∑

(

)

]

−

∑

[

∑

(

)

]

−

∑

[

∑

(

)

ǫμ

]

(13)

where in (a),

is the probability that arrivals are allocated to dispatcher

(or it can be viewed as the fraction

of the total arrivals that are allocated to dispatcher

Combining Eqs. (11), (12) and (13), yields

[

−

∑

(

)

(

)

−

(

)

(

) =

]

−

∑

[

∑

(

)

∑

(

)

(

) =

]

−

∑

[

∑

(

)

ǫμ

(

) =

]

−

∑

[

∑

(

)

−

(

) +

(

)

(

)

]

−

∑

[

∑

(

)

ǫμ

]

−

∑

[

∑

(

)

−

(

)

(

)

]

︸

︷︷

︸

−

∑

[

∑

(

)

(

)

]

︸

︷︷

︸

−

∑

[

∑

(

)

ǫμ

]

︸

︷︷

︸

We are going to handle each term one by one. To upper bound

, we use the following result on

(

) =

(

)

−

(

)

Lemma 3.

Under the condition given by Eq.

(5)

, for any

and

(

)

, there exists a finite

independent of

and a finite constant

that is only a function of

and

, such that for all

≥

[

−

∑

(

)

(

) =

]

≤

holds for all

and

Proof.

See Appendix A.

By using Lemma 3 with

≥

, we have

≤

−

∑

[

∑

∣

(

)

−

(

)

∣

]

≤

M N LT.

(14)

For

, we have

(

)

−

∑

[

∑

(

)

(

)∆

(

)

]

(

)

≤

(15)

where (a) comes from the definition of dispatching preference vec

tor ∆

(

); (b) holds since dispatching preference

is tilted and

(1)

(

)

≤

(2)

(

)

≤

. . .

≤

(

)

(

For

, we have

≥

ǫμ

min

(

)

(16)

where

min

= min

Now, combining Eqs. (14), (15) and (16), yields

[

−

∑

(

)

(

)

−

(

)

(

) =

]

≤−

ǫμ

min

(

)

M N LT.

Substituting the result above back into Eq. (10), yields

(

)

≤−

ǫμ

min

(

)

+ 2

M N LT

KT.

(17)

Now, we are ready to substitute Eq. (9) and Eq. (17) back into Eq.

(7). As a result, we have

(

))

≤−

ǫμ

min

(

)

−

∑

(

)

+ 2

M N LT

max

(

)

≤ −

(

)

∑

(

)

−

(

)

where in (a)

= min(2

ǫμ

min

, p

) and

M N LT

max

. Pick any

α >

0 and let

{

∈Z

(

)

∑

(

)

−

(

)

|≤

}

Then,

is a finite subset. For any

∈B

(

)

≤−

, and for any

∈B

(

)

≤

. By Foster-Lyapunov

theorem, we have established positive recurrence.

Having shown that the Markov chain

{

(

)

, t

≥

}

is ergodic, we are left with the task of showing that all the

moments are finite in steady-state. In order to do so, we use Lemm

a 2. In particular, we choose the Lyapunov

function as

(

)

) =

∥

(

)

∥

and then verify the two conditions. In the following, the superscrip

(

)

will be

omitted for ease of notations. To verify condition (C2), we have

∆

(

)

(

)

k−k

(

)

k|I

(

) =

)

(

)

≤ k

(

)

−

(

)

(

) =

)

≤

−

∑

(

+ 1)

−

(

)

(

) =

)

≤

−

∑

(

)

−

(

) +

(

)

(

) =

)

(

)

≤

−

∑

(

)

+ 2

(

)

(

) =

)

(18)

where (a) holds since

k−k

k|≤k

−

for each

. (b) follows from triangle inequality and the

fact that

(

)

≤

(

) for all

and

. Then, by our assumptions of light-tailed distributions for both tot

arrival and service processes, there exists a random variable

such that

∆

(

)

|≺

for all

and all

∈X

and

[

θW

]

is finite for some

θ >

0, which verifies (C2).

For (C1), we have

[∆

(

)

(

) =

]

[

(

)

k−k

(

)

(

) =

]

[

√

(

)

−

√

(

)

(

) =

]

(

)

≤

(

)

[

(

)

−k

(

)

(

) =

]

(

)

≤ −

min

M N LT

(

)

where (a) follows from the fact that

(

) =

√

is concave; (b) comes from Eq. (17). Thus, condition (C1) is

valid and hence the proof of Theorem 1 is complete.

5.2 Proof of Theorem 2

In order to prove the result, we need two intermediate results. On

e is called

state-space collapse

as stated in

Proposition 1, which is the key ingredient for establishing heavy traffi

c delay optimality. Roughly speaking, it

means that the multi-dimension space for the queue length vector r

educes to one dimension in the sense that

the deviation from the line (on which all the queue lengths are equal) is

bounded by a constant, independent of

. Another intermediate result is concerned with unused service. Ba

sed on these two intermediate results, we

can prove heavy-traffic delay optimality. We omit the time reference

for simplicity when necessary.

Proposition 1.

Under the conditions in Theorem 2, then we have that

⊥

is bounded in the sense that in

steady state there exists finite constants

{

, r

∈

}

independent of

such that

[

∥

(

)

⊥

∥

]

≤

for all

∈

, ǫ

)

and

∈

, where

⊥

−h

is the perpendicular component of

with respect to the

line

√

, . . . ,

Proof.

It suffices to show that

⊥

(

)

∥

(

)

⊥

∥

satisfies the conditions (C1) and (C2) in Lemma 2. Let us

first consider conditions (C2). In particular, we have

∆

⊥

(

)

⊥

(

)

k−k

⊥

(

)

k|I

(

) =

)

(

)

≤ k

⊥

(

)

−

⊥

(

)

(

) =

)

∥

(

)

−

(

)

−

(

) +

(

)

∥

(

) =

)

(

)

≤k

(

)

−

(

)

∥

(

)

−

(

)

∥

(

) =

)

(

)

≤

(

)

−

(

)

(

) =

)

(

)

≤

−

∑

(

)

+ 2

(

)

(

) =

)

(19)

where the inequality (a) follows from the fact that

k−k

k|≤k

−

holds for any

∈

; inequality

(b) follows from triangle inequality; (c) holds due to the non-expans

ive property of projection to a convex set;

(d) follows from Eq. (18). Then by our assumptions of light-tailed dis

tributions for both total arrival and service

processes, there exists a random variable

such that

∆

⊥

(

)

|≺

for all

and all

∈X

, and

[

θW

]

is finite for some

θ >

0, which verifies (C2).

Let us turn to condition (C1). By the proof of Lemma 3.6 in [29], it suffi

ces to establish the following result

in order to verify (C1). That is, there exists

T >

≥

0 and

η >

0 that are all independent of

, such that

for all

and

∈Z

[

−

∑

⊥

(

)

(

)

−

(

)

(

) =

]

≤−

⊥

(20)