arXiv:2002.08908v1 [cs.PF] 20 Feb 2020
Asymptotically Optimal Load Balancing in Large-scale Hete
rogeneous
Systems with Multiple Dispatchers
Xingyu Zhou
Department of ECE
The Ohio State University
zhou.2055@osu.edu
Ness Shroff
Department of ECE and CSE
The Ohio State University
shroff.11@osu.edu
Adam Wierman
Department of Computing and Mathematical Sciences
California Institute of Technology
adamw@caltech.edu
Abstract
We consider the load balancing problem in large-scale heter
ogeneous systems with multiple dispatchers.
We introduce a general framework called Local-Estimation-
Driven (LED). Under this framework, each dis-
patcher keeps local (possibly outdated) estimates of queue
lengths for all the servers, and the dispatching
decision is made purely based on these local estimates. The l
ocal estimates are updated via infrequent
communications between dispatchers and servers. We derive
sufficient conditions for LED policies to achieve
throughput optimality and delay optimality in heavy-traffic
, respectively. These conditions directly imply
delay optimality for many previous local-memory based poli
cies in heavy traffic. Moreover, the results en-
able us to design new delay optimal policies for heterogeneo
us systems with multiple dispatchers. Finally,
the heavy-traffic delay optimality of the LED framework direc
tly resolves a recent open problem on how to
design optimal load balancing schemes using delayed inform
ation.
1 Introduction
Load balancing, which is responsible for dispatching jobs on parallel s
ervers, has attracted significant interest
in recent years. This is motivated by the challenges associated with e
fficiently dispatching jobs in large-scale
data centers and cloud applications, which are rapidly increasing in siz
e. A good load balancing policy not
only ensures high throughput by maximizing server utilization, but imp
roves the user experience by minimizing
delay.
There have been numerous load balancing policies proposed in the liter
ature. The most straightforward one
is Join-Shortest-Queue (JSQ), which has been shown to enjoy opt
imal delay in both non-asymptotic (for homo-
geneous servers) and asymptotic regimes [22, 5, 4]. However, it is d
ifficult to implement in today’s large-scale
data centers due to the large message overhead between the disp
atcher and servers. As a result, alternative
load balancing policies with low message overhead have been proposed
. For example, the Power-of-
d
policy [12]
has been shown to achieve optimal average delay in heavy traffic with
only 2
d
messages per arrival [10]. An-
other common load balancing policy is the pull-based Join-Idle-Queue (
JIQ) [9, 16], which has been shown to
outperform the Power-of-
d
policy using less overhead. However, both Power-of-
d
and JIQ mainly achieve good
performance for systems with homogeneous servers. Recently,
some works consider heterogeneous servers and
propose flexible and low message overhead policies that achieve optim
al delay in heavy traffic [29, 27]. How-
ever, only a single dispatcher is considered in these works. Theoret
ical analysis of load balancing with multiple
dispatchers has mainly focused on the JIQ policy so far [13, 17], which
has a poor performance in heavy traffic
and is even generally unstable for heterogeneous systems [29].
Note that heterogeneous systems with multiple dispatchers are no
w almost the default scenarios in today’s
cloud infrastructures. On one hand, the heterogeneity comes fr
om the usage of multiple generations of CPUs and
various types of devices [6]. On the other hand, with the massive amo
unt of data, a scalable cloud infrastructure
needs multiple dispatchers to increase both throughput and robus
tness [15].
1
Motivated by this, a recent work [1] proposes a new framework nam
ed Loosely-Shortest-Queue (LSQ) for
designing load balancing policies for heterogeneous systems with mult
iple dispatchers. In particular, under this
framework, each dispatcher keeps its own, local, and possibly outd
ated view of each server’s queue length. Upon
arrival, each dispatcher routes to the server with shortest local
view. A small amount of message overhead is
used to update the local view. The authors successfully establish s
ufficient conditions on the update scheme for
the system to be stable. Moreover, extensive simulations were con
ducted to show that LSQ policies significantly
outperform well-known low-communication policies while using similar com
munication overhead in both het-
erogeneous and homogeneous cases. However, no theoretical g
uarantees on the delay performance are provided.
It is worth noting that the key challenge for establishing a delay perf
ormance guarantee for this framework is
that it only uses possibly outdated local information to dispatch job
s. In fact, the problem of designing delay
optimal load balancing schemes that only have access to delayed info
rmation has recently been listed as an open
problem in [8].
Inspired by this, in this paper, we are particularly interested in the f
ollowing questions:
Is is possible to
establish delay performance guarantees for load balancing
in heterogeneous systems with multiple dispatchers? If
so, can these guarantees be achieved using only delayed info
rmation?
Contributions.
To answer the questions above, we propose a general framework
of load balancing for
heterogeneous systems with multiple dispatchers that uses only de
layed (out-of-date) information about the
system state. We call this framework Local-Estimation-Driven (LE
D) and it generalizes the LSQ framework.
Our main results provide sufficient conditions for LED policies to be bot
h throughput optimal and delay optimal
in heavy-traffic. Our key contributions can be summarized as follows
.
First, we introduce the LED framework for designing load balancing p
olicies for heterogeneous systems with
multiple dispatchers. In this framework, each dispatcher keeps its
own local estimates of queue lengths for all
the servers, and makes its dispatching decision based purely on its o
wn local estimates according to a certain
dispatching strategy. The local estimates are updated infrequen
tly via an update strategy that is based on
communications between dispatchers and servers.
Second, we derive sufficient conditions for LED policies to be through
put optimal and delay optimal in heavy-
traffic. The importance of the sufficient conditions is three-fold: (i)
It can be shown that previous local-memory
based policies (e.g., LSQ) satisfy our sufficient conditions. As a result
, we are able to show that they are not
only throughput optimal (in a stronger sense) but also delay optima
l in heavy-traffic. (ii) The conditions allow
us to design new delay optimal load balancing policies with zero dispatch
ing delay and low message overhead
that work for heterogeneous servers and multiple dispatchers. (
iii) These conditions also provide us with a
systematic approach for generalizing previous optimal policies to th
e case of multiple dispatchers and exploring
the trade-off between memory (i.e., local estimations) and message
overhead.
Third, the LED framework also resolves the open problem posed in [8],
which asks how to design heavy-
traffic delay optimal policies that only use delayed information. Our ma
in results for LED policies not only
demonstrate that it is possible to achieve optimal delay in heavy-tra
ffic via only delayed information, but
highlight conditions on the extent to which old information is useful. Mo
reover, they provide methods for using
the delayed information to achieve optimality in heavy traffic. Intere
stingly, the LED framework also shows
that, in the case of multiple dispatchers, inaccurate information ca
n actually lead to improved performance.
To establish the main results, we need to address the following two ke
y challenges. First, each dispatcher in
our model has access to delayed and outdated system information
. Moreover, each dispatcher does not know
the arrivals to servers from other dispatchers, since there is no c
ommunication between them. As a result, for
throughput optimality, we have to carefully design our Lyapunov fu
nction, since the local estimates can also be
unbounded. For delay optimality, we consider two queueing systems
: a local-estimation system and the actual
system. Then, we have to transfer the negative drift on the local-
estimation system to the actual system, which
requires establishing new bounds and the analysis of sample paths.
Related work.
The study of efficient load balancing algorithms has been a hot topic fo
r a long time and
spans across different asymptotic regimes. The most extensively in
vestigated policy might be Join-Shortest-
Queue (JSQ), under which the incoming jobs are always sent to the s
erver with the shortest queue length. JSQ
has been shown to be optimal in a stochastic order sense in the non-
asymptotic regime for arbitrary arrival and
non-decreasing failure rate identical service processes [22, 23]. I
n the heavy-traffic asymptotic regime, in which
the normalized load approaches one and the number of servers is fix
ed, JSQ has been proved to achieve optimal
delay even for heterogeneous servers using both diffusion approx
imations in [5] and the recently proposed drift-
based method [4]. However, the optimality of JSQ comes at the cost o
f a large amount of communication between
dispatchers and servers, which is particularly undesirable for large
-scale data centers. Thus, some popular low-
message overhead alternative policies have been proposed, e.g., Po
wer-of-
d
and Join-Idle-Queue (JIQ). Under
2
Power-of-
d
, the dispatcher only needs to sample
d
≥
2 servers and sends arrivals to the server with the shortest
queue length among the
d
samples. This simple policy has been shown to enjoy a doubly exponent
ial decay
rate in response time in the large-system asymptotic regime [12] and
achieve optimal delay in heavy-traffic for
homogeneous servers [3, 10]. Another low-message overhead polic
y is JIQ (or Pull-based policy) [9, 16], under
which arrivals are sent to one of the idle servers, if there are any, a
nd to a randomly selected server otherwise.
Compared to JSQ and Power-of-
d
, JIQ has the nice property of zero dispatching delay since each arr
ival can
be instantaneously routed rather than waiting for feedback from
servers. Moreover, JIQ has been shown to
outperform Power-of-
d
with even smaller message overhead (at most one per job). In part
icular, under JIQ,
arriving jobs achieve asymptotic zero waiting time in the large-syste
m regime while Power-of-
d
does not. An even
stronger result suggests that, in the Halfin-Whitt asymptotic reg
ime, JIQ achieves the same delay performance
as JSQ [14]. Nevertheless, the performance of JIQ drops substan
tially in heavy traffic with a finite number of
servers, even for homogeneous servers. In fact, it is not heavy
-traffic delay optimal in this case [29]. Motivated
by this, recent works have proposed alternative pull-based policies
that not only enjoy all the nice features of
JIQ but also achieve optimal delay in heavy-traffic [29, 27]. However,
these studies only consider the case of
single dispatcher.
Compared to the large literature on the single dispatcher case, the
re are only a few works for the scenario of
multiple dispatchers, and they mainly focus on the JIQ policy. In part
icular, [13] presents a new large-system
asymptotic analysis of JIQ without the simplifying assumptions in [9]. T
he property of asymptotically zero
waiting time of JIQ was generalized to the case of multiple dispatchers
in [17]. However, the results for JIQ
in [9, 13, 17] all assume that the loads at various dispatchers are st
rictly equal. Without this assumption, [19]
shows that the waiting time under JIQ no longer vanishes in the large-
system regime and two enhanced JIQ
schemes are proposed. As mentioned earlier, although JIQ is a scala
ble choice for the multiple-dispatcher case,
it is not delay optimal in heavy traffic for homogeneous servers and n
ot even generally stable for heterogeneous
systems.
The case of heterogeneous systems with multiple dispatchers has r
eceived very little attention from the
theoretical community so far. To the best of our knowledge, the f
ramework proposed in [1] is the first attempt
to study efficient load balancing schemes with a theoretical guarant
ee for the scenario of heterogeneous systems
with multiple dispatchers. In particular, under the proposed Loose
ly-Shortest-Queue (LSQ) framework, each
dispatcher independently keeps its own local view of sever queue len
gths and routes jobs to the shortest among
them. Communication is used only to update the local views and make s
ure that they are not too far from the
real queue lengths. The main contributions of [1] are the sufficient c
onditions for any LSQ policy to achieve
strong stability with low message overhead. Additionally, extensive s
imulations have been used to demonstrate
its appeal. Nevertheless, a theoretical guarantee on the delay pe
rformance of LSQ policies remains an important
unsolved question.
It is worth pointing out that the idea of using local memory to hold pos
sibly old information for load
balancing was also explored in two recent works [2, 20]. As we discuss
later, these two proposed policies are
in our LED framework. Both works only consider a single dispatcher a
nd homogeneous servers, which is also
a special case of our model. Further, their analysis focuses on the
large-system asymptotic regime where the
number of servers goes to infinity, while our analysis deals with a finite
number of servers.
2 System Model and Preliminaries
This section describes the system model and assumptions consider
ed in this paper. Then, several necessary
preliminaries are presented.
2.1 System model
We consider a discrete-time (i.e., time-slotted) load balancing system
consisting of
M
dispatchers and
N
possibly-
heterogeneous servers. Each server maintains an infinite capacit
y FIFO queue. At each dispatcher, there is a
local memory, through which the dispatcher can have some (possib
ly delayed) information about the system
states. In each time-slot, the central dispatcher routes the ne
w incoming tasks to one of the servers, immediately
upon arrival. Once a task joins a queue, it remains in that queue until
its service is completed. Each server is
assumed to be work conserving, i.e., a server is idle if and only if its corr
esponding queue is empty.
3
2.1.1 Arrivals
Let
A
m
(
t
) denote the number of exogenous tasks that arrive at dispatche
r
m
at the beginning of time-slot
t
. We
assume that
A
Σ
(
t
) =
∑
M
m
=1
A
m
(
t
) is an integer-valued random variable, which is
i.i.d.
across time-slots. The
mean and variance of
A
Σ
(
t
) are denoted by
λ
Σ
and
σ
2
Σ
, respectively. We further assume that there is a positive
probability that
A
Σ
(
t
) is zero. The allocation of total arriving tasks among the
M
dispatchers is allowed to use
any arbitrary policy that is independent of system states. Note th
at, in contrast to previous works on multiple
dispatchers [9, 13, 17], we do not require that the loads at all dispat
chers are equal. We assume that there is
a strictly positive probability for tasks to arrive at each dispatcher
at any time-slot
t
. That is, there exists a
strictly positive constant
p
0
such that
P
(
A
m
(
t
)
>
0)
≥
p
0
,
∀
(
m, t
)
∈M×
N
,
(1)
where
M
=
{
1
,
2
, . . . , M
}
. Moreover, we assume that
A
m
(
t
) is
i.i.d
across time-slots with mean arrival rate
denoted by
λ
m
. We further let
A
m
n
(
t
) denote the number of new arrivals at server
n
from dispatcher
m
at the
beginning of time-slot
t
. Let
A
n
(
t
) =
∑
M
m
=1
A
m
n
(
t
) be the total number of arriving tasks at server
n
at the
beginning of time-slot
t
.
2.1.2 Service
Let
S
n
(
t
) denote the amount of service that server
n
offers for queue
n
in time-slot
t
. That is,
S
n
(
t
) is the
maximum number of tasks that can be completed by server
n
at time-slot
t
. We assume that
S
n
(
t
) is an
integer-valued random variable, which is
i.i.d.
across time-slots. We also assume that
S
n
(
t
) is independent
across different servers as well as the arrival process. The mean
and variance of
S
n
(
t
) are denoted as
μ
n
and
ν
2
n
, respectively. Let
μ
Σ
,
Σ
N
n
=1
μ
n
and
ν
2
Σ
,
Σ
N
n
=1
ν
2
n
denote the mean and variance of the hypothetical total
service process
S
Σ
(
t
)
,
∑
N
n
=1
S
n
(
t
). Let
ǫ
=
μ
Σ
−
λ
Σ
characterize the distance between the arrival rate and the
boundary of capacity region.
2.1.3 Queue Dynamics
Let
Q
n
(
t
) be the queue length of server
n
at the beginning of time slot
t
. Let
A
n
(
t
) denote the number of tasks
routed to queue
n
at the beginning of time-slot
t
according to the dispatching decision. Then the evolution of
the length of queue
n
is given by
Q
n
(
t
+ 1) =
Q
n
(
t
) +
A
n
(
t
)
−
S
n
(
t
) +
U
n
(
t
)
, n
= 1
,
2
, . . . , N,
(2)
where
U
n
(
t
) = max
{
S
n
(
t
)
−
Q
n
(
t
)
−
A
n
(
t
)
,
0
}
is the unused service due to an empty queue.
We do not assume any specific distribution for arrival and service pr
ocesses. Moreover, in contrast to previous
works [29, 4], we do not require that both arrival and service proce
sses have a finite support. Instead, we only
need the condition that their distributions are light-tailed. More spe
cifically, we assume that
E
[
e
θ
1
A
Σ
(
t
)
]
≤
D
1
and
E
[
e
θ
2
S
n
(
t
)
]
≤
D
2
,
(3)
for each
n
where the constants
θ
1
>
0,
θ
2
>
0,
D
1
<
∞
and
D
2
<
∞
are all independent of
ǫ
.
2.2 Local-Estimation-Driven (LED) framework
We are interested in the case that the local memory at each dispatc
her
m
stores an estimate of the queue
length for each server
n
. In particular, we let
̃
Q
m
n
(
t
) be the local estimate of the queue length for server
n
from
dispatcher
m
at the beginning of time-slot
t
(before any arrivals and departures). More specifically, we introd
uce
the following framework for load balancing.
Definition 1.
A Local-Estimation-Driven (LED) policy is composed of the f
ollowing components:
(a)
Dispatching strategy:
At the beginning of each time-slot, each dispatcher
m
chooses one of the servers
for new arrivals purely based on its local estimates (i.e., l
ocal queue length estimates
̃
Q
m
)
(b)
Update strategy:
At the end of each time-slot, each dispatcher would possibly
update its local estimates,
e.g., synchronize local queue length estimate with the true
queue length.
4
The definition of LED is broad, and it includes a variety of classical load
balancing policies. For example, it
can be seen to include LSQ policy studied in [1], by choosing the dispatc
hing strategy to be that new arrivals
at each dispatcher are dispatched to the queue with the shortest
local estimate. Moreover, it also includes two
recent local memory based policies in [2, 20] that are developed for t
he case of single dispatcher and homogeneous
servers.
To study LED, we model the system as a discrete-time Markov chain
{
Z
(
t
) = (
Q
(
t
)
, m
(
t
))
, t
≥
0
}
with state
space
Z
, using the queue length vector
Q
(
t
) together with the memory state
m
(
t
)
,
(
̃
Q
1
(
t
)
,
̃
Q
2
(
t
)
, . . . ,
̃
Q
m
(
t
)).
We consider a set of load balancing systems
{
Z
(
ǫ
)
(
t
)
, t
≥
0
}
parameterized by
ǫ
such that the mean arrival rate of
the total exogenous arrival process
{
A
(
ǫ
)
Σ
(
t
)
, t
≥
0
}
is
λ
(
ǫ
)
Σ
=
μ
Σ
−
ǫ
. Note that the parameter
ǫ
characterizes the
distance between the arrival rate and the boundary of the capac
ity region. We are interested in the throughput
performance and the steady-state delay performance in the hea
vy-traffic regime under any LED policy.
A load balancing system is stable if the Markov chain
{
Z
(
t
)
, t
≥
0
}
is positive recurrent, and
Z
=
{
Q
,
m
}
denotes the random vector whose distribution is the same as the st
eady-state distribution of
{
Z
(
t
)
, t
≥
0
}
. We
have the following definition.
Definition 2
(Throughput Optimality)
.
A load balancing policy is said to be throughput optimal if fo
r any
arrival rate within the capacity region, i.e., for any
ǫ >
0
, the system is positive recurrent and all the moments
of
∥
∥
Q
(
ǫ
)
∥
∥
are finite.
Note that this is a stronger definition of throughput optimality than
that in [1, 21, 25] because, besides the
positive recurrence, it also requires all the moments to be finite in st
eady state for any arrival rate within the
capacity region.
To characterize the steady-state average delay performance in
the heavy-traffic regime when
ǫ
approaches
zero, by Little’s law, it is sufficient to focus on the summation of all the
queue lengths. First, recall the following
fundamental lower bound on the expected sum queue lengths in a loa
d balancing system under any throughput
optimal policy [4]. Note that this result was originally proved with the as
sumption of finite support on the
service process (Lemma 5 in [4]), which can be generalized to service p
rocesses with light-tailed distributions
with a careful analysis of the unused service, see our proof of Lem
ma 6.
Lemma 1.
Given any throughput optimal policy and assuming that
(
σ
(
ǫ
)
Σ
)
2
converges to a constant
σ
2
Σ
as
ǫ
decreases to zero, then
lim inf
ǫ
↓
0
ǫ
E
[
N
∑
n
=1
Q
(
ǫ
)
n
]
≥
ζ
2
,
(4)
where
ζ
,
σ
2
Σ
+
ν
2
Σ
.
The right-hand-side of Eq. (4) is the heavy-traffic limit of a hypothe
sized single-server system with arrival
process
A
(
ǫ
)
Σ
(
t
) and service process
∑
N
n
S
n
(
t
) for all
t
≥
0. This hypothetical single-server queueing system is
often called the
resource-pooled system
. Since a task cannot be moved from one queue to another in the load
balancing system, it is easy to see that the expected sum queue leng
ths of the load balancing system is larger
than the expected queue length in the resource-pooled system. H
owever, if a policy achieves the lower bound in
Eq. (4) in the heavy-traffic limit, based on Little’s law this policy achieve
s the minimum average delay of the
system in steady-state, and thus said to be heavy-traffic delay op
timal, see [4, 10, 21, 24, 25, 29].
Definition 3
(Heavy-traffic Delay Optimality in Steady-state)
.
A load balancing scheme is said to be heavy-
traffic delay optimal in steady-state if the steady-state que
ue length vector
Q
(
ǫ
)
satisfies
lim sup
ǫ
↓
0
ǫ
E
[
N
∑
n
=1
Q
(
ǫ
)
n
]
≤
ζ
2
,
where
ζ
is defined in Lemma 1.
2.3 Dispatching Preference
In order to provide a unified way to specify the dispatching strateg
y in LED, we first introduce a concept
called
dispatching preference
. In particular, let
P
m
n
(
t
) be the probability that new arrivals at dispatcher
m
are
dispatched to server
n
at time-slot
t
. We define
β
m
n
(
t
)
,
P
m
n
(
t
)
−
μ
n
μ
Σ
, which is the difference in probability that
server
n
will be chosen under a particular dispatching strategy and random r
outing (weighted by service rate).
Then, we have the following definition.
5
Definition 4
(Dispatching preference)
.
Fix a dispatcher
m
, let
σ
t
(
·
)
be a permutation of
(1
,
2
, . . . , N
)
that
satisfies
̃
Q
m
σ
t
(1)
(
t
)
≤
̃
Q
m
σ
t
(2)
(
t
)
≤
. . .
≤
̃
Q
m
σ
t
(
N
)
(
t
)
.
The dispatching preference at dispatcher
m
is a
N
-dimensional vector denoted by
∆
m
(
t
)
, the
n
th component of
which is given by
∆
m
n
(
t
)
,
β
m
σ
t
(
n
)
.
In words, the dispatching preference at a dispatcher
m
specifies how servers with different local estimates
are preferred in a unified way such that it is independent of the actu
al values of local estimates. It only depends
on the relative order of local estimates. More specifically, fix a dispa
tcher
m
, by definition we can see that
weighted random routing strategy has no preference for any ser
vers and ∆
m
n
(
t
) = 0 for any
n
. On the other
hand, if new arrivals are always dispatched to the server with the sh
ortest local estimate (e.g, LSQ policy), we
have ∆
m
1
(
t
)
>
0 and ∆
m
n
(
t
)
<
0 for all 2
≤
n
≤
N
. Thus, we can see that a positive value for ∆
m
n
(
t
) means that
the dispatching strategy has a preference for the server with th
e
n
th shortest local estimation. This observation
directly motivates the following two definitions.
Definition 5
(Tilted dispatching strategy)
.
A dispatching strategy adopted at dispatcher
m
is said to be tilted
if there exists a
k
∈{
2
,
3
, . . . N
}
such that for all
t
,
∆
m
n
(
t
)
≥
0
for all
n
≤
k
and
∆
m
n
(
t
)
≤
0
for all
n
≥
k
.
Definition 6
(
δ
-tilted dispatching strategy)
.
A dispatching strategy adopted at dispatcher
m
is said to be
δ
-tilted
if for all
t
(i) it is a tilted dispatching strategy and (ii) there exists
a positive constant
δ
such that
∆
m
1
(
t
)
≥
δ
and
∆
m
N
(
t
)
≤−
δ
.
Remark 1.
Note that similar definitions were first provided in [29] for t
he case of a single dispatcher with
up-to-date information. Based on these definitions, sufficie
nt conditions were presented for throughput and
heavy-traffic optimality. However, these conditions cannot
be directly applied to our model due to the following
two major challenges. One is that, in our model, each dispatc
her only has access to outdated information. The
other is that each dispatcher has no idea of the arrivals at th
e servers coming from other dispatchers, since there
is no communication between them. To handle these challenge
s, we have to develop new techniques.
We end this section by providing intuitions behind the two definitions. T
o start, it can be seen easily
that
∑
N
n
=1
∆
m
n
(
t
) = 0 for all
m
and
t
via the definition of dispatching preference. Roughly speaking, a tilt
ed
dispatching strategy means that compared to (weighted) random
routing (which does not have any preference),
the probabilities of choosing servers with shorter local estimates (
the first
k
shortest ones) are increased, and,
as a result, the probabilities of choosing servers with longer local es
timates are reduced. This is the reason
why we call it tilted, since more preference is given to queues with sho
rter local estimates. Therefore, a tilted
dispatching strategy can be viewed as a strategy that is as least as
‘good’ as (weighted) random routing. On the
other hand, a
δ
-tilted dispatching strategy can be viewed as a strategy that is str
ictly better than (weighted)
random routing. The reason is that, besides the fact that it is tilted
, it also requires that there is a strictly
positive preference of the server with the shortest local estimat
ion.
3 Main Results
In this section, we first present the sufficient conditions for LED po
licies to be throughput optimal and heavy-
traffic delay optimal. Then, we explore several example policies within L
ED framework to demonstrate its
flexibility in designing new load balancing schemes.
3.1 Sufficient Conditions
Let us begin with the sufficient conditions for LED policies to be throug
hput optimal. In particular, we specify
conditions for the dispatching strategy and update strategy tha
t guarantee throughput optimality.
To state the theorem, we need the following notation. Let
I
m
n
(
t
) be an indicator function which equals 1 if
and only if the local estimate of server
n
’s queue length at dispatcher
m
gets updated, i.e., the estimated queue
length
̃
Q
m
n
(
t
) is set to the actual queue length
Q
n
(
t
) at the end of time-slot
t
.
Theorem 1.
Consider an LED policy. Suppose the dispatching strategy at
each dispatcher is tilted and the
update strategy can guarantee the condition that there exis
ts a positive constant
p
such that
E
[
I
m
n
(
t
)
|
Z
(
t
) =
Z
]
≥
p
(5)
6
holds for all
Z
and
(
m, n, t
)
∈M×N ×
N
. Then, this policy is throughput optimal, i.e., the system u
nder this
policy is positive recurrent with all the moments being boun
ded for any
ǫ >
0
.
Proof.
See Section 5.1
Note that this theorem directly implies that LSQ is not only strongly st
able but also enables the system
to have all the moments bounded in steady-state. Moreover, it su
ggests that any dispatching strategy that is
as good as (weighted) random routing is sufficient to guarantee thr
oughput optimality. Further, the update
probability can be a function of the traffic load.
Now, we turn to presenting the sufficient conditions for LED policies t
o be delay optimal in heavy traffic. In
order to achieve delay optimality, we need stronger conditions on bo
th the dispatching strategy and the update
strategy.
Theorem 2.
Consider an LED policy. Suppose the dispatching strategy at
each dispatcher is
δ
-tilted with a
uniform lower bound
δ >
0
being independent of
ǫ
. Suppose the update strategy can guarantee that there exist
s
a positive constant
p
(independent of
ǫ
) such that
E
[
I
m
n
(
t
)
|
Z
(
t
) =
Z
]
≥
p
(6)
holds for all
Z
and
(
m, n, t
)
∈M×N ×
N
. Then, this policy is heavy-traffic delay optimal.
Proof.
See Section 5.2
This theorem not only establishes a delay performance guarantee f
or many previous local-memory based
policies (e.g., LSQ in [1], low-message policies in [2, 20]), but provides us
with the flexibility to design new
delay optimal load balancing for different scenarios with heterogene
ous servers and multiple dispatchers, as
discussed in the next section. More importantly, our results direct
ly suggest that it is possible to use only
delayed information to achieve delay optimality, which resolves one of
the open problems listed in [8].
High-level proof idea.
We end this section by providing drift-based intuitions behind the tec
hnical proofs.
In particular, let us consider two queueing systems: a local-estimat
ion system and the actual system (i.e., queue
lengths at servers). For throughput optimality, it requires the ac
tual system to have a drift towards the origin.
First, by the definition of a tilted dispatching strategy, it provides a
n equivalent drift on the local-estimation
system that is towards the origin. Then, the condition on the updat
e strategy guarantees that the local-
estimation system is not too far away from the actual system. Hen
ce, the actual system also has a drift towards
the origin. The heavy-traffic delay optimality not only requires a drift
towards the origin, but also needs a drift
towards the line that all the queue lengths are equal. First, by the d
efinition of a
δ
-tilted dispatching strategy,
there is a drift towards the line that all the local estimates are equa
l within a given dispatcher. Then, by the
condition for the update strategy, the drift on the local-estimatio
n system can be transfered to a drift on the
actual system, and hence delay optimality. Note that, in the curre
nt proof, in order to make this ‘drift-transfer’
process valid, we impose the condition that both
δ
and
p
are independent of
ǫ
, which is not necessarily required
and both of them could possibly be a particular function of
ǫ
as in [26]. This relaxation could be an interesting
future research direction.
3.2 Examples
To illustrate the applications of Theorems 1 and 2, in this section, we in
troduce examples of LED policies
that are both throughput optimal and heavy-traffic delay optimal.
The flexibility provided by our sufficient
conditions not only allows us to include previous policies as special case
s, but enables us to design new flexible
policies.
Let us first introduce some typical
δ
-tilted dispatching strategies.
Example 1
(Local–Join-Shortest-Queue (L-JSQ))
.
At the beginning of each time-slot
t
, the dispatcher for-
wards its arrivals to the server with the shortest local esti
mation with ties broken arbitrarily. That is, consider
dispatcher
m
, the chosen server is
i
∗
∈
arg min
n
{
̃
Q
m
n
}
.
This dispatching strategy is the same as that in the LSQ policy in [1]. By t
he definition of dispatching
preference, we can see that under L-JSQ, ∆
m
1
(
t
) = 1
−
μ
σ
t
(1)
/μ
Σ
>
0 and ∆
m
n
(
t
) =
−
μ
σ
t
(
n
)
/μ
Σ
<
0. Hence, it
is
δ
-tilted even for heterogeneous servers with
δ
=
μ
min
/μ
Σ
where
μ
min
= min
n
μ
n
.
Instead of always joining the server with the shortest local estima
te, it is also possible to join a sever whose
queue length is below a threshold while satisfying the condition of
δ
-tilted dispatching preference.
7
Example 2
(Local–Join-Below-Average (L-JBA))
.
At the beginning of each time-slot
t
, the dispatcher forwards
its arrivals to a randomly chosen server whose local estimat
e is below or equal to the average local queue length
estimation. That is, consider dispatcher
m
with the average local estimate being
̄
Q
m
(
t
) =
1
N
∑
n
̃
Q
m
n
(
t
)
. Let
A
,
{
n
:
̃
Q
m
n
(
t
)
≤
̄
Q
m
(
t
)
}
. Then, for each
i
∈A
,
P
m
i
(
t
) =
μ
i
/
∑
n
∈A
μ
n
, and for
i /
∈A
,
P
m
i
(
t
) = 0
.
It can be easily shown from the definition that L-JBA is also
δ
-tilted. Note that, compared to L-JSQ, in the
heterogeneous case, it needs the dispatcher to know the service
rate of each server, which can be easily obtained
by the update strategies introduced next. This strategy is more fl
exible than L-JSQ since it does not require
new arrivals to be only sent to the server with the shortest local es
timate, which could be used in the scenarios
with data locality. Moreover, some randomness in the dispatching st
rategy is also useful, as discussed in the
next section.
Further, it is possible to generalize many previous heavy-traffic dela
y optimal policies into the LED frame-
work. For example, we can directly apply the Power-of-
d
policy as our dispatching strategy.
Example 3
(Local–Power-of-
d
(L-Pod))
.
At the beginning of each time-slot
t
, the dispatcher randomly chooses
d
≥
2
servers and sends arrivals to the server that has the shortes
t local estimation among the
d
servers.
It can be easily shown that L-Pod is tilted for homogeneous servers
. Moreover, for a given
m
, we have
∆
m
1
(
t
) =
d
−
1
N
and ∆
m
N
(
t
) =
−
1
N
, and hence it is
δ
-tilted with
δ
=
1
N
.
Now, let us turn to discussing update strategies that satisfy the c
ondition in Theorem 2. In particular,
the update strategy can either be push-based (dispatcher samp
les servers) or pull-based (servers report to
dispatchers).
Definition 7
(Push-Update)
.
If there are new arrivals, then at the end of the time-slot the
dispatcher
m
samples
d
distinct servers with a positive probability
ˆ
p
. Then, it updates the corresponding
d
local estimations with the
true values.
It has been shown in [1] that even for
d
= 1, the push-update strategy is guaranteed to satisfy the cond
ition
in Theorem 2.
Definition 8
(Pull-Update)
.
At the end of each time-slot, for each server
n
if there are completed tasks, then
the server will uniformly at random pick a dispatcher
m
and then abide by one of the following two rules:
•
If the server becomes idle (i.e., no tasks), it sends
(
n,
0)
to dispatcher
m
.
•
If not, it sends
(
n, Q
n
)
to dispatcher
m
with probability
ˆ
p
.
It has been shown in [1] that for any ˆ
p >
0, the pull-update strategy is guaranteed to satisfy the condition
in Theorem 2.
Now, having introduced both the dispatching strategy and the upd
ate strategy, we can combine them to
obtain different LED policies that are delay optimal in heavy-traffic. F
or example, we have L-JSQ-Push, L-JSQ-
Pull, L-JBA-Push, L-JBA-Pull for heterogeneous servers, as we
ll as L-Pod-Push and L-Pod-Pull for homogeneous
servers.
We end this section by summarizing the contributions of the LED fram
ework. (i)
It covers previous
polices.
L-JSQ-Push (with ˆ
p
= 1) and L-JSQ-Pull are the same as LSQ policies considered in [1], whic
h
include the policies developed in both [2] and [20] as special cases. Thu
s, by Theorems 1 and 2, all these policies
are throughput and heavy-traffic delay optimal. (ii)
It allows randomness in dispatching.
The randomness
introduced in L-JBA and L-Pod is helpful when dealing with the scenar
io with an extreme low budget on the
message overhead, as discussed next. (iii)
It enables trade-offs between memory and message overhead.
For example, L-Pod-Push and L-Pod-Pull represent good example
s that trade memory for low message overhead.
That is, if each dispatcher directly uses the traditional Power-of-
d
without any memory, then at least 4 messages
needed to guarantee delay optimality in heavy-traffic. In contrast
, in both L-Pod-Push and L-Pod-Pull, the
worst-case
message overhead is just 1 per arrival. In addition, the message ca
n be further reduced by choosing
a smaller value of ˆ
p
in the update strategy.
4 Discussion
Before moving to the proofs, we would like to discuss key features a
nd insights about LED, and point out
possible refinements on LED.
8
4.1 Key features of LED
In this section, we highlight the key features of the LED framework
, including low message overhead, zero
dispatching delay, low computational complexity and appealing perfo
rmance across various loads.
Low message overhead.
It should be noted that the communication overhead occurs only du
ring the
update phase in LED policies. For the push-update strategy, the n
umber of messages per arrival is at most 2
d
(
d
can even be one). For the pull-update strategy, the number of me
ssages per arrival is at most 1. In contrast,
JSQ needs 2
N
messages per arrival and Power-of-
d
needs at least 4 messages per arrival. Although JIQ has a
comparative worst-case message overhead as LED policies, it is not
stable for heterogeneous servers.
Zero dispatching delay.
Another key feature of all LED policies is that there is zero dispatch
ing delay.
That is, the dispatcher can immediately route its new arrivals to the c
hosen server since the decision is made
purely based on its local estimations. Moreover, the communication
between dispatchers and servers happens
only after the decision is made. This is in contrast to typical push-ba
sed policies like JSQ and Power-of-
d
, under
which the dispatcher has to wait for the response of sampled serve
rs to make its dispatching decision, resulting
in a non-zero dispatching delay.
Low computational complexity.
In order to implement LED policies, each dispatcher has to keep an
array of size
N
its local estimations. Such a space requirement is negligible in a modern
cluster. Further, the
operations required by dispatching strategies of LED policies are ve
ry efficient. For example, in order to find
the server with the minimal local estimate in L-JSQ, we can keep the a
rray in a min-heap data structure. For
L-JBA, we can calculate the average by using an efficient running ave
rage algorithm. For the simple L-Pod, it
only needs random number generators.
Appealing performance across loads.
Although the theoretical delay optimality for the LED frame-
work holds in the heavy-traffic asymptotic regime, the family of LED p
olicies includes efficient policies that
significantly outperform alternative low-message overhead policies
with the same (or even smaller) amount of
communications. For example, if the dispatching strategy adopts L
-JSQ in LED, then it reduces to the LSQ
policy proposed in [1], which appeals to enjoy good performance over
a wide range of traffic loads in different
scenarios via extensive simulations.
As mentioned earlier, the class of heavy-traffic delay optimal LED po
licies is broad and includes flexible
choices of different dispatching and update strategies based on diff
erent application scenarios. The actual delay
performance (except the heavy-load scenario) varies with the pa
rticular choice of dispatching strategy or update
strategy under different scenarios. Thus, it is not possible to pick o
ne particular LED policy that fits every
circumstance, which is also not the focus of this paper. Instead, it
would be useful to present some useful
insights about the LED framework, as presented in the following. Th
ese insights could serve as the guidance on
the choice or design of new LED policies.
4.2 Useful insights from LED
The main trait of the LED framework is that only local, possibly delayed
and inaccurate information, is used
for making the dispatching decision. In the following, we present two
useful insights about the use of inaccurate
delayed information for load balancing.
Inaccurate information can improve performance.
A big problem for load balancing with multiple
dispatchers is
herd behavior
, which means that arrivals at different dispatchers join the same se
rver. This often
leads to a poor delay performance in practice [18]. For example, JSQ u
sed in the case of multiple dispatchers
leads to a serious herd behavior since all the dispatchers will route a
rrivals to the single shortest queue. In
contrast, under the LED framework, each dispatcher may believe
that a different queue is the shortest according
to its own local estimates because these estimates are inaccurate
and delayed. Thus, jobs at different dispatchers
are sent to different queues that may not have the actual shorte
st length but still have relatively small queue
lengths. This intuition is illustrated by Fig. 1. In particular, we conside
r a set up with 10 dispatchers and
100 heterogeneous servers. All the LED policies are configured to
have the same average message overheads as
Power-of-2. It can be seen that the LED policies are not only stable
but achieve a much better performance
compared to JSQ, which suffers from the herd behavior in the multiple
-dispatcher case.
Randomness is useful for heavily-delayed information.
As mentioned earlier, the LED framework
provides us with the possibility of exploring load balancing with extreme
ly low message overhead by choosing
a small value ˆ
p
in the update strategy. As a result, the local information at each d
ispatcher will only be
updated after a long time interval. In this case, if a deterministic disp
atching strategy (e.g., L-JSQ) is adopted,
it would again incur herd behavior (even for a single dispatcher case)
since all the arrivals during the long
update interval will join the same queue. This is another motivation f
or considering L-JBA and L-Pod, which
9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Load (
ρ
)
0
10
20
30
40
50
60
70
Mean response time (time slots)
JSQ
L-JSQ-Pull
L-JSQ-Push
L-JBA-Pull
L-JBA-Push
JIQ
Power-of-2
Figure 1: Inaccurate information could improve performance in mult
iple-dispatcher case.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Load (
ρ
)
0
200
400
600
800
1000
1200
Mean response time (time slots)
L-JSQ-Push
L-JBA-Push
L-Pod-Push
Figure 2: Randomness is useful for heavily-delayed information.
naturally introduce a certain level of randomness and hence help av
oid the herd behavior as suggested by [11].
To illustrate this insight, we consider a set up with 10 dispatchers and
100 homogeneous servers. We compare
the delay performance of L-JSQ-Push, L-Pod-Push and L-JBA-P
ush with the update probability set to ˆ
p
= 0
.
01
and
d
= 2. As shown in Fig. 2, both L-JBA-Push and L-Pod-Push outperfo
rms L-JSQ-Push, which suffers from
herd behavior because of heavily-delayed information.
4.3 Refinements on LED
Our main results suggest that there is a large class of heavy-traffic
delay optimal LED policies. On the one
hand, it provides us with flexibility to tailor our policy design for differen
t application scenarios with different
choices of dispatching and update strategies. On the other hand,
it also suggests the need for refinements on
LED beyond delay optimality in heavy-traffic. To this end, we introduc
e two possible directions for refinements.
Degree of queue imbalance.
As introduced in [28],
degree of queue imbalance
is a refined metric to further
distinguish heavy-traffic delay optimal policies. The idea is that, inste
ad of looking at the average queue length
(and hence average delay), the degree of queue imbalance measur
es the expected difference in queue lengths
among the servers. By following the proof of Proposition 5.6 in [28], w
e can establish that the degree of queue
imbalance of all heavy-traffic delay optimal LED policies is
O
(
1
δ
2
p
4
). Thus, even though by Theorem 2, any
positive
δ
and
p
are sufficient for delay optimality in heavy-traffic, a dispatching stra
tegy with smaller
δ
or a
10
update strategy with a smaller
p
could affect the performance in practice.
Other asymptotic regimes.
In this paper, we focus on the heavy-traffic asymptotic regime whe
re the
number of servers is fixed and the load approaches one. As mention
ed before, there are also other asymptotic
regimes in the analysis of load balancing schemes. One possible directio
n is to extend the fluid-limit techniques
for the large-system regime in [20] to the case of multiple dispatche
rs and heterogeneous servers. Another
alternative regime is the many-server heavy-traffic regime (e.g., Ha
lfin-Whitt regime), which tends to keep a
balance between heavy-traffic regime and large-system regime. St
udying LED in such a regime is another
interesting direction for future work.
5 Proofs
In this paper, we extend the Lyapunov drift-based approach dev
eloped in [4] to allow for unbounded supports
of arrival and service processes. In particular, we replace the fin
iteness condition on the drift in [4] by a
stochastically dominated condition, as shown in (C2) in Lemma 2. As pr
oved in [7], this weaker condition,
combined with a negative drift condition, can still guarantee finite mo
ment bounds. Besides a weaker condition,
we also replace the one-step drift with a
T
-step drift. Formally, we use the following lemma to derive bounded
moments in steady state.
Lemma 2.
For an irreducible aperiodic and positive recurrent Markov
chain
{
X
(
t
)
, t
≥
0
}
over a countable
state space
X
, which converges in distribution to
X
, and suppose
V
:
X →
R
+
is a Lyapunov function. We
define the
T
time slot drift of
V
at
X
as
∆
V
(
X
)
,
[
V
(
X
(
t
0
+
T
))
−
V
(
X
(
t
0
))]
I
(
X
(
t
0
) =
X
)
,
where
I
(
.
)
is the indicator function. Suppose for some positive finite i
nteger
T
, the
T
time slot drift of
V
satisfies
the following conditions:
•
(C1) There exists an
η >
0
and a
κ <
∞
such that for any
t
0
= 1
,
2
, . . .
and for all
X
∈X
with
V
(
X
)
≥
κ
,
E
[∆
V
(
X
)
|
X
(
t
0
) =
X
]
≤−
η.
•
(C2)
|
∆
V
(
X
)
|≺
W
for all
t
0
and all
X
∈X
, and
E
[
e
θW
]
=
D
is finite for some
θ >
0
,
Then
{
V
(
X
(
t
))
, t
≥
0
}
converges in distribution to a random variable
V
for which there exists a
θ
∗
>
0
and
a
C
∗
<
∞
such that
E
[
e
θ
∗
V
]
≤
C
∗
,
which directly implies that all the moments of
V
exist and are finite.
5.1 Proof of Theorem 1
To start with, let us first show that the Markov chain
{
Z
(
t
) = (
Q
(
t
)
, m
(
t
))
, t
≥
0
}
with
m
(
t
)
,
(
̃
Q
1
(
t
)
,
̃
Q
2
(
t
)
, . . . ,
̃
Q
m
(
t
))
is irreducible and aperiodic. Let the initial state be
Z
(0) = (
Q
(0)
, m
(0)) = (0
1
×
N
,
0
1
×
MN
) and the state space
Z
consists of all the states that can be reached from the initial stat
e. Consider any state
Z
, the queue length
vector
Q
can reach the initial state with a positive probability since the event t
hat there are no exogenous ar-
rivals and all the offered service is at least one during each time-slot h
appens with positive probability under our
assumptions. Moreover, under the condition for the update stra
tegy given by Eq. (5), the event that
Q
remains
as the initial state while all
̃
Q
m
reach to the initial state happens with a positive probability. Theref
ore, any
state in the state space can reach the initial state, and hence the
Markov chain is irreducible. The aperiodicity
of the Markov chain comes from the fact that the transition proba
bility from the initial state to itself is positive.
In order to show positive recurrence, we adopt the Foster-Lyap
unov theorem. In particular, we consider the
following Lyapunov function
W
(
Z
(
t
)) =
k
Q
(
t
)
k
2
+
∑
M
m
=1
∥
∥
∥
Q
(
t
)
−
̃
Q
m
(
t
)
∥
∥
∥
1
, and in the rest of the proof we use
W
(
t
) as an abbreviation of
W
(
Z
(
t
)) Let
X
m
n
(
t
)
,
|
Q
n
(
t
)
−
̃
Q
m
n
(
t
)
|
. The conditional mean drift of
W
(
t
) defined
as
D
(
Z
(
t
0
))
,
E
[
W
(
t
0
+
T
)
−
W
(
t
0
)
|
Z
(
t
0
)] can be decomposed as follows
D
(
Z
(
t
0
)) =
D
Q
(
t
0
) +
M
∑
m
=1
N
∑
n
=1
D
X
m
n
(
t
0
)
(7)
11
where
D
Q
(
t
0
)
,
E
[
k
Q
(
t
0
+
T
)
k
2
−k
Q
(
t
0
)
k
2
|
Z
(
t
0
)
]
D
X
m
n
(
t
0
)
,
E
[
X
m
n
(
t
0
+
T
)
−
X
m
n
(
t
0
)
|
Z
(
t
0
)]
Let us first consider the tern
D
X
m
n
(
t
0
). Note that for all
t
0
,
m
and
n
E
[
X
m
n
(
t
0
+ 1)
|
Z
(
t
0
) =
Z
]
≤
E
[(1
−I
m
n
(
t
0
)) (
X
m
n
(
t
0
) +
A
n
(
t
0
) +
S
n
(
t
0
))
|
Z
(
t
0
) =
Z
]
(
a
)
≤
(1
−
p
)
X
m
n
(
t
0
) +
λ
Σ
+
μ
max
(8)
where (a) follows from the condition in Eq. (5) and
μ
max
= max
n
μ
n
. Then, we have (the time reference
t
0
is
dropped for simplicity)
D
X
m
n
(
t
0
)
=
E
[
t
0
+
T
−
1
∑
t
=
t
0
X
m
n
(
t
+ 1)
−
X
m
n
(
t
)
|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
E
[
X
m
n
(
t
+ 1)
−
X
m
n
(
t
)
|
Z
(
t
)]
|
Z
]
(
a
)
≤
t
0
+
T
−
1
∑
t
=
t
0
E
[
−
pX
m
n
(
t
) +
λ
Σ
+
μ
max
|
Z
]
≤−
pX
m
n
(
t
0
) +
λ
Σ
+
μ
max
,
(9)
where (a) follows from Eq. (8). Let us turn to consider the term
D
Q
(
t
0
). By the queue dynamics in Eq. (2),
D
Q
(
t
0
)
=
E
[
t
0
+
T
−
1
∑
t
=
t
0
k
Q
(
t
+ 1)
k
2
−k
Q
(
t
)
k
2
|
Z
(
t
0
) =
Z
]
=
E
[
t
0
+
T
−
1
∑
t
=
t
0
k
Q
(
t
) +
A
(
t
)
−
S
(
t
) +
U
(
t
)
k
2
−k
Q
(
t
)
k
2
|
Z
]
(
a
)
≤
E
[
t
0
+
T
−
1
∑
t
=
t
0
k
Q
(
t
) +
A
(
t
)
−
S
(
t
)
k
2
−k
Q
(
t
)
k
2
|
Z
]
=
E
[
t
0
+
T
−
1
∑
t
=
t
0
2
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i
+
k
A
(
t
)
−
S
(
t
)
k
2
|
Z
]
(
b
)
≤
E
[
t
0
+
T
−
1
∑
t
=
t
0
2
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i
+
K
|
Z
]
,
(10)
where (a) follows from the facts that
Q
n
(
t
) +
A
n
(
t
)
−
S
n
(
t
) +
U
n
(
t
) = max(
Q
n
(
t
) +
A
n
(
t
)
−
S
n
(
t
)
,
0) for any
t
≥
0, and (max(
a,
0))
2
≤
a
2
for any
a
∈
R
; (b) holds by our assumption of light-tailed distributions for the
total arrival process and each service process in Eq. (3). In par
ticular, we have that the second moments for
total arrival process and service process of each server are fin
ite (independent of
ǫ
), and hence there exists a
finite upper bound
K
which is independent of the load parameter
ǫ
.
12
Now, let us continue to work on Eq. (10). In particular, we have
E
[
t
0
+
T
−
1
∑
t
=
t
0
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
E
[
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i|
Z
(
t
)]
|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
E
[
h
Q
(
t
)
,
A
(
t
)
i|
Z
(
t
)]
|
Z
(
t
0
) =
Z
]
(11)
−
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
μ
n
|
Z
(
t
0
) =
Z
]
.
(12)
For Eq. (11), we have
t
0
+
T
−
1
∑
t
=
t
0
E
[
E
[
h
Q
(
t
)
,
A
(
t
)
i|
Z
(
t
)]
|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
E
[
A
m
n
(
t
)
|
Z
(
t
)]
|
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
P
m
n
(
t
)
λ
m
|
Z
(
t
0
) =
Z
]
(
a
)
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
(
β
m
n
(
t
) +
μ
n
μ
Σ
)
λ
m
|
Z
(
t
0
) =
Z
]
,
where (a) follows from the definition of
β
m
n
(
t
). Then, it can be further simplified as follows.
t
0
+
T
−
1
∑
t
=
t
0
E
[
E
[
h
Q
(
t
)
,
A
(
t
)
i|
Z
(
t
)]
|
Z
(
t
0
) =
Z
]
(
a
)
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
β
m
n
(
t
)
λ
m
|
Z
]
+
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
i
=1
Q
n
(
t
)
M
∑
m
=1
μ
n
μ
Σ
(
μ
Σ
−
ǫ
)
p
m
|
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
β
m
n
(
t
)
λ
m
|
Z
]
+
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
μ
n
|
Z
]
−
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
ǫμ
n
μ
Σ
|
Z
]
,
(13)
where in (a),
p
m
is the probability that arrivals are allocated to dispatcher
m
(or it can be viewed as the fraction
of the total arrivals that are allocated to dispatcher
m
).
13
Combining Eqs. (11), (12) and (13), yields
E
[
t
0
+
T
−
1
∑
t
=
t
0
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
M
∑
m
=1
β
m
n
(
t
)
λ
m
|
Z
(
t
0
) =
Z
]
−
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
ǫμ
n
μ
Σ
|
Z
(
t
0
) =
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
M
∑
m
=1
(
Q
n
(
t
)
−
̃
Q
m
n
(
t
) +
̃
Q
m
n
(
t
)
)
β
m
n
(
t
)
λ
m
|
Z
]
−
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
ǫμ
n
μ
Σ
|
Z
]
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
M
∑
m
=1
(
Q
n
(
t
)
−
̃
Q
m
n
(
t
)
)
β
m
n
(
t
)
λ
m
|
Z
]
︸
︷︷
︸
T
1
+
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
M
∑
m
=1
̃
Q
m
n
(
t
)
β
m
n
(
t
)
λ
m
|
Z
]
︸
︷︷
︸
T
2
−
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
Q
n
(
t
)
ǫμ
n
μ
Σ
|
Z
]
︸
︷︷
︸
T
3
.
We are going to handle each term one by one. To upper bound
T
1
, we use the following result on
X
m
n
(
t
) =
|
Q
n
(
t
)
−
̃
Q
m
n
(
t
)
|
.
Lemma 3.
Under the condition given by Eq.
(5)
, for any
t
0
and
Z
(
t
0
)
, there exists a finite
T
1
independent of
ǫ
and a finite constant
L
that is only a function of
p
and
μ
Σ
, such that for all
T
≥
T
1
E
[
t
0
+
T
−
1
∑
t
=
t
0
X
n
m
(
t
)
|
Z
(
t
0
) =
Z
]
≤
LT
holds for all
m
and
n
.
Proof.
See Appendix A.
By using Lemma 3 with
T
≥
T
1
, we have
T
1
≤
λ
Σ
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
M
∑
m
=1
∣
∣
∣
Q
n
(
t
)
−
̃
Q
m
n
(
t
)
∣
∣
∣
|
Z
]
≤
λ
Σ
M N LT.
(14)
For
T
2
, we have
T
2
(
a
)
=
t
0
+
T
−
1
∑
t
=
t
0
E
[
N
∑
n
=1
M
∑
m
=1
̃
Q
m
σ
t
(
n
)
(
t
)∆
m
n
(
t
)
λ
m
|
Z
]
(
b
)
≤
0
,
(15)
where (a) comes from the definition of dispatching preference vec
tor ∆
m
(
t
); (b) holds since dispatching preference
is tilted and
̃
Q
m
σ
t
(1)
(
t
)
≤
̃
Q
m
σ
t
(2)
(
t
)
≤
. . .
≤
̃
Q
m
σ
t
(
N
)
(
t
).
For
T
3
, we have
T
3
≥
ǫμ
min
μ
Σ
k
Q
(
t
0
)
k
1
,
(16)
14
where
μ
min
= min
n
μ
n
.
Now, combining Eqs. (14), (15) and (16), yields
E
[
t
0
+
T
−
1
∑
t
=
t
0
h
Q
(
t
)
,
A
(
t
)
−
S
(
t
)
i|
Z
(
t
0
) =
Z
]
≤−
ǫμ
min
μ
Σ
k
Q
(
t
0
)
k
1
+
λ
Σ
M N LT.
Substituting the result above back into Eq. (10), yields
D
Q
(
t
0
)
≤−
2
ǫμ
min
μ
Σ
k
Q
(
t
0
)
k
1
+ 2
λ
Σ
M N LT
+
KT.
(17)
Now, we are ready to substitute Eq. (9) and Eq. (17) back into Eq.
(7). As a result, we have
D
(
Z
(
t
0
))
≤−
2
ǫμ
min
μ
Σ
k
Q
(
t
0
)
k
1
−
p
M
∑
m
=1
N
∑
n
=1
X
m
n
(
t
0
)
+ 2
λ
Σ
M N LT
+
KT
+
λ
Σ
+
μ
max
(
a
)
≤ −
ξ
(
k
Q
(
t
0
)
k
1
+
M
∑
m
=1
N
∑
n
=1
|
Q
n
(
t
0
)
−
̃
Q
m
n
(
t
0
)
|
)
+
K
1
,
where in (a)
ξ
= min(2
ǫμ
min
μ
Σ
, p
) and
K
1
,
2
λ
Σ
M N LT
+
KT
+
λ
Σ
+
μ
max
. Pick any
α >
0 and let
B
,
{
Z
∈Z
:
k
Q
(
t
0
)
k
1
+
M
∑
m
=1
N
∑
n
=1
|
Q
n
(
t
0
)
−
̃
Q
m
n
(
t
0
)
|≤
K
1
+
α
ξ
}
.
Then,
B
is a finite subset. For any
Z
∈B
c
,
D
(
Z
)
≤−
α
, and for any
Z
∈B
,
D
(
Z
)
≤
K
1
. By Foster-Lyapunov
theorem, we have established positive recurrence.
Having shown that the Markov chain
{
Z
(
t
)
, t
≥
0
}
is ergodic, we are left with the task of showing that all the
moments are finite in steady-state. In order to do so, we use Lemm
a 2. In particular, we choose the Lyapunov
function as
V
(
Z
(
ǫ
)
) =
∥
∥
Q
(
ǫ
)
∥
∥
and then verify the two conditions. In the following, the superscrip
t
(
ǫ
)
will be
omitted for ease of notations. To verify condition (C2), we have
|
∆
V
(
Z
)
|
=
|k
Q
(
t
0
+
T
)
k−k
Q
(
t
0
)
k|I
(
Z
(
t
0
) =
Z
)
(
a
)
≤ k
Q
(
t
0
+
T
)
−
Q
(
t
0
)
kI
(
Z
(
t
0
) =
Z
)
≤
t
0
+
T
−
1
∑
t
=
t
0
k
Q
(
t
+ 1)
−
Q
(
t
)
kI
(
Z
(
t
0
) =
Z
)
≤
t
0
+
T
−
1
∑
t
=
t
0
k
A
(
t
)
−
S
(
t
) +
U
(
t
)
kI
(
Z
(
t
0
) =
Z
)
(
b
)
≤
t
0
+
T
−
1
∑
t
=
t
0
(
k
A
(
t
)
k
+ 2
k
S
(
t
)
k
)
I
(
Z
(
t
0
) =
Z
)
,
(18)
where (a) holds since
|k
x
k−k
y
k|≤k
x
−
y
k
for each
x
,
y
in
R
N
. (b) follows from triangle inequality and the
fact that
U
n
(
t
)
≤
S
n
(
t
) for all
t
and
t
. Then, by our assumptions of light-tailed distributions for both tot
al
arrival and service processes, there exists a random variable
W
such that
|
∆
V
(
X
)
|≺
W
for all
t
0
and all
X
∈X
,
and
E
[
e
θW
]
=
D
is finite for some
θ >
0, which verifies (C2).
15
For (C1), we have
E
[∆
V
(
Z
)
|
Z
(
t
0
) =
Z
]
=
E
[
k
Q
(
t
0
+
T
)
k−k
Q
(
t
0
)
k|
Z
(
t
0
) =
Z
]
=
E
[
√
k
Q
(
t
0
+
T
)
k
2
−
√
k
Q
(
t
0
)
k
2
|
Z
(
t
0
) =
Z
]
(
a
)
≤
1
2
k
Q
(
t
0
)
k
E
[
k
Q
(
t
0
+
T
)
k
2
−k
Q
(
t
0
)
k
2
|
Z
(
t
0
) =
Z
]
(
b
)
≤ −
ǫ
μ
min
μ
Σ
+
2
λ
Σ
M N LT
+
KT
2
k
Q
(
t
0
)
k
,
where (a) follows from the fact that
f
(
x
) =
√
x
is concave; (b) comes from Eq. (17). Thus, condition (C1) is
valid and hence the proof of Theorem 1 is complete.
5.2 Proof of Theorem 2
In order to prove the result, we need two intermediate results. On
e is called
state-space collapse
as stated in
Proposition 1, which is the key ingredient for establishing heavy traffi
c delay optimality. Roughly speaking, it
means that the multi-dimension space for the queue length vector r
educes to one dimension in the sense that
the deviation from the line (on which all the queue lengths are equal) is
bounded by a constant, independent of
ǫ
. Another intermediate result is concerned with unused service. Ba
sed on these two intermediate results, we
can prove heavy-traffic delay optimality. We omit the time reference
t
0
for simplicity when necessary.
Proposition 1.
Under the conditions in Theorem 2, then we have that
Q
⊥
is bounded in the sense that in
steady state there exists finite constants
{
L
r
, r
∈
N
}
independent of
ǫ
such that
E
[
∥
∥
∥
Q
(
ǫ
)
⊥
∥
∥
∥
r
]
≤
L
r
for all
ǫ
∈
(0
, ǫ
0
)
and
r
∈
N
, where
Q
⊥
=
Q
−h
Q
,
c
i
c
is the perpendicular component of
Q
with respect to the
line
c
=
1
√
N
(1
,
1
, . . . ,
1)
.
Proof.
It suffices to show that
V
⊥
(
Z
(
ǫ
)
)
,
∥
∥
∥
Q
(
ǫ
)
⊥
∥
∥
∥
satisfies the conditions (C1) and (C2) in Lemma 2. Let us
first consider conditions (C2). In particular, we have
|
∆
V
⊥
(
Z
)
|
=
|k
Q
⊥
(
t
0
+
T
)
k−k
Q
⊥
(
t
0
)
k|I
(
Z
(
t
0
) =
Z
)
(
a
)
≤ k
Q
⊥
(
t
0
+
T
)
−
Q
⊥
(
t
0
)
kI
(
Z
(
t
0
) =
Z
)
=
∥
∥
Q
(
t
0
+
T
)
−
Q
k
(
t
0
+
T
)
−
Q
(
t
0
) +
Q
k
(
t
0
)
∥
∥
I
(
Z
(
t
0
) =
Z
)
(
b
)
≤k
Q
(
t
0
+
T
)
−
Q
(
t
0
)
k
+
∥
∥
Q
k
(
t
0
+
T
)
−
Q
k
(
t
0
)
∥
∥
I
(
Z
(
t
0
) =
Z
)
(
c
)
≤
2
k
Q
(
t
0
+
T
)
−
Q
(
t
0
)
kI
(
Z
(
t
0
) =
Z
)
(
d
)
≤
2
t
0
+
T
−
1
∑
t
=
t
0
(
k
A
(
t
)
k
+ 2
k
S
(
t
)
k
)
I
(
Z
(
t
0
) =
Z
)
(19)
where the inequality (a) follows from the fact that
|k
x
k−k
y
k|≤k
x
−
y
k
holds for any
x
,
y
∈
R
N
; inequality
(b) follows from triangle inequality; (c) holds due to the non-expans
ive property of projection to a convex set;
(d) follows from Eq. (18). Then by our assumptions of light-tailed dis
tributions for both total arrival and service
processes, there exists a random variable
W
such that
|
∆
V
⊥
(
X
)
|≺
W
for all
t
0
and all
X
∈X
, and
E
[
e
θW
]
=
D
is finite for some
θ >
0, which verifies (C2).
Let us turn to condition (C1). By the proof of Lemma 3.6 in [29], it suffi
ces to establish the following result
in order to verify (C1). That is, there exists
T >
0,
K
2
≥
0 and
η >
0 that are all independent of
ǫ
, such that
for all
t
0
and
Z
∈Z
E
[
t
0
+
T
−
1
∑
t
=
t
0
h
Q
⊥
(
t
)
,
A
(
t
)
−
S
(
t
)
i|
Z
(
t
0
) =
Z
]
≤−
η
k
Q
⊥
k
+
K
2
(20)
16