of 11
Optimal dynamic incentive scheduling for Hawk-Dove evolutionary games
K. Stuckey
Department of Aerospace & Mechanical Engineering,
University of Southern California, Los Angeles CA 90089-1191
R. Dua
Department of Mathematics, University of Southern California, Los Angeles CA 90089-1191
Y. Ma
Department of Physics & Astronomy,
University of Southern California,
Los Angeles CA 90089-1191
J. Parker
§
Division of Biology and Biological Engineering,
California Institute of Technology, Pasadena CA 91125
P.K. Newton
Department of Aerospace & Mechanical Engineering,
Mathematics, and The Ellison Institute,
University of Southern California,
Los Angeles CA 90089-1191
(Dated: August 15, 2021)
The Hawk-Dove mathematical game offers a paradigm of the trade-offs associated with aggres-
sive and passive behaviors. When two (or more) populations of players (animals, insect popu-
lations, countries in military conflict, economic competitors, microbial communities, populations
of co-evolving tumor cells, or reinforcement learners adopting different strategies) compete, their
success or failure can be measured by their frequency in the population (successful behavior is
reinforced, unsuccessful behavior is not), and the system is governed by the replicator dynamical
system. We develop a time-dependent optimal-adaptive control theory for this nonlinear dynami-
cal system in which the payoffs of the Hawk-Dove payoff matrix are dynamically altered (dynamic
incentives) to produce (bang-bang) control schedules that (i) maximize the aggressive population
at the end of time
T
, and (ii) minimize the aggressive population at the end of time
T
. These
two distinct time-dependent strategies produce upper and lower bounds on the outcomes from all
strategies since they represent two extremizers of the cost function using the Pontryagin maximum
(minimum) principle. We extend the results forward to times
nT
(
n
= 1
, ...,
5) in an adaptive way
that uses the optimal value at the end of time
nT
to produce the new schedule for time (
n
+ 1)
T
.
Two special schedules and initial conditions are identified that produce absolute maximizers and
minimizers over an arbitrary number of cycles for 0
T
3. For
T >
3, our optimum schedules
can drive either population to extinction or fixation. The method described can be used to produce
optimal dynamic incentive schedules for many different applications in which the 2
×
2 replicator
dynamics is used as a governing model.
I. INTRODUCTION
The Hawk-Dove game (aka Chicken or Snowdrift
game) is a game-theoretic paradigm for studying the con-
flict between players (or populations of players) who use
kstuckey@usc.edu
rajvirdu@usc.edu
yongqiam@usc.edu
§
joep@caltech.edu
newton@usc.edu (Corresponding author)
two opposing strategies: aggressive (Hawks) and pas-
sive (Doves). One way of framing the conflict is to con-
sider competition in the animal world where two differ-
ent species compete for a limited resource [1–4]. With no
Hawks in the population, Doves will share the resources
and avoid conflict. With no Doves, the Hawks will fight
with each other for resources, taking the risk of injury
or death. If Hawks are present in large enough numbers,
the Doves will flee without fighting. A sufficient fraction
of Doves, on the other hand, can cooperate and expel
the Hawks from the population thereby protecting the
resource [5]. The challenge is to find conditions for sta-
ble co-existence of the two opposing populations. In the
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 15, 2021.
;
https://doi.org/10.1101/2021.08.15.456406
doi:
bioRxiv preprint
2
context of military conflicts, the game is framed as the
game of chicken, thought of as a situation in which two
drivers head towards each other in a single lane trying
not to be the first to swerve away (Doves), each mind-
ful of the fact that if neither swerves (Hawks), both will
die. Key to this game is that the cost of losing is greater
than the value of winning. Versions of this (static) game
have been analyzed and used extensively in political sci-
ence communities to study strategies associated with the
problem of nuclear brinkmanship [6]. In this set-up, the
payoffs are fixed, and the interactions unfold based on
the cost-benefit balance determined by these payoffs.
In a more complicated setting, one might want to mea-
sure repeated interactions in populations of competitors,
~x
= (
x
1
,x
2
)
T
R
2
, where winning and losing is rein-
forced by the relative frequencies of the two competing
populations (frequency dependent selection as in Dar-
winian evolution). For this, the replicator dynamical sys-
tem is commonly used [7–9]:
̇
x
i
=
x
i
((
A~x
)
i
~x
T
(
A~x
)) (
i
= 1
,
2)
(1)
with
x
1
+
x
2
= 1, 0
x
1
1, 0
x
2
1, where each
variable has the interpretation of frequency in the popu-
lation or the alternative interpretation as a probability of
picking a member of one of the two subgroups randomly.
It is useful to also think of the variables
~x
= (
x
1
,x
2
)
T
as strategies (heritable traits) that evolve, with the most
successful strategy dominating, as in the context of Dar-
winian evolution [4] by natural selection. Here,
A
is the
2
×
2 payoff matrix, (
A~x
)
i
is the fitness of population
i
,
and
~x
T
(
A~x
) is the average fitness of both populations,
so
x
i
in (1) drives growth if the population
i
is above
the average and decay if it is below the average. The
fitness functions in (1) are said to be population depen-
dent (selection pressure is imposed by the mix of popula-
tion frequencies) and determine growth or decay of each
subpopulation. Because of this, these equations are also
used extensively in the reinforcement learning commu-
nity where success begets success and failure leads to a
downward spiral of frequency in the population [10].
Using the standard Hawk-Dove payoff matrix [5]:
A
=
[
a
11
a
12
a
21
a
22
]
=
[
3 1
5 0
]
,
(2)
where the population
x
1
are the Hawks (aggressive), and
x
2
are the Doves (passive), the strict Nash equilibrium,
~x
(
x
1
,x
2
) is the mixed state
x
1
=
1
3
,x
2
=
2
3
since
~x
T
A~x
> ~x
T
A~x
for all
~x
6
=
~x
. This implies that the
mixed state is also an evolutionary stable state (ESS) of
the replicator system (1) as discussed in [11]. It is also
useful to uncouple the two variables in (1) and write a
single equation for the aggressor population frequency
x
1
:
̇
x
1
=
x
1
(1
x
1
) [(
A~x
)
1
(
A~x
)
2
]
(3)
=
x
1
(1
x
1
) [(
a
12
a
22
) + ((
a
11
a
21
)
(
a
12
a
22
))
x
1
]
Note also that a single equation for the passive popula-
tion
x
2
is easily obtained using the change of variable
x
1
= 1
x
2
in eqn (3).
The question we address in the paper is whether it is
possible to alter the entries in the payoff matrix
A
in
a time-dependent fashion (dynamic incentives) in order
to optimally achieve some pre-determined goal (such as
minimizing aggression) at the end of fixed time
T
? Dy-
namically altering the entries of a payoff matrix in an
evolutionary game setting has only recently been stud-
ied by coupling the entries, for example, to a system
that represents an external environment [12, 13]. In the
context of nuclear brinkmanship, is it possible to alter
the payoff incentives dynamically in order to achieve a
goal [6] that would not be achievable with fixed payoffs?
Is it possible to offer dynamic economic incentives that
optimize some desired outcome across a population of
participants [14, 15]? Can one optimally design time-
dependent incentive schedules of rewards/punishments
to compel groups of people to get vaccinated [16]? For
co-evolving microbial populations, is it possible to dy-
namically schedule selective antibiotic agents in order to
steer
the evolutionary trajectory in an advantageous di-
rection [17, 18], or even reverse antibiotic resistance, or
in the context of scheduling chemotherapy treatments, is
it possible to design schedules optimally that make best
use of the chemotherapy agents administered in order to
delay chemotherapeutic resistance [8, 9, 19–21]? Control
theory is increasingly being used in a wide range of bi-
ological applications [21–27] but to date, has not been
systematically implemented in the context of evolution-
ary games as far as we know, aside from [8, 9, 21, 28].
One evolutionary context where an apparent Hawk-
Dove scenario may require attainment of a quasi-stable
equilibrium condition is during the evolution of symbiotic
relationships in which one partner is aggressive or preda-
tory. For example, hostile colonies of eusocial insects,
such as ants and termites, are plagued by a diversity of
solitary arthropods that have evolved to infiltrate the so-
cial system and parasitize the nest [30, 31]. The majority
of such parasitic species evolved from free-living ances-
tors without any behavioral specialization [32, 33]. It
follows that the initial steps in establishing the symbiosis
were contingent on these free-living species (the Doves)
entering into equilibrium with their aggressive eusocial
hosts (the Hawks). This equilibrium, once attained, may
have provided an essential, permissive stepping stone to
evolving the essential adaptive traits—such as social be-
haviors and pheromonal mimicry—that facilitate social
parasitism [32].
To address these and related types of settings, we
develop a mathematical framework to determine time-
dependent incentive schedules for altering the payoff en-
tries of a Hawk-Dove evolutionary game in such a way as
to (i) maximize aggression at the end of time
T
, and (ii)
minimize aggression at the end of time
T
. By considering
the bang-bang schedules that produce these upper and
lower bounds on the competing frequencies, we can con-
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 15, 2021.
;
https://doi.org/10.1101/2021.08.15.456406
doi:
bioRxiv preprint
3
FIG. 1. Twelve regions in the (
a
12
, a
21
) plane [29] define which
game is being played. We choose
a
22
at the origin (without
loss). Starting at
t
= 0 in the Hawk-Dove square, what are
the paths to travel that minimize and maximize aggression at
time
t
=
T
?
clude that any alternative payoff schedule will produce
a result that lies somewhere between the two bounds as
each are extremizers of a cost function associated with
the Pontryagin maximum (minimum) principle. We then
extend the time-period to time
nT
(
n
= 1
,...,
5) by us-
ing an adaptive control method that adjusts the schedule
in the (
n
+ 1)
st
window based on the ending frequency
values from the
n
th
window. The schedules produced
drive aggression down to an absolute minimum (
x
min
1
),
or drive it up to an absolute maximium (
x
max
1
), which
are functions of the cycle-time
T
. These values provide
absolute lower and upper bounds on opposing behavior
strategies in an evolutionary setting.
II. OPTIMAL CONTROL THEORY FOR THE
REPLICATOR DYNAMICAL SYSTEM
To implement an optimal dynamic incentive strategy,
we consider the system:
A
=
[
a
11
a
12
a
21
a
22
]
=
A
0
+
A
1
(
t
)
(4)
=
[
3 1
5 0
]
+
[
0
4
u
2
(
t
)
6
u
1
(
t
)
0
]
(5)
=
[
3
1 + 4
u
2
(
t
)
5 + 6
u
1
(
t
)
0
]
,
(6)
(a)
(b)
FIG. 2. Dynamics of the uncontrolled (
u
1
= 0,
u
2
= 0) Hawk-
Dove evolutionary game. (a) Phase portrait associated with
the aggressor population
x
1
. Both Hawk and Dove dominance
(
x
1
= 1
,
0) are unstable fixed points, while the mixed state
x
1
= 1
/
3 is the evolutionarily stable strategy (ESS); (b) Hawk
dynamics for various initial conditions.
T
= 1 is the end of
one control cycle and also the linear growth rate of the Hawk-
Dove system.
where
A
1
(
t
) represents our control with entries in the
off-diagonal terms, and
A
0
is the baseline Hawk-Dove
payoff matrix. The time-dependent functions
~u
(
t
) =
(
u
1
(
t
)
,u
2
(
t
))
R
2
are bounded above and below,
1
u
1
(
t
)
1,
1
u
2
(
t
)
1 and have a range (
3
a
12
5;
1
a
21
11) that allows us to traverse the plane
along any path depicted in red in figure 1, starting in
the Hawk-Dove zone in the uncontrolled (
u
1
= 0;
u
2
= 0)
case which is shown in figure 2 in the phase plane (a) and
the frequency plane (b). The ESS for the uncontrolled
case is
x
1
= 1
/
3. The control path chosen, and the time
parametrization 0
t
T
determines both the sequence
of games being played as well as the switching times (the
times at which the path crosses over from one region to
the next) between games. We denote the total control
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 15, 2021.
;
https://doi.org/10.1101/2021.08.15.456406
doi:
bioRxiv preprint
4
output
~
C
R
2
:
~
C
(
t
) = (
C
1
(
t
)
,C
2
(
t
)) =
t
0
~u
(
t
)
dt
(7)
with total output delivered in time
t
, then:
̇
~
C
(
t
) =
~u
(
t
)
(8)
and:
~
C
(0) = 0
,
(9)
~
C
(
T
) =
T
0
~u
(
t
)
dt
=
~
C
T
(10)
where
T
denotes a final time in which we implement the
control over one cycle. We consider
~
C
T
as a constraint
on the optimization problem, with
~
C
T
= (0
,
0), and our
goal is to first find schedules that minimize and maxi-
mize aggression (
x
1
) at the end of one cycle
t
=
T
sub-
ject to this constraint. For the uncontrolled case, we
know
x
1
1
/
3 as
t
→∞
and we compare the controlled
cases with the uncontrolled case, both satisfying the con-
straint. Also notice that the linear growth rate in (3) is
(
a
12
a
22
) = 1
0 = 1, so we scale
T
the same way in
our computations, as
T
= 1. We then perform the op-
timization adaptively over multiple cycles
nT
using the
end value of cycle
nT
as the initial condition to compute
the optimal schedule for the (
n
+ 1)
st
cycle. Using this
method, we are able to identify absolute maximizers and
minimizers as a function of the cycle time
T
.
A. Optimal control formulation
A standard form for implementing the Pontryagin
maximum (minimum) principle with boundary value con-
straints is:
~
X
= [
~x
(
t
)
,
~
C
(
t
)]
T
,
~
X
R
4
(11)
̇
~
X
=
~
F
(
~
X
) = [
̇
~x,
̇
~
C
(
t
)]
T
,
~
F
:
R
4
R
4
(12)
where we would like to minimize or maximize a general
cost function:
T
0
L
(
~x
(
t
)
,~u
(
t
)
,t
)
dt
+
φ
(
~x
(
T
))
.
(13)
Since the method is standard, we will just briefly de-
scribe the basic framework and refer readers to [34–38]
for more details on how to implement the approach. Fol-
lowing [37] in particular (see page 62 Theorem 4.2.1), we
construct the control theory Hamiltonian:
H
(
~x
(
t
)
,
~
C
(
t
)
,
~
λ,~u
(
t
)) =
~
λ
T
~
F
(
~x
) +
L
(
~x,~u
(
t
)
,t
) (14)
where
~
λ
= [
λ
1
2
1
2
]
T
are the co-state functions (i.e.
momenta) associated with
~x
and
~
C
respectively. Assum-
ing that
~u
(
t
) is the optimal control for this problem,
with corresponding trajectory
~x
(
t
)
,
~
C
(
t
), the canonical
equations satisfy:
̇
x
i
(
t
) =
∂H
∂λ
i
(15)
̇
C
i
(
t
) =
∂H
∂μ
i
(16)
̇
λ
i
(
t
) =
∂H
∂x
i
(17)
̇
μ
i
(
t
) =
∂H
∂C
i
(18)
where
i
= (1
,
2). The corresponding boundary conditions
are:
~x
(0) =
~x
0
(19)
~
C
(0) = 0
,
~
C
(
T
) =
~
C
T
(20)
λ
i
(
T
) =
∂φ
(
~x
(
T
))
∂x
i
(
T
)
(21)
Then, at any point in time, the optimal control
~u
(
t
) will
minimize the control theory Hamiltonian:
~u
(
t
) = arg min
~u
(
t
)
H
(
~x
(
t
)
,
~
C
(
t
)
,
~
λ
(
t
)
,~u
(
t
))
(22)
The optimization problem becomes a two-point bound-
ary value problem (using (19)-(21)) with unknowns
(
λ
2
(0)
,x
2
(
T
)) whose solution gives rise to the optimal
trajectory
~x
(
t
) (from (15)) and the corresponding con-
trol
~u
(
t
) that produces it [34–37]. We choose our cost
function (13):
L
= 0;
φ
(
~x
(
T
)) =
x
1
(
T
)
,
(23)
and we solve this problem by standard numerical shoot-
ing type methods [37].
III. RESULTS
In this section we show the results of the adaptive opti-
mal control method to minimize and maximize aggression
at time
T
= 1, and then further at the end of multiple
cycles
t
=
nT
. Figure 3(a)-(i) shows the maximizing
(blue) and minimizing (red) trajectories for nine initial
conditions. The corresponding bang-bang schedules that
produce these trajectories are also shown in each case.
It is straighforward to prove that the optimal schedules
must be bang-bang since the controllers are linear in the
governing equations. In each case, we show the uncon-
trolled (dashed curve) Hawk-Dove trajectory, which ends
in between the maximizer and minimizer as expected.
Figure 4 shows the maximizing (blue) and minimizing
trajectories over
n
= 5 cycles. We obtain these adap-
tively, using the endpoint from the
n
th
cycle to compute
the optimal schedule for the following (
n
+ 1)
st
cycle.
Two special initial conditions are shown in figure 5. For
x
1
(0) = 0
.
08, the minimizing (red) trajectory shown in
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 15, 2021.
;
https://doi.org/10.1101/2021.08.15.456406
doi:
bioRxiv preprint
5
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
FIG. 3. Maximizing (blue) and minimizing (red) trajectories for nine initial conditions. Dashed curve shows the uncontrolled
Hawk-Dove trajectory which lands in between the max and min at
T
= 1. Dark (light) blue bar shows
u
1
= 1 (
u
2
= 1),
white bar shows
u
1
=
1 (
u
2
=
1) associated with the maximizing control schedule; Dark (light) red bar shows
u
1
= 1
(
u
2
= 1), white bar shows
u
1
=
1 (
u
2
=
1) associated with the minimizing control schedule. All schedules are bang-bang.
(a)
x
1
(0) = 0
.
01; (b)
x
1
(0) = 0
.
05; (c)
x
1
(0) = 0
.
1; (d)
x
1
(0) = 0
.
3; (e)
x
1
(0) = 0
.
5; (f)
x
1
(0) = 0
.
7; (g)
x
1
(0) = 0
.
9; (h)
x
1
(0) = 0
.
95; (i)
x
1
(0) = 0
.
99.
figure 5(a) ends at
x
1
(1) = 0
.
08, hence is periodic. This
value (and corresponding schedule) corresponds to an ab-
solute minimizer
x
min
1
for aggression
x
1
. By contrast, for
x
1
(0) = 0
.
79 shown in figure 5(b), the maximizing (blue)
trajectory ends at
x
1
(1) = 0
.
79, hence is periodic. This
value (and the corresponding schedule) corresponds to
an absolute maximizer
x
max
1
for aggression. These two
special initial conditions are shown in figure 6 over
n
= 5
cycles confirming the periodicity of the minimizing tra-
jectory (red) in figure 6(a) and the periodicity of the max-
imizing (blue) trajectory in figure 6(b). The sequence
of games that the system cycles through to achieve the
minimizing sequence is shown in figure 7, while the maxi-
mizing sequence is shown in figure 8. These are obtained
from eqn (3) and the four equations:
1.
u
1
= 1;
u
2
= 1:
̇
x
1
=
x
1
(1
x
1
)(5
13
x
1
)
2.
u
1
= 1;
u
2
=
1:
̇
x
1
=
x
1
(1
x
1
)(
3
5
x
1
)
3.
u
1
=
1;
u
2
= 1:
̇
x
1
=
x
1
(1
x
1
)(5
x
1
)
4.
u
1
=
1;
u
2
=
1:
̇
x
1
=
x
1
(1
x
1
)(
3 + 7
x
1
).
In figure 9 we show the minimizing values and maxi-
mizing values of
x
1
(
T
) vs.
x
1
(0) through the full range
0
x
1
(0)
1. Notice that at the endpoints, the two
values converge (since for the linear system, the sched-
ule does not matter, only the total
~
C
T
). Figure 9(b)
shows the ratio
x
1
(
T
)
/x
1
(0)
1 (percentage increase or
decrease) vs. initial condition
x
1
(0) through the full
range 0
x
1
(0)
1. When the maximizing (blue) curve
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted August 15, 2021.
;
https://doi.org/10.1101/2021.08.15.456406
doi:
bioRxiv preprint