of 9
PHYSICAL REVIEW E
105
, 014412 (2022)
Optimal dynamic incentive scheduling for Hawk-Dove evolutionary games
K. Stuckey
*
Department of Aerospace & Mechanical Engineering, University of Southern California, Los Angeles, California 90089-1191, USA
R. Dua
Department of Mathematics, University of Southern California, Los Angeles, California 90089-1191, USA
Y. M a
Department of Physics & Astronomy, University of Southern California, Los Angeles, California 90089-1191, USA
J. Parker
§
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, USA
P. K. Newton

Department of Aerospace & Mechanical Engineering, Mathematics, and The Ellison Institute, University of Southern California,
Los Angeles, California 90089-1191, USA
(Received 21 September 2021; accepted 5 January 2022; published 21 January 2022)
The Hawk-Dove evolutionary game offers a paradigm of the trade-offs associated with aggressive and passive
behaviors. When two (or more) populations of players compete, their success or failure is measured by their
frequency in the population, and the system is governed by the replicator dynamics. We develop a time-dependent
optimal-adaptive control theory for this dynamical system in which the entries of the payoff matrix are dynam-
ically altered to produce control schedules that minimize and maximize the aggressive population through a
finite-time cycle. These schedules provide upper and lower bounds on the outcomes for all possible strategies
since they represent two extremizers of the cost function. We then adaptively extend the optimal control schedules
over multiple cycles to produce absolute maximizers and minimizers for the system.
DOI:
10.1103/PhysRevE.105.014412
I. INTRODUCTION
The Hawk-Dove game (also known as the Chicken or
Snowdrift game) is a game-theoretic paradigm for studying
the conflict between players (or populations of players) who
use two opposing strategies: aggressive (Hawks) and pas-
sive (Doves). One way of framing the conflict is to consider
competition in the animal world where two different species
compete for a limited resource [
1
4
]. With no Hawks in the
population, Doves will share the resources and avoid conflict.
With no Doves, the Hawks will fight with each other for
resources, taking the risk of injury or death. If Hawks are
present in large enough numbers, the Doves will flee without
fighting. A sufficient fraction of Doves, on the other hand, can
cooperate and expel the Hawks from the population, thereby
protecting the resource [
5
]. The challenge is to find conditions
for the stable coexistence of the two opposing populations. In
the context of military conflicts, the game is framed as the
game of chicken, thought of as a situation in which two drivers
*
kstuckey@usc.edu
rajvirdu@usc.edu
yongqiam@usc.edu
§
joep@caltech.edu

Corresponding author: newton@usc.edu
head towards each other in a single lane trying not to be the
first to swerve away (Doves), each mindful of the fact that if
neither swerves (Hawks), both will die. Key to this game is
that the cost of losing is greater than the value of winning.
Versions of this (static) game have been analyzed and used
extensively in political science communities to study strate-
gies associated with the problem of nuclear brinkmanship [
6
].
In this setup, the payoffs are fixed, and the interactions unfold
based on the cost-benefit balance determined by these payoffs.
In a more complicated setting, one might want to mea-
sure repeated interactions in populations of competitors,

x
=
(
x
1
,
x
2
)
T
R
2
, where winning and losing is reinforced by
the relative frequencies of the two competing populations
(frequency-dependent selection as in Darwinian evolution).
For this, the replicator dynamical system is commonly used
[
7
9
],
̇
x
i
=
x
i
(
(
A

x
)
i

x
T
(
A

x
)
)
(
i
=
1
,
2)
,
(1)
with
x
1
+
x
2
=
1, 0

x
1

1, 0

x
2

1, where each vari-
able has the interpretation of frequency in the population
or the alternative interpretation as a probability of pick-
ing a member of one of the two subgroups randomly. It is
sometimes useful to also think of the variables

x
=
(
x
1
,
x
2
)
T
as strategies (heritable traits) that evolve, with the most
successful strategy dominating, as in the context of Darwinian
evolution [
4
] by natural selection. Here,
A
is the 2
×
2 payoff
2470-0045/2022/105(1)/014412(9)
014412-1
©2022 American Physical Society
STUCKEY, DUA, MA, PARKER, AND NEWTON
PHYSICAL REVIEW E
105
, 014412 (2022)
matrix, (
A

x
)
i
is the fitness of population
i
, and

x
T
(
A

x
)isthe
average fitness of both populations, so
x
i
in (
1
) drives growth
(instantaneously) at times when the species population
i
is
above the population average and decay at times when it falls
below the system average. Cumulative growth or decay over
a time
T
is governed by
T
0
[(
A

x
)
i

x
T
(
A

x
)]
dt
which can
either be positive or negative depending on the proportion of
time spent above or below the system average. The fitness
functions in (
1
) are said to be population dependent (selection
pressure is imposed by the mix of population frequencies)
and determine the growth or decay of each subpopulation.
Because of this, these equations are also used extensively in
the reinforcement learning community where success begets
success and failure leads to a downward spiral of frequency in
the population [
10
].
Using the standard Hawk-Dove payoff matrix [
5
],
A
=
[
a
11
a
12
a
21
a
22
]
=
[
31
50
]
,
(2)
where the population
x
1
are the Hawks (aggressive), and
x
2
are the Doves (passive), and the strict Nash equilibrium

x
(
x
1
,
x
2
)isthemixedstate
x
1
=
1
3
,
x
2
=
2
3
since

x
T
A

x
>

x
T
A

x
for all

x

=

x
. This implies that the mixed state is also
an evolutionary stable state (ESS) of the replicator system (
1
)
as discussed in Ref. [
11
]. It is also useful to uncouple the two
variables in (
1
) and write a single equation for the aggressor
population frequency
x
1
:
̇
x
1
=
x
1
(1
x
1
)
[
(
A

x
)
1
(
A

x
)
2
]
=
x
1
(1
x
1
)
{
(
a
12
a
22
)
+
[(
a
11
a
21
)
(
a
12
a
22
)]
x
1
}
.
(3)
Note also that a single equation for the passive population
x
2
is easily obtained using the change of variable
x
1
=
1
x
2
in
Eq. (
3
).
The question we address in the paper is whether it is
possible to alter the entries in the payoff matrix
A
in a time-
dependent fashion (dynamic incentives) in order to optimally
achieve some predetermined goal (such as minimizing aggres-
sion) at the end of fixed time
T
. Dynamically altering the
entries of a payoff matrix in an evolutionary game setting
has only recently been studied by coupling the entries, for
example, to a system that represents an external environment
[
12
,
13
]. One can easily imagine different scenarios in which
this might be useful. In the context of nuclear brinkmanship,
for example, is it possible to alter the payoff incentives dy-
namically in order to achieve a goal [
6
] that would not be
achievable with fixed payoffs? Is it possible to offer dynamic
economic incentives that optimize some desired outcome
across a population of participants [
14
,
15
]? In this context,
one can fruitfully think of the problem as one in optimal
control theory where in real time, the government either turns
on or turns off various economic incentives, such as adjusting
interest rate policies or implementing targeted changes to con-
trol inflationary pressure. In the context of evolutionary game
theory, this translates into introducing time-dependent control
schedules that can alter the punishment-reward structure of
the system by explicitly controlling the entries of the payoff
matrix.
Can one optimally design time-dependent incentive sched-
ules of rewards/punishments to compel groups of people to
get vaccinated [
16
]? For coevolving microbial populations, is
it possible to dynamically schedule selective antibiotic agents
in order to
steer
the evolutionary trajectory in an advanta-
geous direction [
17
,
18
], or even reverse antibiotic resistance,
or in the context of scheduling chemotherapy treatments, is
it possible to design schedules optimally that make best use
of the chemotherapy agents administered in order to delay
chemotherapeutic resistance [
8
,
9
,
19
21
]? Control theory is
increasingly being used in a wide range of biological ap-
plications [
21
27
] but to date has not been systematically
implemented in the context of evolutionary games as far as
we know, aside from Refs. [
8
,
9
,
21
,
28
].
One interesting evolutionary context where an apparent
Hawk-Dove scenario may require attainment of a quasistable
equilibrium condition is during the evolution of symbiotic
relationships in which one partner is aggressive or preda-
tory. For example, hostile colonies of eusocial insects, such
as ants and termites, are plagued by a diversity of solitary
arthropods that have evolved to infiltrate the social system
and parasitize the nest [
30
,
31
]. The majority of such para-
sitic species evolved from free-living ancestors without any
behavioral specialization [
32
,
33
]. It follows that the initial
steps in establishing the symbiosis were contingent on these
free-living species (the Doves) entering into equilibrium with
their aggressive eusocial hosts (the Hawks). This equilibrium,
once attained, may have provided an essential, permissive
stepping stone to evolving the essential adaptive traits—such
as social behaviors and pheromonal mimicry—that facilitate
social parasitism [
32
].
To address these and related types of settings, we develop a
mathematical framework to determine time-dependent incen-
tive schedules for altering the payoff entries of a Hawk-Dove
evolutionary game in such a way as to (i) maximize aggression
at the end of time
T
, and (ii) minimize aggression at the end of
time
T
. By considering the bang-bang schedules that produce
these upper and lower bounds on the competing frequencies,
we can conclude that any alternative payoff schedule will
produce a result that lies somewhere between the two bounds
as each are extremizers of a cost function associated with the
Pontryagin maximum (minimum) principle. We then extend
the time period to time
nT
(
n
=
1
,...,
5) by using an adap-
tive control method that adjusts the schedule in the (
n
+
1)st
window based on the ending frequency values from the
n
th
window. The schedules produced drive aggression down to
an absolute minimum (
x
min
1
), or drive it up to an absolute
maximum (
x
max
1
), which are functions of the cycle time
T
.
These values provide absolute lower and upper bounds on
opposing behavior strategies in an evolutionary setting.
II. OPTIMAL CONTROL THEORY FOR THE
REPLICATOR DYNAMICAL SYSTEM
To implement an optimal dynamic incentive strategy, we
consider the time-dependent system
A
=
[
a
11
a
12
a
21
a
22
]
=
A
0
+
A
1
(
t
)(4)
014412-2
OPTIMAL DYNAMIC INCENTIVE SCHEDULING FOR ...
PHYSICAL REVIEW E
105
, 014412 (2022)
=
[
31
50
]
+
[
04
u
2
(
t
)
6
u
1
(
t
)0
]
(5)
=
[
31
+
4
u
2
(
t
)
5
+
6
u
1
(
t
)0
]
,
(6)
where
A
1
(
t
) represents our control with entries in the off-
diagonal terms, and
A
0
is the baseline Hawk-Dove payoff
matrix. The time-dependent functions

u
(
t
)
=
[
u
1
(
t
)
,
u
2
(
t
)]
R
2
are bounded above and below,
1

u
1
(
t
)

1,
1

u
2
(
t
)

1 and have a range (
3

a
12

5;
1

a
21

11)
that allows us to traverse the plane along any path depicted in
redinFig.
1
, starting in the Hawk-Dove zone in the uncon-
trolled (
u
1
=
0;
u
2
=
0) case which is shown in Fig.
2
in the
phase plane [Fig.
2(a)
] and the frequency plane [Fig.
2(b)
].
Traversing this plane amounts to dynamically sampling the
five reciprocity mechanisms allowable in 2
×
2 games, dis-
cussed in the context of social dilemmas in Refs. [
34
,
35
].
Dynamically altering the entries of the payoff matrix allows
the freedom to choose different mechanisms to gain advantage
over some prescribed finite time. Note that we have chosen
the coefficients and bounds on the controllers to ensure that
all regions of the plane in Fig.
1
are accessible. The ESS for
the uncontrolled case is
x
1
=
1
/
3. The control path chosen,
and the time parametrization 0

t

T
determines both the
sequence of games being played as well as the switching times
(the times at which the path crosses over from one region to
the next) between games. We denote the total control output
FIG. 1. Twelve regions in the off-diagonal (
a
12
,
a
21
)plane[
29
]
(delineated by dashed lines) define which game is being played.
We choose
a
22
at the origin (without loss). Starting at
t
=
0inthe
Hawk-Dove square, indicated by the initial large (red) dot, what are
the paths that minimize and maximize aggression at time
t
=
T
?
The solid (red) curve is one example of a path that crosses into four
different regions of the plane.
FIG. 2. Dynamics of the uncontrolled (
u
1
=
0,
u
2
=
0) Hawk-
Dove evolutionary game. (a) Phase portrait associated with the
aggressor population
x
1
. Both Hawk and Dove dominance (
x
1
=
1
,
0)
are unstable fixed points, while the mixed state
x
1
=
1
/
3isthe
evolutionarily stable strategy (ESS). (b) Hawk dynamics for various
initial conditions.
T
=
1 is the end of one control cycle and also the
linear growth rate of the Hawk-Dove system.

C
R
2
,

C
(
t
)
=
[
C
1
(
t
)
,
C
2
(
t
)]
=
t
0

u
(
t
)
dt
,
(7)
with total output delivered in time
t
, then
̇

C
(
t
)
=

u
(
t
)(8)
and

C
(0)
=
0
,
(9)

C
(
T
)
=
T
0

u
(
t
)
dt
=

C
T
,
(10)
where
T
denotes a final time in which we implement the
control over one cycle. We consider

C
T
as a constraint on
the optimization problem, with

C
T
=
(0
,
0), and our goal is
to first find schedules that minimize and maximize aggression
(
x
1
) at the end of one cycle
t
=
T
subject to this constraint.
For the uncontrolled case, we know
x
1
1
/
3as
t
→∞
and we compare the controlled cases with the uncontrolled
014412-3
STUCKEY, DUA, MA, PARKER, AND NEWTON
PHYSICAL REVIEW E
105
, 014412 (2022)
FIG. 3. Maximizing (solid blue) and minimizing (dashed red) trajectories for nine initial conditions. Also shown is the (dashed black) curve
for the uncontrolled Hawk-Dove trajectory which lands in between the maximum and minimum at
T
=
1. The top bar (dark blue) corresponds
to
u
1
=
1, with white indicating
u
1
=−
1. The second bar (light blue) corresponds to
u
2
=
1, with white indicating
u
2
=−
1 associated with
the maximizing control schedule. The third bar (dark red) corresponds to
u
1
=
1, with the white bar indicating
u
1
=−
1. The fourth bar (light
red) corresponds to
u
2
=
1, with white indicating
u
2
=−
1 associated with the minimizing control schedule. All schedules are bang-bang.
(a)
x
1
(0)
=
0
.
01, (b)
x
1
(0)
=
0
.
05, (c)
x
1
(0)
=
0
.
1, (d)
x
1
(0)
=
0
.
3, (e)
x
1
(0)
=
0
.
5, (f)
x
1
(0)
=
0
.
7, (g)
x
1
(0)
=
0
.
9, (h)
x
1
(0)
=
0
.
95, and (i)
x
1
(0)
=
0
.
99.
case, both satisfying the constraint. Also notice that the linear
growth rate in (
3
)is(
a
12
a
22
)
=
1
0
=
1, so we scale
T
the same way in our computations, as
T
=
1. We then perform
the optimization adaptively over multiple cycles
nT
using the
end value of cycle
nT
as the initial condition to compute the
optimal schedule for the (
n
+
1)st cycle. Using this method,
we are able to identify absolute maximizers and minimizers
as a function of the cycle time
T
.
A. Optimal control formulation
A standard form for implementing the Pontryagin maxi-
mum (minimum) principle with boundary value constraints is

X
=
[

x
(
t
)
,

C
(
t
)]
T
,

X
R
4
,
(11)
̇

X
=

F
(

X
)
=
[
̇

x
,
̇

C
(
t
)]
T
,

F
:
R
4
R
4
,
(12)
where one would like to minimize or maximize a general cost
function:
T
0
L
(

x
(
t
)
,

u
(
t
)
,
t
)
dt
+
φ
(

x
(
T
)
)
.
(13)
In our case, we are only interested in minimizing the
terminal value
φ
(

x
(
T
)
)
x
1
(
T
), which is called a classical
Mayer problem (specifically choosing
L
=
0 and optimiz-
ing the terminal value, developed in the context of missile
guidance problems where the final distance from the target
is optimized) discussed in detail in Sec. 13.3, and together
with the Pontryagin principle in Sec. 14.6 of Ref. [
36
]. The
controllers, of course, still play an important role as they enter
both the payoff matrix through Eq. (
6
) as well as the constraint
Eq. (
7
).
Since the method is standard, we will just briefly describe
the basic framework and refer readers to Refs. [
37
41
]for
more details on how to implement the approach. Following
014412-4
OPTIMAL DYNAMIC INCENTIVE SCHEDULING FOR ...
PHYSICAL REVIEW E
105
, 014412 (2022)
FIG. 4. Maximizing (solid blue) and minimizing (dashed red) trajectories for nine initial conditions over
n
=
5 cycles, with solid
(blue)/dashed (red) curves joining the end values after each cycle. The adaptive schedule for the (
n
+
1)st cycle is calculated based on
the endpoint of the
n
th cycle. The dashed black curve shows the uncontrolled Hawk-Dove trajectory. (a)
x
1
(0)
=
0
.
01, (b)
x
1
(0)
=
0
.
05,
(c)
x
1
(0)
=
0
.
1, (d)
x
1
(0)
=
0
.
3, (e)
x
1
(0)
=
0
.
5, (f)
x
1
(0)
=
0
.
7, (g)
x
1
(0)
=
0
.
9, (h)
x
1
(0)
=
0
.
95, and (i)
x
1
(0)
=
0
.
99.
Ref. [
40
] in particular (see p. 62, Theorem 4.2.1), we construct
the control theory Hamiltonian,
H
(

x
(
t
)
,

C
(
t
)
,

λ,

u
(
t
)
)
=

λ
T

F
(

x
)
,
(14)
where

λ
=
[
λ
1
2
1
2
]
T
are the costate functions (i.e.,
momenta) associated with

x
and

C
, respectively. Assum-
ing that

u
(
t
) is the optimal control for this problem, with
corresponding trajectory

x
(
t
)
,

C
(
t
), the canonical equa-
tions satisfy
̇
x
i
(
t
)
=
H
∂λ
i
,
(15)
̇
C
i
(
t
)
=
H
∂μ
i
,
(16)
̇
λ
i
(
t
)
=−
H
x
i
,
(17)
̇
μ
i
(
t
)
=−
H
C
i
,
(18)
where
i
=
(1
,
2). The corresponding boundary conditions are

x
(0)
=

x
0
,
(19)

C
(0)
=
0
,

C
(
T
)
=

C
T
,
(20)
λ
i
(
T
)
=
∂φ
(

x
(
T
)
)
x
i
(
T
)
.
(21)
Then, at any point in time, the optimal control

u
(
t
) will
minimize the control theory Hamiltonian:

u
(
t
)
=
arg min

u
(
t
)
H
(

x
(
t
)
,

C
(
t
)
,

λ
(
t
)
,

u
(
t
)
)
.
(22)
The optimization problem becomes a two-point bound-
ary value problem [using (
19
)–(
21
)] with unknowns
[
λ
2
(0)
,
x
2
(
T
)] whose solution gives rise to the optimal tra-
jectory

x
(
t
)[from(
15
)] and the corresponding control

u
(
t
)
that produces it [
37
40
]. We solve this problem by standard
numerical shooting-type methods [
40
]. The result is that the
optimal controllers follow a bang-bang schedule, taking on
014412-5
STUCKEY, DUA, MA, PARKER, AND NEWTON
PHYSICAL REVIEW E
105
, 014412 (2022)
FIG. 5. Maximizing (solid blue) and minimizing (dashed red)
trajectories for the two special initial conditions
x
1
(0)
=
0
.
08
,
0
.
79.
For the larger initial condition, the maximizing schedule (top two
bars in blue) produces an absolute maximum
x
max
1
for
T
=
1. For the
smaller initial condition, the minimizing schedule (bottom two bars
in red) produces an absolute minimum
x
min
1
for
T
=
1. The dashed
(black) curve shows the uncontrolled Hawk-Dove trajectory. The top
bar (dark blue) shows
u
1
=
1, and the white bar shows
u
1
=−
1.
The second bar (light blue) shows
u
2
=
1, and the white bar shows
u
2
=−
1 associated with the maximizing control schedule. The third
(dark red) bar shows
u
1
=
1, and the white bar shows
u
1
=−
1. The
fourth (light red) bar shows
u
2
=
1, and the white bar shows
u
2
=−
1
associated with the minimizing control schedule. (a)
x
1
(0)
=
0
.
08,
and (b)
x
1
(0)
=
0
.
79.
only the extreme values
+
1or
1, and not values throughout
the interval [
1
,
1].
III. RESULTS
In this section we show the results of solving the adaptive
optimal control method to minimize and maximize aggres-
sion at time
T
=
1, and then further at the end of multiple
cycles
t
=
nT
. Figures
3(a)–3(i)
show the maximizing (blue)
and minimizing (red) trajectories for nine initial conditions.
The corresponding bang-bang schedules that produce these
trajectories are also shown in each case. It is straightforward to
prove that the optimal schedules must be bang-bang since the
FIG. 6. Maximizing (solid blue) and minimizing (dashed red)
trajectories for the two special initial conditions
x
1
(0)
=
0
.
08
,
0
.
79
over
n
=
5 cycles. Notice that the minimizing trajectory (dashed
red) shown in (a) exactly repeats for each cycle [
x
1
(0)
=
x
1
(
T
)]
since the schedule is an absolute minimizer, while the maximizing
trajectory (solid blue) shown in (b) exactly repeats for each cycle
[
x
1
(0)
=
x
1
(
T
)] since the schedule is an absolute maximizer. The
dashed (black) curve shows the uncontrolled Hawk-Dove trajectory.
(a)
x
1
(0)
=
0
.
08. The dashed (red) horizontal line indicates an abso-
lute minimizer at
x
min
1
.(b)
x
1
(0)
=
0
.
079. The solid (blue) horizontal
line indicates an absolute maximizer at
x
max
1
.
controllers are linear in the governing equations. In each case,
we show the uncontrolled (dashed curve) Hawk-Dove trajec-
tory, which ends in between the maximizer and minimizer as
expected.
Figure
4
shows the maximizing (blue) and minimizing
trajectories over
n
=
5 cycles. We obtain these adaptively,
using the endpoint from the
n
th cycle to compute the optimal
schedule for the following (
n
+
1)st cycle. Two special initial
conditions are shown in Fig.
5
.For
x
1
(0)
=
0
.
08, the minimiz-
ing (red) trajectory shown in Fig.
5(a)
ends at
x
1
(1)
=
0
.
08,
hence is periodic. This value (and corresponding schedule)
corresponds to an absolute minimizer
x
min
1
for aggression
x
1
.
By contrast, for
x
1
(0)
=
0
.
79 shown in Fig.
5(b)
, the maximiz-
ing (blue) trajectory ends at
x
1
(1)
=
0
.
79, hence is periodic.
This value (and the corresponding schedule) corresponds
to an absolute maximizer
x
max
1
for aggression. These two
014412-6
OPTIMAL DYNAMIC INCENTIVE SCHEDULING FOR ...
PHYSICAL REVIEW E
105
, 014412 (2022)
FIG. 7. Minimizing sequence of four games (Deadlock; Leader; Prisoner’s Dilemma; Game 9) associated with initial condition
x
1
(0)
=
0
.
08 that produces the absolute minimizer. Open dots show the starting point, and the solid dot shows the ending point.
FIG. 8. Maximizing sequence of four games (Prisoner’s Dilemma; Leader; Deadlock; Game 9) associated with initial condition
x
1
(0)
=
0
.
79 that produces the absolute maximizer. Open dots show the starting point, and the solid dot shows the ending point.
014412-7