of 7
Risk-aware motion planning for automated vehicle among
human-driven cars
Jin I. Ge
1
, Bastian Sch
̈
urmann
2
, Richard M. Murray
1
, and Matthias Althoff
2
Abstract
— We consider the maneuver planning problem for
automated vehicles when they share the road with human-
driven cars and interact with each other using a finite set
of maneuvers. Each maneuver is calculated considering input
constraints, actuator disturbances and sensor noise, so that
we can use a maneuver automaton to perform high-level
planning that is robust against low-level effects. In order to
model the behavior of human-driven cars in response to the
intent of the automated vehicle, we use control improvisation
to build a probabilistic model. To accommodate for potential
mismatches between the learned human model and human
driving behaviors, we use a conditional value-at-risk objective
function to obtain the optimal policy for the automated vehicle.
We demonstrate through simulations that our motion planning
framework allows an automated vehicle to exploit human
behaviors with different levels of robustness.
I. I
NTRODUCTION
High-level motion planning for automated vehicles has
seen significant improvements since the 2004 DARPA Grand
Driving Challenge, and a multitude of methods have been
proposed for automated agents to reach a target while
avoiding the surrounding obstacles [1]. While reach-avoid
problems formulated in continuous time and space can
easily incorporate low-level effects like input constraints and
disturbances, the corresponding numerical solutions often
suffer from high computational loads. This prompted the use
of motion primitives to shift a large portion of computation
costs offline [2]. In particular, when motion primitives are
designed with consideration of low-level dynamics, they can
provide a formal guarantee ensuring robustness in high-level
motion planning [3], [4].
While discrete planners based on graph search (e.g., A*
algorithm) and sampling (e.g., RRT*) have been successfully
used in self-driving demonstrations, reach-avoid problems
remain challenging in dynamic environment, where a high
level of uncertainty can arise due to interaction with human-
driven cars on road. Previous studies often simplify such
behavior uncertainties by assuming human drivers maintain
constant speed or follow given trajectories. As a result,
the corresponding planning strategy often has difficulties to
adapt to aggressive or friendly human behaviors, and thus
tends to be overly defensive in order to ensure overall safety.
*This work is supported by NSF VeHiCal project (Grant Number
1545126) and by the European Commission UnCoVerCPS project (Grant
Number 643921).
1
Department of Computing and Mathematical Sciences, California Insti-
tute of Technology, Pasadena, CA 91107 USA.
jge@caltech.edu,
murray@cds.caltech.edu
2
Department of Informatics, Technische Universit
̈
at M
̈
unchen, Boltz-
mannstr. 3, 85748 Garching, Germany.
bastian.schuermann@tum.de, althoff@tum.de
However, when the interaction between the automated
agent and humans is explicitly modeled, the automated agent
is able to employ far less conservative strategies [5]. Such
explicit human models are crucial for the implementation of
automated vehicles in human-dominated traffic. While we
may describe human decision-making concisely using cost
functions obtained through inverse reinforcement learning,
studies have pointed out that human decisions are often not
optimal due to limited rationality [6]. Therefore, instead
of using deterministic human models that follow optimal
policies under given cost functions, we may consider prob-
abilistic models that are able to regenerate diverse human
behaviors and reflect human preferences through underlying
probability distributions. To achieve this, we choose the
control improvisation technique [7]–[9] that allows us to
explicitly formulate and validate such probabilistic specifi-
cations.
Since the sojourn time of a human-driven vehicle around
the automated car can be too short to collect sufficient
amount of data for behavior learning, the cost function and
probabilistic preferences of a human driver often rely heavily
on a-priori parameters trained with ensemble driver data.
Therefore, it is desirable to allow the automated vehicle
to adjust its optimal strategy based on its confidence level
regarding the accuracy of probabilistic human models. In
particular, we consider a conditional value-at-risk objective
function [10], which quantifies an upper bound on the
mismatch of the probabilistic human model. Further, optimal
policies under various modeling errors can be obtained
through risk-aware decision-making [11].
The remainder of this paper is organized as follows: We
first describe the construction of an action set for both
human-driven and automated cars using motion primitives.
Then, we describe the probabilistic human model using
control improvisation, where the cost function can be learned
from inverse reinforcement learning. Next, we formulate the
reach-avoid problem into a Markov decision tree problem
and obtain the robust optimal policy based on conditional
value-at-risk objective. Finally, we present several case stud-
ies and conclude with a summary and an outlook.
II. C
ONSTRUCTING MANEUVERS THROUGH MOTION
PRIMITIVES
In this section, we construct vehicle maneuvers using
motion primitives, in order to formulate the motion plan-
ning problem in terms of Markov decision processes. For
Submitted, 2019 American Control Conference (ACC)
https://www.cds.caltech.edu/~murray/preprints/gsma19-acc_s.pdf
simplicity, we model each vehicle by a kinematic model
̇
p
x
=
v
cos
(
θ
)
̇
p
y
=
v
sin
(
θ
)
̇
v
=
u
1
+
w
1
̇
θ
=
u
2
+
w
2
(1)
where
p
x
and
p
y
are the longitudinal and lateral positions
for the car’s center of mass,
v
is its velocity, and
θ
is its
orientation.
u
1
and
u
2
are the acceleration and steering input,
respectively, with corresponding additive disturbances
w
1
and
w
2
. In (1), we consider bounded uncertainty
w
= [
w
1
,
w
2
]
W
and limited input
u
= [
u
1
,
u
2
]
U
. We denote the state vector
by
x
= [
p
x
,
p
y
,
v
,
θ
]
, and we denote a possible solution to
(1) under an input trajectory
u
(
·
)
, an initial state
x
(
0
)
, and a
disturbance trajectory
w
by
x
(
t
|
u
(
·
)
,
x
(
0
)
,
w
)
.
Figure 4 shows an example of the motion planning
problem where the automated car (blue) moves from an
initial state A to the final state D. Such a task can be
planned by concatenating a sequence of lane-following and
lane-change maneuvers. Different maneuver sequences such
as
(
AC
2
,
C
2
D
)
,
(
AB
1
,
B
1
C
1
,
C
1
D
)
, and
(
AB
2
,
B
2
D
)
may be
chosen based on the motion of human-driven cars nearby
(red and grey cars). Therefore, we define a motion primitive
as a tuple
μ
=
(
v
0
,
x
f
,
S
0
,
u
(
·|
x
(
0
)
,
w
)
,
R
[
0
,
t
]
(
S
0
,
W
)
)
(2)
where
t
is the length of time for the motion primitive,
v
0
is the initial speed,
x
f
is the desired end state at
t
, and
u
(
·|
x
(
0
)
,
w
)
is an input generator given the initial state
x
(
0
)
and disturbance
w
. The initial set
S
0
= [
e
x
,
e
x
]
×
[
e
y
,
e
y
]
×
[
v
0
e
v
,
v
0
+
e
v
]
×
[
e
θ
,
e
θ
]
,
(3)
describes the state uncertainty around the nominal starting
point
(
0
,
0
,
v
0
,
0
)
in the local coordinate system, where
e
x
,
e
y
,
e
v
, and
e
θ
are the bounds for position, speed and orien-
tation measurement noise. Based on the end state
x
f
and
time horizon
t
, we may obtain the nominal trajectory and
nominal input for the motion primitive, as shown by the black
curves in Fig. 2. However, given the bounded disturbances
and uncertainties, in order to ensure robustness around the
nominal trajectory, we need to obtain an input generator
u
(
t
|
x
(
0
)
,
w
)
for
t
[
0
,
t
]
, with its corresponding reachable
set over-approximated by
R
[
0
,
t
]
(
S
0
,
W
)
{
x
(
t
|
u
(
·
)
,
x
(
0
)
,
w
)
t
[
0
,
t
]
,
x
(
0
)
S
0
,
w
W
}
,
(4)
where
u
(
·
)
is the input trajectory generated using (2). Based
on the reachable set (4), we define the occupancy set
O
[
0
,
t
]
(
S
0
,
W
)
{
Γ
(
x
)
|
x
R
[
0
,
t
]
(
S
0
,
W
)
}
,
(5)
where the function
Γ
(
x
)
:
R
4
P
(
R
2
)
maps the vehicle’s
state
x
to its occupancy area, where
P
(
R
2
)
denotes the
power set in the
(
p
x
,
p
y
)
-plane, see [3], [4] for details.
Note that compared with classic motion primitive settings
such as [2], the occupancy calculation (5) under disturbances
-40
-20
0
20
40
60
80
100
120
0
2
4
AB
1
B
2
C
2
C
1
D
p
x
[m]
p
y
[m]
Fig. 1.
Hierarchical motion planning for the automated car (blue)
by concatenating lane-following and lane-changing maneuvers. Different
sequences such as
(
AC
2
,
C
2
D
)
,
(
AB
1
,
B
1
C
1
,
C
1
D
)
, and
(
AB
2
,
B
2
D
)
may be
chosen based on the motion of human-driven cars nearby (red and grey).
0
20
40
-1
-0.5
0
0.5
1
1.5
2
15
20
-0.2
-0.1
0
0.1
0.2
p
x
[m]
p
y
[m]
v
[m/s]
θ
[rad]
Fig. 2.
A motion primitive used in left-turning maneuvers, see the blue
curves in Fig. 1. The nominal trajectories are shown in solid curves while
the shaded areas mark the reachable set
R
[
0
,
t
]
(
S
0
,
W
)
. The green and red
rectangles mark the initial and final sets, respectively.
and initial uncertainties ensures the robustness of high-
level motion planning against low-level effects. In particular,
concatenation of
μ
A
with
μ
B
is possible if the final set of
μ
A
is a subset of the initial set of
μ
B
under a translational
and rotational transformation
F
A
(
·
)
, that is,
S
A
f
(
S
A
0
,
W
)
F
A
(
S
B
0
)
,
(6)
where the over-approximation of the motion primitive’s end
set
S
f
(
S
0
,
W
)
{
x
(
t
|
u
(
·
)
,
x
(
0
)
,
w
)
x
(
0
)
S
0
,
w
W
}
.
(7)
As an example, we show in Fig. 2 a motion primitive with
time period
t
=
2 [s], initial speed
v
0
=
16 [m/s], and desired
end state
x
f
=
[
28
[
m
]
,
1
.
8
[
m
]
,
12
[
m
/
s
]
,
0
.
14
[
rad
]
]
with input
constraints and disturbance bounds
u
1
[
4
,
4
][
m
/
s
2
]
,
u
2
[
0
.
4
,
0
.
4
][
rad
/
s
]
,
w
1
[
0
.
5
,
0
.
5
][
m
/
s
2
]
,
w
2
[
0
.
02
,
0
.
02
][
rad
/
s
]
.
(8)
The corresponding reachable sets are shaded grey in the
(
p
x
,
p
y
)
-plane and
(
v
,
θ
)
-plane, while the corresponding
nominal trajectories are plotted in dotted curves. The initial
set
S
0
and the final set
S
f
(
S
0
,
W
)
are marked by green and red
rectangles, respectively. Note that this motion primitive and
its orientation-wise mirror primitive constitute a lane-change
maneuver in Fig. 1 (i.e., AC
2
, B
1
C
1
, B
2
D).
Therefore, a maneuvers
a
can be defined for an automated
or human-driven vehicle as a concatenation of motion prim-
itives, that is,
a
= (
μ
1
,...,
μ
N
μ
)
,
(9)
0
10
20
30
40
50
60
70
80
-5
0
5
p
x
[m]
p
y
[m]
Fig. 3.
One lane-keeping maneuver and four lane-changing maneuvers,
where each maneuver consists of two motion primitives.
p
x
[m]
p
y
[m]
x
1
e
x
2
e
x
r
x
h
S
g
Fig. 4.
Example of motion planning in a dynamic environment consisting
of one automated car (blue), one interactive car (red), and two non-
interactive cars (green and grey). While both interactive and non-interactive
cars are human-driven, the latter does not respond to the motion of the
automated car and is not included in the interactive framework. In order to
move to the left lane, the automated car may choose to merge in front or
behind the red vehicle, as marked by the blue and red arrows.
where
N
μ
is the maximum number of motion primitives a
maneuver contains, and
μ
j
1
and
μ
j
satisfy (6) for
j
=
2
,...,
N
μ
. While lane-keeping maneuvers on straight roads
may have
N
μ
=
1, lane-changing or lane-keeping maneuvers
on curvy roads often have
N
μ
>
1. For simplicity, in this
paper we set
N
μ
=
2 for all maneuvers and assume the end
orientation of each maneuver to be aligned with the direction
of the road, as shown in Fig. 3.
III. M
ANEUVER AUTOMATA MODELS
In this section, we describe the dynamics of both automat-
ed and human-driven vehicles, see Fig. 4. While the automat-
ed car is the only agent in the system that is controllable, we
categorize the human-driven vehicles into the non-interactive
cars and the interactive cars. The interactives (red) react
to the intention of the automated car, while the motion of
non-interactive vehicles (green and grey) is not influenced
by the motion of the automated car (blue). Therefore, we
construct Markov models for the motion of non-interactive
vehicles that represent traffic conditions, and build maneuver
automata to describe the interaction between the automated
and interactive vehicles.
We denote the state vector for the human-robot system
X
=
[
x
e
,
x
h
,
x
r
]
,
(10)
with non-interactive vehicles (the environment)
x
e
=
[
x
1
e
,...,
x
N
e
e
]
,
interactive vehicles
(the
humans)
x
h
=
[
x
1
h
,...,
x
N
h
h
]
, and the automated car (the robot)
x
r
. For the
case shown in Fig. 4,
N
e
=
2 and
N
h
=
1.
To reason about the discrete-time maneuver planning, we
consider the state of each vehicle at the beginning or the
end of a maneuver. Since each car’s orientation is assumed
to be aligned with the direction of the lane at the end of a
maneuver, for simplicity we may omit the orientation
θ
in
the discrete states and define
x
i
e
= [
n
e
,
i
x
,
n
e
,
i
y
,
n
e
,
i
v
]
,
x
i
h
= [
n
h
,
i
x
,
n
h
,
i
y
,
n
h
,
i
v
]
,
x
r
= [
n
r
x
,
n
r
y
,
n
r
v
]
,
(11)
where
n
x
is the discretized longitudinal position, see the grids
in Fig. 4,
n
y
is the lane number, and
n
v
is the discretized
speed.
Using the discrete state definition (11), we describe the
motion of non-interactive vehicles with discrete-time Markov
models, where the transition probability
[
T
e
]
i jk
=
P
e
(
x
e
[
k
+
1
] =
s
j
x
e
[
k
] =
s
i
)
(12)
can be trained from traffic data, see [12] for more details.
On the other hand, the motion of the automated vehicle
can be described by a maneuver automaton
M
r
=
{
X
r
,
x
r
[
0
]
,
A
r
,
C
r
,
T
r
,
S
g
}
(13)
– the sample space
X
r
R
3
,
– the initial state
x
r
[
0
]
R
3
,
– the input space
A
r
=
{
a
k
,
k
=
1
,...,
K
}
where the ma-
neuver
a
k
is defined in (9),
C
r
(
x
r
,
a
r
)
is the instantaneous cost for the automated car
at state
x
r
with action
a
r
,
– the conditional transition matrix
T
r
,
– the accepting states
S
g
P
(
R
3
)
,
where the size of the sample space
|
X
r
|
is finite, because the
vehicle speed is bounded, and the vehicle position is between
the position of the initial state
x
r
[
0
]
and those of the accepting
states
S
g
.
In (13) the cost
C
r
(
x
r
,
a
r
)
may contain the cost for potential
collisions, the cost-to-goal, and the action cost. Therefore,
its value depends on the states and actions of neighbor-
ing human-driven vehicles. For example, we may denote
C
r
(
x
r
,
a
r
|
x
e
,
x
h
,
a
h
)
at step
i
as
C
r
[
i
]
, and define it as
C
r
[
i
] =
f
r1
(
O
r
[
i
]
,
O
h
[
i
]
)
+
f
r2
(
O
e
[
i
]
,
O
r
[
i
]
)
+
d
(
O
r
[
i
]
,
S
g
)
+
γ
4
||
a
r
[
i
]
||
2
,
(14)
where
γ
4
>
0, and by abuse of notation we denote
O
r
[
i
]
,
O
h
[
i
]
, and
O
e
[
i
]
as the grids occupied during maneuver
i
by the automated, interactive, and non-interactive vehicles,
respectively. Details on the projection from the continuous
occupancy set (5) to the discrete grids can be found in [13].
The collision costs
f
r1
,
f
r2
are non-negative functions that
decrease monotonically with respect to the distance between
the automated vehicle and the interactive and non-interactive
vehicles, e.g.,
f
(
S
1
,
S
2
) =
max
(
0
,
k
c
(
̄
N
−||
S
1
,
S
2
||
2
l
)
)
,
(15)
where
||
S
1
,
S
2
||
l
denotes the 2-norm for the corresponding
portion of
S
1
and
S
2
that are in the same lane,
̄
N
is the
minimum distance beyond which the collision cost is zero,
and
k
c
>
0 is the weight on the collision cost. The distance-
to-goal cost is
d
(
O
r
[
i
]
,
S
g
)
=
l
(
O
r
[
i
]
,
S
g
)
+
min
s
S
g
||
O
r
[
i
]
s
||
2
2
,
(16)
where
l
(
O
r
[
i
]
,
S
g
)
=
0 when
O
r
[
i
]
is in the same lane as
S
g
and
l
(
O
r
[
i
]
,
S
g
)
>
0 otherwise. Therefore, the automated car’s
cost
C
(
x
r
,
a
r
)
is a random variable with respect to the motion
of human-driven vehicles.
We also note that the conditional transition probability for
s
i
s
j
given action
a
k
is
[
T
r
]
i jk
=
P
r
(
x
r
[
l
+
1
] =
s
j
x
r
[
l
] =
s
i
,
a
r
[
l
] =
a
k
)
,
(17)
for
i
,
j
=
1
,...,
|
X
r
|
,
k
=
1
,...,
K
, which can be obtained
using the maneuver definition (2,9). However, (13) does not
specify how
a
r
[
k
]
is chosen given the history
X
[
0
]
,...,
X
[
k
1
]
and the accepting states
S
g
, this problem will be formu-
lated and solved in the next sections.
Similarly, the dynamics of an interactive vehicle can be
defined as
M
h
=
{
X
h
,
x
h
[
0
]
,
A
h
,
C
h
,
T
h
}
,
(18)
where its sample space
X
h
, initial state
x
h
[
0
]
, and action set
A
h
are defined similarly as in (13). The transition probability
[
T
h
]
i jk
=
P
h
(
x
h
[
l
+
1
] =
s
i
x
h
[
l
] =
s
j
,
a
h
[
l
] =
a
k
)
(19)
can also be obtained using the maneuver definition (2,9).
Similarly, we may define the human’s cost function
C
h
(
x
h
[
k
]
,
a
h
[
k
]
|
x
e
[
k
]
,
x
r
[
k
]
,
a
r
[
k
])
in terms of
C
h
[
k
] =
f
h1
(
O
r
[
k
]
,
O
h
[
k
]
)
+
f
h2
(
O
e
[
k
]
,
O
h
[
k
]
)
+
γ
a
||
a
h
[
k
]
||
2
,
(20)
where
γ
a
is the weight for the input cost, while the collision
cost
f
h1
and
f
h2
against the motion of the automated and the
non-interactive vehicles are defined similarly as in (14).
While some previous research consider deterministic poli-
cies
a
h
[
k
]
in (18), a probabilistic distribution given the state
history might describe a human driver’s action better [14],
[15]. In the next section, we will provide details on a
probabilistic model for the interactive vehicles using control
improvisation.
IV. M
ODELING THE BEHAVIOR OF INTERACTIVE
VEHICLES
Here, we describe the discrete-time decision-making of
interactive vehicles based on the maneuver automate model
(18). To break the symmetry where the decision of automated
and interactive vehicles depend on each other, we assume
the automated car broadcasts its intentions through advanced
signaling and vehicle-to-vehicle communication, so that the
human drivers driving behind or next to it may react to
its intention [5]. However, due to limited rationale, human
drivers are not always able to obtain the optimal policy and
generate stochastic behaviors instead.
Therefore, we consider human actions
a
h
[
k
]
generated by
a distribution conditioned on the state of other vehicles and
the automated car’s action, that is,
a
h
[
k
]
P
(
a
h
[
k
]
x
h
[
k
]
,
x
e
[
k
]
,
x
r
[
k
]
,
a
r
[
k
]
)
,
(21)
where
denotes sampling from the probability on the right-
hand side.
To ensure this distribution represents human decisions well
in traffic, we use control improvisation to ensure that the
generated behaviors obey a set of hard constraints (conditions
that are satisfied deterministically), a set of soft constraints
(conditions that are satisfied with a probability), and a
randomness requirement on the behavior richness [7].
That is, given a finite alphabet
Σ
, we would like to generate
a randomized behavior in the language
I
Σ
. Given a hard-
constraint set
S
Σ
, finitely many soft-constraint sets
A
i
I
with probability bounds
ξ
i
[
0
,
1
]
where
i
∈{
1
,...,
n
}
, and
a probability bound
ρ
(
0
,
1
]
, the distribution
D
:
Σ
[
0
,
1
]
with support set
S
such that
1)
S
I
,
2)
i
=
1
,...,
n
,
P
[
ω
A
i
|
ω
D
]
1
ξ
i
,
3)
ω
S
,
D
(
ω
)
ρ
,
is called an
(
ξ
,
ρ
)
-improvising distribution. The hard con-
straint 1) is used to specify all behaviors with non-zero
probabilities (e.g., limited steering and acceleration capabili-
ties); the soft constraint 2) is used to regulate the probability
for certain sets of behaviors (e.g., the probability of the
human driver performing certain motion primitives); while
the randomness requirement 3) ensures a low probability for
a human driver to repeat any particular maneuver sequence.
Here, the human driver’s action in response to the au-
tomated car’s intention can be formulated as satisfying the
following specifications:
1)
Hard constraint
The human-driven cars must follow the acceleration and
steering limits while remaining on the road, that is,
a
h
[
k
]
U
,
x
h
[
k
]
X
,
k
0
.
(22)
This hard constraint does not require a human driver to guar-
antee his or her action will not lead to imminent collisions.
However, such faulty human behaviors has low probabilities
since most human drivers are able to select safe maneuvers
in normal driving conditions, as will be specified in the soft
constraints below.
2)
Soft constraints
Based on the limited rationale assumption [6], the cost op-
timization (20) is too resource-exhausting for human drivers.
Therefore, we assume human drivers associate maneuvers
generating high costs with low probability, i.e.,
P
(
C
h
[
k
]
<
C
th
)
p
th
,
k
0
,
(23)
where
C
h
[
k
]
is defined in (20),
C
th
is the cost threshold
below which human drivers are not sensitive to, and
p
th
is the corresponding probability threshold. Note that (23)
only considers the time horizon of one step ahead, which
corresponds to the findings in [16] that the look-ahead
horizon of human drivers are relatively short in uncertain
decision-making settings.
3)
Randomness
The randomness condition ensures richness in the gener-
ated actions of human drivers. It requires that a particular
sequence of motion primitives has low probability. This
condition is often satisfied by ensuring a sufficient number
of actions inside the hard and soft constraints, see [7] for
more details.
The principle of maximum entropy [15] can be used
to obtain an estimation of the functions
f
h1
(
·
)
,
f
h2
(
·
)
and
parameter
γ
a
in (18) based on driving data. However, it is
necessary to check that the trained distribution (21) satisfies
the soft constraint (23) under given traffic conditions, see
[12] for more details.
V. I
NTERACTIVE MOTION
-
PLANNING FRAMEWORK
In this section, we formulate the motion-planning frame-
work that exploits the interaction between the automated and
human-driven vehicles while providing robustness against
inaccuracies in the probabilistic human models. We also
sketch the value iteration method used to obtain an optimal
sequence of maneuvers in the interactive motion-planning
framework.
A. Probabilistic reach-avoid task
We model the motion of the human-robot system during
one reach-avoid task using a Markov decision tree
M
=
(
X
,
A
,
C
r
,
P
,
X
[
0
]
,
D
,
S
g
)
(24)
with the following components:
– the sample set
X
X
r
×
X
h
×
X
e
,
– the action set
A
A
r
×
A
h
,
C
r
(
X
,
a
)
is the instantaneous cost for the automated car
at state
X
with action
a
= (
a
r
,
a
h
)
,
– the transition probability
P
(
·|
X
,
a
)
for state
X
with
action
a
,
– the initial state
X
[
0
] =
[
x
e
[
0
]
,
x
h
[
0
]
,
x
r
[
0
]
]
,
– the maximum tree depth
D
,
– the accepting states
S
g
P
(
R
3
)
.
The termination conditions of the Markov decision tree are
1) if the maximum depth
D
is reached (
k
=
D
),
2) if the goal region is reached
x
r
[
k
x
r
[
0
]
,
a
r
[
i
]
]
S
g
,
(25)
where
i
=
0
,...,
k
1 for
k
<
D
, and
3) if a collision happens
O
(
x
r
[
k
]
,
a
r
[
k
])
(
O
(
x
h
[
k
]
,
a
h
[
k
])
O
(
x
e
[
k
]
,
a
e
[
k
])
)
6
=
,
(26)
where
O
(
x
[
k
]
,
a
[
k
])
with subscripts r, h, and e are the occu-
pancy set of the automated, interactive, and non-interactive
vehicles during the
k
th
maneuver, respectively. Therefore, the
size of the sample set
X
can be significantly smaller than
the product set
X
r
×
X
h
×
X
e
.
Based on the Markov model (24), we define the automated
car’s objective as to find an optimal maneuver sequence
a
r
that minimizes its conditional risk-at-value cost during the
reach-avoid task, that is,
a
r
=
arg min
a
r
(
A
r
)
k
CVaR
α
(
k
i
=
0
C
r
[
i
]
X
[
0
]
,
a
r
,
a
h
)
,
(27)
where CVaR
α
denotes the conditional value-at-risk at cau-
tion level
α
against the action distribution of interactive
vehicles (21). When the caution level is low (
α
0
+
),
the conditional value-at-risk CVaR
α
(
Z
)
approaches the mean
value
E
(
Z
)
. When the risk level is high (
α
1
), the
conditional value-at-risk CVaR
α
(
Z
)
approaches the worst-
case scenario max
(
Z
)
. In particular, CVaR
α
(
Z
)
is convex in
Z
and continuous with respect to the caution level
α
[10].
CVaR
α
(
Z
)
can be interpreted as the worst-case expectation
of
Z
when the distribution
P
(
·|
X
,
a
)
is perturbed
CVaR
α
(
Z
) =
max
ξ
E
(
α
,
P
)
E
ξ
[
Z
]
,
(28)
where
E
ξ
denotes the
ξ
weighed expectation of
Z
with the
risk envelop
E
(
α
,
P
) =
{
ξ
ξ
(
ω
)
[
0
,
1
1
α
]
,
ξ
(
ω
)
P
(
ω
)
d
ω
=
1
}
.
(29)
Therefore, (28) shows that the conditional value-at-risk can
be used to provide specific levels of robustness in the auto-
mated car’s optimal policy against the probabilistic model of
interactive vehicles.
Similar to most motion-planning methods in dynamic and
uncertain environments [17], the interactive motion-planning
framework (21,24,27) cannot guarantee absolute collision
avoidance. Instead, we may restrict the automated car’s
action set
{
a
r
[
i
]
}
at step
i
to the actions that do not lead to
imminent collisions against the most probable human action.
B. Value iteration
The optimal policy problem (21,27) can be solved using
the CVaR decomposition theorem in [11]. Denote the value
function at the state
s
X
and caution level
y
(
0
,
1
)
as
V
(
s
,
y
) =
min
a
A
CVaR
y
(
k
i
=
0
C
r
[
i
]
X
[
0
] =
s
,
a
)
(30)
and define the Bellman operator
T
[
V
](
s
,
y
) =
min
a
A
(
C
r
(
s
,
a
)+
max
ξ
E
(
y
,
P
(
·|
s
,
a
))
s
X
ξ
(
s
)
V
(
s
,
y
ξ
(
s
))
P
(
s
|
s
,
a
)
)
.
(31)
Then for all states
s
X
and caution level
y
(
0
,
1
)
, the
stationary point
V
(
s
,
y
)
of the Bellman operator
T
[
V
](
s
,
y
) =
V
(
s
,
y
)
(32)
is the optimal conditional value-at-risk
V
(
s
,
y
) =
min
a
r
(
A
r
)
k
CVaR
y
(
k
i
=
0
C
r
[
i
]
X
[
0
] =
s
,
a
r
,
a
h
)
,
(33)
with the optimal policy
a
r
[
k
] =
a
(
s
[
k
]
,
y
[
k
]
)
,
(34)
for the iterative caution level
y
[
k
] =
y
[
k
1
]
ξ
s
[
k
1
]
(
s
[
k
]
)
.
(35)
0
50
100
150
200
250
2
4
6
8
0
50
100
150
200
250
2
4
6
8
Fig. 5.
Nominal trajectories of the automated car (red circles) and the
human-driven car (blue cross) in high-speed driving under different caution
levels. Both longitudinal and lateral distances are in the unit of meters.
Upper panel: high caution level
α
=
0
.
9; Lower panel: low caution level
α
=
0
.
1;
The Bellman equation (31,32) can be solved using a value
iteration scheme as described in [11].
VI. C
ASE STUDIES
In this section, we consider a high-speed scenario and
a low-speed scenario for the interactive motion planning
of an automated vehicle changing into the left lane. For
simplicity, here we only consider one automated vehicle and
one interactive vehicle, while no non-interactive vehicles are
considered, cf.
x
r
and
x
h
in Fig. 4.
For the high-speed scenario, we consider the steady-state
speed for both the automated and interactive vehicle as
16 [m/s], while the time step for each maneuver (9) is 4
[s]. The automated car has 6 maneuvers that correspond to
{
accelerate in the lane, constant-speed in the lane, decelerate
in the lane, change the lane with acceleration, change the lane
with the same speed, change the lane with deceleration
}
.
The interactive vehicle has 3 maneuvers corresponding to
{
accelerate in the lane, constant-speed in the lane, decelerate
in the lane
}
. While the motion-planning framework allows
larger maneuver sets and more human-driven vehicles, to
demonstrate its benefits, we present this most simplified case.
In Fig. 5, we consider the case where the automated car’s
initial longitudinal position (red circle) is 8 [m] in front of
the human-driven vehicle (blue star). In the upper panel of
Fig. 5, the automated car has the high caution level
α
=
0
.
9. Therefore it plans against the most aggressive human
behavior, where the human driver will not slow down to let
him cut in front. Thus, the automated car uses two lane-
keeping maneuvers to increase its longitudinal distance from
the human-driven car, before eventually executing the lane-
change maneuver.
On the other hand, in the lower panel of Fig. 5, with the
low caution level
α
=
0
.
1, the automated car fully exploits
the understanding that the human driver is highly likely to
respond to the automated car’s behaviors so that no imminent
collision will happen (23). Therefore, the automated car
decides to make bolder actions, i.e., cutting in front of the
human-driven car at the first step.
0
50
100
150
200
250
300
2
4
6
8
0
50
100
150
200
250
300
2
4
6
8
Fig. 6.
Nominal trajectories of the automated car (red circles) and the
human-driven car (blue cross) in high-speed driving under different caution
levels. Both longitudinal and lateral distances are in the unit of meters.
Upper panel: high caution level
α
=
0
.
9; Lower panel: low caution level
α
=
0
.
1;
Fig. 6 shows a symmetric case, where the automated car
starts slightly behind the human-driven car. In the upper
panel, the higher caution level prompts the automated car
to employ several lane-keeping maneuvers over which its
distance to the human-driven car increases. However, for the
lower caution level, the automated car merges behind the
human-driven car immediately, fully exploit the belief that
the human driver is unlikely to brake heavily.
Indeed, planning with low caution levels requires high
confidence in the human interaction model (21), especially
when the automated car has higher priority (e.g., a police
car or ambulance), or when the gesture of the human driver
and the motion of the interactive vehicle indicate amicable
driving behavior. However, if the caution level is too low
compared with the accuracy of the interactive model, the
planned path may have to be terminated and re-planning
engaged.
We also note that this planning framework (21,27) can
be simplified to widely-used motion-planning methods such
as the A
algorithm, especially when the human-driven vehi-
cle’s model (21) is replaced with non-interactive assumptions
such as the constant-speed assumption. Without interactive
models, a planning algorithm can have difficulties generating
a path for merging or lane-changing in dense traffic.
Therefore, we consider a low-speed driving scenario where
the automated car needs to change to the right lane within
10 grids (with in a longitudinal distance of 80 [m]), see
Fig. 7. The left and middle panels show the cases when
the automated car has low caution level. In both cases, it
maintains the constant speed of 4 [m/s] for the first step while
indicating the lane-change intention. Then, as the human
driver reacts to such an action either by speeding up and
letting the automated car to merge behind it (left) or by
slowing down and letting the automated car to cut in front
of it (middle), the automated car decides to decelerate or
accelerate at the second step. However, when the caution
level is high (right panel), the automated car plans with
the maximum cost against human behaviors, and it decides
to accelerate considering the worst case where the human
driver would also accelerate. While the right panel shows one
1
2
0
2
4
6
8
10
1
2
0
2
4
6
8
10
1
2
0
2
4
6
8
10
Fig. 7.
State sequences of the automated car and the human-driven car in
low-speed driving under different caution levels. The longitudinal distance is
in the unit of 8 [m], while the lateral distance is in the unit of the lane width.
Left panel: Caution level
α
=
0
.
05, and the human driver does not wish to
let the automated car cut in front. Middle panel: Caution level
α
=
0
.
05 and
the human driver lets the robot to cut in front. Right panel: Caution level
α
=
0
.
95 where regardless of the human behavior, the automated car tries
to accelerate before lane-changing.
run where the lane-change task is successfully finished, not
utilizing the human response can lead to a larger probability
of not fulfilling the lane-change task.
Through these three simulations, we demonstrate the
flexibility the caution level
α
brings to interactive motion
planning. In particular, it explicitly balances between ex-
ploiting the friendly behavior of human drivers and retaining
robustness against inaccuracy in the interaction model, which
may not be achieved through many existing graph-search
methods.
VII. CONCLUSIONS
We proposed an interactive motion-planning framework
for automated vehicles when they share the road with human-
driven cars. We constructed maneuver automata to describe
the dynamics of both automated vehicles and human-driven
vehicles using motion primitives. Then we built a prob-
abilistic model to describe the action of human drivers
responding to the intention of an automated vehicle. We
used a conditional value-at-risk formulation to describe the
interactive motion-planning problem, exploiting the proba-
bilistic human response model while providing desired levels
of robustness against mismatches between the learned human
model and human driving behaviors. Finally, we demonstrat-
ed through simulations that the interactive motion-planning
strategy allows the behavior of an automated car to adapt to
different caution levels regarding the human model. In future
research, it is desirable to ensure probabilistic constraints on
guaranteed safety and analyze the influences on traffic flow.
R
EFERENCES
[1] D. Gonzlez, J. Prez, V. Milans, and F. Nashashibi, “A review of motion
planning techniques for automated vehicles,”
IEEE Transactions on
Intelligent Transportation Systems
, vol. 17, no. 4, pp. 1135–1145,
2016.
[2] E. Frazzoli, M. A. Dahleh, and E. Feron, “Maneuver-based motion
planning for nonlinear systems with symmetries,”
IEEE Transactions
on Robotics
, vol. 21, no. 6, pp. 1077–1091, 2005.
[3] B. Sch
̈
urmann and M. Althoff, “Convex interpolation control with
formal guarantees for disturbed and constrained nonlinear systems,” in
Proceedings of the 20th International Conference on Hybrid Systems:
Computation and Control
, 2017, pp. 121–130.
[4] B. Sch
̈
urmann, D. Heß, J. Eilbrecht, O. Stursberg, F. Kster, and
M. Althoff, “Ensuring drivability of planned motions using formal
methods,” in
Proceedings of the 20th International Conference on
Intelligent Transportation Systems (ITSC)
, 2017, pp. 1–8.
[5] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning
for autonomous cars that leverages effects on human actions,” in
Proceedings of the Robotics: Science and Systems Conference (RSS)
,
2016, pp. 66–73.
[6] T. L. Griffiths, F. Lieder, and N. D. Goodman, “Rational use of
cognitive resources: Levels of analysis between the computational and
the algorithmic,”
Topics in Cognitive Science
, vol. 7, no. 2, pp. 217–
229, 2015.
[7] D. J. Fremont, A. Donz
́
e, S. A. Seshia, and D. Wessel, “Control
improvisation,” in
35th IARCS Annual Conference on Foundation of
Software Technology and Theoretical Computer Science (FSTTCS)
,
vol. 45, 2015, pp. 463–474.
[8] A. Donz
́
e, R. Valle, I. Akkaya, S. Libkind, S. A. Seshia, and D. Wessel,
“Machine improvisation with formal specifications,” in
Proceedings of
the 40th International Computer Music Conference (ICMC)
, Septem-
ber 2014.
[9] I. Akkaya, D. Fremont, R. Valle, A. Donz
́
e, E. A. Lee, and S. A. Se-
shia, “Control improvisation for probabilistic temporal specifications,”
in
Proceedings of the 1st IEEE International Conference on Internet-
of-Things Design and Implementation (IoTDI)
, 2016, pp. 55–70.
[10] R. T. Rockafellar and S. Uryasev, “Optimization of conditional value-
at-risk,”
Journal of Risk
, vol. 2, pp. 21–41, 2000.
[11] Y. Chow, A. Tamar, S. Mannor, and M. Pavone, “Risk-sensitive and
robust decision-making: a cvar optimization approach,” in
Advances in
Neural Information Processing Systems 28
, C. Cortes, N. D. Lawrence,
D. D. Lee, M. Sugiyama, and R. Garnett, Eds.
Curran Associates,
Inc., 2015, pp. 1522–1530.
[12] J. I. Ge and R. M. Murray, “Voluntary lane-change policy synthesis
with control improvisation,” in
Proceedings of the IEEE 57th Interna-
tional Conference on Decision and Control
, 2018, pp. 1–8.
[13] M. Althoff, O. Stursberg, and M. Buss, “Model-based probabilistic
collision detection in autonomous driving,”
IEEE Transactions On
Intelligent Transportation Systems
, vol. 10, no. 2, pp. 299–310, 2009.
[14] D. Sadigh, K. Driggs Campbell, A. A. A. Puggelli, W. Li, V. Shia,
R. Bajcsy, A. L. Sangiovanni-Vincentelli, S. S. Sastry, and S. A. Seshi-
a, “Data-driven probabilistic modeling and verification of human driver
behavior,” EECS Department, University of California, Berkeley, Tech.
Rep. UCB/EECS-2013-197, Dec 2013.
[15] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum
entropy inverse reinforcement learning,” in
Proc. AAAI
, 2008, pp.
1433–1438.
[16] D. Carton, V. Nitsch, D. Meinzer, and D. Wollherr, “Towards assessing
the human trajectory planning horizon,”
PLOS ONE
, vol. 11, no. 12,
pp. 1–39, 2016.
[17] N. E. D. Toit and J. W. Burdick, “Robot motion planning in dynamic,
uncertain environments,”
IEEE Transactions on Robotics
, vol. 28,
no. 1, pp. 101–115, 2012.