Simultaneous Model Identification and Task Satisfaction
in the Presence of Temporal Logic Constraints
Sandeep P. Chinchali
Scott C. Livingston
Marco Pavone
Joel W. Burdick
Abstract
— Recent proliferation of cyber-physical systems,
ranging from autonomous cars to nuclear hazard inspection
robots, has exposed several challenging research problems on
automated fault detection and recovery. This paper considers
how recently developed formal synthesis and model verification
techniques may be used to automatically generate information-
seeking trajectories for anomaly detection. In particular, we
consider the problem of how a robot could select its actions so as
to maximally disambiguate between different model hypotheses
that govern the environment it operates in or its interaction
with other agents whose prime motivation is a priori unknown.
The identification problem is posed as selection of the most
likely model from a set of candidates, where each candidate is
an adversarial Markov decision process (MDP) together with
a linear temporal logic (LTL) formula that constrains robot-
environment interaction. An adversarial MDP is an MDP in
which transitions depend on both a (controlled) robot action
and an (uncontrolled) adversary action. States are labeled,
thus allowing interpretation of satisfaction of LTL formulae,
which have a special form admitting satisfaction decisions in
bounded time. An example where a robotic car must discern
whether neighboring vehicles are following its trajectory for a
surveillance operation is used to demonstrate our approach.
I. I
NTRODUCTION
Autonomous robots programmed to carry out physical
tasks must often enter an information gathering mode in
order to acquire the physical data needed to complete a task,
such as manipulation of a novel object whose properties are a
priori unknown. Autonomous robots working in the presence
of humans or other autonomous agents must often determine
the intent of these agents–are they friendly, adversarial, or
neutral? Because of the importance of information seeking
behavior to many robotic tasks, there is a growing literature
on information theoretic methods to incorporate information
acquisition into the planning or action-selection process.
In the domain of mobile robot exploration and SLAM,
a number of works have explored how single robots or a
team of robots should select their exploring movements so
as to optimize a number of exploration criteria [5], [20],
[25]. Action selection for industrial assembly robots [6]
has also been considered. More recently there has been
work on “planning for sensing” in the context of dexterous
manipulation [11], [12], [22]. In general, these problems can
be seen as a type of system identification [19].
The novel contribution of this paper is the formulation
of a class of information-seeking problems in the language
of formal systems and model verification theory. In this
S.P. Chinchali and M. Pavone are with Stanford University, Stanford,
CA, USA,
{
csandeep, pavone
}
@stanford.edu
. S.C. Livingston
and J.W. Burdick are with the California Institute of Technology, Pasadena,
CA, USA,
{
slivingston@cds,jwb@robotics
}
.caltech.edu
?
Fig. 1.
Illustration of car-following detection scenario. We control the
green leading car and want to decide which of a set of models is consistent
with motion of the following red car. Temporal logic formulae constrain the
follower’s response to actions of the leading car.
approach, high level task definitions (or “specifications”)
are used to automatically construct automata which form
a hybrid control system whose design guarantees that the
specification is faithfully executed. Formal methods offer
the promise of automated contruction of high-level con-
trollers and strong guarantees on performance. Research
in computer-aided verification of software and embedded
systems has considered problems requiring synthesis of
automata from temporal logic formulae for several decades
[2], [10], [23], and the hybrid control community has adopted
these methods over the last decade [7], [9]. Only recently has
this approach gained attention by the robotics community [3],
where formal methods have been used to construct sensor-
based motion planning strategies [16], including finger-
gaiting algorithms [4].
A significant challenge of applying formal methods in
robotics is the prevalence of uncertainty. This includes sensor
noise, imperfect actuation, and changes in robot surround-
ings. There is a rapidly growing literature on treating formal
methods amid uncertainty. A well known stochastic model is
the Markov decision process, which can be naturally used if
probabilistic satisfaction of a task specification is allowed
[2]. A class of methods known to work well in practice
for constrained and uncertain problems is model predictive
control [21]; a similarly motivated method has recently been
explored in a temporal logic context by Ding
et al
. [8].
Sarid
et al
. propose a method for synthesizing a reactive
robot controller based on task specifications written in terms
of classes of objects that may be discovered online while
building a map [24]. Wongpiromsarn and Frazzoli model an
uncertain environment using switched Markov chains, where,
while a model of each chain is known, which one is active
is not known [26].
Recent work by Jones, Schwager, and Belta has addressed
http://resolver.caltech.edu/CaltechCDSTR:2016.002
synthesis of informative motion plans under temporal logic
constraints [14], [15] and anomaly detection from unsu-
pervised learning of formal specifications [13]. Rather than
focus on informative motion plans for monitoring tasks, we
instead consider control sequences designed to
elicit informa-
tive responses
from dynamic environments that may attempt
to obfuscate their identities. Such adversarial environments
may not immediately expose faults, and the problem of
anomaly detection depends on informative “probing” of
anomalous behavior to detect faults early and safely.
We study the problem of identifying which of several
candidate strategies an environment is following, where each
candidate is associated with a temporal logic specification.
The overall system specification also governs behaviors that
are permissible by the robot and its unknown and possibly
adversarial environment, and possibly a task to be completed
during the process. We provide an algorithm that formal-
izes how robots should visit states that distinguish between
candidate models, gather observations that disqualify wrong
candidates, and lead to information gain while adhering to
high-level temporal logic formulae.
This paper is organized as follows. Section II introduces
bounded time linear temporal logic (BLTL) as a high-
level specification language and adversarial Markov decision
processes to capture model uncertainty. Section III provides
the problem formulation of model identification coupled with
temporal logic constraints and introduces the motivating ex-
ample of detecting anomalous drivers in robotic navigation.
Section IV introduces our key contribution, an algorithm for
robotic inference with temporal logic constraints. Section V
provides a simulation of the algorithm for autonomous driv-
ing. Finally, Section VI summarizes our results and presents
directions for future research.
II. B
ACKGROUND
In this paper we consider discrete-time systems on finite
state spaces. Note that for more general control systems it
may be possible to construct a discrete abstraction [1], [18],
whence our analysis applies. The primary specification lan-
guage, which describes properties over finite time horizons,
is referred to as bounded linear temporal logic (BLTL) and
is defined below. The following running example, which is
illustrated in Figure 1, will be used throughout the paper.
Consider an autonomous vehicle being followed by other
cars on a several-lane highway. These cars can closely pursue
the robot, such as in a high-speed car chase, keep their
distance to “tail” the robot during surveillance, or simply be
benign (incidentally following the target). The robot must
safely swerve lanes to avoid obstacles and determine if
a pursuant car is tailing it, requiring strict adherence to
temporal logic formulae while determining the nature of
following cars.
As the above example suggests, robots must often interact
with a dynamic environment so as to complete some task.
Our development models this interaction in two respects: as
a two-player game in which an adversarial strategy is to be
identified, and as a Markov decision process that captures
stochasticity of transitions among states.
A. Bounded linear-time temporal logic (BLTL)
In this paper we are concerned with modeling interaction
between a robot and an adversary over finite durations.
While transitions among states are modeled using so-called
adversarial MDPs (AMDPs, defined below) (defined below),
constraints on sequences of states are expressed using a
temporal logic that we now introduce.
Our goal is to describe properties about finite sequences
of states. To achieve this, we consider formulae that are built
from an operator,
U
I
, that describes constrained reachability
over bounded time durations. The subscript
I
= [
a,b
]
is a
bounded interval on the nonnegative integers
N
. Note that
because every interval
I
is bounded, every such formula can
be decided using a finite sequence of time steps. We refer to
formulae described here as bounded LTL, or BLTL.
More precisely, let
Π
be a finite set of
atomic propositions
.
Elements of
Π
can be regarded as Boolean-valued variables
that, at each discrete time, take either the value
True
or
False
. In the example of autonomous highway driving,
atomic propositions associated with the robot include
Y
=
{
C
m
}
M
m
=1
to indicate the lane occupancy of the robot (car) in
one of
M
total lanes, where index
m
refers to a specific lane.
Correspondingly, there are atomic propositions associated
with the (adversarial) environment,
X
=
{
F
m
}
M
m
=1
, which
indicate the lane occupancy of a following vehicle. The set
{
C
i
,F
j
} ⊂
Π
represents a state where
C
i
and
F
j
are both
True
while other atomic propositions are
False
, which have
the intuitive interpretation that the robot car occupies lane
i
while the environment follower occupies lane
j
.
The syntax of BLTL formulae over interval
I
is given by
the context-free grammar
φ
::=
True
|
p
|¬
φ
|
φ
∧
φ
|©
φ
|
φ
U
I
φ,
(1)
where
p
is an atomic proposition
p
∈
Π
. Notice that (1)
is essentially the basic syntax of LTL but with an interval
associated with the
U
operator.
Before defining the semantics of BLTL, we fix notation
about finite strings. Let
Σ
be a set (or alphabet) over which
string concatenation is defined, while
Σ
+
denotes the set
of all finite strings of length at least 1. For a string
σ
=
σ
0
σ
1
···
σ
n
∈
Σ
+
,
a fragment from index
i
through
j
, with
i < j
, is written
σ
i
:
j
=
σ
i
σ
i
+1
···
σ
j
.
If
j
=
n
, i.e.,
to express the suffix of
σ
beginning at index
i
, we may
abbreviate the subscript by writing
σ
i
:
.
Take the alphabet to be subsets of atomic propositions,
i.e.,
Σ = 2
Π
. Let
t
∈
N
. We denote the offset of
I
= [
a,b
]
from
t
by
t
+
I
= [
a
+
t,b
+
t
]
. Satisfaction of a BLTL
formula
φ
by a finite sequence
σ
∈
Σ
+
beginning at time
t
∈
N
, which is denoted by
σ
t
:
|
=
φ
, is defined inductively
as follows.
1)
σ
t
:
|
=
True
.
2) for a single atomic proposition
p
∈
Π
,
σ
t
:
|
=
p
if and
only if
p
∈
σ
t
.
3)
σ
t
:
|
=
¬
φ
if and only if
σ
t
:
6|
=
φ
.
http://resolver.caltech.edu/CaltechCDSTR:2016.002
4)
σ
t
:
|
=
φ
1
∧
φ
2
if and only if
σ
t
:
|
=
φ
1
and
σ
t
:
|
=
φ
2
.
5)
σ
t
:
|
=
©
φ
if and only if
σ
(
t
+1):
|
=
φ
.
6)
σ
t
:
|
=
φ
1
U
I
φ
2
if and only if there exists some
j
∈
t
+
I
so that
σ
j
:
|
=
φ
2
and for all
i
∈
(
t
+
I
)
∩
[0
,j
]
,
σ
i
:
|
=
φ
1
.
If
t
= 0
, we also write
σ
|
=
φ
. Other operators can be
derived from those defined above. In particular,
φ
∨
ψ
≡
¬
(
¬
φ
∧¬
ψ
)
,
φ
=
⇒
ψ
≡ ¬
φ
∨
ψ
,
I
φ
≡
True
U
I
φ
and
I
φ
≡
¬
I
¬
φ
.
Notice that
σ
must be long enough
to observe if a BLTL formula is satisfied. For any BLTL
formula
φ
, denote by
T
(
φ
)
the minimum time at which the
satisfaction of
φ
by any
σ
∈
Σ
+
can be decided. That is,
T
(
φ
)
is constant and associated with each
φ
.
T
(
φ
)
always
exists and is finite. Indeed, it is at most the sum of the upper
bounds of all intervals appearing in
φ
.
BLTL can be used to specify robot tasks over bounded
time durations. For example, we can achieve
positional safety
(remain within a set of certain “safe” states) with the operator
I
and
finite time responses
with the operator
I
.
In the
autonomous driving example, safety formulae require cars
to persist in the same lane or move only to an adjacent
lane at the next time step. Such an example LTL formula
is
∧
m
I
(
C
m
=
⇒ ©
(
F
m
−
1
∨
F
m
∨
F
m
+1
))
.
B. Labeled Markov decision processes
A
labeled Markov chain
M
is a tuple
(
S,
Init
,
P
,
Π
,L
)
where
S
is a finite set of states,
P
:
S
×
S
→
[0
,
1]
is
a transition probability function,
Init
is a probability mass
function over
S
, and
L
:
S
→
2
Π
is a labeling that assigns
subsets of atomic propositions in set
Π
to each state. The
transition probabilities from a state
s
to subsequent states
s
′
satisfy
Σ
s
′
∈
S
P
(
s,s
′
) = 1
.
A
labeled Markov decision process
(MDP) is a tuple
(
S,
Init
,
Act
,
P
,
Π
,L
)
where
S
,
Init
,
Π
, and
L
are as for
Markov chains,
Act
is a finite set of actions, and
P
:
S
×
Act
×
S
→
[0
,
1]
is a transition probability function.
As
defined
by
Wongpiromsarn
and
Frazzoli
[26], an
adversarial MDP
(AMDP)
M
is a tuple
(
S,
Init
,
Act
c
,
Act
u
,
P
,
Π
,L
)
, where
S
is a finite set
of states,
Init
is a probability mass function over
S
,
Act
c
is
a mapping from states into sets of
controlled actions
,
Act
u
is a mapping into sets of
uncontrolled actions
(or
adversarial
actions
),
L
:
S
→
2
Π
is a labelling function from states to
atomic propositions, and
P
:
S
×
Act
c
×
Act
u
×
S
→
[0
,
1]
defines transition probabilities where for each state
s
∈
S
,
a
∈
Act
c
(
s
)
, and
b
∈
Act
u
(
s
)
,
∑
s
′
∈
S
P
(
s,a,b,s
′
) = 1
.
For each state
s
∈
S
, the controlled action
a
∈
Act
c
(
S
)
(respectively, adversarial action
b
∈
Act
u
(
S
)
) is said to be
enabled at
s
if
a
∈
Act
c
(
s
)
(respectively,
b
∈
Act
u
(
s
)
).
C. BLTL satisfaction on AMDPs
We now define the reactive synthesis problem for BLTL
formulae on adversarial MDPs. Recall from the previous
section (Section II-B) that there is a joint sequence of states
that arise from policies (or strategies) for action selection
by the robot and the adversary. Each state in the sequence
is labeled with a set of atomic propositions from
Π
. Since
there is only one set of atomic propositions, our treatment
may superficially appear different from the original setting of
reactive synthesis for LTL proposed in [23], in which there
are distinct controlled and uncontrolled variables. However,
observe that we recover the usual setting of reactive synthesis
by partitioning the set of atomic propositions and construct-
ing an AMDP so that control actions
Act
c
can only affect
values of one set of atomic propositions, and that adversarial
actions
Act
u
can only affect values of the other set.
Our development follows that of [26] except that we
consider time-bounded properties, as defined in Section II-A.
Let
φ
be a BLTL formula over the set of atomic propositions
Π
, and let
M
be an adversarial MDP with states labeled by
subsets of the same set of atomic propositions
Π
. A finite
sequence of states
s
0
···
s
t
of
M
is said to
satisfy
φ
, denoted
by
s
0
···
s
t
|
=
φ
, if
L
(
s
0
)
···
L
(
s
t
)
|
=
φ
, where
L
is the
labeling of
M
that associates sets of atomic propositions
with states. Thus, through the labeling we are able to decide
satisfaction as defined in Section II-A.
We denote the set of finite state sequences of length at
least one to be
S
+
. Let
C
:
S
+
→
Act
c
be a control policy,
and let
E
:
S
+
→
Act
u
be an adversary policy. From a state
s
of the adversarial MDP, the probability of satisfying
φ
is
defined as
Pr
C
,
E
M
(
s
|
=
φ
)
=
∑
s
0
···
s
T
(
φ
)
∈
Sat
T
(
φ
)
s
(
φ
)
T
(
φ
)
−
1
∏
t
=0
P
C
,
E
(
s
0
,...,s
t
+1
)
,
(2)
where we define the set of state sequences of length
T
(
φ
)
that begin at
s
and that satisfy
φ
as
Sat
T
(
φ
)
s
(
φ
) =
{
s
0
···
s
T
(
φ
)
∈
S
T
(
φ
)
|
s
0
=
s
∧
s
0
···
s
T
(
φ
)
|
=
φ
}
,
(3)
and for any sequence of states
s
0
···
s
t
, we define
P
C
,
E
(
s
0
,...,s
t
+1
) =
P
(
s
t
,
C
(
s
0
···
s
t
)
,
E
(
s
0
···
s
t
)
,s
t
+1
)
.
(4)
Intuitively,
P
C
,
E
is just the transition probability function
P
of
M
during application of the control policy
C
and the
adversary policy
E
. Recall from Section II-A that
T
(
φ
)
is
a sufficient length to decide satisfaction of
φ
by a sequence
of sets of atomic propositions, hence it is also a sufficient
length for sequences of states in the MDP.
III. P
ROBLEM FORMULATION
In this section we formulate the problem we wish to solve
in this paper. We first define the set of candidate environment
models the robot must disambiguate and how the robot
interacts with them, and key assumptions on their interaction.
A. System - environment interaction model
The interaction between a controlled system and adver-
sarial environment is modelled as an adversarial MDP, as
defined in Section II-B, with an associated temporal logic
http://resolver.caltech.edu/CaltechCDSTR:2016.002
formula the interaction must adhere to. We consider a set of
such adversarial MDPs corresponding to possible environ-
mental models. Henceforth, a candidate adversarial MDP, or
candidate MDP, refers to the interaction between the system
and a possible environment model as an adversarial MDP.
Consider a set
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
(5)
of
N
candidate environment models. Each model
(
M
i
,φ
i
)
,
for
1
≤
i
≤
N
, is a pair consisting of an adversarial MDP
M
i
and temporal logic specification
φ
i
. The formula is of
a special form of LTL that describes interaction using BLTL
subformulae. Precisely,
φ
i
is
∧
r
(
ψ
req
,r
i
=
⇒
ψ
res
,r
i
)
,
(6)
where
ψ
req
,r
i
and
ψ
res
,r
i
are both BLTL formulae, and
is
the “always” LTL operator. Often,
ψ
req
,r
i
is a system control
sequence that guarantees a response from an environment
in the form
ψ
res
,r
i
. In the pursuant car example, we take
ψ
req
,
1
=
C
1
and
ψ
res
,
1
=
I
[0
,
2]
F
1
,
which specifies that, if
the robot car enters lane
1
, it will observe the environment
car follow it to lane
1
within 2 time steps. The precise syntax
and semantics of LTL are not needed here; it is enough to
know the BLTL preliminaries of Section II-A and to interpret
(6) as requiring that the subformula
ψ
req
,r
i
=
⇒
ψ
res
,r
i
is satisfied at every time step, for all
r
. The motivation for
this assumed form of interaction formulae is that it allows
us to solve the problem through repeated finite-duration
interactions. The robot can probe for a particular behavior
r
of model
i
by satisfying
ψ
req
,r
i
, from where we expect a
trajectory satisfying
ψ
res
,r
i
in response, provided that model
i
is the ground-truth.
The adversarial MDP for model
(
M
i
,φ
i
)
is defined
as
M
i
= (
S,
Init
,
Act
c
,
Act
u
,
P
i
,
Π
,L
)
. The adversarial
MDPs for all models
i
share the same state set
S
and initial
state distribution
Init
, set of atomic propositions
Π
, labeling
function
L
:
S
→
2
Π
, and system (controller) and environ-
ment (adversary) action sets
Act
c
and
Act
u
. However, as
indicated by the subscripts, the transition probabilities
P
i
may be different. The assumption of common states, initial
distributions, adversarial actions, and atomic propositions,
and labellings is without loss of generality because any
differences only render easier the identification problem,
which is stated below in Section III-E.
A candidate model
(
M
i
,φ
i
)
is said to be
self-consistent
if, for each
s
∈
S
and for any robot controller
C
, there is
some adversarial policy
E
such that
Pr
C
,
E
M
i
(
s
|
=
φ
i
) = 1
.
The key intuition behind
self-consistency
is, if model
M
i
is indeed the true model, the environment must always
enact a strategy that is concordant with the true model’s
specification
φ
i
. Self-consistency is a natural requirement of
systems that satisfy high-level constraints in that environment
assumptions for the true model encoded in
φ
i
must be
adhered to in order to allow guarantees about interaction
with the true model.
B. Distinguishability
Let
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
be a set of candidate models. For
an LTL formula
φ
,
L
(
φ
)
denotes the set of all sequences that
satisfy
φ
. The models
(
M
i
,φ
i
)
and
(
M
j
,φ
j
)
are said to be
interactively distinguishable
if
L
(
φ
i
)
6
=
L
(
φ
j
)
. Let
s
∈
S
.
The models are said to be
distinguishable in probability from
s
if
min
C
max
E
Pr
C
,
E
M
i
(
s
|
=
φ
i
∧
φ
j
)
<
1
∨
min
C
max
E
Pr
C
,
E
M
j
(
s
|
=
φ
i
∧
φ
j
)
<
1
.
(7)
If the property holds for all
s
, then the models are said to
be
distinguishable in probability
.
Intuitively, being distinguishable in probability from some
state
s
ensures that a control policy exists such that the ad-
versary cannot simultaneously satisfy both candidate models
with probability 1. Accordingly, if we can repeatedly reach
s
and repeatedly play an appropriate policy, one of the models
must eventually be demonstrated as false.
C. Assumptions
We assume a finite set of self-consistent models
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
and fully observable system and robot states.
The key assumption that facilitates model identification is
that for each adversarial MDP
M
i
and for models
i
6
=
j
,
there is a state
s
such that
(
M
i
,φ
i
)
and
(
M
j
,φ
j
)
are
distinguishable in probability from
s
.
Such an assumption is justified because if models are
identical or are not perceivably different from any state, pur-
poseful robot motion towards distinguishing states
s
cannot
be made for model identification. Further, we assume each
adversarial MDP
M
i
has no transient states, namely that it
is possible with positive probability to reach every state from
every other state. Such an assumption is necessary to pre-
clude scenarios where the environment always prohibits the
controller from reaching a distinguishing state from which it
can differentiate models, in which case model identification
also cannot occur. Relaxation of such assumptions, including
addressing Hidden Markov Models (HMMs) and randomized
policies, is discussed in Section VI.
D. Belief distribution over models
The robot must identify the true model from other com-
peting candidate models by judicious choice of information-
seeking control actions in its system strategy or policy
C
:
S
+
→
Act
c
. Such actions are control inputs that elicit
responses from the environment model and exploit transition
probabilities for the MDPs to reach states that disambiguate
models.
The robot can differentiate environment models by esti-
mating the likelihood that gathered observations came from
the interaction between the robot and a possible environment
model. Since observations may be consistent with many
http://resolver.caltech.edu/CaltechCDSTR:2016.002
candidate models, we indicate the robot’s current certainty at
time
t
about the true model identity in a belief distribution.
The belief distribution is a probability distribution over can-
didate models that evolves with time
t
as new observations
are generated. Given a sequence of states, controlled actions,
and adversarial actions,
s
0
a
0
b
0
s
1
a
1
b
1
s
2
···
s
T
, the belief for
model
i
is updated as
B
i
←
P
i
(
s
0
,a
0
,b
0
,s
1
)
···
P
i
(
s
T
−
1
,a
T
−
1
,b
T
−
1
,s
T
)
B
i
η,
where
η
is a normalization factor to ensure that
∑
j
B
j
= 1
.
E. Problem statement
A
control policy
for the robot is a mapping from finite
sequences of states to actions,
C
:
S
+
→
Act
c
. Without loss
of generality, we refer to the ground truth model as
M
1
and
assume it is one of the candidate models.
Problem 1:
Given
a set of self-consistent candidate mod-
els
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
, where each adversarial MDP
M
i
has no transient states and where for
i
6
=
j
, there is a state
s
such that
(
M
i
,φ
i
)
and
(
M
j
,φ
j
)
are distinguishable in
probability from
s
, and a tolerance
ε
for wrong models,
find
a control policy such the ground truth model is found with
tolerance
ε
, i.e., the belief vector element
B
i
≥
1
−
ε
, where
(
M
i
,φ
i
)
is the ground truth.
F. Example: detecting pursuers in robotic navigation
We return to the running example of a robot vehicle
being followed by other cars. These cars can closely pursue
the robot, such as in a high-speed car chase, keep their
distance to “tail” the robot during surveillance, or sim-
ply be a benign car that happens to follow the target by
chance. In this paper, we perform numerical experiments
for this example using the set of candidate models
M
=
{
(
M
pursue
,φ
pursue
)
,
(
M
k
−
bound
,φ
k
−
bound
)
,
(
M
benign
,φ
benign
)
}
with
N
= 3
models.
Using the notation from the previous section, system vari-
ables
Y
=
{
C
1
,...,C
M
}
indicate the lane occupancy of the
controlled robot car in one of
M
total lanes. Corresponding
environment variables
X
=
{
F
1
,...,F
M
}
indicate the lane
occupancy of a following vehicle. A state of the system at
discrete time
t
is the lane occupancy of the robot and follower
vehicle
s
t
= (
C
m,t
F
m,t
)
.
The robot must enact a strategy
C
(choose and follow
a trajectory) to move between lanes in response to the
follower’s lane choices
F
m
. This strategy
C
must allow the
robot to best determine the follower’s identity as
pursuant
,
k
-bound
, or
benign
.
The candidate models have interaction formulae as fol-
lows.
1)
φ
benign
=
True
, i.e., the benign has no temporal logic
constraint.
2)
φ
k
−
bound
is
∧
r
(
C
r
=
⇒
I
k
F
r
−
k
∨
F
r
−
k
+1
∨
···∨
F
r
+
k
)
.
3)
φ
pursue
=
∧
r
(
C
r
=
⇒
I
k
F
r
)
The
k
-bound
car
must be within
k
lanes of the robot
car at all times and can only move one lane in each discrete
timestep. BLTL is well-suited to describe the
k
-bound
be-
havior since choice of a
k
length interval, denoted by
I
k
,
indicates how quickly the follower must enter the same lane
as the car. The requirement that the car must be within
k
lanes, coupled with single lane moves per timestep, imposes
that the follower has
k
timesteps to reach the robotic car’s
lane. Specifically, response-request formula
φ
= (
C
m
=
⇒
I
k
F
m
)
encodes
that a system request (the car being in lane
m
or
C
m
) will necessitate a response where the follower
reaches lane
m
within
k
steps. We simulated with
k
= 1
lanes. Note that the “pursuer” is a
k
-bound
car with
k
= 0
.
IV. A
LGORITHM FOR INTERACTION WITH
BOUNDED
LTL
CONSTRAINTS
A. Description
We first establish notation used in the algorithms. Let
f
and
g
be two probability mass functions over the same
sample space. Denote the Kullback-Leibler (KL) divergence
[17] between them by
D
(
f,g
)
. Given an index
i
, the set of
indices excluding
i
is denoted by
−
i
=
{
1
,
2
,...,n
}\{
i
}
.
For a state
s
, reference model
(
M
i
,φ
i
)
, and control policy
C
, define
InfoGain
for a robot action
a
min
b
∈
Act
u
∑
j
∈−
i
D
(
f
s,a,b
i
,f
s,a,b
j
)
(8)
where, for control and adversarial actions
a
and
b
, each
f
s,a,b
j
(
·
)
is a probability mass function over the set of MDP
states
S
based on the transition probability of model
j
, i.e.,
f
s,a,b
j
(
s
′
) =
P
j
(
s,a,b,s
′
)
for
s
′
∈
S
.
We now present Algorithm 1 that chooses system control
actions to maximally disambiguate models. Beginning with
a set of candidate models and uniform initial belief distri-
bution, the robot assumes the most likely model
i
(line 6)
with the goal of gathering further evidence to substantiate
model
i
. The robot considers possible models
j
with non-
zero belief elements in the set
Candidates
(line 7). From
such candidates, the robot tries to find another model
j
for
which the assumed model
i
and model
j
are distinguishable
in probability from state
s
in the hope of observing a state
sequence that disqualifies candidate model
j
by violating its
specification
φ
j
. The min-max formulation in line 9 indicates
that the environment may choose an adversarial controller
E
to “obfuscate” or minimize the observed deviation between
models but the system must choose the most informative
controller
C
∗
to embark on a finite horizon trajectory from
state
s
that is least likely to be concordant with assumed
model
i
and other competing model
j
. If such a candidate
model
j
cannot be found in the hopes of violating its
specification
φ
j
, the robot chooses the optimal controller
C
∗
that maximizes the resulting difference in state distributions
measured in terms of the Kullback-Leibler (KL) divergence
http://resolver.caltech.edu/CaltechCDSTR:2016.002
Algorithm 1
Bounded LTL interaction
Require:
Set of candidate models
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
,
wrong model probability tolerance
ε
1:
T
:= max
j
T
(
φ
j
)
2:
Initialize at state
s
3:
t
:= 0
// Start time
4:
B
:= (
1
N
,...,
1
N
)
// Uniform initial belief vector
5:
repeat
6:
i
:= rand arg max
j
B
j
// Randomly break ties
7:
Candidates :=
{
j
|
B
j
>
0
}
8:
if
for some
j
∈
Candidates
,
(
M
j
,φ
j
)
and
(
M
i
,φ
i
)
are distinguishable in probability from
s
then
9:
C
∗
:= arg min
C
max
E
Pr
C
,
E
M
j
(
s
|
=
φ
i
∧
φ
j
)
10:
else
11:
C
∗
:=
InfoGain
(
s,B,
{M
1
,...,
M
N
}
)
12:
end if
13:
Apply controller
C
∗
.
14:
Receive resulting sequence of states
s
1
···
s
T
and
observe adversary actions
b
0
,...b
T
−
1
15:
For each
j
, if
s
0
···
s
T
6|
=
φ
j
, then
B
j
:= 0
16:
Update belief vector using transition probabilities for
(
s
0
,
C
∗
(
s
0
)
,b
0
,s
1
)
,
(
s
1
,
C
∗
(
s
0
s
1
)
,b
1
,s
2
)
,...
17:
Normalize belief vector so that
∑
N
j
=1
B
j
= 1
18:
t
:=
t
+ 1
;
s
:=
s
T
19:
until
for some model
(
M
i
,φ
i
)
,
B
i
≥
1
−
ε
to enable maximum information gain (line 11). We note
one example of the
InfoGain
function in terms of the KL
divergence but our algorithm is general and other measures
of information gain are possible. The novelty of our approach
lies in the exploitation of distinguishing states based on LTL
formulae.
Once the policy given by optimal controller
C
∗
is enacted
and observations
b
0
...b
T
−
1
are gathered, the robot can
immediately discount models
j
where the observed tran-
sitions do not satisfy the model’s BLTL specification
φ
j
(line 15). Such transitions that violate BLTL specifications
are inherently encoded as zero probability transitions in the
model’s adversarial MDP. Then, Bayesian updates of the
belief vector given the observations guide the choice of the
most likely model at the next algorithm iteration (line 16).
Algorithm 2 is a greedy version of Algorithm 1 with
single timestep iterations and where temporal logic for-
mulae are disregarded in the selection of control actions
– potentially allowing for much faster computation times.
At each iteration, the robot enacts a greedy decision to
enact a single, optimal control action
a
∗
, that regardless
of adversarial environmental action
b
, maximizes the KL
divergence between distributions over next
s
′
between model
i
and competing models
j
. We note that in the single timestep
case, an example of the
InfoGain
routine is to maximize
the cumulative KL divergence under assumed model
i
as
opposed to other models
j
.
Algorithm 2
Greedy one-step interaction
Require:
Wrong model probability tolerance
ε
1:
Initialize at state
s
2:
t
:= 0
// Start time
3:
B
:= (
1
N
,...,
1
N
)
// Uniform initial belief vector
4:
repeat
5:
i
:= rand arg max
j
B
j
// Randomly break ties
6:
a
∗
:= arg max
a
∈
Act
c
min
b
∈
Act
u
∑
j
∈−
i
D
(
f
s,a,b
i
,f
s,a,b
j
)
7:
Apply controller action
a
∗
8:
Receive resulting new state
s
′
9:
Update belief vector using transition probabilities for
(
s,a,b,s
′
)
10:
Normalize belief vector so that
∑
N
j
=1
B
j
= 1
11:
t
:=
t
+ 1
;
s
:=
s
′
12:
until
for some model
M
i
,
B
i
≥
1
−
ε
B. Correctness analysis
In this section, we present a correctness result about
Algorithm 1 that provides conditions for convergence to the
ground-truth model when the candidate models are mutually
distinguishable in probability. In this case, we take
= 0
because violating traces, i.e., finite sequences of states, will
eventually occur, causing corresponding elements of the
belief vector to be assigned exactly 0 (at line 15).
Theorem 1:
Let
M
=
{
(
M
i
,φ
i
)
}
N
i
=1
be a set of self-
consistent candidate models such that for each pair
(
i,j
)
,
(
M
i
,φ
i
)
and
(
M
j
,φ
j
)
are distinguishable in probability.
Suppose that the ground-truth model is in
M
(but unknown).
If
ε
= 0
, then Algorithm 1 will terminate with probability 1
and return the ground-truth model.
Proof:
In each iteration, line 17 ensures that the
elements of
B
sum to 1 before the exit condition
B
i
≥
1
−
ε
(line 19) is reached. Thus, if
ε
= 0
, the exit condition can
be rewritten as
B
i
= 1
for some
i
. Consider an iteration of
the loop in Algorithm 1 (lines 5–19). Let
i
be selected as in
line 6 of Algorithm 1. From the exit condition on line 19,
it follows that a loop-invariant is
|
Candidates
| ≥
2
. By
hypothesis each pair of candidate models is distinguishable
in probability, and therefore the condition on line 8 must be
satisfied. Let
C
∗
be selected according to line 9. From the
definition of distinguishable in probability (see (7)), we have
for any adversarial policy
E
,
Pr
C
∗
,
E
M
i
(
s
|
=
φ
i
∧
φ
j
)
<
1
∨
Pr
C
∗
,
E
M
j
(
s
|
=
φ
i
∧
φ
j
)
<
1
.
Suppose the left disjunct, having probability with respect to
M
i
holds, i.e.,
Pr
C
∗
,
E
M
i
(
s
|
=
φ
i
∧
φ
j
)
<
1
. The case where
the right disjunct holds is entirely similar. Hence,
Pr
C
∗
,
E
M
i
(
s
|
=
φ
i
)
<
1
∨
Pr
C
∗
,
E
M
i
(
s
|
=
φ
j
)
<
1
.
(9)
There are now two cases. First, if
(
M
i
,φ
i
)
is the ground-
truth, then the adversary must play such that
Pr
C
∗
,
E
M
i
(
s
|
=
φ
i
) = 1
, which is possible by the assumption of self-
consistency. Thus the other disjunct in (9) must be true, i.e.,
Pr
C
∗
,
E
M
i
(
s
|
=
φ
j
)
<
1
.
(10)
http://resolver.caltech.edu/CaltechCDSTR:2016.002
Fig. 2.
Trial in which the
pursuant
car model is the ground truth.
Simulation is with 5 lanes. Notice that the mass in the belief vector quickly
accumulates on the correct model, and the benign model is dismissed due
to violation within a few trials.
Second, if
(
M
i
,φ
i
)
is not the ground-truth, then we may
conservatively suppose that the adversary plays so as to
ensure that every
φ
j
is satisfied, and thus that no update
occurs on line 15. From line 6, it follows that the first case,
i.e., where
i
is the ground-truth, will occur infinitely often
because it will always have corresponding element of the
belief vector as strictly positive. Because there are finitely
many candidate models, at least one of them, say
j
, will be
chosen infinitely often. Let
p
be the probability on the left-
side of the inequality in (10). After
n
iterations of choosing
model
(
M
j
,φ
j
)
, the probability that
φ
j
is satisfied in all
iterations is
p
n
. Since
p <
1
,
lim
n
→∞
p
n
= 0
, i.e., the
event of all iterations having state sequences that satisfy
φ
j
has probability 0. Therefore, with probability 1 there is an
iteration in which the resulting state sequence does not satisfy
φ
j
. Upon that iteration, at line 15,
B
j
= 0
, and therefore the
candidate model
(
M
j
,φ
j
)
cannot be in
Candidates
again.
Because there are finitely many candidate models, the result
follows.
We note that Algorithm 1 can also handle the case when
models are not distinguishable in probability. In this case,
the
InfoGain
subroutine on line 11 would select control
actions to maximally differentiate models. The analysis and
correctness proof for this case is the subject of future work.
V. N
UMERICAL EXPERIMENTS
We return to the pursuant car example from Section III-F
to illustrate that Algorithm 1 can deduce how the robot car
must change lanes to distinguish pursuers. The simulations
provide intuition to understand the proof in Section IV-B.
In our implementation of Algorithm 1, we take as input
BLTL candidate models and an environment simulator for the
ground truth model which selects among permissible moves
(that respect the BLTL specifications)
The adversarial MDP in this example is formed as a
product of two MDPs, each having state space that is just a
chain graph where each state represents a lane of the road.
Fig. 3. Trial in which the
k
-bound
car model is the ground truth, where
k
= 1
. Simulation is with 5 lanes.
There are two agents: the robot and another car. Intuitively,
we want to construct a useful description about whether
the car is following the robot, and if so, to find a concise
model of the behavior that is amenable to other control
synthesis methods. Here, deciding qualitatively how the other
car (the environment) is moving can be thought of as “fault
detection.” For each agent, from every position there are three
possible actions: stay-in-current-lane, move left, and move
right. Associated with each agent is a parameter
p
∈
[0
,
1]
that provides the probability that the requested action fails.
If attempting to stay in the current lane, with probability
p
2
the outcome is to move into the lane to the left of the current
one, and with probability
p
2
the outcome is to move into the
right lane. If attempting to move to the left, with probability
p
the outcome is to stay in the current lane. If attempting
to go right, with probability
p
the outcome is to stay in the
current lane.
If the agent is in the left-most (respectively, right-most)
lane, then the action to move to the left (respectively, to the
right) always (i.e., with probability 1) results in the agent
staying in its current lane. In the left-most lane (respectively,
right-most) lane, the action to stay in the current lane
fails with probability
p
2
, having the outcome to move right
(respectively, left). This assumption causes the action sets
Act
c
and
Act
u
to be the same at every state.
We ran the simulation on a road with 5 lanes. The value
of the failure probability varied depending on the candidate
MDP.
We use
p
= 0
.
5
for the benign car,
p
= 0
.
2
for the
k
-
bound car, and
p
= 0
.
0
for the pursuant car to describe a
scenario where the pursuer follows the robotic car without
failure. It is important to observe that, while it is clear that
φ
benign
=
True
permits emulation of any other candidate
model, the failure probability ensures that the models can be
distinguished.
A trial in which the ground-truth is the pursuant car is
shown in Figure 2. The pursuant car has very restrictive
behavior since it must always follow the robot, allowing it
http://resolver.caltech.edu/CaltechCDSTR:2016.002
to quickly be differentiated from a benign car, explaining the
extremely quick, monotonic convergence in belief illustrated
in Figure 2. A trial in which the ground-truth is the
k
-
bound car with
k
= 1
is shown in Figure 3. Since the
k
-
bound car need not immediately pursue the robot, it can
be harder to distinguish from other models, explaining the
slower convergence in Figure 3.
VI. C
ONCLUSIONS AND
F
UTURE
W
ORK
A general problem has been posed for identification
of a model described by both dynamics and a temporal
logic specification. A practical method for purposeful robot
movement toward distinguishing states—all while satisfying
feasible candidate task specifications—was presented and its
convergence proven for a class of probabilistic environments.
In future work, we plan to address the optimality of our
algorithm and consider variations featuring robust MDPs
and chance-constrained MDPs. Further, we will extend the
problem to the case of hidden environment states, necessi-
tating the use of Hidden Markov Models (HMMs), and also
consider randomized strategies.
To address the less restrictive problem settings described
above, we will consider several heuristics and algorithm
optimality/time-complexity. Even in the current algorithm,
one can employ heuristics to hasten convergence. For exam-
ple, in line 8 of Algorithm 1, if several candidate models
j
exist, we can select the controller which minimizes the
probability in line 9 over all contending models. We can also
exploit the bounded nature of BLTL to select the candidate
j
which minimizes the time
T
(
φ
i
)
or
T
(
φ
j
)
required to
distinguish between models
i
and
j
and only apply the
controller for this minimal duration in line 13.
Our work is a small but exciting step towards the am-
bitious goal of automated fault detection and recovery for
cyber-physical systems. Reaching this goal may one day
allow, for example, detection of hacked cars in autonomous
transportation networks or compromised drones in networks
of aerial vehicles. Our work makes the valuable insight
that increasingly complex cyber-physical systems may not
immediately exhibit key faults, requiring future robots to
synthesize informative controllers that dynamically interact
with adversarial environments to expose anomalies.
A
CKNOWLEDGMENTS
This work was partially supported by United Technologies
Corporation and IBM, through the industrial cyberphys-
ical systems (iCyPhy) consortium, and by NASA under
the Space Technology Research Grants Program, Grant
NNX12AQ43G. S.P Chinchali is supported by a Stanford
Graduate Fellowship and National Science Foundation Fel-
lowship. The authors thank Richard M. Murray for comments
on an early draft.
R
EFERENCES
[1] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas. Discrete
abstractions of hybrid systems.
Proc. of the IEEE
, 88(7):971–984,
July 2000.
[2] C. Baier and J.-P. Katoen.
Principles of Model Checking
. MIT Press,
2008.
[3] C. Belta, A. Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins, and G. J.
Pappas. Symbolic planning and control of robot motion: Finding
the missing pieces of current methods and ideas.
IEEE Robotics &
Automation Magazine
, pages 61–70, March 2007.
[4] S. Chinchali, S. C. Livingston, U. Topcu, J. W. Burdick, and R. M.
Murray. Towards formal synthesis of reactive controllers for dexterous
robotic manipulation. In
Proc. of International Conference on Robotics
and Automation (ICRA)
, pages 5183–5189, Saint Paul, Minnesota,
USA, May 2012.
[5] T. H. Chung and J. W. Burdick. A decision-making framework for
control strategies in probabilistic search. In
Proc. of International
Conference on Robotics and Automation (ICRA)
, pages 4386–4393,
Roma, Italy, April 2007.
[6] R. Cole and C. K. Yap. Shape from probing.
Journal of Algorithms
,
8:19–38, March 1987.
[7] J. M. Davoren and A. Nerode. Logics for hybrid systems.
Proc. of
the IEEE
, 88(7):985–1010, July 2000.
[8] X. C. Ding, C. Belta, and C. G. Cassandras.
Receding horizon
surveillance with temporal logic specifications. In
49th IEEE Conf.
on Decision and Control (CDC)
, pages 256–261, December 2010.
[9] G. E. Fainekos, A. Girard, and G. J. Pappas. Hierarchical synthesis
of hybrid controllers from temporal logic specifications. In
Proc. of
Hybrid Systems: Computation and Control (HSCC)
, 2007.
[10] E. Gr
̈
adel, W. Thomas, and T. W. (Eds.).
Automata, Logics, and Infinite
Games: A Guide to Current Research
, volume 2500 of
Lecture Notes
in Computer Science
. Springer, 2002.
[11] P. Hebert, J. W. Burdick, T. Howard, N. Hudson, and J. Ma. Action
inference: the next best touch. In
Proc. of ICRA, submitted
, 2013.
[12] K. Hsiao, L. P. Kaelbling, and T. Lozano-P
́
erez. Task-driven tactile
exploration. In
Proc. of Robotics: Science and Systems
, Zaragoza,
Spain, June 2010.
[13] A. Jones, Z. Kong, and C. Belta. Anomaly detection in cyber-physical
systems: A formal methods approach. In
Proc. of the 53rd IEEE
Conference on Decision and Control (CDC)
, pages 848–853, 2014.
[14] A. Jones, M. Schwager, and C. Belta. A receding horizon algorithm
for informative path planning with temporal logic constraints. In
Proc.
of 2013 IEEE International Conference on Robotics and Automation
(ICRA)
, pages 5019–5024. IEEE, 2013.
[15] A. Jones, M. Schwager, and C. Belta. Information-guided persistent
monitoring under temporal logic constraints. In
American Control
Conference (ACC), 2015
, pages 1911–1916. IEEE, 2015.
[16] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas. Temporal logic-
based reactive mission and motion planning.
IEEE Trans. on Robotics
,
25(6):1370–1381, 2009.
[17] S. Kullback and R. A. Leibler. On information and sufficiency.
The
annals of mathematical statistics
, pages 79–86, 1951.
[18] J. Liu and N. Ozay. Abstraction, discretization, and robustness in
temporal logic control of dynamical systems. In
Proceedings of Hybrid
Systems: Computation and Control (HSCC)
, pages 293–302, April
2014.
[19] L. Ljung.
System Identification: Theory for the User
. Prentice-Hall,
1987.
[20] J. Ma.
Real-Time Applications of 3D Object Detection and Tracking
.
PhD thesis, Caltech, Pasadena, CA, USA, 2010.
[21] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Con-
strained model predictive control: stability and optimality.
Automatica
,
36:789–814, 2000.
[22] R. Platt, L. Kaelbling, T. Lozano-Perez, and R. Tedrake. Simultaneous
localization and grasping as a belief space control problem. In
15th
Int. Symposium on Robotics Research (ISRR)
, Flagstaff, Arizona, USA,
August 28–September 1 2011.
[23] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In
Proc. of the 16th ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages
, POPL ’89, pages 179–190, New York,
NY, USA, 1989. ACM.
[24] S. Sarid, B. Xu, and H. Kress-Gazit. Guaranteeing high-level behaviors
while exploring partially known maps. In
Proc. of Robotics: Science
and Systems
, Sydney, Australia, July 2012.
[25] S. Thrun, W. Burgard, and D. Fox.
Probabilistic Robotics
. The MIT
Press, 2006.
[26] T. Wongpiromsarn and E. Frazzoli. Control of probabilistic systems
under dynamic, partially known environments with temporal logic
specifications. In
Proc. of CDC
, pages 7644–7651, December 2012.
http://resolver.caltech.edu/CaltechCDSTR:2016.002