of 8
Inverse Abstraction of Neural Networks Using Symbolic Interpolation
Sumanth Dathathri
1
, Sicun Gao
2
, Richard M. Murray
1
1
Computing and Mathematical Sciences, California Institute of Technology
2
Computer Science and Engineering, University of California, San Diego
Abstract
Neural networks in real-world applications have to satisfy
critical properties such as safety and reliability. The analy-
sis of such properties typically involves extracting informa-
tion through computing
pre-images
of neural networks, but
it is well-known that explicit computation of pre-images is
intractable. We introduce new methods for computing com-
pact symbolic abstractions of pre-images. Our approach relies
on computing approximations that provably overapproximate
and underapproximate the pre-images at all layers. The ab-
straction of pre-images enables formal analysis and knowl-
edge extraction without modifying standard learning algo-
rithms. We show how to use inverse abstractions to automati-
cally extract simple control laws and compact representations
for pre-images corresponding to unsafe outputs. We illustrate
that the extracted abstractions are often interpretable and can
be used for analyzing complex properties.
1 Introduction
Neural networks have shown significant potential as a key
building block of intelligent systems. However, a major
challenge to their real-world application is that it is ex-
tremely difficult to analyze their behaviors and guarantee
critical properties, such as the absence of catastrophic cor-
ner cases. Recent work on verifying neural networks such
as (Zakrzewski 2001; Bunel et al. 2018; Katz et al. 2017)
has focused on developing faster algorithms for validat-
ing or falsifying formal properties of whole neural net-
works directly through their encoding as constraint satisfac-
tion problems. These approaches are designed for generat-
ing counter-examples to (or for verifying) specific proper-
ties for piecewise-linear neural networks. An alternative ap-
proach for analysis is to decompose the large networks and
perform the analysis in a modular way, which is a standard
practice in software program analysis (Hoare 1969). Such
decomposition often provides more information than simple
monolithic analysis and enables us to verify complex prop-
erties in a tractable manner. A crucial task for enabling such
modular analysis is that we must be able to represent and
Correspondence: sdathath@caltech.edu, sicung@ucsd.edu.
The work is supported by DARPA Assured Autonomy, NSF CNS-
1830399 and the VeHICaL project (NSF grant #1545126).
Copyright
c
2019, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
manipulate
pre-images
of programs, or computable func-
tions in general. On neural networks, this translates to be-
ing able to propagate sets of the network outputs backwards
through individual neural layers (as real-valued functions),
eventually to the input domain. However, this is in principle
harder than direct constraint solving, because of the require-
ment of representing and manipulating high-dimensional ge-
ometric shapes that often do not have polynomial-size rep-
resentations. Thus, the important question is how to effi-
ciently compute approximate representations (abstractions)
of such pre-image sets, so that they are both compact and
precise enough for enabling formal analysis, interpretation
and knowledge/policy extraction.
In this paper, we develop algorithms for computing
sym-
bolic abstractions
of pre-images of neural networks. We
bypass the difficulty of representing the exact pre-images,
by maintaining both overapproximations and underapprox-
imations that can be compactly represented as symbolic
constraints. We leverage a recent algorithm for computing
symbolic
interpolants
(Albarghouthi and McMillan 2013),
where an extension of Farkas’ lemma is used to
learn
inter-
polants that have simple structures. The techniques are ap-
plicable because the concepts that are learned by neural net-
works are often simple (Ba and Caruana 2014). We exploit
the network structures and propagate pre-images of subsets
of the output space through each layer to the input space.
We enhance scalability of the algorithms on piecewise-linear
neural networks by designing heuristics for the specific sym-
bolic forms of the abstractions.
In experiments, we focus on knowledge/policy analysis
and extraction for two well-known control environments:
cart-pole and swimmer. We show that for the multilayer per-
ceptron (MLP) network policies trained through standard re-
inforcement learning algorithms, we can extract knowledge
in the form of compact abstractions. For cart-pole, the ex-
tracted policy achieves a perfect score. Using this extracted
policy we are able to formally verify/satisfy certain complex
safety properties. For swimmer, we show how high torque
outputs are mapped to a compact representation in the in-
put space. We believe these techniques will be important for
analyzing learning-enabled components in practical control
applications.
Related Work
Model Extraction:
There has been recent
To appear, 2019 AAAI Conference on Artificial Intelligence
http://www.cds.caltech.edu/~murray/preprints/dgm19-aiaa.pdf
work in extracting verifiable and explainable models from
trained networks. In this direction, (Bastani, Pu, and Solar-
Lezama 2018) introduce an imitation learning-based ap-
proach to generate data to train a decision tree that allows for
easier verification. In (Verma et al. 2018), the authors use a
trained neural network to guide a local search over program-
matic policies that are human-readable and more verifiable
than the complex neural networks themselves. In contrast to
both of the above described approaches that focus on rein-
forcement learning, our work introduces a general tool for
the analysis of neural networks by computing abstractions.
Additionally, our approach does not rely on learning simpler
models or modifying the training process, and extracts sym-
bolic abstractions completely algorithmically for the origi-
nal neural network. In (Mahendran and Vedaldi 2015), the
authors propose a framework for inverting representations.
They reconstruct the input image given an encoding of the
image. This is different from our work, where given a set of
outputs we compute (approximately) the entire pre-image.
Interpolants have frequently been used to help improve the
scalability of software verification, automated theorem prov-
ing, and constraint solving (Bonacina and Johansson 2015;
Dathathri et al. 2017; Kroening and Weissenbacher 2011).
Input-Output Abstraction:
There has been considerable
progress in the direction of computing forward abstractions
i.e. ranges of outputs for a given set of inputs and using
them to verify simple reachability/robustness properties. In
(Wang et al. 2018), symbolic-interval analysis coupled with
heurisitics for tightening the computed bounds is used to
compute approximate output sets corresponding to a set of
box-constrained inputs. In (Gehr et al. 2018), in a similar
spirit, abstract-interpretation is leveraged to compute ap-
proximate output sets for a given set of inputs. Both of
these abstraction-based approaches approaches are able to
certify robustness/reachability properties significantly faster
than solver-based approaches, such as (Katz et al. 2017).
However, reasoning about complex properties (for e.g., gen-
eral specifications in linear-arithmetic over the inputs and
outputs of the neural network) is still not possible with
these output abstractions. Additionally, these abstractions
are also not well suited for policy/knowledge extraction. For
instance, for a cart-pole controller, it is not directly possible
to answer the question:
What makes the cart go left?
with
the above discussed techniques. In contrast, our inverse ab-
stractions can be used for extracting such knowledge from
trained networks (see Section 5.2), and the computed com-
pact abstractions can then be used with other solvers for fur-
ther complex analysis. However, if only a few simple ro-
bustness/reachability properties are to be verified, direction
verification is preferable instead of our approach that has to
represent and manipulating complex symbolic sets.
2 Preliminaries
2.1 Neural Networks and Constraints
Consider a neural network
f
with
n
layers. That is,
f
(
x
)=
h.g
n
.g
n
1
.....g
1
(
x
)
where
g
r
is the transfer function repre-
senting the map from the input to the output space for layer
r
and
h
:
R
k
!
Y
is a map from the logits to the
k
class-
labels (e.g.
arg max
). For example, if a network has
n
1
layers with ReLU activation and a final linear layer:
g
r
(
z
) = max(
W
r
x
+
b
r
,
0)
8
r
2
1
,
2
,...,n
1
g
r
(
z
)=
W
r
x
+
b
r
r
=
n.
For simplicity, we only discuss neural networks that map to
a discrete set of outputs but the approach is valid even when
the network produces a continuous set of outputs. By
y
f
i
(
x
)
,
we refer to the output from the
i
th
layer of the neural net-
work i.e
y
f
i
(
x
)=
g
i
.g
i
1
.....g
1
(
x
)
. Often, in classification
tasks, the outputs from the layer
g
n
are fed through a soft-
max layer to normalize the scores. Here, in our analysis, we
do not consider the softmax layer as it preserves the ordering
amongst scores corresponding to the different classes. For a
vector
z
2
R
m
,
arg max
i
{
z
i
}
= arg max
i
{
softmax
(
z
)
i
}
.
Piecewise-linear neural networks (without the softmax
layer) can be expressed as constraints in the theory of quan-
tifier free linear rational arithmetic (QFLRA). This includes
neural networks with activation functions that are piecewise-
linear (e.g. ReLU, Leaky ReLU, MaxOut, MaxPool). We
follow the encoding described in (Ehlers 2017). For exam-
ple, a ReLU node
y
= max(0
,x
)
is written as:
(
y
=0
^
x
0)
_
(
y
=
x
^
x
0)
.
The entire network is similarly encoded into QFLRA.
As a slight abuse of notation we interchangeably use
g
(
z
)
to represent the map of
z
through the function
g
, and the
first-order logic constraint that enforces the same. For ex-
ample, consider the function
g
(
x
) = max(0
,w
T
x
)
. We in-
terchangeably use
y
=
g
(
x
)
to also represent the constraint
(
y
=
w
T
x
^
w
T
x
0)
_
(
y
=0
^
w
T
x<
0)
. For a satisfy-
ing assignment
(
y, x
)
for this constraint, we have
y
=
g
(
x
)
.
For a formula
'
that has a vector of free variables
s
, we write
x
|
=
'
if
'
interprets to
True
when we set
s
=
x
.
Definition 1 (Pre-images)
Consider a neural network
f
.
Let
X
be the domain and
Y
be the codomain. The preim-
age of a set
S
Y
for the neural network
f
is the set
{
x
2
X
|
f
(
x
)
2
S
}
.
For example, consider a neural network based cart-pole
controller with the action space
{
left
,
right
}
. The pre-
image corresponding to
S
=
{
right
}
is the set of observa-
tions that cause the controller to output
right
as the action.
In the remainder of this work, we refer to this pre-image of
set
S
for the neural network
f
as
Pre
f
(
S
)
. The exact pre-
image of the network for a given set
S
, in the worst case, can
have exponentially many linear regions. To overcome this
we consider abstractions that provably (over) underapprox-
imate the exact pre-image
Pre
f
(
S
)
. The restriction on the
structure of
S
for our work is that it has to be expressible in
QFLRA, which includes all constraints that have half-spaces
as atoms combined with Boolean operators.
2.2 Symbolic Interpolation
Symbolic interpolation is a well-studied concept in propo-
sitional and first-order logic (Craig 1957). Given two
quantifier-free first-order formulas A and B, such that
A
^
B
is unsatisfiable, a Craig interpolant I is a formula satisfying:
A
=
)
I
;
B
^
I
=
)?
;
I
only contain variables that are shared by
A
and
B
.
Intuitively, the interpolant
I
provides an overapproximation
of
A
that is still precise enough to exhibit its conflict with
B
, and does not contain redundant information that involves
any variable that is not shared by both
A
and
B
.
Definition 2 (Overapproximation)
We call
(
x
)
an over-
approximator of
A
(
x
)
if
8
x.A
(
x
)=
)
(
x
)
.
Definition 3 (Underapproximation)
Conversely, we call
(
x
)
the underapproximator of
A
(
x
)
if
8
x.
(
x
)=
)
A
(
x
)
.
When
A
^
B
is not satisfiable, Craig’s interpolation theorem
guarantees the existence of an interpolant
I
such that
I
over-
approximates
A
and
¬
I
overapproximates
B
. These inter-
polants have found application in compositional approaches
to program-verification and SMT solving. In our work, we
build on the algorithm from (Albarghouthi and McMillan
2013) for computing interpolants, as opposed to other ap-
proaches based on lazy SMT that produces complex inter-
polants. The intuition behind this choice is that simpler in-
terpolants are more likely to provide general explanations
corresponding to the neural network’s parameters and the
task for which the network was trained, rather than complex
ones that may overfit the specific instantiation. Further, sim-
pler interpolants provide the added advantage of being easier
to reason over using automated reasoning engines (e.g.
z3
).
3 Inverse Abstraction of Neural Networks
We seek to compute approximations for the pre-image that
closely approximate the pre-image, but are provable (super)
subsets of the pre-image. The fact that these are provable
(over) underapproximations (unlike the approximated mod-
els in (Bastani, Pu, and Solar-Lezama 2018; Verma et al.
2018)) allows us to prove properties that hold for the neu-
ral network itself. For example, suppose we wish to prove
a property that for every input from some set
W
, the corre-
sponding output from the neural network does not belong
to the set
S
. By computing an overapproximation
O
for
Pre
f
(
S
)
and showing that
O
^
W
is not satisfiable we have
verified the property. Similarly, if the property was that every
output belongs to some set
S
, then by computing the under-
approximation
U
for
Pre
f
(
S
)
and showing that
W
=
)
U
is valid, we have verified the property.
Here, we give a brief overview of the algorithm for com-
puting the overapproximation of the pre-image for set
S
.
Consider the neural network described earlier. Let
p
(
f,S
)
n
be
the set of inputs to layer
g
n
of the neural network that lead
to the output being in set
S
. Similarly, let
p
(
f,S
)
n
1
be the set
of inputs to layer
g
n
1
of the neural network that result in
outputs in
S
. Note that
p
(
f,k
)
n
is the set of assignments to
s
that satisfy:
h
(
g
n
(
s
))
|
=
S.
For the other layers (
r<n
), we can iteratively define
p
(
f,S
)
r
as assignments to
s
that satisfy:
p
(
f,S
)
r
=
{
s
|
g
r
(
s
)
|
=
p
(
f,S
)
r
+1
}
(1)
The core idea is to begin by computing an approximate
representation of
p
(
f,k
)
n
, and using this to then compute the
approximations for
p
(
f,k
)
n
1
. And, then by iterating through the
layers of the network we can compute an approximation for
Pre
f
(
S
)
. Computing these approximations involves prov-
ing the interpolation condition (See Section 3.1). We could
compute the approximation across the entire network, but
proving the interpolation condition for the entire network is
computationally expensive. This is because the worst-case
complexity of proving properties for piecewise-linear net-
works scales exponentially in the number of nodes under
consideration (Katz et al. 2017). The layer-wise approach
breaks down the problem, where we compute approxima-
tions for each layer. This requires proving properties across
one layer at a time – can be further simplified to computing
approximations over sets of nodes instead of entire layers.
3.1 Computing Overapproximations
Here we outline how to compute a useful over-
approximation of
p
(
f,S
)
r
, assuming we have the over-
approximation of
p
(
f,S
)
r
+1
. Denote the over-approximation of
p
(
f,S
)
r
+1
as
O
(
f,S
)
r
+1
with
O
(
f,S
)
n
+1
=
S
. First, consider a set of
randomly chosen points
̄
X
, either by sampling from the in-
put domain or from the training data. Let
X
S
̄
X
be the
set of points such that for every
x
2
X
S
,
f
(
x
)
62
S
. Intro-
duce the auxiliary free variable vector
̄
p
r
and construct the
formula:
r
1
:=
_
̄
x
2
X
S
̄
p
r
=
y
f
r
1
( ̄
x
)
.
(2)
This formula allows
̄
p
r
to assume the value of the output
at layer
g
r
1
corresponding to inputs from
X
S
. Recall that
y
f
r
1
(
x
)=
g
r
1
.g
r
2
...g
1
(
x
)
i.e.
y
f
r
1
(
x
)
is the vector of
activation values corresponding to
x
from layer
r
1
of the
network. Now, consider the formula:
r
:=
g
r
( ̄
p
r
)
|
=
O
(
f,S
)
r
+1
.
(3)
Note that the set of valid assignments for
̄
p
r
represents an
overapproximation of the set
p
(
f,S
)
r
. This is because:
g
r
( ̄
p
r
)
|
=
p
(
f,S
)
r
+1
=
)
g
r
( ̄
p
r
)
|
=
O
(
f,S
)
r
+1
,
which follows from
O
(
f,S
)
r
+1
overapproximating
p
(
f,S
)
r
+1
and as
a consequence of equation (1).
Lemma 4
If
O
(
f,S
)
r
+1
^
r
is unsatisfiable, then the formula
r
^
r
1
is unsatisfiable for each
r
2
{
1
,
2
,...,n
}
.
Proof
Assume
O
(
f,S
)
r
+1
^
r
is unsatisfiable. Suppose
9
̄
p
r
sat-
isfying
r
1
and
r
. By definition of
r
, we have
g
r
( ̄
p
r
)
|
=
r
and by eq. (3),
g
r
( ̄
p
r
)
|
=
O
(
f,S
)
r
+1
. This results in a contra-
diction since
O
(
f,S
)
r
+1
^
r
is unsatisfiable.
Lemma 5
If
O
(
f,S
)
r
+1
^
r
is unsatisfiable,
9
I
r
that satisfies
the following:
p
(
f,S
)
r
=
)
r
=
)
I
r
,
(4)
I
r
=
)
¬
r
1
.
(5)
Figure 1: Illustration of the approach.
r
is the set of inputs
to layer
g
r
(
.
)
that map into the overapproximating abstrac-
tion for the subsequent layer
g
r
+1
(
.
)
.
r
1
is a set of inputs
to layer
g
r
(
.
)
sampled so that the neural network maps them
to outside the set
S
, where
S
is the set whose pre-image is
being abstracted.
Proof
Lemma 4 and Craig’s Interpolation theorem guaran-
tee the existence of an
I
r
satisfying the two conditions.
Lemma 5 guarantees the existence of an overapproximator
of
p
(
f,S
)
r
that is disjoint from the set of sampled points that
map to the complement of
S
. This ensures that the over-
approximations do not get arbitrarily slack as we propagate
the interpolants through the layers and remain tight. Further,
since we use the algorithm from (Albarghouthi and McMil-
lan 2013) that is designed to compute simple interpolants,
they are likely to generalize to other data points. A formal
description is provided in Algorithm 1. Note that
I
r
is only
a function of the
̄
p
r
(the only shared free variables between
r
1
and
r
). Figure 1 outlines the setup for a single layer.
3.2 Bounding the Problem
The interpolants computed that serve as overapproximations
are in the disjunctive normal form (DNF) with the atoms be-
ing half-spaces. The convergence of the algorithm for com-
puting the interpolant can be made faster by restricting the
domain in which
r
1
and
r
need to be separated by the in-
terpolant. Given a lower bound (
l
r
) and an upper bound (
u
r
)
on the outputs
y
f
r
, we can construct an additional constraint
B
r
defined as:
B
r
:= ( ̄
p
r
u
r
)
^
( ̄
p
r
l
r
)
.
We then compute the interpolant such that
(
B
r
^
r
)=
)
I
r
and
I
r
=
)
¬
r
1
. To compute
u
r
and
l
r
we use the
relaxation proposed in (Ehlers 2017). In most tasks, the in-
puts come from a bounded domain. For example in image
processing, pixels have values ranging between 0 and 255.
These bounds can then be propagated through the network
as in (Ehlers 2017) using a convex relaxation of the network.
Every ReLU node
y
= max(0
,x
)
that behaves non-linearly
is approximated with its convex hull:
y
0
,y
0
,y
u
x
x
l
x
u
x
l
x
(6)
Using this relaxation for the non-linear constraints results in
a linear program (LP). By optimizing over the resulting LP,
the bounds for the nodes can be computed in a sequential
manner, starting with the input layer. These bounds can fur-
ther be tightened by splitting the input domain as in (Bunel et
al. 2018), but for our work we do not split the input domain.
As an added benefit, bounding the problem makes checking
the interpolation condition significantly faster.
Algorithm 1
Computing compact abstractions
1:
procedure
A
BSTRACTION
R
OUTINE
2:
.
Returns Simple Overapproximator of
Pre
f
(
S
)
3:
Compute
8
̄
x
2
X
S
,
y
f
1
( ̄
x
)=
g
1
( ̄
x
)
4:
Compute
l
1
,
u
1
and
B
1
5:
for
r=2 . . . n
do
6:
Compute
8
̄
x
2
X
S
,
y
f
r
( ̄
x
)=
g
r
( ̄
x
)
7:
Compute
l
r
,
u
r
and construct
B
r
8:
for
r=n . . . 1
do
9:
Construct
r
1
(See equation (2))
10:
Construct
r
^
B
r
(See equation (3))
11:
Compute
I
r
satisfying equations (4) and (5)
12:
Set
O
(
f,S
)
r
=
I
r
return
O
(
f,S
)
1
Theorem 6
(Soundness and Completeness) Algorithm 1 al-
ways terminates to return an overapproximator
O
(
f,S
)
1
of
Pre
f
(
S
)
. Further,
O
(
f,S
)
1
is disjoint from
X
S
.
Proof
At
r
=
n
, by construction we have
O
(
f,S
)
r
+1
^
r
is not
satisfiable. Lemmas 4 and 5 guarantee the existence of an
overapproximator (interpolant) satisfying equations (4) and
(5). The algorithm from (Albarghouthi and McMillan 2013)
is both sound and complete, and hence is guaranteed to find
an interpolant if one exists. For
r
=
n
1
, we again have
that
O
(
f,S
)
r
+1
^
r
is not satisfiable since
O
(
f,S
)
r
+1
=
)
¬
r
(equation (5)). Repeating the arguments above, for every
r
n
, we compute
O
(
f,S
)
r
+1
(
I
r
) satisfying equations (4) and
(5). Hence, Algorithm 1 terminates to return an overapprox-
imator for
Pre
f
(
S
)
(equation (4)) that is disjoint from
X
S
(equation (5)).
3.3 2D example
To illustrate the ideas developed above, we introduce a small
example. Consider a simple-neural network
f
:
R
2
!
R
4
,
with 2 hidden layers with 10 ReLU nodes in each layer.
The network is trained to predict which quadrant a point
x
belongs to and achieves an accuracy of
99
.
65%
on hold-
out data. Here,
arg max(
f
(
x
))
represents the quadrant
x
belongs to. Along each dimension, the input is restricted
to the domain
[
1
,
1]
. Here, we are interested in comput-
ing a compact representation of the set the network classi-
fies as the third quadrant. To compute an overapproxima-
tion, we sample a set of 150 points that are classified as be-
ing outside the third quadrant by the neural network. Here,
O
f,S
n
=
V
j
=1
,
2
,
4
(
y
3
>y
j
)
where
y
=
f
(
x
)
is the output
of the neural network. To compute the underapproximation,
we sample 50 points that are classified as being in the third
Figure 2: Left: An overapproximation of the region clas-
sified by the network as the third quadrant. Right: An un-
derapproximation of the region classified by the network as
the third quadrant. The overapproximation and the underap-
proximation are close to each other. It can be observed that
in both the under and the overapproximations, parts of the
3
rd
and the
4
th
quadrant are misclassified – this indicates
faulty behavior for the neural network
f
.
quadrant. Here,
O
f,S
n
=
W
j
=2
,
3
,
4
(
y
3
<y
j
)
and Figure 2
depicts the over and under-approximations computed. The
overapproximation computed is a union of 5 polytopes while
the underapproximation computed consists of 2 polytopes.
4 Algorithms
The core of the computational effort in Algorithm 1 comes
from Line 11 where the interpolant
I
r
is computed. Com-
puting the interpolant for two constraints
A
and
B
has the
following key steps:
Sampling polytopes satisfying
A
and
B
and separating
them with Farkas’ lemma,
Checking the conditions
A
=
)
I
and
I
=
)
¬
B
, and
generating counter examples (if any),
Merging and splitting of the polytopes into sets, which are
then separated by Farkas’ lemma.
Counterexample guided abstraction refinement (CE-
GAR)(Clarke et al. 2003) – checking the interpolation
condition and generating counter-examples to refine the
abstraction – is integral to computing the interpolants.
The sampling at the first stage of CEGAR can done with
training samples, and subsequently an oracle can be used
to find counter-examples to the condition
A
=
)
I
and
I
=
)
¬
B
. Checking the conditions
A
=
)
I
and
I
=
)
¬
B
using an external oracle turns out to be the most
expensive part of the algorithm.
Note that in our work
A
encodes the behavior of a layer
of the neural network (with the nonlinearity), and conven-
tional SMT solvers are inefficient at verifying properties for
non-linear neural networks. For checking
A
=
)
I
, we use
the framework (PLNN-v) from (Bunel et al. 2018) designed
for verifying piecewise-linear neural networks. PLNN-v uti-
lizes an encoding where the network and the property to be
verified are encoded as a single network (
̄
f
), and an opti-
mization problem is solved to determine if there exists an
input to generate an output whose value is greater than 0.
If there is none, the property is
unsat
over the input do-
main for original neural network
f
.
A
=
)
I
is verified by
checking that
A
^
¬
I
is unsatisfiable. Recall that for each
r
,
O
f,S
r
is in DNF with linear atoms. For a layer
g
r
(
.
)
,
A
has
the form:
(
s
=
g
r
( ̄
p
r
))
^
_
i
T
i
s
t
i
!
,
and
I
has the form
W
i
Q
m
̄
p
r
q
m
. Then,
A
^
¬
I
can
equivalently be written as:
(
s
=
g
r
( ̄
p
r
)))
^
(
max(max
i
(
max
j
(
Q
i,j
̄
p
r
+
q
i,j
)
,
max
m
(
max
n
(
T
m,n
s
+
t
r
))
0)
(7)
This can then be encoded into a neural network with
g
r
(
.
)
as
the first layer followed by a sequence of MaxPool layers to
encode the above constraint. For checking
I
=
)
¬
B
we
use the SMT-solver z3 (De Moura and Bjørner 2008).
Splitting Heuristic
The algorithm for computing inter-
polants from (Albarghouthi and McMillan 2013) relies on
sampling and separating sets of polytopes
S
A
and
S
B
sat-
isfying formulas
A
and
B
respectively. The objective is to
separate the sets of polytopes by sequentially applying a set
of merging and splitting heuristics with hyperplanes gen-
erated using Farkas’ lemma. If the two sets of polytopes
cannot be separated by a single hyperplane, the sets are
broken down into smaller subsets using the
unsat-core
returned by z3. However, generating the
unsat-core
is
computationally expensive and further, we do not use
z
3
for
checking
A
=
)
I
. Alternatively, we develop an intuitive
heurstic where we first consider one polytope each satisfying
A
i
2
S
A
and
B
i
2
S
B
. Then, we compute a separating hy-
perplane
h
w, x
i
+
b
=0
such that
h
w, x
i
+
b<
0=
)
A
i
and
h
w, x
i
+
b>
0=
)
B
i
. Subsequently, we check if
any of the other polytopes in our set are already separated
by this hyperplane. If there exists a polytope
p
satisfying
A
that is not separated by the hyperplane, we combine
p
with
the rest of the separated polytopes from
A
and compute a
new hyperplane
h
ˆ
w, x
i
+
b
=0
. If we cannot compute such
a hyperplane, we split
p
from the rest. This is outlined in
Algorithm 2, and the polytopes returned by Algorithm 2 are
split from the initial set. In Algorithm 2,
x
is the set of shared
free variables for formulas
A
and
B
.
Merging Heuristic
In (Albarghouthi and McMillan
2013), the heuristic used for merging is to group together
polytopes based on the syntactic similarity. However, for our
problem all the sampled polytopes corresponding to the gen-
erated counter-examples during CEGAR are syntactically
similar. Instead, we merge the polytopes into the set that
has the closest matching pattern of activation states (e.g.
constant or linear for ReLU nodes), in terms of Hamming
distance. This results in data-points/counter-examples that
have similar non-linear activation patterns being grouped to-
gether.
Algorithm 2
Splitting Heuristic
1:
procedure
S
PLIT
R
OUTINE
Require:
S
A
=
{
A
1
,...,A
c
}
(Polytopes satisfying
A
),
S
B
=
{
B
1
,...,B
d
}
(Polytopes satisfying
B
)
2:
Set
A
count
=1
,
B
count
=1
3:
Compute
(
w, b
)
:
h
w, x
i
+
b
=0
separates
A
1
,
B
1
4:
A
sat-set
=
;
,
B
sat-set
=
;
, Unsat-Set
=
;
5:
while
A
count
c
_
B
count
d
do
6:
A
old-count
=
A
count
,B
old-count
=
B
count
7:
for
i=
A
old-count
,...,c
do
8:
A
count
=
A
count
+1
9:
if
A
i
62
A
sat-set
then
10:
if
h
w, x
i
+
b<
0=
)
A
then
11:
A
sat
set
=
A
sat
set
[
{
A
i
}
12:
else
13:
̄
S
A
=
A
sat
set
[
{
A
i
}
14:
try
Find
(
w, b
)
to sep.
̄
S
A
,
B
sat-set
15:
catch
Unsat-Set
=
Unsat-Set
[
{
A
i
}
16:
break
17:
for
i=
B
old-count
,...,d
do
18:
B
count
=
B
count
+1
19:
if
B
i
62
A
sat-set
then
20:
if
h
w, x
i
+
b>
0=
)
B
then
21:
B
sat
set
=
B
sat
set
[
{
B
i
}
22:
else
23:
̄
S
B
=
B
sat
set
[
{
B
i
}
24:
try
Find
(
w, b
)
to sep.
̄
S
B
,
A
sat-set
25:
catch
Unsat-Set
=
Unsat-Set
[
{
B
i
}
26:
break
return
Unsat-Set
5 Experiments
In this section, we implement and test our approach with
neural networks trained on multiple tasks.
5.1 2D toy-example
We use the simple 2D-example introduced in Section 3.3 to
study the scalability of the approach. On the same task, we
train networks of varying sizes and measure the run-times
for computing underapproximations of the third quarant, as
classified by the neural network. Figure 3 depicts the run-
times for different network sizes.
Robustness
We use the computed underapproximations to
verify the robustness of the classifier. Robustness has been
extensively studied for classifiers, particularly in image-
processing (Szegedy et al. 2013; Carlini and Wagner 2017).
For a given input
x
and the corresponding output-label
k
, we
say the classifier
f
is
robust if
8
̄
x
:
k
̄
x
x
k
1
, f
( ̄
x
)=
k.
To measure the robustness of the networks trained on this
task, we compute both an underapproximation
U
f,S
and an
overapproximation
O
f,S
of the pre-image corresponding to
third quadrant for a network with 4-hidden layers and 16
nodes per hidden layer. For a set of 50 points from the third
quadrant and
"
=0
.
5
(recall the problem domain is
[
1
,
1]
Figure 3: Runtimes for computing abstractions. Left: Vary-
ing number of nodes in every hidden layer. Right: Varying
depth of the trained neural network. The run-time scales ex-
ponentially with increasing number of nodes, this is because
the worst case complexity of checking the interpolation con-
dition scales exponentially in the size of the hidden layer.
The run-time scales almost linearly with increasing depth.
along each dimension), we check with
z3
for each point if
there exists a counter-example satisfying:
k
̄
x
x
k
1
,
̄
x
62
U
f,S
,
and
k
̄
x
x
k
1
,
̄
x
62
O
f,S
.
If a counter-example is found for both conditions, the point
is not robust, and if a counter-example is found for the un-
derapproximation but not the overapproximation, the point’s
robustness is unknown. If no counter-example is found for
both conditions, the point is robust. We are able to vali-
date/invalidate the robustness of 49 points and for one point,
the result is unknown, and verifying these set of properties
using the computed abstractions takes 1.4s. This shows that
the approximations are quite accurate for this task. The com-
putations were performed on a 2.40GHz Quadcore machine
with 16 GB of RAM.
5.2 Cart-pole Control
We consider the classical control problem introduced in
(Barto, Sutton, and Anderson 1983). The inputs to the
network are observations from a four dimensional state
space comprising of the position of the cart (
x
), the velocity
of the cart(
̇
x
), the angle of the pole (
) and the angular
velocity of the pole (
̇
)
. We train a neural network with
2-hidden layers for the problem with Deep-Q learning
using the environment in (OpenAI-CartPole-v0 2018). The
neural-network achieves a reward of perfect score of 200.0,
averaged over 100 episodes. The output from the network
maps to the discrete actions
{
left
,
right
}
. For both the
output actions, we compute the overapproximations of the
pre-image and computing each abstraction takes under 5
minutes. Note that since there are just two output classes,
the negation of the overapproximation of one output action
results in an underapproximation of the other output action.
Both the overapproximations consists of a union of two
half-spaces, which implies that the underapproximations
are just one single polytope.
On replacing the neural network controller with a con-
troller based on the overapproximation corresponding to
(a)
(b)
(c)
Figure 4: (a) Cart-pole: the neural-network controller is abstracted into simple control laws, (b) Computed abstractions for the
neural-network cart-pole controller. Left:Varying
x,
with
( ̇
x,
̇
)=0
. Right: Varying
,
̇
with
(
x,
̇
x
)=0
, (c) Swimmer-robot:
the set of inputs corresponding to high-torque outputs are abstracted into a simple representation.
{
left
}
, the new controller still achieves a perfect score of
200
.
0
. This shows that the abstraction closely matches the
exact pre-images for the neural-network. Further, these sim-
ple abstractions give insight into the internal strategy learnt
by the neural network. The computed overapproximation for
the pre-image of the output set
{
left
}
(
Pre
f
(
left
)
) is:
(
0
.
335
x
0
.
06 ̇
x
+0
.
918
+0
.
202
̇
0
.
665)
_
(
0
.
110
x
+0
.
156 ̇
x
+0
.
950
+0
.
245
̇
0
.
015)
.
We see that a negative
x
(the cart is to the left of the
workspace), causes the cart to apply a force to the right. A
negative
causes the controller to make the cart move left.
(See Figure 5). This matches the expected intuitive behavior.
Further, we can use the approximations to verify proper-
ties about the neural-network. For e.g., consider the prop-
erty that
8
x
t
,
̇
x
t
,
t
,
̇
t
such that
k
x
t
k
0
.
1
,
k
̇
x
t
k
0
.
1
,
k
t
k
0
.
1
,
k
̇
t
k
0
.
1
, the condition
k
̇
t
+1
k
0
.
5
holds. First, we can show that for all points that satisfy the
overapproximation of
Pre
f
(
left
)
and the action
left
,
the property holds with the cart-pole dynamics. Next, we
can repeat a similar procedure with
Pre
f
(
right
)
, to fully
verify the neural network controller. We verify this prop-
erty with
dReal
(Gao, Kong, and Clarke 2013), as it can
reason over the
sin
and
cos
functions that occur in the
dynamics. This computation takes
0
.
124
s. However, on
checking for the condition (with the underapproximations)
k
̇
t
+1
k
0
.
3
, the solver finds a counterexample with
(
x
t
,
̇
x
t
,
t
,
̇
t
)=(
0
.
09
,
0
.
09
,
0
.
0
,
0
.
097)
as the initial
condition with
̇
t
+1
=
0
.
31
in 0.037 s.
5.3 Swimmer
For this task, we construct a compact abstraction that allows
for run-time monitoring. The setting for the problem is to
determine if a noisy observation by a monitor could possi-
bly lead to unsafe behavior. The controller we consider is
a neural network with 2 hidden layers trained with proximal
policy optimization (Schulman et al. 2017) on the
Swimmer
environment (OpenAI-Swimmer-v2 2018). The task is to
control a 3-link robot in a viscous fluid to make it swim for-
ward as fast as possible. The network (
f
) maps from an 8-
dimensional state space (
x
2
R
8
) to a 2-dimensional space
(
1
,
2
)
corresponding to the joint actuation torques.
For this task, suppose that in the the domain
x
2
[
2
,
2]
8
,
the observations made by a run-time monitor are noisy such
that
k
x
x
true
k
1
0
.
1
, where
x
is the observed state and
x
true
is the true state. The controller has access to
x
true
,
while the monitor only has the noisy reading
x
. We want
to construct a monitor that during the operation of the robot
flags an input
x
as unsafe if
x
is a noisy observation and it
is possible that the true state
x
true
can cause a large torque.
Formally,
x
is unsafe if it satisfies:
x
2
[
2
,
2]
8
^9
̄
x.
k
̄
x
x
k
1
0
.
1
^
f
( ̄
x
)=(
1
,
2
)
^
|
1
|
+
|
2
|
1
.
0
.
Since the monitoring is run-time, the flagging has to be
near instantaneous and we would like to avoid reasoning
over the entire network. To allow for this, we compute an
overapproximating abstraction
'
( ̄
x
)
for the set of inputs to
the network in the domain
[
2
.
1
,
2
.
1]
8
such that the network
outputs
|
1
|
+
|
2
|
1
.
0
. The monitor can be set up using
'
( ̄
x
)
as follows:
x
2
[
2
,
2]
8
^9
̄
x.
k
̄
x
x
k
1
0
.
1
^
'
( ̄
x
)
.
Algorithm 1 computes a
'
( ̄
x
)
that has a simple structure
such that, for 50 inputs (sampled from observations seen
during training) such that
x
2
[
2
,
2]
8
, the average time
per input for checking the condition above with
z3
is 0.14
seconds. This time can further be reduced by parallelizing
the check across polytopes.
6 Conclusion
We have developed an approach to algorithmically abstract
neural network pre-images into compact representations that
allow for interpretation and verification. The approach intro-
duced here opens several possible directions for future con-
tributions. An interesting direction to explore is if the cur-
rent approach can be coupled with current verification al-
gorithms for neural networks to improve verification itself.
Another avenue to explore is if the abstraction procedure in-
troduced in our work can be coupled with training to learn
neural networks that satisfy certain desired properties. Al-
ternatively, given an abstraction for a neural network, an in-
teresting open question is if we can tune the abstraction to
satisfy certain desired specifications without compromising
significantly on performance.
References
[Albarghouthi and McMillan 2013]
Albarghouthi, A., and
McMillan, K. L. 2013. Beautiful interpolants. In Shary-
gina, N., and Veith, H., eds.,
Computer Aided Verification
,
313–329. Berlin, Heidelberg: Springer Berlin Heidelberg.
[Ba and Caruana 2014]
Ba, J., and Caruana, R. 2014. Do
deep nets really need to be deep? In Ghahramani, Z.;
Welling, M.; Cortes, C.; Lawrence, N. D.; and Weinberger,
K. Q., eds.,
Advances in Neural Information Processing Sys-
tems 27
. Curran Associates, Inc. 2654–2662.
[Barto, Sutton, and Anderson 1983]
Barto, A. G.; Sutton,
R. S.; and Anderson, C. W. 1983. Neuronlike adaptive
elements that can solve difficult learning control problems.
IEEE Transactions on Systems, Man, and Cybernetics
SMC-
13(5):834–846.
[Bastani, Pu, and Solar-Lezama 2018]
Bastani, O.; Pu, Y.;
and Solar-Lezama, A. 2018. Verifiable reinforcement learn-
ing via policy extraction. In
Neural Information Processing
Systems, NIPS 2018
.
[Bonacina and Johansson 2015]
Bonacina, M. P., and Jo-
hansson, M. 2015. On interpolation in automated theorem
proving.
Journal of Automated Reasoning
54(1):69–97.
[Bunel et al. 2018]
Bunel, R.; Turkaslan, I.; Torr, P. H. S.;
Kohli, P.; and Kumar, M. P. 2018. In
Neural Information
Processing Systems, NIPS 2018
.
[Carlini and Wagner 2017]
Carlini, N., and Wagner, D. A.
2017. Towards evaluating the robustness of neural networks.
In
2017 IEEE Symposium on Security and Privacy, SP 2017,
San Jose, CA, USA, May 22-26, 2017
, 39–57.
[Clarke et al. 2003]
Clarke, E.; Grumberg, O.; Jha, S.; Lu, Y.;
and Veith, H. 2003. Counterexample-guided abstraction re-
finement for symbolic model checking.
J. ACM
50(5):752–
794.
[Craig 1957]
Craig, W. 1957. Three uses of the Herbrand-
Gentzen theorem in relating model theory and proof theory.
J. Sym. Logic
3:269–285.
[Dathathri et al. 2017]
Dathathri, S.; Arechiga, N.; Gao, S.;
and Murray, R. M. 2017. Learning-based abstractions for
nonlinear constraint solving. In
Proceedings of the Twenty-
Sixth International Joint Conference on Artificial Intelli-
gence, IJCAI-17
, 592–599.
[De Moura and Bjørner 2008]
De Moura, L., and Bjørner, N.
2008. Z3: An efficient smt solver. In
Proceedings of the The-
ory and Practice of Software, 14th International Conference
on Tools and Algorithms for the Construction and Analysis
of Systems
, TACAS’08/ETAPS’08, 337–340. Berlin, Hei-
delberg: Springer-Verlag.
[Ehlers 2017]
Ehlers, R. 2017. Formal verification of piece-
wise linear feed-forward neural networks. In
Automated
Technology for Verification and Analysis - 15th International
Symposium, ATVA 2017, Pune, India, October 3-6, 2017,
Proceedings
, 269–286.
[Gao, Kong, and Clarke 2013]
Gao, S.; Kong, S.; and
Clarke, E. M. 2013. dreal: An smt solver for nonlinear
theories over the reals. In Bonacina, M. P., ed.,
Automated
Deduction – CADE-24
, 208–214. Berlin, Heidelberg:
Springer Berlin Heidelberg.
[Gehr et al. 2018]
Gehr, T.; Mirman, M.; Drachsler-Cohen,
D.; Tsankov, P.; Chaudhuri, S.; and Vechev, M. 2018. Ai2:
Safety and robustness certification of neural networks with
abstract interpretation. In
2018 IEEE Symposium on Secu-
rity and Privacy (SP)
, 3–18.
[Hoare 1969]
Hoare, C. A. R. 1969. An axiomatic basis for
computer programming. In
Communications of the ACM
.
[Katz et al. 2017]
Katz, G.; Barrett, C. W.; Dill, D. L.; Ju-
lian, K.; and Kochenderfer, M. J. 2017. Reluplex: An ef-
ficient SMT solver for verifying deep neural networks. In
Computer Aided Verification
. Berlin, Heidelberg: Springer
Berlin Heidelberg.
[Kroening and Weissenbacher 2011]
Kroening, D., and
Weissenbacher, G. 2011. Interpolation-based software
verification with wolverine. In Gopalakrishnan, G., and
Qadeer, S., eds.,
Computer Aided Verification
, 573–578.
Berlin, Heidelberg: Springer Berlin Heidelberg.
[Mahendran and Vedaldi 2015]
Mahendran, A., and Vedaldi,
A. 2015. Understanding deep image representations by in-
verting them. In
IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2015, Boston, MA, USA, June
7-12, 2015
, 5188–5196.
[OpenAI-CartPole-v0 2018]
OpenAI-CartPole-v0. 2018.
CartPole-v0.
https://gym.openai.com/envs/
CartPole-v0/
. [Online; accessed 2-Sep-2018].
[OpenAI-Swimmer-v2 2018]
OpenAI-Swimmer-v2. 2018.
Swimmer-v2.
https://gym.openai.com/envs/
Swimmer-v2/
. [Online; accessed 2-Sep-2018].
[Schulman et al. 2017]
Schulman, J.; Wolski, F.; Dhariwal,
P.; Radford, A.; and Klimov, O. 2017. Proximal policy op-
timization algorithms.
CoRR
abs/1707.06347.
[Szegedy et al. 2013]
Szegedy, C.; Zaremba, W.; Sutskever,
I.; Bruna, J.; Erhan, D.; Goodfellow, I. J.; and Fergus, R.
2013. Intriguing properties of neural networks.
CoRR
abs/1312.6199.
[Verma et al. 2018]
Verma, A.; Murali, V.; Singh, R.; Kohli,
P.; and Chaudhuri, S. 2018. Programmatically interpretable
reinforcement learning. In
International Conference on Ma-
chine Learning, ICML 2018
.
[Wang et al. 2018]
Wang, S.; Pei, K.; Whitehouse, J.; Yang,
J.; and Jana, S. 2018. Efficient formal safety analysis of neu-
ral networks. In
Advances in Neural Information Processing
Systems 32
.
[Zakrzewski 2001]
Zakrzewski, R. R. 2001. Verification of
a trained neural network accuracy. In
IJCNN’01. Interna-
tional Joint Conference on Neural Networks. Proceedings
(Cat. No.01CH37222)
, volume 3, 1657–1662 vol.3.