Inverse Abstraction of Neural Networks Using Symbolic Interpolation
Sumanth Dathathri
1
, Sicun Gao
2
, Richard M. Murray
1
1
Computing and Mathematical Sciences, California Institute of Technology
2
Computer Science and Engineering, University of California, San Diego
⇤
Abstract
Neural networks in real-world applications have to satisfy
critical properties such as safety and reliability. The analy-
sis of such properties typically involves extracting informa-
tion through computing
pre-images
of neural networks, but
it is well-known that explicit computation of pre-images is
intractable. We introduce new methods for computing com-
pact symbolic abstractions of pre-images. Our approach relies
on computing approximations that provably overapproximate
and underapproximate the pre-images at all layers. The ab-
straction of pre-images enables formal analysis and knowl-
edge extraction without modifying standard learning algo-
rithms. We show how to use inverse abstractions to automati-
cally extract simple control laws and compact representations
for pre-images corresponding to unsafe outputs. We illustrate
that the extracted abstractions are often interpretable and can
be used for analyzing complex properties.
1 Introduction
Neural networks have shown significant potential as a key
building block of intelligent systems. However, a major
challenge to their real-world application is that it is ex-
tremely difficult to analyze their behaviors and guarantee
critical properties, such as the absence of catastrophic cor-
ner cases. Recent work on verifying neural networks such
as (Zakrzewski 2001; Bunel et al. 2018; Katz et al. 2017)
has focused on developing faster algorithms for validat-
ing or falsifying formal properties of whole neural net-
works directly through their encoding as constraint satisfac-
tion problems. These approaches are designed for generat-
ing counter-examples to (or for verifying) specific proper-
ties for piecewise-linear neural networks. An alternative ap-
proach for analysis is to decompose the large networks and
perform the analysis in a modular way, which is a standard
practice in software program analysis (Hoare 1969). Such
decomposition often provides more information than simple
monolithic analysis and enables us to verify complex prop-
erties in a tractable manner. A crucial task for enabling such
modular analysis is that we must be able to represent and
⇤
Correspondence: sdathath@caltech.edu, sicung@ucsd.edu.
The work is supported by DARPA Assured Autonomy, NSF CNS-
1830399 and the VeHICaL project (NSF grant #1545126).
Copyright
c