Neural
Hardware
for
Vision
by
Carver
A.
Mead
2
ENGINEERING
&
SCIENCE
/
JUNE
1987
B
IOLOGY
HAS
ALWAYS
BEEN
the
inspiration
for
computational
metaphor.
In
the
mid-
1930s
Alan
Turing's
original
model
for
com-
putation,
which
we
call
the
sequential
pro-
cess,
was
based
on
the
way
mathematicians
proved
theorems.
Because
mathematicians
are
biological
entities,
we
can
say
that
even
Turing's
sequential
process
was
inspired
by
the
way
biological
systems
work.
But
I
will
be
discussing
some
biological
systems
that.
are
simpler
than
mathematicians,
since
nobody,
including
mathematicians,
can
understand
the
way
mathematicians
work.
In
the
last
decade
or
so
the
knowledge
of
what
goes
on
in
the
brain
has
increased
tremendously.
When
Max
DelbrUck
first
interested
me
in
biology
20
years
ago,
the
pic-
ture
we
had
of
the
brain
at
that
time
was
much
more
simplistic
and
much
less
analog
in
nature.
At
the
time,
neurobioIogists
were
completely
preoccupied
with
nerve
impulses
and
the
way
they
were
generated
in
neurons.
Now
they
are
looking
more
deeply
at
the
principles
on
which
neural
computation
is
based.
And
there
are
some
surprises
here.
Nerve
impulses, which
are
quasi-digital,
play
a surprisingly
small
role
in
the
actual
compu-
tation
process.
Most
of
the computation
is
analog,
and
it's
done
at
the
very
tips
of
the
dendritic
tree
of
the
neuron.
Throughout
the
brain
there
is
distributed
feedback
from
these
dendritic
tips
to
the
nerves
that
are
driving
them.
These
new
discoveries
prompted
us
to
take
a fresh
look
at
neural
computation
to
see
whether
we
might
be
able
to
synthesize
sys-
tems
that
have
some
of
the
properties
of
real
neural
systems.
It
turns
out
that
it's
probably
just
the
right
time
to
be
doing
this.
What's
different
today
from
attempts
in
the
last
30
years
to
build
neurocircuits
is that
now
we
have
a technology
that
makes
it possible
to
put
a billion
transistors
on
a six-inch
wafer
and
interconnect
them
all.
Conventional
digital
technology
has
difficulty
using
a full
wafer,
since
many
transistors
are
inoperative.
Re-creating
the
brain's
distributed
analog
computation
gives
us
inherent
redundancy
and
robustness
under
failure.
We
can
actu-
ally
use
a substantial
fraction
of
these
billion
transistors.
So,
the
technology
that
was
de-
veloped
for
microprocessors
and memories
has
provided
us
a base
on
which
we
can
build
neural
computing
systems.
These
computing
systems
fire
based
on
very
different
principles
from
any
of
the
conventional
computing
engines,
analog
or
digital,
that
were
built
in
the
past.
'
The
particular
system
we
have
been
work-
ing
on
is
a very
simple
model
of
the
part
of
the
brain
wrapped
up
behind
the
eyeball.
Although
it's
quite
simple
by
brain
standards,
it does
a level
of
computation
that
even
our
most
powerful
computers
today
can't
handle.
The
lens
of
the
eye
focuses
an
image
on
the
surface
of
the
retina,
where
the
first
levels
of
visual
processing
occur.
When
we
want
to
see
details
of
shapes,
such
as
letters,
the
image
gets
focused
on
the
fovea,
a small
area
of
the
retina
with
tightly
packed
photoreceptors.
But
the
fovea
is
responsible
for
only
a frac-
tion
of
the
retina's
activity.
Most
of
the
action
happens
at
the
periphery,
where
move-
ment
of
the
image
produces
signals
that are
transformed
into
nerve
pulses
that
are
trans-
mitted
over
the
optic
nerve
"cable"
to
the
higher centers
in
the
brain.
In
a cross
section
through
the
retina
one
can
see
on
the
surface
a layer
of
photorecep-
tors,
below
which
lie
layers
of
three
different
kinds
of
cells-
bipolar,
horizontal,
and
ama-
crine.
Below
these
cells
are
the
ganglion
cells,
whose
axonS
form
the
fibers
of
the
optic
nerve.
The
principal
signal
flow
in
the
retina
runs
from
the
receptors
down
through
the
bipolar
cells
(the
horizontal
and
amacrine
cells
spread
across
a large
area
of
the
retina
in
layers
transverse
to
the
signal
flow)
and
into
the
ganglion
cells,
which
turn
the
signal
into
nerve
pulses.
In
engineering
terms
we
can
say
that
the
process
starts
by
transducing
the
light
energy
into
an
electrical
signal.
We
send
that
signal
on
to
an
amplifier
and
then
off
through
a cable.
The
signals
in
the
retina
are
all
ana-
log
until
they
go
out
the
cable
as
nerve
pulses,
which
are
quasi-digital
(digital
in
amplitude
but
analog
in
time).
This
basic
structure
(with
some
diversity
in
the
details)
is universal
throughout
the
ver-
tebrates.
We
can
assume
that
the
animals
that
evolved
this
eye
structure
ate
any
that
did
not.
It
is
characteristic
of
biological
sys-
tems
that
they
are
here
because
they
work.
An
animal
didn't
live
long
if
it couldn't
see
the
predators
that
were
about
to
jump
on
it,
and
its
genes
did
not
have
a chance
to
get
represented
in
the
next
generation.
Because
evolution
has
such
a ruthless
way
of
dealing
with
bad
designs,
we
can
view
surviving
bio-
logical
structures
as
highly
engineered
systems.
The
visual
system
is there
to
see
things
about
the
world
..
The
scene
coming
into
the
eye,
however,
is not
the
world.
It's
a bunch
of
photons
that arrive
because
there
is
some
light
somewhere
that
shines
on
objects
in
the
world
and
gets
reflected
off
them
into
the
eye.
The
light
that
falls
on
the
image
surface
is
the
product
oran
illumination
function
multi-
plied
by
the
reflectance
of
the
object.
But
we
don't
want
to
see
the
illumination
function;
we
want
to
see
the
object.
Nobody
ever
got
jumped
on
by
an
illumination
function.
So
we
take
the
logarithm
of
the
intensity,
and
that
factors
the
problem
into
the
log
of
the
illumination
function,
which
is
often
a
smooth
function
(except
for
shadows),
plus
the
log
of
the
reflectance
of
the
object.
The
computation
of
the
logarithm
is
done
in
the
receptors
or
in
their
interactions
with
each
other.
The
visual
system
also
has
to
make
sure
that
the
signals
are
within
range.
If
they're
not,
you
get
blanked
out.
You
have
probably
noticed
this
phenomenon,
say,
watching
a
baseball
game
on
television.
When
someone
hits
a ball
up
into
the
stands,
the
television
camera
pans
from
the
brightly
lit
field
over
to
OUTPUT
(NEURAL
IMAGE)
In
this
cross
section
of
a
ver-
tebrate
retina.
the
main
signal
flow
travels
downward
from
the
photoreceptors
through
the
bipolar
cells
to
the
ganglion
cells.
Which
connect
to
the
optic
nerve.
The
layers
of
hor-
izontal
cells
and
amacrine
cells
lie
transverse
to
the
sig-
nal
path.
(From
''The
Control
of
Sensitivity
in
the
Retina"
by
Frank
S.
Werblin.
©
January
1973
Scientific
American.
Inc.
All
rights
reserved)
PIGMENT
EPITHELIUM
RECEPTOR
CELLS
BIPOLAR
CELLS
AMACRINE
CELLS
SYNAPTIC
GANGLION
CELLS
3
the
stands
in
the
shade.
The
camera
has
an
elaborate
automatic
gain
control
system,
but
in
such
a mixed
scene
you
see
a pure
white
field
and
pure
black
stands;
one
signal
is
above
range,
and
the
other
is
below
range,
so
you
don't
see
anything
at
all.
If
an
animal
did
that,
its
visual
system
would
not
be
around
in
the
next
generation
because
the
predators
would
simply
jump
from
places
that
were
half
in
the
shade
and
half
in
sunlight.
But
in
the
visual
system,
unlike
the
televi-
sion
camera,
there
is
a measure
of
the
local
average
intensity
of
the
light;
this
value
is
used
as
the
midpoint
for
the
acceptable
range
of
input
levels.
Basically
this
is
a mechanism
for
deciding
whether
the
pixel
we
are
looking
at
is
sufficiently
different
from
the
pixels
around
it to
be
reported.
This
level-
normalization
computation
is
performed
by
the
horizontal
cells.
The
horizontal
cells
look
at
the
potentials
on
a bunch
of
photorecep-
tors
and
then
take
a spatial
average.
Then
the
difference
between
that
spatial
average
and
the
local
receptor
is
computed
in
the
synaptic
complex
in
the
foot
of
the
receptor.
The
resulting
spatial
derivative
gets
shipped
on
to
the
bipolar
cells.
The
outputs
of
the
bipolar
cells feed
the
amacrine
layer,
which
is
responsible
for
com-
puting
the
time
derivative
of
the
signal.
Ris-
ing
edges
of
the
bipolar
waveform
are
turned
into
peaks,
which
in
turn
cause
ganglion
cells
to
fire.
In
rough
terms,
the
amacrine
layer
is
extracting
motion
information
from
the
incoming
retinal
image.
In
some
animals,
like
the
frog,
very
elaborate
motion
computa-
tions
are
performed.
A
visual
scene
of
the
frog's
natural
habitat
moving
as
a whole
eli-
cits
no
response.
When
a small,
dark
spot
is
moved
relative
to
the
background,
however,
a
large
response
results.
In
higher
vertebrates,
much
of
this
kind
of
complex
motion
calcula-
tion
has
migrated
to
visual
cortex,
and
the
retina
computes
a simple
time
derivative.
How
much
does
something
have
to
be
moving
for
us
to
see
it?
The
answer
depends
on
how
much
the
rest
of
the
image
is
movIng.
Another
level
of
gain
control
mechanism
makes
sure
that,
if
we
are
going
to
report
a
derivative
event,
that
event
is
significant
rela-
tive
to
the
rest
of
the
scene.
If
we
are
looking
at
a tree,
and
the
leaves
are
all
blowing
in
the
wind,
something
has
to
move
significantly
before
we
will
report
it.
Otherwise,
our
higher
levels
of
information
processing
would
get
overloaded
by
reports
about
all
those
little
4
ENGINEERING
&
SCIENCE
/
JUNE
1987
fluttering
leaves.
For
a primate
it usually
takes
something
bigger
than
a leaf
to
jump
on
you
and
hurt
you
very
much.
A
derivative
signal
with
respect
to
time
is
taken
by
the
interaction
of
the
bipolar,
ama-
crine,
and
ganglion
cells.
Exactly
how
bio-
logical
systems
do
this
is
not
known.
The
local
derivative
with
respect
to
time
is
com-
pared
to
the
derivatives
that
are
being
taken
in
the
surrounding
area.
If
the
local
signal
is
significantly
larger,
it gets
reported.
We
might
wonder
why
so
much
of
the
information
in
the
optic
nerve
is
derivative.
After
all,
we
could
just
ship
all
the
intensity
information
about
the
scene
up
the
optic
nerve.
The
optic
nerve
has
a bandwidth
approaching
that
of
a television
signal.
People
who
design
machine
vision
systems
usually
start
with
a television
signal;
they
take
one
frame
and
compare
it with
the
succeeding
frame,
and
so
on.
Motion
is characterized
as
something
in
one
position
in
the
first
frame
that
is
in
a different
position
in
the
second
frame.
It
would
be
easy
for
a living
system
to
do
gain
control
in
the
camera,
like
television
does,
and
then
send
the
intensity
information
up
to
the
brain
to
extract
the
motion
infor-
mation
where
there
is a lot
more
horsepower
to
do
so.
So
why
go
to
all
the
trouble
of
building
this
elaborate
derivative
processing
machinery down
at
the
camera
level?
The
answer
is
a straightforward
one:
A
television
camera
samples
every
point
on
the
image
once
every
1/30
of
a second.
But
a predator
in
the
visual
field
can
move
a distance
of
many
pixels
in
1/30
of
a second.
So
what
we
have
done
is
to
take
a simple
problem
-
taking
a directional
derivative
with
respect
to
time
-
and
transformed
it into
a compli-
cated
one.
Now
we
have
an
image
at
time
t
and
an
image
at
t
+
1/30,
and
we
have
to
decide
what
point in
the
first
image
corre-
sponds
to
what
point
in
the
succeeding
image.
So
sampling
transforms
the
processing
task
into
the
extremely
difficult
correspondence
problem.
People
use
supercomputers
to
try
to
solve
that
problem.
Living
systems
didn't
have
supercomputers;
they
solved
the
prob-
lem
the
easy
way
and
just
took
the
derivative.
So
when
we
built
our
rudimentary
elec-
tronic
retina,
we
built
it
tojust
take
the
derivative
also.
We
based
our
system
on
the
following
four
insights
from
biology:
1.
It's
important
to
take
a logarithm
of
the
signal,
because
logarithms
factor
the
scene
into
the
illumination
function
and
the
prop-
erties
of
the
objects.
2.
It's
important
to
keep
the
signals
in
range.
3.
Normalization
should
be
done
on
a
local
basis;
there
is
information
in
the
shade
and in
the
sunlight.
4.
It's
important
to
take
time
derivatives
before
we
have
sampled
the
image
with
respect
to
time.
Otherwise,
we
would
be
throwing
away
the
single
most
important
piece
of
information
in
the
image.
We
have
designed
a simple
retina
and
have
implemented
it on
silicon
in
a standard,
off-the-shelf
CMOS
(complementary
metal-
oxide
semiconductor)
process.
The
basic
component
is
a photoreceptor,
for
which
we
use
a bipolar
transistor.
In
a
CMOS
process
this
is
a parasitic
device,
that
is,
it's
responsi-
ble
for
some
problems
in
conventional
digital
circuits.
But
in
our
retina
we
take
advantage
of
the
gain
of
this.
excellent
phototransistor.
There'S
nothing
special
about
this
fabrica-
tion
process,
and
it's
not
exactly
desirable
from
an
analog
point
of
view.
Neurons
in
the
brain
don't
have
anything
special
about
them
either;
they
have
limited
dynamic
range,
they're
noisy,
and
they
have
all
kinds
of
gar-
bage.
But
if
we're
going
to
build
neural
sys-
tems,
we'd
better
not
start
off
with
a better
process
(with,
say,
a dynamic
range
of
10
5
),
because
we'd
simply
be
kidding
ourselves
that
we
had
the
right
organizing
principles.
If
we
build
a system
that
is
organized
on
neural
principles,
we
can
stand
a lot
of
garbage
in
the
individual
components
and still
get
good
information
out.
The
nervous
system
does
that,
and
if
we're
going
to
learn
how
it works,
we'd
better
subject
ourselves
to
the
same
discipline.
As
in
a biological
eye,
the
first
step
is
to
take
the
logarithm
of
the
signal
arriving
at
the
photoreceptor.
To
do
this,
we
use
the
stan-
dard
trick
of
electrical
engineers,
that
is,
to
use
an
exponential
element
in
a feedback
loop.
The
voltage
that
comes
out
is the
loga-
rithm
of
the
current
that
goes
in.
We
think
this
operation
is similar
to
the
way
living
sys-
tems
do
it,
although
that
is
not
proven.
The
element
that
we
use
to
make
this
exponential
consists
of
two
MOS
transistors
stacked
up.
A
nice
property
of
this
element
is
that
the
voltage
range
of
the
output
is
appropriate
for
subsequent
processing
by
the
kinds
of
amplifiers
we
can
build
in
this
technology.
When
we
use
the
element
to
build
a photore-
ceptor,
the
voltage
out
of
the
photoreceptor
is
logarithmic
over
four
or
five
orders
of
magni-
tude
of
incoming
light
intensity.
The
lowest
photo
current
is
about
10-
14
amps,
which
translates
to
a light
level
of
10
5
photons
per
second.
This
level
corresponds
approximately
to
moonlight,
which
is
about
the
lowest
level
of
light
you
can
see
with
your
cones.
There
are
two
kinds
of
receptors
in
the
eye
-
cones
and
rods.
We
use
the
cones
under
all
normal
circumstances
and
the
rods
This
computer
drawing
of
a
small
group
of
pixels
(one
pixel
appears
on
the
cover)
from
the
center
of
the
retina
shows
how
the individual
cells
are
composed
to
form
the
pro-
cessing
array.
The
entire
chip,
shown
on
the
following
page,
contains
a
48
x
48
array
of
these
pixels.
5