Mead.pdf

Neural

Hardware

for

Vision

by

Carver

A.

Mead

2

ENGINEERING

&

SCIENCE

/

JUNE

1987

B

IOLOGY

HAS

ALWAYS

BEEN

the

inspiration

for

computational

metaphor.

In

the

mid-

1930s

Alan

Turing's

original

model

for

com-

putation,

which

we

call

the

sequential

pro-

cess,

was

based

on

the

way

mathematicians

proved

theorems.

Because

mathematicians

are

biological

entities,

we

can

say

that

even

Turing's

sequential

process

was

inspired

by

the

way

biological

systems

work.

But

I

will

be

discussing

some

biological

systems

that.

are

simpler

than

mathematicians,

since

nobody,

including

mathematicians,

can

understand

the

way

mathematicians

work.

In

the

last

decade

or

so

the

knowledge

of

what

goes

on

in

the

brain

has

increased

tremendously.

When

Max

DelbrUck

first

interested

me

in

biology

20

years

ago,

the

pic-

ture

we

had

of

the

brain

at

that

time

was

much

more

simplistic

and

much

less

analog

in

nature.

At

the

time,

neurobioIogists

were

completely

preoccupied

with

nerve

impulses

and

the

way

they

were

generated

in

neurons.

Now

they

are

looking

more

deeply

at

the

principles

on

which

neural

computation

is

based.

And

there

are

some

surprises

here.

Nerve

impulses, which

are

quasi-digital,

play

a surprisingly

small

role

in

the

actual

compu-

tation

process.

Most

of

the computation

is

analog,

and

it's

done

at

the

very

tips

of

the

dendritic

tree

of

the

neuron.

Throughout

the

brain

there

is

distributed

feedback

from

these

dendritic

tips

to

the

nerves

that

are

driving

them.

These

new

discoveries

prompted

us

to

take

a fresh

look

at

neural

computation

to

see

whether

we

might

be

able

to

synthesize

sys-

tems

that

have

some

of

the

properties

of

real

neural

systems.

It

turns

out

that

it's

probably

just

the

right

time

to

be

doing

this.

What's

different

today

from

attempts

in

the

last

30

years

to

build

neurocircuits

is that

now

we

have

a technology

that

makes

it possible

to

put

a billion

transistors

on

a six-inch

wafer

and

interconnect

them

all.

Conventional

digital

technology

has

difficulty

using

a full

wafer,

since

many

transistors

are

inoperative.

Re-creating

the

brain's

distributed

analog

computation

gives

us

inherent

redundancy

and

robustness

under

failure.

We

can

actu-

ally

use

a substantial

fraction

of

these

billion

transistors.

So,

the

technology

that

was

de-

veloped

for

microprocessors

and memories

has

provided

us

a base

on

which

we

can

build

neural

computing

systems.

These

computing

systems

fire

based

on

very

different

principles

from

any

of

the

conventional

computing

engines,

analog

or

digital,

that

were

built

in

the

past.

'

The

particular

system

we

have

been

work-

ing

on

is

a very

simple

model

of

the

part

of

the

brain

wrapped

up

behind

the

eyeball.

Although

it's

quite

simple

by

brain

standards,

it does

a level

of

computation

that

even

our

most

powerful

computers

today

can't

handle.

The

lens

of

the

eye

focuses

an

image

on

the

surface

of

the

retina,

where

the

first

levels

of

visual

processing

occur.

When

we

want

to

see

details

of

shapes,

such

as

letters,

the

image

gets

focused

on

the

fovea,

a small

area

of

the

retina

with

tightly

packed

photoreceptors.

But

the

fovea

is

responsible

for

only

a frac-

tion

of

the

retina's

activity.

Most

of

the

action

happens

at

the

periphery,

where

move-

ment

of

the

image

produces

signals

that are

transformed

into

nerve

pulses

that

are

trans-

mitted

over

the

optic

nerve

"cable"

to

the

higher centers

in

the

brain.

In

a cross

section

through

the

retina

one

can

see

on

the

surface

a layer

of

photorecep-

tors,

below

which

lie

layers

of

three

different

kinds

of

cells-

bipolar,

horizontal,

and

ama-

crine.

Below

these

cells

are

the

ganglion

cells,

whose

axonS

form

the

fibers

of

the

optic

nerve.

The

principal

signal

flow

in

the

retina

runs

from

the

receptors

down

through

the

bipolar

cells

(the

horizontal

and

amacrine

cells

spread

across

a large

area

of

the

retina

in

layers

transverse

to

the

signal

flow)

and

into

the

ganglion

cells,

which

turn

the

signal

into

nerve

pulses.

In

engineering

terms

we

can

say

that

the

process

starts

by

transducing

the

light

energy

into

an

electrical

signal.

We

send

that

signal

on

to

an

amplifier

and

then

off

through

a cable.

The

signals

in

the

retina

are

all

ana-

log

until

they

go

out

the

cable

as

nerve

pulses,

which

are

quasi-digital

(digital

in

amplitude

but

analog

in

time).

This

basic

structure

(with

some

diversity

in

the

details)

is universal

throughout

the

ver-

tebrates.

We

can

assume

that

the

animals

that

evolved

this

eye

structure

ate

any

that

did

not.

It

is

characteristic

of

biological

sys-

tems

that

they

are

here

because

they

work.

An

animal

didn't

live

long

if

it couldn't

see

the

predators

that

were

about

to

jump

on

it,

and

its

genes

did

not

have

a chance

to

get

represented

in

the

next

generation.

Because

evolution

has

such

a ruthless

way

of

dealing

with

bad

designs,

we

can

view

surviving

bio-

logical

structures

as

highly

engineered

systems.

The

visual

system

is there

to

see

things

about

the

world

..

The

scene

coming

into

the

eye,

however,

is not

the

world.

It's

a bunch

of

photons

that arrive

because

there

is

some

light

somewhere

that

shines

on

objects

in

the

world

and

gets

reflected

off

them

into

the

eye.

The

light

that

falls

on

the

image

surface

is

the

product

oran

illumination

function

multi-

plied

by

the

reflectance

of

the

object.

But

we

don't

want

to

see

the

illumination

function;

we

want

to

see

the

object.

Nobody

ever

got

jumped

on

by

an

illumination

function.

So

we

take

the

logarithm

of

the

intensity,

and

that

factors

the

problem

into

the

log

of

the

illumination

function,

which

is

often

a

smooth

function

(except

for

shadows),

plus

the

log

of

the

reflectance

of

the

object.

The

computation

of

the

logarithm

is

done

in

the

receptors

or

in

their

interactions

with

each

other.

The

visual

system

also

has

to

make

sure

that

the

signals

are

within

range.

If

they're

not,

you

get

blanked

out.

You

have

probably

noticed

this

phenomenon,

say,

watching

a

baseball

game

on

television.

When

someone

hits

a ball

up

into

the

stands,

the

television

camera

pans

from

the

brightly

lit

field

over

to

OUTPUT

(NEURAL

IMAGE)

In

this

cross

section

of

a

ver-

tebrate

retina.

the

main

signal

flow

travels

downward

from

the

photoreceptors

through

the

bipolar

cells

to

the

ganglion

cells.

Which

connect

to

the

optic

nerve.

The

layers

of

hor-

izontal

cells

and

amacrine

cells

lie

transverse

to

the

sig-

nal

path.

(From

''The

Control

of

Sensitivity

in

the

Retina"

by

Frank

S.

Werblin.

©

January

1973

Scientific

American.

Inc.

All

rights

reserved)

PIGMENT

EPITHELIUM

RECEPTOR

CELLS

BIPOLAR

CELLS

AMACRINE

CELLS

SYNAPTIC

GANGLION

CELLS

3

the

stands

in

the

shade.

The

camera

has

an

elaborate

automatic

gain

control

system,

but

in

such

a mixed

scene

you

see

a pure

white

field

and

pure

black

stands;

one

signal

is

above

range,

and

the

other

is

below

range,

so

you

don't

see

anything

at

all.

If

an

animal

did

that,

its

visual

system

would

not

be

around

in

the

next

generation

because

the

predators

would

simply

jump

from

places

that

were

half

in

the

shade

and

half

in

sunlight.

But

in

the

visual

system,

unlike

the

televi-

sion

camera,

there

is

a measure

of

the

local

average

intensity

of

the

light;

this

value

is

used

as

the

midpoint

for

the

acceptable

range

of

input

levels.

Basically

this

is

a mechanism

for

deciding

whether

the

pixel

we

are

looking

at

is

sufficiently

different

from

the

pixels

around

it to

be

reported.

This

level-

normalization

computation

is

performed

by

the

horizontal

cells.

The

horizontal

cells

look

at

the

potentials

on

a bunch

of

photorecep-

tors

and

then

take

a spatial

average.

Then

the

difference

between

that

spatial

average

and

the

local

receptor

is

computed

in

the

synaptic

complex

in

the

foot

of

the

receptor.

The

resulting

spatial

derivative

gets

shipped

on

to

the

bipolar

cells.

The

outputs

of

the

bipolar

cells feed

the

amacrine

layer,

which

is

responsible

for

com-

puting

the

time

derivative

of

the

signal.

Ris-

ing

edges

of

the

bipolar

waveform

are

turned

into

peaks,

which

in

turn

cause

ganglion

cells

to

fire.

In

rough

terms,

the

amacrine

layer

is

extracting

motion

information

from

the

incoming

retinal

image.

In

some

animals,

like

the

frog,

very

elaborate

motion

computa-

tions

are

performed.

A

visual

scene

of

the

frog's

natural

habitat

moving

as

a whole

eli-

cits

no

response.

When

a small,

dark

spot

is

moved

relative

to

the

background,

however,

a

large

response

results.

In

higher

vertebrates,

much

of

this

kind

of

complex

motion

calcula-

tion

has

migrated

to

visual

cortex,

and

the

retina

computes

a simple

time

derivative.

How

much

does

something

have

to

be

moving

for

us

to

see

it?

The

answer

depends

on

how

much

the

rest

of

the

image

is

movIng.

Another

level

of

gain

control

mechanism

makes

sure

that,

if

we

are

going

to

report

a

derivative

event,

that

event

is

significant

rela-

tive

to

the

rest

of

the

scene.

If

we

are

looking

at

a tree,

and

the

leaves

are

all

blowing

in

the

wind,

something

has

to

move

significantly

before

we

will

report

it.

Otherwise,

our

higher

levels

of

information

processing

would

get

overloaded

by

reports

about

all

those

little

4

ENGINEERING

&

SCIENCE

/

JUNE

1987

fluttering

leaves.

For

a primate

it usually

takes

something

bigger

than

a leaf

to

jump

on

you

and

hurt

you

very

much.

A

derivative

signal

with

respect

to

time

is

taken

by

the

interaction

of

the

bipolar,

ama-

crine,

and

ganglion

cells.

Exactly

how

bio-

logical

systems

do

this

is

not

known.

The

local

derivative

with

respect

to

time

is

com-

pared

to

the

derivatives

that

are

being

taken

in

the

surrounding

area.

If

the

local

signal

is

significantly

larger,

it gets

reported.

We

might

wonder

why

so

much

of

the

information

in

the

optic

nerve

is

derivative.

After

all,

we

could

just

ship

all

the

intensity

information

about

the

scene

up

the

optic

nerve.

The

optic

nerve

has

a bandwidth

approaching

that

of

a television

signal.

People

who

design

machine

vision

systems

usually

start

with

a television

signal;

they

take

one

frame

and

compare

it with

the

succeeding

frame,

and

so

on.

Motion

is characterized

as

something

in

one

position

in

the

first

frame

that

is

in

a different

position

in

the

second

frame.

It

would

be

easy

for

a living

system

to

do

gain

control

in

the

camera,

like

television

does,

and

then

send

the

intensity

information

up

to

the

brain

to

extract

the

motion

infor-

mation

where

there

is a lot

more

horsepower

to

do

so.

So

why

go

to

all

the

trouble

of

building

this

elaborate

derivative

processing

machinery down

at

the

camera

level?

The

answer

is

a straightforward

one:

A

television

camera

samples

every

point

on

the

image

once

every

1/30

of

a second.

But

a predator

in

the

visual

field

can

move

a distance

of

many

pixels

in

1/30

of

a second.

So

what

we

have

done

is

to

take

a simple

problem

-

taking

a directional

derivative

with

respect

to

time

-

and

transformed

it into

a compli-

cated

one.

Now

we

have

an

image

at

time

t

and

an

image

at

t

+

1/30,

and

we

have

to

decide

what

point in

the

first

image

corre-

sponds

to

what

point

in

the

succeeding

image.

So

sampling

transforms

the

processing

task

into

the

extremely

difficult

correspondence

problem.

People

use

supercomputers

to

try

to

solve

that

problem.

Living

systems

didn't

have

supercomputers;

they

solved

the

prob-

lem

the

easy

way

and

just

took

the

derivative.

So

when

we

built

our

rudimentary

elec-

tronic

retina,

we

built

it

tojust

take

the

derivative

also.

We

based

our

system

on

the

following

four

insights

from

biology:

1.

It's

important

to

take

a logarithm

of

the

signal,

because

logarithms

factor

the

scene

into

the

illumination

function

and

the

prop-

erties

of

the

objects.

2.

It's

important

to

keep

the

signals

in

range.

3.

Normalization

should

be

done

on

a

local

basis;

there

is

information

in

the

shade

and in

the

sunlight.

4.

It's

important

to

take

time

derivatives

before

we

have

sampled

the

image

with

respect

to

time.

Otherwise,

we

would

be

throwing

away

the

single

most

important

piece

of

information

in

the

image.

We

have

designed

a simple

retina

and

have

implemented

it on

silicon

in

a standard,

off-the-shelf

CMOS

(complementary

metal-

oxide

semiconductor)

process.

The

basic

component

is

a photoreceptor,

for

which

we

use

a bipolar

transistor.

In

a

CMOS

process

this

is

a parasitic

device,

that

is,

it's

responsi-

ble

for

some

problems

in

conventional

digital

circuits.

But

in

our

retina

we

take

advantage

of

the

gain

of

this.

excellent

phototransistor.

There'S

nothing

special

about

this

fabrica-

tion

process,

and

it's

not

exactly

desirable

from

an

analog

point

of

view.

Neurons

in

the

brain

don't

have

anything

special

about

them

either;

they

have

limited

dynamic

range,

they're

noisy,

and

they

have

all

kinds

of

gar-

bage.

But

if

we're

going

to

build

neural

sys-

tems,

we'd

better

not

start

off

with

a better

process

(with,

say,

a dynamic

range

of

10

5

),

because

we'd

simply

be

kidding

ourselves

that

we

had

the

right

organizing

principles.

If

we

build

a system

that

is

organized

on

neural

principles,

we

can

stand

a lot

of

garbage

in

the

individual

components

and still

get

good

information

out.

The

nervous

system

does

that,

and

if

we're

going

to

learn

how

it works,

we'd

better

subject

ourselves

to

the

same

discipline.

As

in

a biological

eye,

the

first

step

is

to

take

the

logarithm

of

the

signal

arriving

at

the

photoreceptor.

To

do

this,

we

use

the

stan-

dard

trick

of

electrical

engineers,

that

is,

to

use

an

exponential

element

in

a feedback

loop.

The

voltage

that

comes

out

is the

loga-

rithm

of

the

current

that

goes

in.

We

think

this

operation

is similar

to

the

way

living

sys-

tems

do

it,

although

that

is

not

proven.

The

element

that

we

use

to

make

this

exponential

consists

of

two

MOS

transistors

stacked

up.

A

nice

property

of

this

element

is

that

the

voltage

range

of

the

output

is

appropriate

for

subsequent

processing

by

the

kinds

of

amplifiers

we

can

build

in

this

technology.

When

we

use

the

element

to

build

a photore-

ceptor,

the

voltage

out

of

the

photoreceptor

is

logarithmic

over

four

or

five

orders

of

magni-

tude

of

incoming

light

intensity.

The

lowest

photo

current

is

about

10-

14

amps,

which

translates

to

a light

level

of

10

5

photons

per

second.

This

level

corresponds

approximately

to

moonlight,

which

is

about

the

lowest

level

of

light

you

can

see

with

your

cones.

There

are

two

kinds

of

receptors

in

the

eye

-

cones

and

rods.

We

use

the

cones

under

all

normal

circumstances

and

the

rods

This

computer

drawing

of

a

small

group

of

pixels

(one

pixel

appears

on

the

cover)

from

the

center

of

the

retina

shows

how

the individual

cells

are

composed

to

form

the

pro-

cessing

array.

The

entire

chip,

shown

on

the

following

page,

contains

a

48

x

48

array

of

these

pixels.

5