1909.08601.pdf

Diversity-enabled sweet spots in layered

architectures and speed-accuracy trade-offs in

sensorimotor control

Yorie Nakahira

, Quanying Liu

, Terrence J. Sejnowski

b,c,

, John C. Doyle

Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125, USA

The Salk Institute for Biological Studies, La Jolla, CA, USA

Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA

These authors contributed equally

To whom correspondence should be addressed; E-mail: doyle@caltech.edu, terry@salk.edu.

Nervous systems sense, communicate, compute, and actuate movement using distributed components with trade-offs in

speed, accuracy, sparsity, noise, and saturation. Nevertheless, the resulting control can achieve remarkably fast, accurate,

and robust performance due to a highly effective layered control architecture. However, there is no theory explaining the effec-

tiveness of layered control architectures that connects speed-accuracy trade-offs (SATs) in neurophysiology to the resulting

SATs in sensorimotor control. In this paper, we introduce a theoretical framework that provides a synthetic perspective to ex-

plain why there exists extreme diversity across layers and within levels. This framework characterizes how the sensorimotor

control SATs are constrained by the hardware SATs of neurons communicating with spikes and their sensory and muscle end-

points, in both stochastic and deterministic models. The theoretical predictions of the model are experimentally confirmed

using driving experiments in which the time delays and accuracy of the control input from the wheel are varied. These results

show that the appropriate diversity in the properties of neurons and muscles across layers and within levels help create sys-

tems that are

both

fast

and

accurate despite being built from components that are individually slow or inaccurate. This novel

concept, which we call “diversity-enabled sweet spots” (DESSs), explains the ubiquity of heterogeneity in the sizes of axons

within a nerve as well the resulting superior performance of sensorimotor control.

uman sensorimotor control can achieve extremely

robust performance in complex, uncertain environ-

ments, despite being implemented in systems that are

distributed, sparse, quantized, delayed, and saturated.

More specifically, at the hardware level, there exists a

severe speed and accuracy tradeoffs. For example, achiev-

ing fast or accurate nerve signaling requires additional

space and metabolic costs to build and maintain nerves,

and such resource limitations impose hard SATs in nerve

signaling. In contrast, at the system level, the SATs in

sensorimotor control are much less severe. For example,

when riding a mountain bike down a twisting, bumpy

trail, though a trade-off exists between traveling fast and

accurately following the trail, most human can safely

stay on the trail without crashing. Such robust perfor-

mance despite hardware limitations may due to highly

effective layered control architectures that de-constrain

the hardware constraints.

Despite the profound influence of architectures on per-

formance, we have paid little attention to what makes an

architecture effective. To understand effective layered ar-

chitectures, we need to study how component constraints

and trade-offs impact those on sensorimotor performance

and clarify the overall system performance and limitations

when different control layers act jointly. However, the

hardware SATs of neural signaling (

–

) and the system

SATs in sensorimotor control (

–

) have been studied sep-

arately. This is in part because there are few theoretical

tools that allow us to study the hardware SATs of neural

signaling (

–

) and the system SATs in sensorimotor con-

trol (

–

), or to understand the collective performance

when different layers work together. In our terminology

"layers" refers to different architectural components (

e.g.

planning layer, reflex layer), while "levels" refers to differ-

ent levels of abstraction or composition (

e.g.

brain level

vs nerve level vs molecular level, or whole muscle level vs

fiber level).

We developed a mathematical theory that connects

the component speed-accuracy constraints and trade-offs

with those at the sensorimotor system level and provides

an integrated view of a layered control systems involving

planning in a high layer and reflexive reaction in a low

layer. Using this theory, we show here that

diversity

between layers and within layers can be exploited to

achieve

both

fast

and

accurate performance despite being

implemented using slow or inaccurate hardware. We call

arXiv:1909.08601v2 [math.OC] 25 Sep 2019

these synergies “diversity-enabled sweet spots” (DESSs).

At the component level, this concept explains why there

are extreme heterogeneities in the characteristics of neural

components (Fig. 1) (

). At the system level, DESSs

explain the benefits of extreme heterogeneities in speed

and accuracy in different sensorimotor loops (10, 11).

Fig. 1. Component speed-accuracy trade-offs (SATs) in sensory nerves.

Sizes

and numbers of axons for selected nerves and the resulting SATs. The dashed

line represents nerves with equal cross-sectional area, which is proportional to

in Eq. 3. The nerves shown have similar cross-sectional areas but wildly different

compositions of axon size and number, resulting in different speed and accuracy in

nerve signaling (1). A myelin sheath around an axon can also increase its speed of

propagation. Many nerves, such as the sciatic nerve, contain a mixture of axons with

different sizes and degrees of myelination.

Basic model.

An example of an effective layered control

architecture is the oculomotor system that stabilizes the

eye on a moving target while you are bouncing down a

trail (Fig. 2A) (

). Neurons in the visual cortex

responding to target motion on the retina drive the actu-

ators to pursue the target after a delay of 100 millisecond.

In contrast, fast head motions are compensated by con-

trol systems in the brainstem in the millisecond range.

Together, they allow you to maintain fixation on a distant

moving target despite severe bumps.

In trail following (Fig. 2B), higher-level cortical control

systems in the cortex and basal ganglia provide advanced

warning for planning actions to avoid trees and other

obstacles. This is accompanied by a fast feedback system

in the spinal cord that maintains stable tracking.

To study how these control systems are coordinated,

we first introduce a driving task that simulates the trail

following on a display screen. In the task, the subjects

have to track a reference trail or trajectory with small

errors despite unseen bumps and disturbances. We define

the error dynamics

(

)

between the actual position (

i.e.

player’s position) and the desired position (

i.e.

trail’s

position) as follows:

(

+ 1) =

(

) +

(

) +

(

)

[1]

which relates the future error

(

+ 1)

with the previous

error

(

)

, the uncertainty

(

)

(bumps or trail changes),

and the control action

(

)

. The control action

(

)

generated using the observed errors and uncertainty as

follows:

(

) =

(

(0 :

−

)

(0 :

−

(0 :

−

1))

[2]

Here,

is a function that defines the controller, which

uses sensing components (

i.e.

eyes, muscle sensors and

the inner ear), communication components (

i.e.

nerves),

computing components (

i.e.

the cortex in the central

nervous system), and actuation components (

i.e.

eye

and arm muscles). Here,

captures in the

delay in control, which can further be decomposed into

the nerve signaling delay

and other internal delays

in the feedback loop. The advanced warning

models

the fact that the rider can view its future trail

time

steps in advance. Its specific value is determined by the

rider’s speed and the trail’s features, and its effect can

be observed from that the muscle tone changes before an

expected perturbation (

). The rate constraint,

accounts for the limitations in nerve signaling.

Fig. 2. Diagrams of sensorimotor control for eye tracking movements and

mountain bike trail tracking.

(A)

Diagram of two major feedback loops involved

in the eye movement: visual cortex feedback and vestibular-ocular reflex (VOR) feed-

back. Objects are tracked using the slow visual cortex feedback, while head motion

is compensated for by the much faster VOR feedback.

(B)

Diagram of the basic

sensorimotor control model for our experiment that simulates riding a mountain bike.

Each box is designated by its function: sensing and communication (

e.g.

vision,

muscle spindle sensor, vestibulo-ocular reflex), actuation (muscle), and computation

(high-layer planning and tracking and low-layer reflexes and reactions). Depending

on the hardware details, they may be quantized (discrete valued), have time delays,

experience saturation, and be subject to noise. The trail ahead can be seen in ad-

vance, but the bumps and other disturbances are unanticipated. The line thickness

indicates the relative speed of the pathway (thicker lines for faster pathways.

Hardware SATs.

There exists trade-offs between neural

signaling speed and accuracy arising from the fixed spatial

and metabolic cost to build and maintain axons. Specifi-

cally, nerves with the same cross-sectional area can either

contain many small axons or a few large axons (Fig. 1),

which inevitably leads to SATs in neural signaling (

–

The specific forms of SATs depend on how the nerves

encode information (

e.g.

spike-based, rate-based, and

spike-interval-based encoding). Our theory does not re-

quire any specific forms of encoding methods and the

resulting hardware SATs, so for simplicity, we assume

the spike-based encoding scheme in our analysis in the

main text. In the spike-based encoding, information is

encoded in the presence or absence of a spike within

each time interval, analogous to digital packet-switching

networks (

). This encoding requires spikes to be

generated with sufficient timing accuracy, which has been

experimentally verified in many types of neurons (

To model the complex size distributions in axon bundles

in a nerve, we classify axons into

distinct types, where

each type corresponds to axons of identical size. We index

each type by

∈{

···

}

and model type

axons

as a communication channel with signaling delay

and

signaling rate

(

i.e.

the total amount of information

in bits that can be transmitted per unit time). It can be

shown that

∑

[3]

where

≥

and

λ >

are constants associated with

the total resource (

i.e.

space available to build the axons)

used by type

axons and all axons, respectively. See the

supplementary information for more detail. A special case

of Eq. 3 is that all axons have the same size. In such case,

we can model the axon bundles as a single communication

channel with signaling delay

···

and signaling rate

∑

satisfying

λT

[4]

For other types of encoding, we refer interested readers

to the supplementary information.

System SATs imposed by hardware SATs.

The hardware

SATs imposes the SATs in sensorimotor control. To study

its impact, we consider the motivating example of riding

a mountain bike, which is simulated by our driving game

experiments (see Materials and Methods). The error

between the actual and desired positions evolves according

to Eq. 57. The feedback loop Eq. 2 can transmit

bits

of information with delay

−

from sensing (of the disturbance) to actuation. We

characterize the worst-case error and the average-case

error in sensorimotor control. The worst-case error is more

applicable to risk-averse sensorimotor behaviors, such as

riding a mountain bike on a cliff/trail, in which staying

on the cliff is necessary for survival even in the presence

of the worst possible uncertainty (

–

). The average-

case error is more applicable to risk-neutral sensorimotor

behaviors, such as riding a mountain bike across a broad

field, in which there is no fatal risk of leaving the field (

The worst-case error

max

‖

∞

≤

‖

∞

is lower-

bounded by

max(0

+ 1) +

(

−

)

−

[5]

In this case,

the mean squared error

lim

→∞

)

∑

[

(

)

]

is lower-bounded by

max(0

+ 1) +

(

−

)

−

[6]

The proof of Eq. 5, Eq. 6 and more general results are

given in the supplementary information. The performance

bounds in both settings (Eq. 5–6) are qualitatively similar:

both bounds decompose into two terms. The shared first

term,

max

+ 1)

(denote as the delay error), is only a

function of the total delay and thus can be considered as

the cost due to delay. The second terms,

−

and

−

(denote as the rate error), are only functions

of the signaling rate and can be considered as the cost

due to rate limits.

Since the validity of our framework does not require

the hardware SATs to have any specific form, we next

use the SAT in spike-based encoding to demonstrate how

the SATs at the component level impact the SATs at

the system level. By combining the hardware SATs in

Eq. 4 and the system SATs in Eq. 5 and Eq. 6, we can

predict the influence of the neural signaling constraints

on sensorimotor control, shown in Fig. 3A. Increasing

the delay in the feedback loop increased the delay errors,

while increasing rate led to a large decrease in the rate

errors. The errors for the trials with both added delay

and added quantization was approximately the sum of the

errors for the trials with the delay and the quantization

added separately, as predicted by the model. Thus, the

delays can cause small disturbances to escalate into larger

errors (

), and increasing the data rate dramatically

reduces errors in the context of control.

Furthermore, the minimum reaching time or the mini-

mum error is achieved when the deleterious effects of the

nerve signaling delay and inaccuracy are both controlled

within a moderate range. Conversely, the nerve com-

position that either maximize the speed or accuracy in

nerve signaling results in suboptimal performance. This

observation suggests that the analysis of neural design

principle and its capability for information transfer should

be studied together with sensorimotor control.

Experimental test of model predictions.

The predictions

of the model were confirmed experimentally with driving

game experiments (see Materials and Methods for more

details). The subjects played the driving game under

three different conditions: with added delay, with added

Parameter

Description

(

)

Error at time step

Controller

≥

Signaling delay

≥

Advanced warning

≥

Internal delay

−

Total delay

signaling rate (bits per unit time)

Cost associated with the resource use

Table 1. Parameters in the basic model.

quantization, or with added delay and quantization. Their

trajectories were measured and the errors were analyzed

and shown in Fig. 3B.

Similar to the theoretical prediction, constrained by the

hardware SAT, Eq. 4, the optimal performance is achieved

at a sweet spot of intermediate levels with added delay

and quantization rate. Conversely, minimizing either the

added delay or the rate independently leads to suboptimal

performance.

Layered control systems.

In this section we will examine

two biological control systems that combine slow advanced

planning with fast reflexive reaction.

Visual tracking of a moving object.

The above results can be

used to study the effectiveness of the layered control ar-

chitecture used in the oculomotor system. Visual tracking

of a moving object is done through two major feedback

loops: a VOR feedback loop that compensates for head

motion and a visual feedback loop through the visual

cortex that tracks a moving object (Fig. 2A). From a

control perspective, an important difference of the two

loops is their levels of advanced warning. VOR feedback

reacts after head moves, while the visual environment is

highly correlated over time and thus are also predictable.

We refer to the regime of VOR feedback

delayed reaction

in which the net delay

−

is positive, and the un-

certainty

(

)

becomes accessible to the controller

after

(

)

affects the error dynamics. We refer to the regime of

visual feedback

advanced planning

, in which

−

≥

and the uncertainty

(

)

becomes accessible to the con-

troller

before

(

)

affects the error dynamics. These two

regimes are qualitatively different in their optimal choice

and

for achieving optimal robust performance, as

shown in Fig. 4A and summarized below.

(i) Delayed reaction:

When the net delay

−

is large, the total error can be much larger than the size

of the uncertainty

‖

∞

and goes to infinity as

→∞

This large error amplification is consistent with the all-

too-familiar observation that even a small bump on a trail

can cause a cyclist to lose control of the bike and crash.

increases, the delay error increasingly dominates the

total error. Since the delay error largely contributes to

the total error, the total error is minimized when

is set

Fig. 3. Theoretical and experimental system SATs in sensorimotor control

(A)

Theoretical SATs in the tracking (driving) task. The delay error (blue), rate error (red),

and the total error (black) in Eq. 5 are shown with varying hardware SAT

= (

−

(B)

Empirical SATs in the tracking (driving) task averaged over 4 subjects.

The error under added delay (blue), the error under added quantization (red); and

the error under added delayed and quantization (black) are shown. In the last case,

the added delay

and quantization rate

satisfy

= (

−

. The shadowed

area indicates the standard error across subjects.

to be small in return for small

. Therefore, a feedback

loop in this regime performs better when it is built from a

few large axons. Interestingly, the flat optimal delay/rate

within the delayed reaction regime suggests that optimal

performance can be achieved using one type of nerve

composition for a broad range of advanced warnings. This

property is beneficial because the net delay (defined from

advanced warning) differs across different sensorimotor

tasks.

(ii) Advanced planning:

When the net warning

−

is large, the total error approaches zero as

→

∞

. This large disturbance attenuation is consistent with

the observation that a cyclist can avoid obstacles given

enough time to plan a response,

e.g.

route a path around

them or brace against their impact. Given sufficiently

large advanced warning

, the rate error increasingly

dominates the total error because the growth in

incurs

no additional delay error. Since the rate error contributes

largely to the total error, the total error is minimized

when the signaling rate

is set to be large at the expense

of large signaling delay

. Therefore, a feedback loop in

this regime performs better when it is built from many

small axons.

This prediction is qualitatively consistent with the

anatomy of the human oculomotor system (Fig. 1). The

vestibular nerve, which transmits motion information from

the inner ear to the vestibular nucleus in the brainstem,

has

000

axons with mean diameter

μm

and coeffi-

cient of variation

μm

. In contrast, the optic nerve

carrying visual signals from the retina has approximately

million axons with mean diameter

μm

and coefficient

of variation

μm

, significantly smaller but more numer-

ous and with greater variability (

). As a consequence,

feedback from visual processing is slower (approximately

100

ms delay) but more accurate than the VOR feedback

(approximately

ms delay) (23).

This diversity in control performance can also be ob-

served in two simple tests: moving one’s hand left and

right across the visual field with increasing frequency while

holding the head still (Test 1); and shaking the head back

and forth (in a ’no’ pattern) at increasing frequency while

holding the hand still (Test 2). In Test 1, the hand starts

to blur at around 1-2 Hertz due to delays in tracking.

In Test 2, blurring due to the inability to compensate

for fast head motion occurs at a much higher frequency.

This difference illustrates that the visual cortex feedback

responsible for Test 1 (object tracking) has lower levels

of tolerable delays than the VOR feedback responsible

for Test 2 (head motion compensation). However, though

slower, the visual cortex feedback is more accurate than

the VOR feedback. This is illustrated by the fact that

standing on one leg with closed eyes is more difficult than

with eyes open.

Riding a mountain bike.

The study of oculomotor system

reveals that nerves with appropriate diversity allows the

visual systems to react to head motion quickly and col-

lect accurate visual information. This kind of DESS is

ubiquitous in sensorimotor control. For example, consider

the DESSs in the control architectures used for riding a

mountain bike. The task of riding a mountain bike was

simulated using the driving game experiments. The con-

trol system associated with the task is shown in Fig. 2B.

Fig. 4. Delayed reaction vs. advanced planning

(A)

Comparison between the

regime of advanced warning and that of delayed reaction. The top figure shows the

minimum total error Eq. 5 (the delay error plus the rate error) given a fixed resource

level

. The bottom figure shows the optimal signaling delay

, total delay

−

, and rate

λT

for varying net delay

−

. In both figures, the

horizontal axes denote the net delay

−

≥

or the net warning

−

≥

(B)

The benefit of diversity between planning and reflex layers. The top figure shows

the minimum error Eq. 99 for the case when the high-layer and low-layer controllers

are allowed to have

diverse

signaling delays and rates and otherwise (

i.e.

and

). We term the former the diverse case and the latter the uniform

case. The high-layer controller can better exploit the advanced warning to minimize

errors in the diverse case than in the uniform case. The bottom figure shows the

resulting optimal delays and rates for the diverse case. System parameters are set

to be

= 0

, and

= 10