of 22
Diversity-enabled sweet spots in layered
architectures and speed-accuracy trade-offs in
sensorimotor control
Yorie Nakahira
a,
1
, Quanying Liu
a,
1
, Terrence J. Sejnowski
b,c,
2
, John C. Doyle
a,
2
a
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125, USA
b
The Salk Institute for Biological Studies, La Jolla, CA, USA
c
Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
1
These authors contributed equally
2
To whom correspondence should be addressed; E-mail: doyle@caltech.edu, terry@salk.edu.
Nervous systems sense, communicate, compute, and actuate movement using distributed components with trade-offs in
speed, accuracy, sparsity, noise, and saturation. Nevertheless, the resulting control can achieve remarkably fast, accurate,
and robust performance due to a highly effective layered control architecture. However, there is no theory explaining the effec-
tiveness of layered control architectures that connects speed-accuracy trade-offs (SATs) in neurophysiology to the resulting
SATs in sensorimotor control. In this paper, we introduce a theoretical framework that provides a synthetic perspective to ex-
plain why there exists extreme diversity across layers and within levels. This framework characterizes how the sensorimotor
control SATs are constrained by the hardware SATs of neurons communicating with spikes and their sensory and muscle end-
points, in both stochastic and deterministic models. The theoretical predictions of the model are experimentally confirmed
using driving experiments in which the time delays and accuracy of the control input from the wheel are varied. These results
show that the appropriate diversity in the properties of neurons and muscles across layers and within levels help create sys-
tems that are
both
fast
and
accurate despite being built from components that are individually slow or inaccurate. This novel
concept, which we call “diversity-enabled sweet spots” (DESSs), explains the ubiquity of heterogeneity in the sizes of axons
within a nerve as well the resulting superior performance of sensorimotor control.
H
uman sensorimotor control can achieve extremely
robust performance in complex, uncertain environ-
ments, despite being implemented in systems that are
distributed, sparse, quantized, delayed, and saturated.
More specifically, at the hardware level, there exists a
severe speed and accuracy tradeoffs. For example, achiev-
ing fast or accurate nerve signaling requires additional
space and metabolic costs to build and maintain nerves,
and such resource limitations impose hard SATs in nerve
signaling. In contrast, at the system level, the SATs in
sensorimotor control are much less severe. For example,
when riding a mountain bike down a twisting, bumpy
trail, though a trade-off exists between traveling fast and
accurately following the trail, most human can safely
stay on the trail without crashing. Such robust perfor-
mance despite hardware limitations may due to highly
effective layered control architectures that de-constrain
the hardware constraints.
Despite the profound influence of architectures on per-
formance, we have paid little attention to what makes an
architecture effective. To understand effective layered ar-
chitectures, we need to study how component constraints
and trade-offs impact those on sensorimotor performance
and clarify the overall system performance and limitations
when different control layers act jointly. However, the
hardware SATs of neural signaling (
1
4
) and the system
SATs in sensorimotor control (
5
8
) have been studied sep-
arately. This is in part because there are few theoretical
tools that allow us to study the hardware SATs of neural
signaling (
1
4
) and the system SATs in sensorimotor con-
trol (
5
8
), or to understand the collective performance
when different layers work together. In our terminology
"layers" refers to different architectural components (
e.g.
planning layer, reflex layer), while "levels" refers to differ-
ent levels of abstraction or composition (
e.g.
brain level
vs nerve level vs molecular level, or whole muscle level vs
fiber level).
We developed a mathematical theory that connects
the component speed-accuracy constraints and trade-offs
with those at the sensorimotor system level and provides
an integrated view of a layered control systems involving
planning in a high layer and reflexive reaction in a low
layer. Using this theory, we show here that
diversity
between layers and within layers can be exploited to
achieve
both
fast
and
accurate performance despite being
implemented using slow or inaccurate hardware. We call
1
arXiv:1909.08601v2 [math.OC] 25 Sep 2019
these synergies “diversity-enabled sweet spots” (DESSs).
At the component level, this concept explains why there
are extreme heterogeneities in the characteristics of neural
components (Fig. 1) (
2
,
3
,
9
). At the system level, DESSs
explain the benefits of extreme heterogeneities in speed
and accuracy in different sensorimotor loops (10, 11).
A
Fig. 1. Component speed-accuracy trade-offs (SATs) in sensory nerves.
Sizes
and numbers of axons for selected nerves and the resulting SATs. The dashed
line represents nerves with equal cross-sectional area, which is proportional to
λ
in Eq. 3. The nerves shown have similar cross-sectional areas but wildly different
compositions of axon size and number, resulting in different speed and accuracy in
nerve signaling (1). A myelin sheath around an axon can also increase its speed of
propagation. Many nerves, such as the sciatic nerve, contain a mixture of axons with
different sizes and degrees of myelination.
Basic model.
An example of an effective layered control
architecture is the oculomotor system that stabilizes the
eye on a moving target while you are bouncing down a
trail (Fig. 2A) (
10
,
11
). Neurons in the visual cortex
responding to target motion on the retina drive the actu-
ators to pursue the target after a delay of 100 millisecond.
In contrast, fast head motions are compensated by con-
trol systems in the brainstem in the millisecond range.
Together, they allow you to maintain fixation on a distant
moving target despite severe bumps.
In trail following (Fig. 2B), higher-level cortical control
systems in the cortex and basal ganglia provide advanced
warning for planning actions to avoid trees and other
obstacles. This is accompanied by a fast feedback system
in the spinal cord that maintains stable tracking.
To study how these control systems are coordinated,
we first introduce a driving task that simulates the trail
following on a display screen. In the task, the subjects
have to track a reference trail or trajectory with small
errors despite unseen bumps and disturbances. We define
the error dynamics
x
(
t
)
between the actual position (
i.e.
player’s position) and the desired position (
i.e.
trail’s
position) as follows:
x
(
t
+ 1) =
x
(
t
) +
w
(
t
) +
u
(
t
)
,
[1]
which relates the future error
x
(
t
+ 1)
with the previous
error
x
(
t
)
, the uncertainty
w
(
t
)
(bumps or trail changes),
and the control action
u
(
t
)
. The control action
u
(
t
)
is
generated using the observed errors and uncertainty as
follows:
u
(
t
) =
K
(
x
(0 :
t
T
u
)
,w
(0 :
t
T
u
+
T
a
1)
,u
(0 :
t
1))
.
[2]
Here,
K
is a function that defines the controller, which
uses sensing components (
i.e.
eyes, muscle sensors and
the inner ear), communication components (
i.e.
nerves),
computing components (
i.e.
the cortex in the central
nervous system), and actuation components (
i.e.
eye
and arm muscles). Here,
T
u
=
T
s
+
T
i
captures in the
delay in control, which can further be decomposed into
the nerve signaling delay
T
s
and other internal delays
T
i
in the feedback loop. The advanced warning
T
a
models
the fact that the rider can view its future trail
T
a
time
steps in advance. Its specific value is determined by the
rider’s speed and the trail’s features, and its effect can
be observed from that the muscle tone changes before an
expected perturbation (
12
,
13
). The rate constraint,
R
,
accounts for the limitations in nerve signaling.
A
B
Fig. 2. Diagrams of sensorimotor control for eye tracking movements and
mountain bike trail tracking.
(A)
Diagram of two major feedback loops involved
in the eye movement: visual cortex feedback and vestibular-ocular reflex (VOR) feed-
back. Objects are tracked using the slow visual cortex feedback, while head motion
is compensated for by the much faster VOR feedback.
(B)
Diagram of the basic
sensorimotor control model for our experiment that simulates riding a mountain bike.
Each box is designated by its function: sensing and communication (
e.g.
vision,
muscle spindle sensor, vestibulo-ocular reflex), actuation (muscle), and computation
(high-layer planning and tracking and low-layer reflexes and reactions). Depending
on the hardware details, they may be quantized (discrete valued), have time delays,
experience saturation, and be subject to noise. The trail ahead can be seen in ad-
vance, but the bumps and other disturbances are unanticipated. The line thickness
indicates the relative speed of the pathway (thicker lines for faster pathways.
2
Hardware SATs.
There exists trade-offs between neural
signaling speed and accuracy arising from the fixed spatial
and metabolic cost to build and maintain axons. Specifi-
cally, nerves with the same cross-sectional area can either
contain many small axons or a few large axons (Fig. 1),
which inevitably leads to SATs in neural signaling (
1
3
).
The specific forms of SATs depend on how the nerves
encode information (
e.g.
spike-based, rate-based, and
spike-interval-based encoding). Our theory does not re-
quire any specific forms of encoding methods and the
resulting hardware SATs, so for simplicity, we assume
the spike-based encoding scheme in our analysis in the
main text. In the spike-based encoding, information is
encoded in the presence or absence of a spike within
each time interval, analogous to digital packet-switching
networks (
14
,
15
). This encoding requires spikes to be
generated with sufficient timing accuracy, which has been
experimentally verified in many types of neurons (
16
,
17
).
To model the complex size distributions in axon bundles
in a nerve, we classify axons into
m
distinct types, where
each type corresponds to axons of identical size. We index
each type by
k
∈{
1
,
2
,
···
,m
}
and model type
k
axons
as a communication channel with signaling delay
T
k
and
signaling rate
R
k
(
i.e.
the total amount of information
in bits that can be transmitted per unit time). It can be
shown that
R
k
=
λ
k
T
k
m
k
=1
λ
k
=
λ
[3]
where
λ
k
0
and
λ >
0
are constants associated with
the total resource (
i.e.
space available to build the axons)
used by type
k
axons and all axons, respectively. See the
supplementary information for more detail. A special case
of Eq. 3 is that all axons have the same size. In such case,
we can model the axon bundles as a single communication
channel with signaling delay
T
s
=
T
1
=
T
2
=
···
=
T
m
and signaling rate
R
=
m
k
=1
R
k
satisfying
R
=
λT
s
.
[4]
For other types of encoding, we refer interested readers
to the supplementary information.
System SATs imposed by hardware SATs.
The hardware
SATs imposes the SATs in sensorimotor control. To study
its impact, we consider the motivating example of riding
a mountain bike, which is simulated by our driving game
experiments (see Materials and Methods). The error
between the actual and desired positions evolves according
to Eq. 57. The feedback loop Eq. 2 can transmit
R
bits
of information with delay
T
:=
T
u
T
a
=
T
s
+
T
i
T
a
from sensing (of the disturbance) to actuation. We
characterize the worst-case error and the average-case
error in sensorimotor control. The worst-case error is more
applicable to risk-averse sensorimotor behaviors, such as
riding a mountain bike on a cliff/trail, in which staying
on the cliff is necessary for survival even in the presence
of the worst possible uncertainty (
18
21
). The average-
case error is more applicable to risk-neutral sensorimotor
behaviors, such as riding a mountain bike across a broad
field, in which there is no fatal risk of leaving the field (
1
).
The worst-case error
max
w
1
x
is lower-
bounded by
max(0
,T
+ 1) +
(
2
R
1
)
1
.
[5]
In this case,
the mean squared error
lim
n
→∞
(1
/n
)
n
t
=1
E
[
x
(
t
)
2
]
is lower-bounded by
max(0
,T
+ 1) +
(
2
2
R
1
)
1
.
[6]
The proof of Eq. 5, Eq. 6 and more general results are
given in the supplementary information. The performance
bounds in both settings (Eq. 5–6) are qualitatively similar:
both bounds decompose into two terms. The shared first
term,
max
(0
,T
+ 1)
(denote as the delay error), is only a
function of the total delay and thus can be considered as
the cost due to delay. The second terms,
(2
R
1)
1
and
(2
2
R
1)
1
(denote as the rate error), are only functions
of the signaling rate and can be considered as the cost
due to rate limits.
Since the validity of our framework does not require
the hardware SATs to have any specific form, we next
use the SAT in spike-based encoding to demonstrate how
the SATs at the component level impact the SATs at
the system level. By combining the hardware SATs in
Eq. 4 and the system SATs in Eq. 5 and Eq. 6, we can
predict the influence of the neural signaling constraints
on sensorimotor control, shown in Fig. 3A. Increasing
the delay in the feedback loop increased the delay errors,
while increasing rate led to a large decrease in the rate
errors. The errors for the trials with both added delay
and added quantization was approximately the sum of the
errors for the trials with the delay and the quantization
added separately, as predicted by the model. Thus, the
delays can cause small disturbances to escalate into larger
errors (
22
), and increasing the data rate dramatically
reduces errors in the context of control.
Furthermore, the minimum reaching time or the mini-
mum error is achieved when the deleterious effects of the
nerve signaling delay and inaccuracy are both controlled
within a moderate range. Conversely, the nerve com-
position that either maximize the speed or accuracy in
nerve signaling results in suboptimal performance. This
observation suggests that the analysis of neural design
principle and its capability for information transfer should
be studied together with sensorimotor control.
Experimental test of model predictions.
The predictions
of the model were confirmed experimentally with driving
game experiments (see Materials and Methods for more
details). The subjects played the driving game under
three different conditions: with added delay, with added
3
Parameter
Description
x
(
t
)
Error at time step
t
K
Controller
T
s
0
Signaling delay
T
a
0
Advanced warning
T
i
0
Internal delay
T
=
T
s
+
T
i
T
a
Total delay
R
signaling rate (bits per unit time)
λ
Cost associated with the resource use
Table 1. Parameters in the basic model.
quantization, or with added delay and quantization. Their
trajectories were measured and the errors were analyzed
and shown in Fig. 3B.
Similar to the theoretical prediction, constrained by the
hardware SAT, Eq. 4, the optimal performance is achieved
at a sweet spot of intermediate levels with added delay
and quantization rate. Conversely, minimizing either the
added delay or the rate independently leads to suboptimal
performance.
Layered control systems.
In this section we will examine
two biological control systems that combine slow advanced
planning with fast reflexive reaction.
Visual tracking of a moving object.
The above results can be
used to study the effectiveness of the layered control ar-
chitecture used in the oculomotor system. Visual tracking
of a moving object is done through two major feedback
loops: a VOR feedback loop that compensates for head
motion and a visual feedback loop through the visual
cortex that tracks a moving object (Fig. 2A). From a
control perspective, an important difference of the two
loops is their levels of advanced warning. VOR feedback
reacts after head moves, while the visual environment is
highly correlated over time and thus are also predictable.
We refer to the regime of VOR feedback
delayed reaction
,
in which the net delay
T
i
T
a
is positive, and the un-
certainty
w
(
t
)
becomes accessible to the controller
after
w
(
t
)
affects the error dynamics. We refer to the regime of
visual feedback
advanced planning
, in which
T
a
T
i
0
,
and the uncertainty
w
(
t
)
becomes accessible to the con-
troller
before
w
(
t
)
affects the error dynamics. These two
regimes are qualitatively different in their optimal choice
of
T
s
and
R
for achieving optimal robust performance, as
shown in Fig. 4A and summarized below.
(i) Delayed reaction:
When the net delay
T
i
T
a
>
0
is large, the total error can be much larger than the size
of the uncertainty
w
and goes to infinity as
T
i
→∞
.
This large error amplification is consistent with the all-
too-familiar observation that even a small bump on a trail
can cause a cyclist to lose control of the bike and crash.
As
T
i
increases, the delay error increasingly dominates the
total error. Since the delay error largely contributes to
the total error, the total error is minimized when
T
s
is set
A
B
Fig. 3. Theoretical and experimental system SATs in sensorimotor control
(A)
Theoretical SATs in the tracking (driving) task. The delay error (blue), rate error (red),
and the total error (black) in Eq. 5 are shown with varying hardware SAT
T
= (
R
5)
/
20
.
(B)
Empirical SATs in the tracking (driving) task averaged over 4 subjects.
The error under added delay (blue), the error under added quantization (red); and
the error under added delayed and quantization (black) are shown. In the last case,
the added delay
T
and quantization rate
R
satisfy
T
= (
R
5)
/
20
. The shadowed
area indicates the standard error across subjects.
to be small in return for small
R
. Therefore, a feedback
loop in this regime performs better when it is built from a
few large axons. Interestingly, the flat optimal delay/rate
within the delayed reaction regime suggests that optimal
performance can be achieved using one type of nerve
composition for a broad range of advanced warnings. This
property is beneficial because the net delay (defined from
advanced warning) differs across different sensorimotor
4
tasks.
(ii) Advanced planning:
When the net warning
T
a
T
i
>
0
is large, the total error approaches zero as
R
. This large disturbance attenuation is consistent with
the observation that a cyclist can avoid obstacles given
enough time to plan a response,
e.g.
route a path around
them or brace against their impact. Given sufficiently
large advanced warning
T
a
, the rate error increasingly
dominates the total error because the growth in
T
s
incurs
no additional delay error. Since the rate error contributes
largely to the total error, the total error is minimized
when the signaling rate
R
is set to be large at the expense
of large signaling delay
T
s
. Therefore, a feedback loop in
this regime performs better when it is built from many
small axons.
This prediction is qualitatively consistent with the
anatomy of the human oculomotor system (Fig. 1). The
vestibular nerve, which transmits motion information from
the inner ear to the vestibular nucleus in the brainstem,
has
20
,
000
axons with mean diameter
3
μm
and coeffi-
cient of variation
0
.
4
μm
. In contrast, the optic nerve
carrying visual signals from the retina has approximately
1
million axons with mean diameter
0
.
6
μm
and coefficient
of variation
0
.
5
μm
, significantly smaller but more numer-
ous and with greater variability (
1
). As a consequence,
feedback from visual processing is slower (approximately
100
ms delay) but more accurate than the VOR feedback
(approximately
10
ms delay) (23).
This diversity in control performance can also be ob-
served in two simple tests: moving one’s hand left and
right across the visual field with increasing frequency while
holding the head still (Test 1); and shaking the head back
and forth (in a ’no’ pattern) at increasing frequency while
holding the hand still (Test 2). In Test 1, the hand starts
to blur at around 1-2 Hertz due to delays in tracking.
In Test 2, blurring due to the inability to compensate
for fast head motion occurs at a much higher frequency.
This difference illustrates that the visual cortex feedback
responsible for Test 1 (object tracking) has lower levels
of tolerable delays than the VOR feedback responsible
for Test 2 (head motion compensation). However, though
slower, the visual cortex feedback is more accurate than
the VOR feedback. This is illustrated by the fact that
standing on one leg with closed eyes is more difficult than
with eyes open.
Riding a mountain bike.
The study of oculomotor system
reveals that nerves with appropriate diversity allows the
visual systems to react to head motion quickly and col-
lect accurate visual information. This kind of DESS is
ubiquitous in sensorimotor control. For example, consider
the DESSs in the control architectures used for riding a
mountain bike. The task of riding a mountain bike was
simulated using the driving game experiments. The con-
trol system associated with the task is shown in Fig. 2B.
A
B
Fig. 4. Delayed reaction vs. advanced planning
(A)
Comparison between the
regime of advanced warning and that of delayed reaction. The top figure shows the
minimum total error Eq. 5 (the delay error plus the rate error) given a fixed resource
level
λ
. The bottom figure shows the optimal signaling delay
T
s
, total delay
T
=
T
s
+
T
i
T
a
, and rate
R
=
λT
s
for varying net delay
T
i
T
a
. In both figures, the
horizontal axes denote the net delay
T
i
T
a
0
or the net warning
T
a
T
i
0
.
(B)
The benefit of diversity between planning and reflex layers. The top figure shows
the minimum error Eq. 99 for the case when the high-layer and low-layer controllers
are allowed to have
diverse
signaling delays and rates and otherwise (
i.e.
R
`
=
R
h
and
T
`
=
T
h
). We term the former the diverse case and the latter the uniform
case. The high-layer controller can better exploit the advanced warning to minimize
errors in the diverse case than in the uniform case. The bottom figure shows the
resulting optimal delays and rates for the diverse case. System parameters are set
to be
R
`
= 0
.
1
T
s
,
R
h
= 0
.
1
T
h
, and
T
i
= 10
.
5