Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
1 of 32
Endotaxis: A neuromorphic algorithm
for mapping, goal-
learning, navigation,
and patrolling
Tony Zhang
1
, Matthew Rosenberg
1,2
, Zeyu Jing
1
, Pietro Perona
3
, Markus Meister
1
*
1
Division of Biology and Biological Engineering, California Institute of Technology,
Pasadena, United States;
2
Center for the Physics of Biological Function, Princeton
University, Princeton, United States;
3
Division of Engineering and Applied Science,
California Institute of Technology, Pasadena, United States
Abstract
An animal entering a new environment typically faces three challenges: explore the
space for resources, memorize their locations, and navigate towards those targets as needed. Here
we propose a neural algorithm that can solve all these problems and operates reliably in diverse and
complex environments. At its core, the mechanism makes use of a behavioral module common to all
motile animals, namely the ability to follow an odor to its source. We show how the brain can learn
to generate internal “virtual odors” that guide the animal to any location of interest. This
endotaxis
algorithm can be implemented with a simple 3-
layer neural circuit using only biologically realistic
structures and learning rules. Several neural components of this scheme are found in brains from
insects to humans. Nature may have evolved a general mechanism for search and navigation on the
ancient backbone of chemotaxis.
eLife assessment
This
valuable
work proposes a framework inspired by chemotaxis for understanding how the brain
might implement behaviors related to navigating toward a goal. The evidence supporting the
conceptual claim is
convincing
. The article proposes a hypothesis that would be of interest to the
broad systems neuroscience community, although it was noted the relationship to existing similar
hypotheses could be clarified.
Introduction
Animals navigate their environment to look for resources – such as shelter, food, or a mate – and
exploit such resources once they are found. Efficient navigation requires knowing the structure of
the environment: which locations are connected to which others (
Tolman, 1948
). One would like
to understand how the brain acquires that knowledge, what neural representation it adopts for the
resulting map, how it tags significant locations in that map, and how that knowledge gets read out for
decision-
making during navigation.
Experimental work on these topics has mostly focused on simple environments – such as an open
arena (
Wilson and McNaughton, 1993
), a pond (
Morris et al., 1982
), or a desert (
Müller and Wehner,
1988
) – and much has been learned about neural signals in diverse brain areas under these condi
-
tions (
Sosa and Giocomo, 2021
;
Collett and Collett, 2002
). However, many natural environments
are highly structured, such as a system of burrows, or of intersecting paths through the underbrush.
Similarly, for many cognitive tasks, a sequence of simple actions can give rise to complex solutions.
RESEARCH ARTICLE
*For correspondence:
meister4@mac.com
Competing interest:
See page
27
Funding:
See page 27
Preprint posted
10 October 2022
Sent for Review
09 November 2022
Reviewed preprint posted
24 March 2023
Reviewed preprint revised
15 November 2023
Version of Record published
29 February 2024
Reviewing Editor:
Srdjan
Ostojic, École Normale
Supérieure - PSL, France
Copyright Zhang
et al
. This
article is distributed under the
terms of the
Creative Commons
Attribution License
, which
permits unrestricted use and
redistribution provided that the
original author and source are
credited.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
2 of 32
One algorithm for finding a valuable resource is common to all animals: chemotaxis. Every motile
species has a way to track odors through the environment, either to find the source of the odor or
to avoid it (
Baker et al., 2018
). This ability is central to finding food, connecting with a mate, and
avoiding predators. It is believed that brains originally evolved to organize the motor response in
pursuit of chemical stimuli. Indeed, some of the oldest regions of the mammalian brain, including
the hippocampus, seem organized around an axis that processes smells (
Jacobs, 2012
;
Aboitiz and
Montiel, 2015
).
The specifics of chemotaxis, namely the methods for finding an odor and tracking it, vary by
species, but the toolkit always includes a search strategy based on trial-
and-
error: try various actions
that you have available, then settle on the one that makes the odor stronger (
Baker et al., 2018
). For
example, a rodent will weave its head side-
to-
side, sampling the local odor gradient, then move in
the direction where the smell is stronger. Worms and maggots follow the same strategy. Dogs track a
ground-
borne odor trail by casting across it side-
to-
side. Flying insects perform similar casting flights.
Bacteria randomly change direction every now and then, and continue straight as long as the odor
improves (
Berg, 1988
). We propose that this universal behavioral module for chemotaxis can be
harnessed to solve general problems of search and navigation in a complex environment, even when
tell-
tale odors are not available.
For concreteness, consider a mouse exploring a labyrinth of tunnels (
Figure 1A
). The maze may
contain a source of food that emits an odor (
Figure 1A1
). That odor will be strongest at the source
and decline with distance along the tunnels of the maze. The mouse can navigate to the food loca
-
tion by simply following the odor gradient uphill. Suppose that the mouse discovers some other
interesting locations that
do not
emit a smell, like a source of water, or the exit from the labyrinth
(
Figures 1A2–3
). It would be convenient if the mouse could tag such a location with an odorous mate
-
rial, so it may be found easily on future occasions. Ideally, the mouse would carry with it multiple such
odor tags, so it can mark different targets each with its specific recognizable odor.
Here we show that such tagging does not need to be physical. Instead, we propose a mechanism
by which the mouse’s brain may compute a ‘virtual odor’ signal that declines with distance from a
chosen target. That neural signal can be made available to the chemotaxis module as though it were a
real odor, enabling navigation up the gradient toward the target. Because this goal signal is computed
in the brain rather than sensed externally, we call this hypothetical process
endotaxis
.
The developments reported here were inspired by a recent experimental study with mice navi
-
gating a complex labyrinth (
Rosenberg et al., 2021
) that includes 63 three-
way junctions. Among
other things, we observed that mice could learn the location of a resource in the labyrinth after
encountering it just once, and perfect a direct route to that target location after
∼
10
encounters.
Furthermore, they could navigate back out of the labyrinth using a direct route they had not traveled
before, even on the first attempt. Finally, the animals spent most of their waking time patrolling the
labyrinth, even long after they had perfected the routes to rewarding locations. These patrols covered
the environment efficiently, avoiding repeat visits to the same location. All this happened within a
few hours of the animal’s first encounter with the labyrinth. Our modeling efforts here are aimed at
explaining these remarkable phenomena of rapid spatial learning in a new environment: one-
shot
learning of a goal location, zero-
shot learning of a return route, and efficient patrolling of a complex
maze. In particular we want to do so with a biologically plausible mechanism that could be built out
of neurons.
Results
A neural circuit to implement endotaxis
Figure 1B
presents a neural circuit model that implements three goals: mapping the connectivity of
the environment; tagging of goal locations with a virtual odor; and navigation toward those goals. The
model includes four types of neurons: resource cells, point cells, map cells, and goal cells.
Resource cells
These are sensory neurons that fire when the animal encounters an interesting resource, for example,
water or food, that may form a target for future navigation. Each resource cell is selective for a specific
kind of stimulus. The circuitry that produces these responses is not part of the model.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
3 of 32
Point cells
This layer of cells represents the animal’s location. (We avoid the term ‘place cell’ here because [1]
that term has a technical meaning in the rodent hippocampus, whereas the arguments here extend
to species that do not have a hippocampus; and [2] all the cells in this network have a place field, but
it is smallest for the point cells.) Each neuron in this population has a small response field within the
environment. The neuron fires when the animal enters that response field. We assume that these point
cells exist from the outset as soon as the animal enters the environment. Each cell’s response field is
defined by some conjunction of external and internal sensory signals at that location.
+
+
+
+
B
Circuit 2
Friday,May 28, 2021
9:40 AM
u
i
v
i
r
k
G
kj
i
k
j
M
ji
A
1
2
3
4
M
G
mode switch
to chemotaxis
resource cells
goal cells
synapses
map cells
point cells
location
w
i
Figure 1.
A mechanism for endotaxis. (
A
) A constrained environment of tunnels linked by intersections, with special locations offering food, water,
and the exit. (1) A real odor emitted by the food source decreases with distance (shading). (2) A virtual odor tagged to the water source. (3) A virtual
odor tagged to the exit. (4) Abstract representation of this environment by a graph of nodes (intersections) and edges (tunnels). (
B
) A neural circuit to
implement endotaxis. Open circles: four populations of neurons that represent ‘resource,’ ‘point,’ ‘map,’ and ‘goal.’ Arrows: signal flow. Solid circles:
synapses. Point cells have small receptive fields localized in the environment and excite map cells. Map cells excite each other (green synapses) and
also excite goal cells (blue synapses). Resource cells signal the presence of a resource, for example, cheese, water, or the exit. Map synapses and goal
synapses are modified by activity-
dependent plasticity. A ‘mode’ switch selects among various goal signals depending on the animal’s need. They may
be virtual odors (water, exit) or real odors (cheese). Another goal cell (clock) may report how recently the agent has visited a location. The output of the
mode switch gets fed to the chemotaxis module for gradient ascent. Mathematical symbols used in the text:
u
i
is the output of a point cell at location
i
,
X
i
is the input to the corresponding map cell,
W
i
is the output of that map cell,
M
is the matrix of synaptic weights among map cells,
G
are the synaptic
weights from the map cells onto goal cells, and
S
L
is the output of goal cell
k
.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
4 of 32
Map cells
This layer of neurons learns the structure of the environment, namely how the various locations are
connected in space. The map cells get excitatory input from point cells in a one-
to-
one fashion. These
input synapses are static. The map cells also excite each other with all-
to-
all connections. These recur
-
rent synapses are modifiable according to a local plasticity rule. After learning, they represent the
topology of the environment.
Goal cells
Each goal cell serves to mark the locations of a special resource in the map of the environment. The
goal cell receives excitatory input from a resource cell, which gets activated whenever that resource is
present. It also receives excitatory synapses from map cells. Such a synapse is strengthened when the
presynaptic map cell is active at the same time as the resource cell.
After the map and goal synapses have been learned, each goal cell carries a virtual odor signal
for its assigned resource. The signal increases systematically as the animal moves closer to a location
with that resource. A mode switch selects one among many possible virtual odors (or real odors) to
be routed to the chemotaxis module for odor tracking. (The mode switch effectively determines the
animal’s behavioral policy. In this report, we do not consider how or why the animal chooses one mode
or another.) The animal then pursues its chemotaxis search strategy to maximize that odor, which leads
it to the selected tagged location.
Why does the circuit work?
The key insight is that the output of the goal cell declines systematically with the distance of the
animal from the target location. This relationship holds even if the environment is constrained with a
complex connectivity graph (
Figure 1A4
). Here we explain how this comes about, with mathematical
details to follow.
In a first phase, the animal explores the environment while the circuit builds a map. When the
animal moves from one location to an adjacent one, those two point cells fire in rapid succession. That
leads to a Hebbian strengthening of the excitatory synapses between the two corresponding map
map cells
goal cell
for water
graph of the
envir
onment
with point cells
AB
CD
E
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
M
G
Figure 2.
The phases of endotaxis during exploration, goal-
tagging, and navigation. A portion of the circuit in
Figure 1
is shown, including a single
goal cell that responds to the water resource. Bottom shows a graph of the environment, with nodes linked by edges, and the agent’s current location
shaded in orange. Each node has a point cell that reports the presence of the agent to a corresponding map cell. Map cells are recurrently connected
(green) and feed convergent signals onto the goal cell. (
A
) Initially the recurrent synapses are weak (empty circles). (
B
) During exploration, the agent
moves between two adjacent nodes on the graph, and that strengthens (arrowhead) the connection between their corresponding map cells (filled
circles). (
C
) After exploration, the map synapses reflect the connectivity of the graph. Now the map cells have an extended profile of activity (darker =
more active), centered on the agent’s current location
Y
and decreasing from there with distance on the graph. (
D
) When the agent reaches the water
source
Z
, the goal cell gets activated by the sensation of water, and this triggers plasticity (arrowhead) at its input synapses. Thus, the state of the map
at the water location gets stored in the goal synapses. This event represents tagging of the water location. (
E
) During navigation, as the agent visits
different nodes, the map state gets filtered through the goal synapses to excite the goal cell. This produces a signal in the goal cell that declines with
the agent’s distance from the water location.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
5 of 32
0
5
10
Graph distance
10
5
10
4
10
3
10
2
10
1
10
0
Goal signal
gain=0.26
0.28
0.30
0.32
0.34
0
5
10
15
Goal signal (log axis)
=0.26
0
5
10
15
=0.30
0
5
10
15
Graph distance
=0.32
0
5
10
15
=0.34
0
5
10
15
=0.39
0
5
10
Graph distance
0
50
100
Navigated distance
noise =0.01
ideal
gain =0.26
0.28
0.30
0.32
0.34
0
5
10
Graph distance
0
50
100
Navigated distance
gain =0.34
ideal
noise =0.30
0.10
0.03
0.01
A
B
C
D
E
Figure 3.
Theory of the goal signal. Dependence of the goal signal on graph distance, and the consequences for endotaxis navigation. (
A
) The graph
representing a binary tree labyrinth (
Rosenberg et al., 2021
) serves for illustration. Suppose the endotaxis model has acquired the adjacency matrix
perfectly:
M
=
A
. We compute the goal signal
E
xy
between any two nodes on the graph and compare the results at different values of the map gain
γ
. (
B
) Dependence of the goal signal
E
xy
on the graph distance
%
xy
between the two nodes. Mean ± SD, error bars often smaller than markers. The
maximal distance on this graph is 12. Note logarithmic vertical axis. The signal decays exponentially over many log units. At high
γ
, the decay distance
is greater. (
C
) A detailed look at the goal signal, each point is for a pair of nodes
(
x
,
y
)
. For low
γ
, the decay with distance is strictly monotonic. At high
γ
, there is overlap between the values at different distances. As
γ
exceeds the critical value
γ
c
= 0.38
, the distance dependence breaks down. (
D
)
Using the goal signal for navigation. For every pair of start and end nodes, we navigate the route by following the goal signal and compare the distance
traveled to the shortest graph distance. For all routes with the same graph distance, we plot the median navigated distance with 10 and 90% quantiles.
Variable gain at a constant noise value of
ε
= 0.01
. (
E
) As in panel (
D
) but varying the noise at a constant gain of
γ
= 0.34
.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
6 of 32
cells (
Figure 2A and B
). In this way, the recurrent network of map cells learns the connectivity of the
graph that describes the environment. To a first approximation, the matrix of synaptic connections
among the map cells will converge to the correlation matrix of their inputs (
Dayan and Abbott, 2001
;
Galtier et al., 2012
), which in turn reflects the adjacency matrix of the graph (
Equation 1
). Now the
brain can use this adjacency information to find the shortest path to a target.
After this map learning, the output of the map network is a hump of activity, centered on the
current location
Y
of the animal and declining with distance along the various paths in the graph of
the environment (
Figure 2C
). If the animal moves to a different location
Z
, the map output will change
to another hump of activity, now centered on
Z
(
Figure 2D
). The overlap of the two hump-
shaped
profiles will be large if nodes
Y
and
Z
are close on the graph, and small if they are distant. Fundamen
-
tally the endotaxis network computes that overlap.
Suppose the animal visits
Z
and finds water there. Then the water resource cell fires, triggering
synaptic learning in the goal synapses. That stores the current profile of map activity
W
J
(
y
)
in the
synapses
(
LJ
onto the goal cell
k
that responds to water (
Figure 2D
,
Equation 9
). When the animal
subsequently moves to a different location
Y
, the goal cell
k
receives the current map output
v
(
Y
)
filtered through the previously stored synaptic template
v
(
Z
)
(
Figure 2E
). This is the desired measure
of overlap (
Equation 10
). Under suitable conditions, this goal signal declines monotonically with the
shortest graph distance between
Y
and
Z
, as we will demonstrate both analytically and in simulations
(sections ‘Theory of endotaxis’ and ‘Acquisition of map and targets during exploration’).
Theory of endotaxis
Here we formalize the processes of
Figure 2
in a concrete mathematical model. The model is simple
enough to allow some exact predictions for its behavior. The present section develops an analytical
understanding of endotaxis that will help guide the numerical simulations in subsequent parts.
The environment is modeled as a graph consisting of
n
nodes, with adjacency matrix
A
ij
=
1, if node
i
can be reached from node
j
in one step
0, otherwise, including the
i
=
j
case
(1)
We suppose the graph is undirected, meaning that every link can be traversed in both directions,
A
ij
=
A
ji
Movements of the agent are modeled as a sequence of steps along that graph. During exploration,
the agent performs a walk that tries to cover the entire environment. In the process, it learns the
adjacency matrix
A
. During navigation, the agent uses that knowledge to travel to a known target.
For an agent navigating a graph, it is very useful to know the shortest graph distance between any
two nodes
D
ij
= minimum number of steps needed to reach node
i
from node
j
(2)
Given this information, one can navigate the shortest route from
Y
to
Z
: for each of the neighbors of
Y
,
look up its distance to
Z
and step to the neighbor with the shortest distance. Then repeat that process
until
Z
is reached. Thus, the shortest route can be navigated one step at a time without any high-
level
advanced planning. This is the core idea behind endotaxis.
The network of
Figure 1B
effectively computes the shortest graph distances. We implement the
circuit as a textbook linear rate model (
Dayan and Abbott, 2001
). Each map unit
i
has a synaptic
input
X
i
that it converts to an output
W
i
,
W
J
=
γ
X
J
(3)
where
γ
is the gain of the units. The input consists of an external signal
u
i
summed with a recurrent
feedback through a connection matrix
M
w
i
=
u
i
+
∑
ij
M
ij
v
j
(4)
where
.
JK
is the synaptic strength from unit
j
to
i
.
Research article
Computational and Systems Biology
|
Neuroscience
Zhang
et al
. eLife 2023;12:RP84141.
DOI:
https://doi.org/10.7554/eLife.84141
7 of 32
The point neurons are one-
hot encoders of location. A point neuron fires if the agent is at that
location; all the others are silent:
V
J
(
x
) = firing rate of point cell
J
with the agent at node
x
=
δ
Jx
(5)
where
δ
iY
is the Kronecker delta.
So the vector of all map outputs is
v
γ
(
u
Mv
)
(
γ
1
−
M
)
−
u
(6)
where
u
is the one-
hot input from point cells.
Now consider goal cell number
k
that is associated to a particular location
Z
because its resource is
present at that node. The goal cell sums input from all the map units
W
i
, weighted by its goal synapses
(
LJ
. So with the agent at node
Y
, the goal signal
S
L
is
S
L
(
Y
)=
∑
i
(
Li
·
v
i
(
Y
)=
g
L
·
v
(
Y
)=
g
L
·
(
1
γ
1
−
M
)
−
1
u
(
Y
)
(7)
where we write
H
k
for the
k
th row vector of the goal synapse matrix
G
. This is the set of synapses from
all map cells onto the specific goal cell in question.
Suppose now that the agent has learned the structure of the environment perfectly, such that the
map synapses are a copy of the graph’s adjacency matrix (1),
M
=
A
(8)
Similarly, suppose that the agent has acquired the goal synapses perfectly, namely proportional to the
map output at the goal location
Z
:
g
L
=
v
(
Z
)
(9)
Then as the agent moves to another location
Y
, the goal cell reports a signal
r
k
(
x
)=
g
k
·
v
(
x
)=
v
(
y
)
·
v
(
x
)
≡
E
xy
(10)
where the matrix
E
(
γ
1
−
A
)
−
⊤
(
γ
1
−
A
)
−
(11)
It has been shown (
Meister, 2023
) that for small values of
γ
the elements of the resolvent matrix
Y
(
γ
1
−
A
)
−
(12)
are monotonically related to the shortest graph distances
D
. Specifically,
Y
xy
−−−→
γ
→
0
γ
1+
D
xy
(13)
Building on that, the matrix
E
becomes
E
xy
−−−→
γ
→
0
∑
z
γ
1+
D
zx
γ
1+
D
zy
=
∑
z
γ
2+
D
zx
+
D
zy
(14)
The limit is dominated by the term with the smallest exponent, which occurs when
z
lies on a shortest
path from
Y
to
Z
min
z
(
D
zx
+
D
zy
)=
D
xy
where we have used the undirected nature of the graph, namely
%
[x
=
%
x[
.
Therefore,