PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
Efficient population coding of sensory stimuli
Shuai Shao
,
1
,
2
Markus Meister
,
3
and Julijana Gjorgjieva
4
,
1
,
*
1
Computation in Neural Circuits Group, Max Planck Institute for Brain Research, 60438 Frankfurt, Germany
2
Donders Institute and Faculty of Science, Radboud University, 6525 GD, Nijmegen, Netherlands
3
Division of Biology and Biological Engineering, California Institute of Technology, 91125 Pasadena, California, USA
4
School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
(Received 24 July 2022; accepted 19 October 2023; published 5 December 2023)
This article is part of the
Physical Review Research
collection titled
Physics of Neuroscience
.
The efficient coding theory postulates that single cells in a neuronal population should be optimally configured
to efficiently encode information about a stimulus subject to biophysical constraints. This poses the question of
how multiple neurons that together represent a common stimulus should optimize their activation functions to
provide the optimal stimulus encoding. Previous theoretical approaches have solved this problem with binary
neurons that have a step activation function, and have assumed that spike generation is noisy and follows a
Poisson process. Here we derive a general theory of optimal population coding with neuronal activation functions
of any shape, different types of noise and heterogeneous firing rates of the neurons by maximizing the Shannon
mutual information between a stimulus and the neuronal spiking output subject to a constraint on the maximal
firing rate. We find that the optimal activation functions are discrete in the biological case of non-negligible noise
and demonstrate that the information does not depend on how the population is divided into ON and OFF cells
described by monotonically increasing vs decreasing activation functions, respectively. However, the population
with an equal number of ON and OFF cells has the lowest mean firing rate, and hence encodes the highest
information per spike. These results are independent of the shape of the activation functions and the nature of
the spiking noise. Finally, we derive a relationship for how these activation functions should be distributed in
stimulus space as a function of the neurons’ firing rates.
DOI:
10.1103/PhysRevResearch.5.043205
I. INTRODUCTION
In many neuronal systems, sensory information is pro-
cessed by multiple neurons in parallel, forming a population
code. However, how a population of neurons works together
to efficiently encode a sensory stimulus in the presence of
different biological constraints is still an open question. Many
experimental and theoretical studies have proposed that neu-
ronal coding is optimal [
1
–
5
]. Determining optimality is
typically considered in the context of various constraints pro-
vided by the biological system in question. These include
various assumptions made about the structure of the neuronal
population, the relationship between stimulus and neuronal
firing, the source and magnitude of sensory noise, and differ-
ent measures used to quantify coding efficiency. For example,
a common way to describe the firing rate of a neuron as a func-
tion of the stimulus is through an activation function, which
usually describes a nonlinear dependence determined by the
*
gjorgjieva@tum.de
Published by the American Physical Society under the terms of the
Creative Commons Attribution 4.0 International
license. Further
distribution of this work must maintain attribution to the author(s)
and the published article’s title, journal citation, and DOI. Open
access publication funded by the Max Planck Society.
various ion channels embedded in the neuron’s membrane or
elaborate dendrites morphologies [
6
,
7
]. The activation func-
tions of sensory neurons can be monotonically increasing or
decreasing as a function of the stimulus, referred to as ON
or OFF, respectively [Fig.
1(a)
], although in some sensory
systems ON-OFF cells with nonmonotonic activation func-
tions also exist [
8
,
9
]. ON and OFF cells are found in many
sensory systems, including the retina where ON (OFF) gan-
glion cells code for increases (decreases) in visual stimulus
intensity or contrast [
10
,
11
] and the insect mechanosensory
system where they code for increases and decreases in leg
angle [
12
]. In line with most optimal coding theories of
neuronal populations, here we assume that multiple cells
together encode a sensory stimulus more efficiently than sin-
gle cells in the presence of sensory noise and biophysical
constraints.
Populations of sensory neurons are typically affected by
noise, which can come from different sources including from
the sensory environment and biophysical constraints. Assum-
ing a description of neuronal firing by activation functions,
noise can enter before or after the activation function, called
input vs output noise, respectively, and can have a different
influence of stimulus coding [
13
,
14
]. Since neurons com-
municate via action potentials, theoretical studies of optimal
coding have commonly assumed that individual neurons gen-
erate spike counts in fixed coding windows following Poisson
statistics [
15
–
18
]. Under conditions of low spike count
2643-1564/2023/5(4)/043205(19)
043205-1
Published by the American Physical Society
SHAO, MEISTER, AND GJORGJIEVA
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
(b)
Light on
Light on
ON neuron
OFF neuron
(a)
Membrane
potential
Firing rate
(c)
Spiking noise
......
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
Stimulus
FIG. 1. Efficient coding framework of a population of ON and OFF neurons. (a) A schematic of ON and OFF neurons. An ON neuron
fires more frequently when the stimulus (which is light in this example) is high and fires at the spontaneous rate (here 0) when the stimulus is
absent. The opposite is true for an OFF neuron. (b) The population coding model. Sensory stimuli
s
, which are constant in the coding windows
of size
T
, are drawn from a distribution
p
(
s
). The stimuli are encoded by a population of neurons with firing rates
ν
i
(
s
), which fire noisy
spike trains,
n
i
. The distribution of
n
i
is given by the conditional probability
p
(
n
i
|
ν
i
(
s
)
)
, which denotes the spiking noise. The efficiency of the
neuronal coding is quantified by the Shannon mutual information between the stimuli
s
and the spike trains
n
i
, i.e.,
I
(
n
1
,...,
n
N
;
s
). (c) The
optimal activation function, which maximizes the mutual information for a single ON neuron is discrete. (Upper) The optimal thresholds
of a single neuron that maximizes the Shannon mutual information.
i
denotes the cumulative probability of
s
above a threshold. (Lower)
Schematics depicting that the number of steps of the optimal activation function increases with the product of the maximal firing rate and the
coding window, i.e.,
R
=
ν
max
T
. For low
R
, the optimal activation function is binary and has one threshold (
i
=
1). As
R
increases, the optimal
activation function become ternary (
i
=
2), etc. The activation function becomes continuous in the limit of
R
→∞
.
intensity of the Poisson process, the optimal activation func-
tions of single neurons can be proven to be discrete with
a single step, i.e., binary [
15
,
16
]. However, when the spike
count intensity increases, binary neurons are no longer opti-
mal, but rather the number of steps in the activation function
increases as a function of spike count intensity [
16
]. Espe-
cially in biological systems, many of these assumptions need
to be relaxed. First, activation functions in different sensory
systems usually do not manifest as binary and may appear
continuous due to the presence of noise [
14
,
19
,
20
]. Neuronal
spike counts can also be non-Poisson, for instance, in the
retina [
21
,
22
]. Therefore, it is an interesting question what op-
timal configuration of activation functions can be achieved in
theoretical frameworks of efficient coding where spike counts
follow statistics other than Poisson.
What quantity might neural populations optimize? Two
measures have been commonly used [
17
,
23
–
27
]. The Shan-
non mutual information between the stimulus and neuronal
responses does not assume how the information should be
decoded downstream. Alternatively, the stimulus can be esti-
mated using a decoder and the difference between the stimulus
and the estimate can be minimized. These two measures can
generate very different predictions about the optimal popula-
tion coding strategy [
18
,
28
].
Here, we develop a general efficient coding theory based
on a population coding model with multiple ON and OFF
neurons that code for a scalar stimulus from a given distribu-
tion assuming any (monotonic) nonlinear activation function
and any noise statistics. We use the Shannon mutual infor-
mation between the stimulus and the neuronal spikes as a
measure of coding efficiency, and discover that this measure is
independent of how the population is divided into ON and
OFF neurons. We also investigate how the optimal firing
thresholds of ON and OFF neurons partition the stimulus
space as a function of the maximal neuronal firing rates. When
these firing rates are equal for all neurons, we find that the
thresholds divide the stimulus distribution into surprisingly
regular stimulus regions.
043205-2
EFFICIENT POPULATION CODING OF SENSORY STIMULI
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
II. THEORETICAL FRAMEWORK
We propose a theoretical framework of population coding
with the following assumptions [Fig.
1(a)
]:
(i) A population of ON and OFF neurons code for a
one-dimensional stimulus, with monotonically increasing and
decreasing firing rates as a function of the stimulus (respec-
tively), called activation functions;
(ii) Each neuron
i
in the population has a minimum (spon-
taneous) firing rate
ν
0
usually assumed to be 0, and a maximal
firing rate constraint
ν
max
,
i
;
(iii) The dynamic range of each neuron
i
, defined as the
stimulus that leads to nonzero and nonmaximal firing rate
ν
i
(with
ν
0
<ν
i
<ν
max
,
i
), does not overlap with the dynamic
range of other neurons;
(iv) The dynamic ranges of OFF neurons are lower than
those of ON neurons.
For the second assumption, we start with a simple case
in which the maximal firing rates in a population are iden-
tical across the cells, i.e.,
ν
max
,
i
=
ν
max
. Later in this paper
(Secs.
III D
and
III H
) we also consider neuron popula-
tions with heterogeneous
ν
max
,
i
. The assumption of zero
spontaneous firing rate ensures analytical tractability. Our
conclusions hold, at least in the case of binary activation func-
tions for all cells with Poisson noise, even if this assumption
is relaxed [
18
].
We denote the sensory stimulus to be encoded by a pop-
ulation of
N
cells as the scalar
s
, which is drawn from a
distribution
p
(
s
). We denote the activation function of each
neuron as
ν
i
(
s
), where the subscript
i
is the index of neurons
in the population. We define “the coding window”
T
as the
time period when the stimulus
s
is constant [Fig.
1(b)
]. The
coding window depends on the neuronal dynamics in the
specific sensory population. For instance, in the mammalian
retina, retinal ganglion cells have a coding window of 10 to
50 ms [
17
,
29
,
30
]. In the mouse auditory system, auditory
nerve fibers, have a coding window of 50 ms [
14
,
19
]. Defining
a coding window allows us to define the spike count
n
i
for
neuron
i
within a coding window
T
, which has an expected
value of
ν
i
(
s
)
T
. Therefore, the stimulus
s
is encoded by a vec-
tor of noisy spike counts
n
={
n
1
,...,
n
N
}
, which represents
the population code.
We consider a general noise model where the spike counts
follow a probability distribution
p
(
n
|
s
), which only directly
depends on the expected value
ν
(
s
)
T
. Since the firing rate
vector
ν
is a deterministic function of the stimulus
s
, and
assuming the noise of different neurons is independent of each
other, the probability distribution
p
(
n
|
s
) can also be written
as a product of the spike count probability distribution of
every neuron, i.e.,
p
(
n
|
ν
(
s
)
)
=
i
p
(
n
i
|
ν
i
(
s
)
)
. Because
ν
i
is
the firing rate and
ν
i
T
is the expected value of the spike count
n
i
of neuron
i
, by definition, we have
+∞
n
i
=
0
p
(
n
i
|
ν
i
)
=
1
,
(1)
+∞
n
i
=
0
p
(
n
i
|
ν
i
)
n
i
=
ν
i
T
.
(2)
While the noise can follow any distribution, a special case
commonly used in previous study is the Poisson noise
where
p
(
n
|
s
)
=
p
(
n
|
ν
(
s
)
)
=
i
(
ν
i
(
s
)
T
)
n
i
n
i
!
e
−
ν
i
(
s
)
T
. We quantify
the coding efficiency of this population code using the Shan-
non mutual information between the population spike count
n
and stimulus
s
,
I
(
s
,
n
)
=
n
ds p
(
s
)
p
(
n
|
s
)log
p
(
n
|
s
)
P
(
n
)
(3)
where
P
(
n
)
=
ds p
(
s
)
i
p
(
n
i
|
ν
i
)
,
(4)
and
n
=
+∞
n
1
=
0
...
+∞
n
N
=
0
denotes the sum over all possible
spike counts of all the neurons.
Because the firing rates
ν
depend deterministically on the
stimulus
s
, the mutual information between
s
and
n
is the same
as the mutual information between
ν
and
n
(see Supplemental
Material, SM [
31
]),
I
(
s
,
n
)
=
I
(
ν,
n
)
=
n
ν
d
N
ν
p
(
ν
)
p
(
n
|
ν
)log
p
(
n
|
ν
)
P
(
n
)
.
(5)
III. RESULTS
We seek to derive the optimal activation functions
{
ν
i
(
·
)
}
i
of an entire population of ON and OFF neurons, which
maximize the mutual information
I
(
s
,
n
)[Eq.(
5
)], when the
conditional probability
p
(
n
i
|
ν
i
) is given. We also aim to de-
termine how this maximal mutual information depends on the
ON-OFF composition of the neuronal population.
A. The optimal activation function for a single neuron is discrete
We first investigate a population with only a single neuron
subject to the constraints from Sec.
II
. Previous studies have
found that under these conditions and with Poisson-distributed
spike counts, the optimal activation function for a single neu-
ron should be discrete, with an increasing number of steps
as a function of the product
R
=
ν
max
T
, i.e., the maximum
expected spike count [
15
,
16
] [Fig.
1(c)
]. In two steps, we
generalize this result to any analytic conditional probability
p
(
n
|
ν
) (analytic in terms of
ν
) using the fact that mutual
information is convex in the input space [
32
].
In step 1, we prove that the mutual information
I
(
ν,
n
)is
distributed proportionally to the probability density
p
(
ν
)in
the optimal configuration. Defining the “density of mutual
information” as
i
(
ν
)
=
n
p
(
n
|
ν
)log
p
(
n
|
ν
)
P
(
n
)
(6)
we can write
I
(
ν,
n
)
=
ν
d
ν
p
(
ν
)
i
(
ν
)
.
(7)
We can then prove that in the optimal case,
i
(
ν
)
=
I
max
for all possible
ν
(8)
where
I
max
is the maximal mutual information (see the SM
[
31
]).
043205-3
SHAO, MEISTER, AND GJORGJIEVA
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
Then in step 2, we show that Eq. (
8
) cannot be true if the
activation function
ν
(
s
) is continuous, therefore concluding
that it must be discrete. To do this, we first redefine the
activation function using a function
F
ν
. For an ON neuron (the
case for an OFF neuron follows similarly), we can write for
any arbitrary firing rate ̃
ν
,
F
ν
( ̃
ν
)
=
s
max
( ̃
ν
)
−∞
ds p
(
s
)
,
(9)
where
s
max
( ̃
ν
) is defined as the highest
s
that makes
ν
(
s
)
̃
ν
,
i.e.,
s
max
=
max
{
s
|
ν
(
s
)
̃
ν
}
. Because
ν
(
s
) is a monotonically
increasing function of
s
,
s
max
( ̃
ν
) is also monotonically increas-
ing, making
F
ν
( ̃
ν
) a monotonically increasing function of ̃
ν
.
We can replace the variable in the integral of Eq. (
9
), leading
to
F
ν
( ̃
ν
)
=
ν
(
s
=
s
max
( ̃
ν
))
ν
(
s
→−∞
)
d
ν
p
(
ν
)
.
(10)
Therefore,
F
ν
becomes the cumulative distribution function of
the firing rate
ν
,
F
ν
( ̃
ν
)
=
̃
ν
0
d
ν
p
(
ν
)
.
(11)
Let
F
∗
ν
denote the optimal activation function, which maxi-
mizes the mutual information
I
(
ν,
n
)[Eqs.(
5
) and (
7
)]. We
explicitly include the dependence of the density of mutual
information
i
(
ν
)[Eq.(
6
)] on the activation function
F
ν
by
writing
i
(
ν,
F
ν
) because
P
(
n
) depends on
F
ν
. Then, Eq. (
8
)
can be rewritten as
i
(
ν,
F
∗
ν
)
=
I
(
F
∗
ν
) for all
ν
in
E
∗
ν
(12)
where
E
∗
ν
is the set of points at which
F
∗
ν
increases.
From now on, we denote the conditional probability
p
(
n
i
|
ν
i
)by
L
(
n
i
,ν
i
T
), and call it the “noise generation func-
tion”. If we assume
L
(
n
,ν
T
) is analytic with respect to
ν
T
,
then we can show that the optimal activation function has a
finite number of steps, i.e.,
E
∗
ν
is a finite set of points. Note
that because of Eq. (
11
),
E
ν
is also the set of all possible
firing rates, i.e.,
E
ν
={
ν
|
p
(
ν
)
>
0
}
.If
E
∗
ν
has a finite number
of points, then the optimal
ν
(
s
) will have a finite number of
steps.
Let us first consider the case that
E
∗
ν
is infinite. In the
simplest case, if
F
∗
ν
is continuous over the interval [0
,ν
max
],
then
E
∗
ν
=
[0
,ν
max
]. As a result,
i
(
ν,
F
∗
ν
)
=
const for any
ν
∈
[0
,ν
max
].
If
F
∗
ν
is not continuous but
E
∗
ν
has an infinite number
of points (e.g.,
F
∗
ν
is only continuous on a subinterval of
[0
,ν
max
]), similar to previous study [
15
,
32
], one can use the
Bolzano Weierstrass theorem [
33
] to prove that
E
∗
ν
has a limit
point in [0
,ν
max
]. Then by the identity theorem for analytic
functions [
34
], if two analytic functions, in our case
i
(
ν,
F
∗
ν
)
and
I
(
F
∗
ν
), have the same value on an infinite number of
points and the limit of these points, then they are equal, i.e.,
i
(
ν,
F
∗
ν
)
=
const for any
ν
∈
[0
,ν
max
]. In short, assuming
E
∗
ν
has an infinite number of points also implies that
i
(
ν,
F
∗
ν
)isa
constant over the interval [0
,ν
max
].
If
E
∗
ν
is infinite, assuming optimal coding, based on Eq. (
8
),
we have
i
(
ν
)
=
+∞
n
=
0
L
(
n
,ν
T
)log
L
(
n
,ν
T
)
P
(
n
)
=
I
max
=
const
.
(13)
Then the derivative with respect to
ν
T
,
i
(
ν
)
=
+∞
n
=
0
L
(
n
,ν
T
)log
L
(
n
,ν
T
)
P
(
n
)
=
0
(14)
where
L
(
n
,ν
T
) denotes
∂
L
(
n
,ν
T
)
∂
(
ν
T
)
. Similarly, the second deriva-
tive
i
(
ν
)
=
+∞
n
=
0
L
(
n
,ν
T
)log
L
(
n
,ν
T
)
P
(
n
)
+
L
(
n
,ν
T
)
2
L
(
n
,ν
T
)
=
0
.
(15)
Using mathematical induction, one can prove that for any
m
∈
N
+
,the
m
th derivative of
i
(
ν
) with respect to
ν
T
,
i
(
m
)
(
ν
),
contains the term
+∞
n
=
0
L
(
m
)
(
n
,ν
T
)log
L
(
n
,ν
T
)
P
(
n
)
.
(16)
According to Eq. (
2
),
n
L
(
n
,ν
T
)
n
=
ν
T
,wehave
L
(0
,
0)
=
1
,
L
(
n
1
,
0)
=
0
.
(17)
Based on these two boundary conditions,
L
(
n
,ν
T
) can be
written as a Maclaurin series
L
(0
,ν
T
)
=
1
+
+∞
k
=
1
a
0
k
(
ν
T
)
k
,
(18)
L
(
n
,ν
T
)
=
+∞
k
=
1
a
nk
(
ν
T
)
k
.
(19)
Substituting these two series into the fractional or polyno-
mial terms of the derivatives of the noise generation function
L
(
n
,ν
T
)
,
L
(
n
,ν
T
)
,...,
L
(
m
−
1)
(
n
,ν
T
), and also in the terms
n
L
(
m
)
(
n
,ν
T
)log
P
(
n
) in the derivatives
i
(
m
)
(
ν
)[Eq.(
16
)],
we find that they all become fractional or polynomial terms
of
ν
T
after doing the Maclaurin expansion with respect to
ν
T
around 0. For example, in
i
(
ν
)[Eq.(
15
)],
L
(
n
,ν
T
)
2
L
(
n
,ν
T
)
=
+∞
k
=
1
a
nk
k
(
ν
T
)
k
−
1
2
+∞
k
=
1
a
nk
(
ν
T
)
k
=
+∞
k
=
1
a
nk
k
(
ν
T
)
k
−
1
2
a
n
1
ν
T
1
+
+∞
k
=
2
a
nk
a
n
1
(
ν
T
)
k
−
1
−
1
=
a
n
1
ν
T
+
3
a
n
2
+
5
a
n
3
+
a
2
n
2
a
n
1
ν
T
+···
(
n
1)
.
(20)
043205-4
EFFICIENT POPULATION CODING OF SENSORY STIMULI
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
The only exception is the term containing log(
ν
T
) apart from the polynomial terms,
+∞
n
=
0
L
(
m
)
(
n
,ν
T
)log
L
(
n
,ν
T
)
=
+∞
k
=
m
a
0
k
k
!
(
k
−
m
)!
(
ν
T
)
k
−
m
log
1
+
+∞
l
=
1
a
0
l
(
ν
T
)
l
+
+∞
n
=
1
+∞
k
=
m
a
nk
k
!
(
k
−
m
)!
(
ν
T
)
k
−
m
×
⎡
⎣
log
a
n
,
j
(
n
)
+
j
(
n
)log(
ν
T
)
+
log
⎛
⎝
1
+
l
>
j
(
n
)
a
nl
a
nj
(
ν
T
)
l
−
j
(
n
)
⎞
⎠
⎤
⎦
(21)
where
j
(
n
) is the minimal index of
k
that makes
a
nk
>
0
when
n
is given. When
ν
T
→
0, we can see that the first
term in Eq. (
21
) is finite. The second term can be expanded
as the sum of polynomial terms and other terms proportional
to (
ν
T
)
k
−
m
log(
ν
T
), which converge to 0 if
k
>
m
. The only
diverging term is (
ν
T
)
k
−
m
log(
ν
T
) when
k
=
m
, which be-
comes log(
ν
T
). Hence, the second term diverges as
+∞
n
=
1
a
nm
j
(
n
)log(
ν
T
)
(22)
while other terms of
i
(
m
)
(
ν
) either converge to a finite value
or diverge even faster than log(
ν
T
), because they are either
polynomial or fractional terms of
ν
T
. The sum of the coef-
ficients
a
nm
of all the fractional terms with the same order
should then be 0. If we could not find a relationship among
a
nm
that make the sum 0, a paradox would arise completing the
proof. In addition, the sum of the coefficients
a
nm
of log(
ν
T
)
terms should also be 0, i.e.,
+∞
n
=
1
a
nm
j
(
n
)
=
0 for all
m
1
.
(23)
According to Eq. (
17
), when
ν
T
=
0,
L
(
n
1
,ν
T
) reaches
its lower bound 0. Then the derivative
L
(
n
,
0), which equals
to
a
n
1
[see Eq. (
19
)], is positive or 0 for any
n
1, i.e.,
a
n
1
0
.
(24)
Combining with Eq. (
23
), and noting that
j
(
n
)
>
0, we have
a
n
1
=
0 for all
n
1
.
(25)
Similarly, based on
a
n
1
=
0, we can derive
a
n
2
=
0. This is
because the second derivative
L
(
n
1
,
0) also needs to be
positive or 0, given that
L
(
n
1
,ν
T
) is at its lower bound
and its first derivative is 0. Continuing this process, we get
a
nm
=
0
(26)
for all
n
1 and
m
1. Substituting into Eq. (
19
), we have
L
(
n
,ν
T
)
=
0 for any
n
1 and any
ν,
(27)
which leads to
L
(0
,ν
T
)
=
1 for any
ν.
(28)
This is in contradiction to Eq. (
2
),
n
L
(
n
,ν
T
)
n
=
ν
T
, since
ν>
0 means that the neuron fires and
L
(0
,ν
T
) cannot be 1.
Therefore, Eq. (
13
) leads to a paradox, which indicates that
the set of increasing points
E
∗
ν
cannot be infinite.
In summary, this proves that a continuous activation func-
tion is inconsistent with Eq. (
8
). This means that the optimal
activation function for a single neuron must be discrete for any
noise generation function.
B. The optimal activation functions for a population
of neurons are discrete
Next, we investigate a population of
N
neurons, made up of
ON and OFF neurons that have monotonically increasing and
decreasing activation functions as a function of the stimulus
s
, respectively. We continue to consider the same constraints
of a maximal firing rate and zero spontaneous firing rate
[Fig.
1(a)
]. Under these conditions, the optimal activation
functions for all neurons in the population continues to be
discrete for any analytic noise generation function
L
(
n
i
,ν
i
T
).
We define the “dynamic range” of a neuron to be the
interval of
s
that leads to unsaturated firing rates, i.e.,
{
s
|
0
<
ν
i
(
s
)
<ν
max
}
for neuron
i
(see Sec.
II
). For a discrete activa-
tion function, the dynamic range is the interval between the
lowest and highest threshold. We assume that the dynamic
ranges of any two neurons do not overlap and also assumed
that any OFF neuron encodes smaller stimuli than any ON
neuron (see Sec.
II
), which is consistent with experimental
measurements [
12
] and previous theoretical study [
18
].
We consider a mixed population of
m
ON neurons and
N
−
m
OFF neurons. To proceed, we label all ON neurons
with decreasing indices (
m
to 1) from low to high dynamic
ranges, where the ON neuron with the highest dynamic range
has index 1. Similarly, we label all OFF neurons with increas-
ing indices (
m
+
1to
N
) from low to high dynamic ranges
to ensure symmetry in our mathematical expressions (note
this ordering is different from previous work [
18
] to ensure
symmetry of the expressions).
If one of the ON neurons 1
,
2
,...,
m
fires, assuming that
spontaneous firing rates are 0, we know that the stimulus
s
is higher than, or is at least within the dynamic range of
neuron
m
. Then we also know the firing rates of neurons
m
+
1
,
m
+
2
,...,
N
, which means the spike counts of these
neurons cannot give any new information about the stimulus
s
. Based on this, we can write the mutual information encoded
by the mixture of
m
ON neurons and
N
−
m
OFF neurons as
I
N
(
F
1
,...,
F
N
)
=
I
m
(
F
1
,...,
F
m
)
+
Q
m
I
N
−
m
F
(
m
)
m
+
1
,...,
F
(
m
)
N
.
(29)
Here
F
i
=
F
ν
i
is defined in the same way as before [Eq. (
11
)],
while
Q
m
denotes the probability that none of the ON neurons
1
,
2
,...,
m
fires. We additionally define the terms
F
(
m
)
to
denote the “revised” distribution functions under the condi-
tion that none of the neurons 1
,
2
,...,
m
fires, i.e., given an
043205-5
SHAO, MEISTER, AND GJORGJIEVA
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
arbitrary firing rate ̃
ν
,
F
(
m
)
i
( ̃
ν
)
=
Prob(
ν
i
̃
ν
|
n
1
= ··· =
n
m
=
0)
.
(30)
From Bayes rule, we can write
F
(
m
)
i
( ̃
ν
)
=
F
i
( ̃
ν
)Prob(
n
1
= ··· =
n
m
=
0
|
ν
i
̃
ν
)
Q
m
.
(31)
Here,
Q
m
does not depend on ̃
ν
. Within the dynamic range
of neuron
i
(where
i
>
m
), the firing rate of neurons 1
,...,
m
are all 0, which means Prob(
n
1
= ··· =
n
m
=
0
|
ν
i
̃
ν
)also
does not depend on ̃
ν
in the dynamic range of
F
i
. Therefore,
if
F
i
is discrete,
F
(
m
)
i
will also be discrete, and vice versa.
This relationship also exists between
F
i
and
F
(
j
)
i
where
j
is
an arbitrary positive integer smaller than
i
.
Following a similar logic, we can also decompose the
mutual information encoded by a population of
N
neurons
in Eq. (
29
)into
N
single terms, each containing the mutual
information encoded by one neuron, i.e.,
I
N
=
I
(
F
1
)
+
P
1
(0)
I
F
(1)
2
+
P
(1)
2
(0)
I
F
(2)
3
+···
+
P
(
N
−
2)
N
−
1
(0)
I
F
(
N
−
1)
N
(32)
where
P
i
(0)
=
L
(0
,ν
i
T
)
dF
i
denotes the probability that
neuron
i
does not fire, i.e.,
n
i
=
0. Furthermore, we have
used
I
(
F
(
i
−
1)
i
) to denote the mutual information of neuron
i
assuming that neurons 1
,...,
i
−
1 do not fire. Since
m
does
not explicitly appear in this equation, Eq. (
32
) applies to any
mixed ON-OFF population, including homogeneous ON pop-
ulations (where
m
=
N
) or homogeneous OFF populations
(where
m
=
0).
We use mathematical induction to demonstrate that the
optimal activation functions in a population are all discrete.
Having already shown this for a single neuron, we assume it is
true for a population of
N
−
1 cells. Then we add an additional
neuron and show the optimal activation functions of all
N
neurons are discrete. Without loss of generality, we assume
that the newly added neuron is an ON neuron with a high-
est dynamic range, labeled with 1, and the remaining
N
−
1
neurons 2
,...,
N
. The sum of all the terms multiplying
P
1
(0)
in Eq. (
32
) has the same mathematical form as
I
N
−
1
.Asa
result, the sum equals
I
max
N
−
1
when optimizing
F
(1)
2
,...,
F
(
N
−
1)
N
,
allowing us to write
I
N
=
I
(
F
1
)
+
P
1
(0)
I
max
N
−
1
.
(33)
Meanwhile, because we assumed that optimal activation func-
tions are discrete in a population of
N
−
1 neurons, the
optimal
F
(1)
2
,...,
F
(
N
−
1)
N
are all discrete. As we argued be-
fore, since
F
i
and
F
(
j
)
i
are either both discrete or both
continuous, this means that
F
2
,...,
F
N
are all discrete. As
before [Eq. (
6
)], we can also define the density of mutual
information here as
i
(
ν
1
)
=
n
1
p
(
n
1
|
ν
1
)log
p
(
n
1
|
ν
1
)
P
(
n
)
+
p
(
n
1
=
0
|
ν
1
)
I
max
N
−
1
=
n
1
L
(
n
1
,ν
1
T
)log
L
(
n
1
,ν
1
T
)
P
(
n
)
+
L
(0
,ν
1
T
)
I
max
N
−
1
.
(34)
Therefore, maximizing
I
N
is equivalent to optimizing
F
1
as-
suming optimal
F
(1)
2
,...,
F
(
N
−
1)
N
as in Eq. (
33
). If the optimal
F
1
is continuous, when
I
N
is maximized we have
i
(
ν
1
)
=
I
max
N
,ν
1
∈
[0
,ν
max
]
(35)
and this leads to (see the SM [
31
])
L
(
n
1
=
0
,ν
1
T
)
=
1 for any
ν
1
.
(36)
Similar to Eq. (
28
), here Eq. (
36
) is also in contradiction to
Eq. (
2
),
n
1
L
(
n
1
,ν
1
T
)
n
1
=
ν
1
T
. Therefore, the optimal
F
1
must be discrete and we have proved that all the
N
optimal
activation functions need to be discrete.
Hence, using mathematical induction, we have proved that
all the neurons’ optimal activation functions in a population
of any number of neurons are discrete.
C. The optimal thresholds and the maximal mutual information
for a population of binary neurons
Having shown that the optimal activation functions in pop-
ulation are discrete for any noise generation function, we first
consider the simplest discrete activation function, which is bi-
nary, to derive the optimal thresholds and the maximal mutual
information. As before, we study a combination of a total of
N
neurons,
m
ON and
N
−
m
OFF neurons. Assuming that
the spontaneous firing rate (the firing rate when the stimulus
s
is subthreshold) is 0 [Fig.
2(a)
], only two parameters charac-
terize the activation function of neuron
i
,
ν
i
(
s
): the threshold
(denoted as
θ
i
) and the maximal firing rate, which as before
we assume is the same for all neurons (
ν
max
,
i
=
ν
max
for all
i
).
Because there is only one threshold for every neuron, the
dynamic range of every neuron is compressed to a single
point, the neuron’s threshold. Labeling all the neurons as
before
θ
m
+
1
<
···
<θ
N
<θ
m
<
···
<θ
1
,
(37)
we note that there is only one noisy firing level at the maxi-
mum firing rate. The absence of noise in the zero firing state
enables us to lump all firing states with nonzero spike count
into one [
18
](seealsoSM[
31
]).
Because the optimal activation functions are discrete, fol-
lowing [
18
], we can replace the firing thresholds with the
intervals of stimulus space partitioned by those thresholds
[Fig.
2(a)
] and optimize them instead of directly optimizing
thresholds, i.e., we define
u
i
=
Prob(
ν
i
=
ν
max
)
=
⎧
⎨
⎩
+∞
θ
i
ds p
(
s
)
,
for ON
θ
i
−∞
ds p
(
s
)
,
for OFF.
(38)
Denoting
R
=
ν
max
T
,
q
=
L
(0
,
R
)
=
1
−
+∞
n
=
1
L
(
n
,
R
)
,
(39)
we extend the finding of [
18
] to any noise generation function
that the maximal mutual information is
I
max
N
=
log(1
+
N
(1
−
q
)
q
q
/
(1
−
q
)
)
=−
log(
P
(
0)) (40)
where
P
(
0) is the probability that spike counts are all 0 (see
the SM [
31
]). The maximal information
I
max
N
is independent
of the composition of ON neurons and OFF neurons and
043205-6
EFFICIENT POPULATION CODING OF SENSORY STIMULI
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
FIG. 2. Efficient population coding of binary neurons. (a) Ac-
tivation functions of ON and OFF binary neurons. Each neuron
has the same maximal firing rate
ν
max
, with an activation function
described by a single threshold
θ
i
. (b) Optimal configurations of
homogeneous populations with only ON neurons and mixed ON
and OFF neurons.
i
denotes the cumulative probability of
s
above
threshold
θ
i
. The optimal thresholds partition the cumulative stimulus
space into regular intervals [Eqs. (
46
)–(
48
)]. The optimized mutual
information is independent of the ON-OFF mixture for any noise
generation function [Eq. (
49
)].
only depends on the total number of neurons
N
. Hence, we
have generalized the previously termed “equal coding theo-
rem” to other noise generation functions than Poisson [
18
].
In addition, comparing the maximal mutual information of a
single neuron population
I
max
1
, and of an
N
-neuron population
I
max
N
, reveals that the maximum mutual information encoded
by a population of neurons increases logarithmically with the
number of neurons,
I
max
N
=
log
N
exp
I
max
1
−
1
+
1
.
(41)
Given this maximum mutual information, we next cal-
culate the optimal threshold distribution of the population’s
binary activation functions. We can show (see the SM [
31
])
that the optimal
{
u
i
}
for the ON neurons are
u
i
=
1
+
(
i
−
1)(1
−
q
)
N
(1
−
q
)
+
q
−
q
/
(1
−
q
)
(42)
and for the OFF neurons
u
i
=
1
+
(
m
−
i
+
1)(1
−
q
)
N
(1
−
q
)
+
q
−
q
/
(1
−
q
)
.
(43)
The terms
{
u
i
}
represent an arithmetic progression for any
noise generation function
L
, whereby all the firing thresholds
equally partition the probability space of stimuli similar to the
case with Poisson noise [
18
]. If we define
p
1
=
u
1
,
p
m
+
1
=
u
m
+
1
,
p
i
=
u
i
−
u
i
−
1
,
i
=
2
,...,
m
,
m
+
2
,...,
N
,
(44)
as the probabilities of the stimulus intervals, i.e., the intervals
of stimuli
s
that lead to the same firing rates
ν
[Fig.
2(b)
], we
have
p
1
=
p
m
+
1
def
=
p
edge
,
p
2
= ··· =
p
m
=
p
m
+
2
= ··· =
p
N
def
=
p
,
p
=
(1
−
q
)
p
edge
.
(45)
This gives us the optimal thresholds in cumulative stimulus
space [Fig.
2(b)
],
i
=
θ
i
−∞
ds p
(
s
)
,
(46)
for the ON cells as:
1
=
1
−
p
edge
,
2
=
1
−
p
edge
−
p
,
···
m
=
1
−
p
edge
−
(
m
−
1)
p
,
(47)
and for the OFF cells as:
m
+
1
=
p
edge
,
m
+
2
=
p
edge
+
p
,
···
N
=
p
edge
+
(
N
−
m
−
1)
p
.
(48)
Given these optimal thresholds, we can combine this with
Eq. (
40
) to find the expression for the optimal mutual infor-
mation
I
max
N
=−
log(1
−
Np
)
.
(49)
Hence, we conclude that, for a mixed ON-OFF population
with binary activation functions, the optimal thresholds and
mutual information look exactly the same for any noise gener-
ation function as for Poisson [
18
]. Homogeneous populations
with only ON or OFF neurons, and mixed ON-OFF popula-
tions with any ON-OFF mixture can encode the same amount
of information.
D. The optimal thresholds for a population of binary neurons
with heterogeneous maximal firing rates
Different neurons might have different maximal firing rate
constraints. For example, ON ganglion cells in the primate
043205-7
SHAO, MEISTER, AND GJORGJIEVA
PHYSICAL REVIEW RESEARCH
5
, 043205 (2023)
retina have higher firing rates than OFF ganglion cells [
30
]. To
explore the effect of these maximal firing rate differences on
efficient coding, next we assume that the different neurons in
the population might have different maximal firing rates, and
consider a heterogeneous population of ON and OFF neurons.
In this case, we define
ν
max
,
i
as the maximal firing rate of
neuron
i
,
R
i
=
ν
max
,
i
T
,
q
i
=
L
(0
,
R
i
)
.
(50)
Similar to Eq. (
38
), we define
u
i
as the probability that neuron
i
fires at its maximal firing rate, i.e.,
u
i
=
Prob(
ν
i
=
ν
max
,
i
).
Then we can prove that the optimal thresholds are {see
Eqs. (S6.9) and (S6.10) in the SM [
31
]
}
u
i
=
q
q
i
/
(1
−
q
i
)
i
+
i
−
1
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
(51)
for ON neurons and
u
i
=
q
q
i
/
(1
−
q
i
)
i
+
i
−
1
j
=
m
+
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
(52)
for OFF neurons. The maximal mutual information now be-
comes
I
N
=
log
⎡
⎣
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
⎤
⎦
.
(53)
This result tells us that as long as the distribution of the maxi-
mal firing rates is the same (i.e., the same set of
{
q
i
}
), shuffling
the thresholds within the ON and OFF subpopulations, replac-
ing ON neurons with OFF, or replacing OFF neurons with
ON, does not change the maximal mutual information (Fig.
3
).
Similar to Eq. (
44
), defining
p
1
=
u
1
,
p
m
+
1
=
u
m
+
1
,
p
i
=
u
i
−
u
i
−
1
,
i
=
2
,...,
m
,
m
+
2
,...,
N
,
(54)
we can derive the stimulus intervals partitioned by the thresh-
olds as
p
1
=
q
q
1
/
(1
−
q
1
)
1
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
=
e
−
I
N
q
q
1
/
(1
−
q
1
)
1
,
p
m
+
1
=
q
q
m
+
1
/
(1
−
q
m
+
1
)
m
+
1
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
=
e
−
I
N
q
q
m
+
1
/
(1
−
q
m
+
1
)
m
+
1
,
p
i
=
q
q
i
/
(1
−
q
i
)
i
−
q
1
/
(1
−
q
i
−
1
)
i
−
1
1
+
N
j
=
1
(1
−
q
j
)
q
q
j
/
(1
−
q
j
)
j
=
e
−
I
N
q
q
i
/
(1
−
q
i
)
i
−
q
1
/
(1
−
q
i
−
1
)
i
−
1
,
i
=
2
,...,
m
,
m
+
2
,...,
N
.
(55)
...
Same amount of information
...
1
0
1
0
1
0
1
0
(a)
(b)
(c)
(d)
FIG. 3. Efficient population coding of binary neurons with het-
erogeneous maximal firing rates. (a) Optimal configurations of
homogeneous populations with only ON neurons.
i
denotes the cu-
mulative probability of
s
above threshold
θ
i
. The optimal thresholds
partition the cumulative stimulus space into intervals, which increase
with the maximal firing rate of the neurons within a population.
(b) Same as (a) but with maximal firing rates shuffled. (c) Same as
(a) but for mixed populations of ON and OFF neurons. (d) Same as
(c) but with maximal firing rates shuffled. All populations code the
same amount of mutual information assuming the same distribution
of the maximal firing rates [Eq. (
53
)].
One can show that (see the SM [
31
])
d
dq
(
q
q
/
(1
−
q
)
)
<
0
,
d
dq
(
q
1
/
(1
−
q
)
)
>
0
.
(56)
We note that recently [
35
] derived a similar solution for binary
neurons.
In our solution [Eq. (
56
)],
q
i
→
0 and
q
i
−
1
→
0,
p
i
con-
verges to its maximum
e
−
I
N
. On the other hand, when
q
i
→
1
043205-8