of 32
Supplemental Material
Efficient population coding of sensory stimuli
Shuai Shao
1
,
2
, Markus Meister
2
and Julijana Gjorgjieva
1
,
2
1. Computation in Neural Circuits Group, Max Planck Institute for Brain Research, Frankfurt, Germany
2. Donders Institute and Faculty of Science, Radboud University, Nijmegen, Netherlands
3. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
4. School of Life Sciences, Technical University of Munich, Freising, Germany
Contact: gjorgjieva@tum.de
1 The mutual information between stimulus and spikes equals the
mutual information between firing rates and spikes
In this section we prove the argument in the main text that the mutual information between the stimuli
s
and the spike counts
⃗n
, equals the mutual information between the firing rates
⃗ν
and the spike counts
⃗n
, i.e.,
I
(
s,⃗n
) =
I
(
⃗ν,⃗n
) (Eq. 5). This was also shown in previous literature [1] but limited to a single neuron (i.e.,
N
= 1, when
⃗n
and
⃗ν
are scalars).
Since the spike counts of different neurons are independent of each other, we can write
p
(
⃗n
|
s
) =
Y
i
p
(
n
i
|
s
)
.
(S1.1)
Inserting it into the formula of the Mutual Information (Eq. 3), we have
I
(
s,⃗n
) =
X
⃗n
ˆ
d
sp
(
s
)
p
(
⃗n
|
s
) log
p
(
⃗n
|
s
)
P
(
⃗n
)
=
X
⃗n
ˆ
d
sp
(
s
)
p
(
⃗n
|
s
) log
p
(
⃗n
|
s
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
X
i
ˆ
d
sp
(
s
) log
p
(
n
i
|
s
)
Y
k
p
(
n
k
|
s
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
⃗n
ˆ
d
sp
(
s
) log
p
(
n
i
|
s
)
Y
k
p
(
n
k
|
s
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.2)
Note that
Q
k
p
(
n
k
|
s
) =
p
(
n
i
|
s
)
Q
k
̸
=
i
p
(
n
k
|
s
), summing over all
k
̸
=
i
, we have
I
(
s,⃗n
) =
X
i
X
n
i
ˆ
d
sp
(
s
)
p
(
n
i
|
s
) log
p
(
n
i
|
s
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
ν
i
p
(
ν
i
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.3)
Denoting by
⃗ν
(
i
)
= (
ν
1
,...,ν
i
1
i
+1
,...,ν
N
) the vector of all the
ν
except for
ν
i
, we have
ˆ
d
N
1
⃗ν
(
i
)
p
(
⃗ν
(
i
)
|
ν
i
) = 1
.
(S1.4)
1
Therefore, equation (Eq. S1.3) becomes
I
(
s,⃗n
) =
X
i
X
n
i
ˆ
d
N
1
⃗ν
(
i
)
p
(
⃗ν
(
i
)
|
ν
i
)
ˆ
d
ν
i
p
(
ν
i
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
n
i
|
⃗ν
) log
p
(
n
i
|
⃗ν
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
) log
p
(
n
i
|
⃗ν
)
Y
k
p
(
n
k
|
⃗ν
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.5)
Similar to Eq. S1.1, we also have
p
(
⃗n
|
⃗ν
) =
Y
i
p
(
n
i
|
⃗ν
)
(S1.6)
which leads to
I
(
s,⃗n
) =
X
i
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
n
i
|
⃗ν
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
)
X
i
log
p
(
n
i
|
⃗ν
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
⃗n
|
⃗ν
)
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
⃗ν
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
⃗n
|
⃗ν
)
P
(
⃗n
)
=
I
(
⃗ν,⃗n
)
.
(S1.7)
2 Density of the mutual information for a single neuron is constant
when optimized
This section works as a first step to prove that the optimal activation function for a single neuron is discrete.
We limit our discussion to a single neuron. Without loss of generality, we only consider an ON neuron with
an activation function where the firing rate increases with stimulus intensity. Studying an OFF neuron is
entirely symmetric. As in the main text, the maximal firing rate of this neuron is constrained to
ν
max
and
the spontaneous firing rate is denoted by
ν
0
.
2.1 A neuron with a discrete activation function
First, we consider a neuron with a discrete activation function. In this case, the firing rate can only be some
discrete values between
ν
0
and
ν
max
. Therefore, we denote the probability that the firing rate is
ν
by
p
ν
,
instead of
p
(
ν
) that we commonly write. The mutual information is then
I
(
s,n
) =
I
(
ν,n
) =
X
n
X
ν
p
ν
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.1)
We can define the entropy of the spike count at a given firing rate as
h
(
ν
) =
+
X
n
=0
p
(
n
|
ν
) log
p
(
n
|
ν
)
(S2.2)
and note that
P
(
n
) =
X
ν
p
ν
p
(
n
|
ν
)
,
(S2.3)
2
so that we have
I
(
ν,n
) =
X
ν
p
ν
h
(
ν
)
X
ν
p
ν
X
n
p
(
n
|
ν
) log
P
(
n
)
.
(S2.4)
Since
p
ν
are probabilities, we have the constraint that
P
ν
p
ν
= 1, hence to optimize the objective function
we include a Lagrange multiplier,
e
I
=
I
(
ν,n
) +
λ
(
X
ν
p
ν
1)
.
(S2.5)
Assuming optimality,
p
ν
e
I
=
h
(
ν
)
X
n
p
(
n
|
ν
) log
P
(
n
)
X
n
p
(
n
|
ν
) +
λ
= 0
.
(S2.6)
Absorbing
P
n
p
(
n
|
ν
) =
1 into
λ
, i.e.
λ
λ
1, we have
h
(
ν
)
X
n
p
(
n
|
ν
) log
P
(
n
) +
λ
= 0
.
(S2.7)
Multiplying both sides by
p
ν
and summing over
ν
, we have
I
(
s,n
) +
λ
= 0
.
(S2.8)
We define
i
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.9)
Multiplying this equation with
p
ν
and summing over
ν
, we have
I
=
X
ν
p
ν
i
(
ν
)
.
(S2.10)
Therefore, we call
i
(
ν
) “the density of mutual information”, which is also defined by Eq. 6 in the main text.
According to Eq. S2.7, we can write
I
(
ν,n
) =
λ
=
X
n
p
(
n
|
ν
) log
P
(
n
)
h
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
=
i
(
ν
)
.
(S2.11)
This means when the mutual information is optimized,
i
(
ν
) is a constant for all possible
ν
. The convexity
of the mutual information ensures that the optimal solution is unique. As a special case, if the spontaneous
rate
ν
0
= 0, according to Eq. 1 and Eq. 2, we have
p
(
n
= 0
|
ν
= 0) = 1 and
p
(
n
̸
= 0
|
ν
= 0) = 0. As a result,
I
max
=
i
(
ν
= 0) =
log
P
(0)
.
(S2.12)
Also, Eq. S2.11 means the mutual information
I
(
ν,n
) is distributed proportionally to the probabilities
p
ν
when it is maximized. In addition, one can also define
i
s
(
s
) =
X
n
p
(
n
|
s
) log
p
(
n
|
s
)
P
(
n
)
,
(S2.13)
then we have
I
=
ˆ
d
sp
(
s
)
i
s
(
s
)
(S2.14)
and
i
(
ν
) =
i
s
(
s
)
.
(S2.15)
Therefore, the maximal mutual information
I
(
s,n
) will be distributed proportionally to the probability
density of the stimulus
s
, denoted by
p
(
s
) in the main text and Fig. 1B. The density function
i
s
(
s
) is also
a constant over the space of stimulus
s
. For example, if we have a ternary activation function with three
possible firing rates 0,
ν
max
/
2, and
ν
max
, and the stimulus
s
follows a standard normal distribution, the input
space in terms of
ν
is
{
0
max
/
2
max
}
, so we have
i
(
ν
= 0) =
i
(
ν
=
ν
max
/
2) =
i
(
ν
=
ν
max
). Similarly, the
input space of
s
is then the set of all real numbers
R
, and we have
i
s
(
s
) = const
,s
R
.
3
2.2 A neuron with a continuous activation function
We assume that the neuron has a continuous and smooth (analytic) activation function, with the lowest rate
(i.e., the spontaneous firing rate)
ν
0
and the maximal firing rate
ν
max
. Then, the mutual information can
be written as:
I
=
+
X
n
=0
ˆ
ν
max
ν
0
d
sp
(
s
)
p
(
n
|
s
) log
p
(
n
|
s
)
P
(
n
)
=
+
X
n
=0
ˆ
ν
max
ν
0
d
ν p
(
ν
)
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.16)
Define
e
I
=
I
+
λ

́
ν
max
ν
0
p
(
ν
)d
ν
1

, then
e
I
=
+
X
n
=0
P
(
n
) log
P
(
n
) +
+
X
n
=0
ˆ
ν
max
ν
0
d
ν p
(
n
|
ν
)
p
(
ν
) log
p
(
n
|
ν
) +
λ

ˆ
ν
max
ν
0
d
ν p
(
ν
)
1

.
(S2.17)
When optimized,
δ
e
I
=
+
X
n
=0
(log
P
(
n
) + 1)
δP
(
n
)
ˆ
ν
max
ν
0
d
ν h
(
ν
)
δp
(
ν
) +
λ
ˆ
ν
max
ν
0
d
ν δp
(
ν
) = 0
.
(S2.18)
Because
δP
(
n
) =
δ

ˆ
ν
max
ν
0
d
ν p
(
ν
)
p
(
n
|
ν
)

=
ˆ
ν
max
ν
0
d
ν p
(
n
|
ν
)
δp
(
ν
)
,
(S2.19)
we have
δ
e
I
=
ˆ
ν
max
ν
0
d
ν
+
X
n
=0
p
(
n
|
ν
) (log
P
(
n
) + 1)
δp
(
ν
)
ˆ
ν
max
ν
0
d
ν h
(
ν
)
δp
(
ν
) +
λ
ˆ
ν
max
ν
0
d
ν δp
(
ν
) = 0 (S2.20)
which leads to
+
X
n
=0
p
(
n
|
ν
) (log
P
(
n
) + 1)
h
(
ν
) +
λ
= 0
.
(S2.21)
Absorbing
P
+
n
=0
p
(
n
|
ν
) =
1 into
λ
, multiplying by
p
(
ν
), and integrating over
ν
, we have
I
+
λ
= 0. As
a result,
I
=
λ
=
X
n
p
(
n
|
ν
) log
P
(
n
)
h
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
=
i
(
ν
)
,
for
ν
[
ν
0
max
]
(S2.22)
which means the density of mutual information
i
(
ν
) is a constant for all firing rates
ν
. One can still define the
density function with stimulus
s
as Eq. S2.13 and the
i
s
(
s
) is still a constant when the mutual information
is optimized.
In summary, we have shown that the density of mutual information
i
(
ν
) is a constant for all possible
firing rates, independent of whether the activation function is discrete or continuous. We note that this
result has also been proven in previous work using a different approach based on the convexity of mutual
information [2, 3].
3 The optimal activation functions of a population of neurons are
discrete
To prove that the optimal activation functions are discrete, we first need to prove that when the mutual
information of a population of
N
neurons is maximized, the density of mutual information
e
i
(
ν
1
) that we
defined in the main text is a constant and equals to the maximal mutual information
I
max
N
(Eq. 35). Con-
sistent with the main text, we denote
p
(
n
i
|
ν
i
) by
L
(
n
i
i
T
) from now on.
4
According to the definition in the main text (Eq. 34), we have
e
i
(
ν
1
) =
X
n
1
p
(
n
1
|
ν
1
) log
p
(
n
1
|
ν
1
)
P
(
n
)
+
p
(
n
1
= 0
|
ν
1
)
I
max
N
1
=
X
n
1
L
(
n
1
1
T
) log
L
(
n
1
1
T
)
P
(
n
)
+
L
(0
1
T
)
I
max
N
1
(S3.1)
and we can see that when neurons 2
,...,N
are all optimized,
ˆ
d
ν
1
i
(
ν
1
)
p
(
ν
1
) =
I
N
=
I
(
F
1
) +
P
1
(0)
I
max
N
1
.
(S3.2)
Similar as in the last section, we define
e
I
N
=
I
N
+
λ
́
ν
max
0
p
(
ν
1
)d
ν
1
1

, and write
e
I
N
=
+
X
n
1
=0
P
(
n
1
) log
P
(
n
1
) +
+
X
n
1
=0
ˆ
ν
max
0
d
ν
1
L
(
n
1
1
T
)
p
(
ν
1
) log
L
(
n
1
1
T
)
+
I
max
N
1
ˆ
ν
max
0
d
ν
1
p
(
ν
1
)
L
(0
1
T
) +
λ

ˆ
ν
max
0
d
ν
1
p
(
ν
1
)
1

.
(S3.3)
When optimized,
δ
e
I
N
=
+
X
n
1
=0
(log
P
(
n
1
) + 1)
δP
(
n
1
)+
ˆ
ν
max
0
d
ν
1
I
max
N
1
L
(0
1
T
)
h
(
ν
1
)

δp
(
ν
1
)+
λ
ˆ
ν
max
0
d
ν
1
δp
(
ν
1
) = 0
.
(S3.4)
Because
δP
(
n
1
) =
δ

ˆ
ν
max
0
d
ν
1
p
(
ν
1
)
L
(
n
1
1
T
)

=
ˆ
ν
max
0
d
ν
1
L
(
n
1
1
T
)
δp
(
ν
1
)
,
(S3.5)
we have
δ
e
I
N
=
ˆ
ν
max
0
d
ν
1
+
X
n
1
=0
L
(
n
1
1
T
) (log
P
(
n
1
) + 1)
δp
(
ν
1
)
+
ˆ
ν
max
0
d
ν
1
I
max
N
1
L
(0
1
T
)
h
(
ν
1
)

δp
(
ν
1
) +
λ
ˆ
ν
max
0
d
ν
1
δp
(
ν
1
) = 0
(S3.6)
which leads to
+
X
n
1
=0
L
(
n
1
1
T
) (log
P
(
n
1
) + 1) +
I
max
N
1
L
(0
1
T
)
h
(
ν
1
) +
λ
= 0
.
(S3.7)
Absorbing
P
+
n
1
=0
L
(
n
1
1
T
) =
1 into
λ
, multiplying by
p
(
ν
1
), and integrating over
ν
1
, we have
I
N
+
λ
=
0. As a result,
I
max
N
=
λ
=
X
n
1
L
(
n
1
1
T
) log
P
(
n
1
) +
I
max
N
1
L
(0
1
T
)
h
(
ν
1
)
=
X
n
1
L
(
n
1
1
T
) log
L
(
n
1
1
T
)
P
(
n
1
)
+
I
max
N
1
L
(0
1
T
) =
e
i
(
ν
1
)
,
for
ν
1
[0
max
]
.
(S3.8)
Second, we need to prove that the above result (Eq. S3.8) will lead to a contradiction if the optimal activation
function
F
1
is continuous. From the discussion in the main text, this is equivalent to finding a paradox in
e
i
(
ν
1
) =
+
X
n
1
=0
L
(
n
1
1
T
) log
L
(
n
1
1
T
)
P
(
n
1
)
+
I
max
N
1
L
(0
1
T
) =
I
max
N
= const
.
(S3.9)
5
Following the same procedure as in the main text, we can prove that if we write the Maclaurin series
L
(
n
1
1
T
) =
+
X
k
=1
a
n
1
,k
(
ν
1
T
)
k
(S3.10)
for any
n
1
1, the sum of the coefficients of log(
νT
) terms in the
m
th
derivative of
e
i
(
ν
1
) is then
P
+
n
1
=1
a
n
1
,m
j
(
n
1
),
where
j
(
n
1
) is the minimal index of
k
that makes
a
n
1
,k
>
0. This follows the same formalism as Eq. 22 in
the main text, because the additional term here,
I
max
N
1
L
(0
1
T
), does not contribute to log(
νT
) terms when
it is written as a Maclaurin series.
If Eq. S3.9 were correct, all the derivatives of
e
i
(
ν
1
) would be 0, and we would have
+
X
n
1
=1
a
n
1
,m
j
(
n
1
) = 0 for any
m
1
(S3.11)
because log(
ν
1
T
) diverges as
ν
1
T
0. Again, similar as in the main text, we could show that in this case,
L
(
n
1
= 0
1
T
) = 1 for any
ν
1
,
(S3.12)
which cannot be true. Therefore, we have proved that the optimal
F
1
being continuous will lead to a paradox.
Therefore, in a population of
N
neurons, given that neurons 2
,...,N
are all optimized, the optimal activation
function of neuron 1 will be discrete.
4 The number of steps in the optimal activation functions in-
creases as a function of the maximal firing rate constraint
Here, we perform extensive numerical calculations on neuronal populations with up to four neurons and any
ON-OFF mixture to demonstrate that as the maximal firing rate constraint
ν
max
increases, the number of
steps in the optimal activation functions increases. We calculated the optimal thresholds numerically using
three different noise generation functions
L
(
n,νT
) (Fig. S1):
(1) Poisson distribution
L
(
n,νT
) =
(
νT
)
n
n
!
exp(
νT
)
,
(S4.1)
(2) Binomial distribution
L
(
n,νT
) =
N
n

p
n
(1
p
)
N
n
,
(S4.2)
where
N
= 30 and
p
=
νT/N
, and
(3) Geometric distribution
L
(
n,νT
) =
p
n
(1
p
)
(S4.3)
where
p
=
νT/
(1 +
νT
).
In detail, we calculated the mutual information as a function of the firing thresholds and intermediate
firing rates, based on Eq. 5. These parameters were initialized randomly, and then optimized using SLSQP
method [4]. All code was written in Python 2.7. A sample is publicly available at https://zenodo.org/record/8083056.
We find that the number of thresholds increases as the maximal firing rate constraint
ν
max
increases.
Moreover, for all neurons in the same population, the threshold splitting occurs at the same firing rate,
meaning that every neuron in the population has an optimal discrete activation function with the same
number of steps. Hence, the optimal neuronal population consists of exclusively binary neurons, or exclusively
ternary neurons, or exclusively quaternary neurons, etc. But it can never be a mixture of neurons with
different numbers of steps, e.g., binary and ternary.
6
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
Neuron 1
Neuron 2
0
0.2
0.4
0.6
0.8
1
1 ON +
1 OFF
2 ON
2 ON +
1 OFF
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
3 ON
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
5
10
15
20
25
Neuron 1
Neuron 2
Neuron 3
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
5
10
15
20
25
Neuron 1
Neuron 2
Neuron 3
Neuron 1
Neuron 2
0
5
10
15
20
25
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Poisson
Binomial
Geometric
2 ON +
2 OFF
3 ON +
1 OFF
0
5
10
15
20
25
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Neuron 1
Neuron 2
Neuron 3
Neuron 4
0
5
10
15
20
25
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
Neuron 1
Neuron 2
Neuron 3
Neuron 4
Figure S1.
Optimal thresholds in different neuronal populations (1 ON + 1 OFF, 2 ON, 2 ON + 1 OFF,
3 ON, 2 ON + 2 OFF, and 3 ON + 1 OFF) with different noise generation functions (Poisson, Binomial,
and Geometric distribution).
7
5 Population coding of binary neurons with any noise generation
function
From now on, we denote the probability function of spike generation as
L
(
n
i
,r
i
), where
n
i
is the spike count
of neuron
i
, and
r
i
is the expected value of
n
i
. When the firing rate of neuron
i
is
ν
i
and the time window
is
T
, we have
r
i
=
ν
i
T
. Consistent with the main text, we assume the neurons do not have a spontaneous
firing rate, i.e.,
ν
0
= 0.
R
=
ν
max
T
is the maximal value of any
r
i
.
For a binary neuron, we define the interval of stimulus space partitioned by its threshold as
u
i
= Prob(
ν
i
=
ν
max
), which is the same as Eq. 38 in the main text (Fig. 2B). The mutual information between stimuli and
spikes can be formulated as
I
1
=
g
(
u
1
) =
X
n
ˆ
s
d
sp
(
s
)
p
(
n
|
s
) log
P
(
n
|
s
)
P
(
n
)
=
(1
u
1
) log
P
(0) +
u
1
L
(0
,R
) log
L
(0
,R
)
P
(0)
+
u
1
+
X
n
=1
L
(
n,R
) log
L
(
n,R
)
u
1
L
(
n,R
)
=
P
(0) log
P
(0) +
u
1
L
(0
,R
) log
L
(0
,R
)
u
1
log
u
1
+
X
n
=1
L
(
n,R
)
.
(S5.1)
Define
q
=
L
(0
,R
) = 1
P
+
n
=1
L
(
n,R
), we have
P
(0) = 1
u
1
+
u
1
q
(S5.2)
I
1
=
P
(0) log
P
(0) +
u
1
q
log
q
u
1
(1
q
) log
u
1
.
(S5.3)
Here, all the nonzero spike counts have been merged as in previous work with Poisson spike statistics [5].
It is equivalent to only having a firing state
n
̸
= 0 and a non-firing state
n
= 0. The only difference from
Poisson spike statistics is the exact formulation of the function
L
and the value of
q
. This similarity allows us
to use some of the results derived in previous literature [5]. For example, given a population of
N
neurons,
its mutual information can be written as
I
N
=
g
(
u
1
) + (1
u
1
(1
q
))
h
g
(
u
(1)
2
) +
...
+

1
u
(
N
2)
N
1
(1
q
)

g
(
u
(
N
1)
N
)
i
(S5.4)
where
u
(
j
)
i
means the revised probability of
u
i
after knowing that all neurons 1
,...,j
(
j < i
) did not spike,
and
g
(
u
) =
(1
u
+
uq
)log(1
u
+
uq
) +
uq
log
q
u
(1
q
) log
u
. The index of neurons follow Eq. 37 in
the main text (Fig. 2B).
One can show that
u
(
j
)
i
follows the rule below [5]:
u
(
j
)
i
=
u
(
j
1)
i

1
P
(
j
1)
j
(0)

P
(
j
1)
j
(0)
=
u
(
j
1)
i
u
(
j
1)
j
(1
q
)
1
u
(
j
1)
j
(1
q
)
(S5.5)
if neuron
i
and neuron
j
are both ON, or both OFF, and
u
(
j
)
i
=
u
(
j
1)
i
P
(
j
1)
j
(0)
=
u
(
j
1)
i
1
u
(
j
1)
j
(1
q
)
(S5.6)
if neuron
i
is OFF but neuron
j
is ON. Here
P
(
k
)
j
(0) is the probability that neuron
j
does not fire after
knowing that none of the neurons 1
,...,k
(
k < j
) spikes, we also have
P
(
k
)
j
(0) = 1
u
(
k
)
j
(1
q
)
.
(S5.7)
Taking derivatives of
I
N
with respect to
u
(
N
1)
N
,
u
(
N
2)
N
1
, ..., and
u
1
yields [5]
u
(
i
1)
i
=
1
(
N
i
+ 1)(1
q
) +
q
q/
(1
q
)
.
(S5.8)
8