Supplemental Material
Efficient population coding of sensory stimuli
Shuai Shao
1
,
2
, Markus Meister
2
and Julijana Gjorgjieva
1
,
2
1. Computation in Neural Circuits Group, Max Planck Institute for Brain Research, Frankfurt, Germany
2. Donders Institute and Faculty of Science, Radboud University, Nijmegen, Netherlands
3. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
4. School of Life Sciences, Technical University of Munich, Freising, Germany
Contact: gjorgjieva@tum.de
1 The mutual information between stimulus and spikes equals the
mutual information between firing rates and spikes
In this section we prove the argument in the main text that the mutual information between the stimuli
s
and the spike counts
⃗n
, equals the mutual information between the firing rates
⃗ν
and the spike counts
⃗n
, i.e.,
I
(
s,⃗n
) =
I
(
⃗ν,⃗n
) (Eq. 5). This was also shown in previous literature [1] but limited to a single neuron (i.e.,
N
= 1, when
⃗n
and
⃗ν
are scalars).
Since the spike counts of different neurons are independent of each other, we can write
p
(
⃗n
|
s
) =
Y
i
p
(
n
i
|
s
)
.
(S1.1)
Inserting it into the formula of the Mutual Information (Eq. 3), we have
I
(
s,⃗n
) =
X
⃗n
ˆ
d
sp
(
s
)
p
(
⃗n
|
s
) log
p
(
⃗n
|
s
)
P
(
⃗n
)
=
X
⃗n
ˆ
d
sp
(
s
)
p
(
⃗n
|
s
) log
p
(
⃗n
|
s
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
X
i
ˆ
d
sp
(
s
) log
p
(
n
i
|
s
)
Y
k
p
(
n
k
|
s
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
⃗n
ˆ
d
sp
(
s
) log
p
(
n
i
|
s
)
Y
k
p
(
n
k
|
s
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.2)
Note that
Q
k
p
(
n
k
|
s
) =
p
(
n
i
|
s
)
Q
k
̸
=
i
p
(
n
k
|
s
), summing over all
k
̸
=
i
, we have
I
(
s,⃗n
) =
X
i
X
n
i
ˆ
d
sp
(
s
)
p
(
n
i
|
s
) log
p
(
n
i
|
s
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
ν
i
p
(
ν
i
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.3)
Denoting by
⃗ν
(
i
)
= (
ν
1
,...,ν
i
−
1
,ν
i
+1
,...,ν
N
) the vector of all the
ν
except for
ν
i
, we have
ˆ
d
N
−
1
⃗ν
(
i
)
p
(
⃗ν
(
i
)
|
ν
i
) = 1
.
(S1.4)
1
Therefore, equation (Eq. S1.3) becomes
I
(
s,⃗n
) =
X
i
X
n
i
ˆ
d
N
−
1
⃗ν
(
i
)
p
(
⃗ν
(
i
)
|
ν
i
)
ˆ
d
ν
i
p
(
ν
i
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
n
i
|
ν
i
) log
p
(
n
i
|
ν
i
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
n
i
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
n
i
|
⃗ν
) log
p
(
n
i
|
⃗ν
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
i
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
) log
p
(
n
i
|
⃗ν
)
Y
k
p
(
n
k
|
⃗ν
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
.
(S1.5)
Similar to Eq. S1.1, we also have
p
(
⃗n
|
⃗ν
) =
Y
i
p
(
n
i
|
⃗ν
)
(S1.6)
which leads to
I
(
s,⃗n
) =
X
i
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
n
i
|
⃗ν
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
)
X
i
log
p
(
n
i
|
⃗ν
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
⃗n
|
⃗ν
)
−
X
⃗n
P
(
⃗n
) log
P
(
⃗n
)
=
X
⃗n
ˆ
⃗ν
d
N
⃗ν p
(
⃗ν
)
p
(
⃗n
|
⃗ν
) log
p
(
⃗n
|
⃗ν
)
P
(
⃗n
)
=
I
(
⃗ν,⃗n
)
.
(S1.7)
2 Density of the mutual information for a single neuron is constant
when optimized
This section works as a first step to prove that the optimal activation function for a single neuron is discrete.
We limit our discussion to a single neuron. Without loss of generality, we only consider an ON neuron with
an activation function where the firing rate increases with stimulus intensity. Studying an OFF neuron is
entirely symmetric. As in the main text, the maximal firing rate of this neuron is constrained to
ν
max
and
the spontaneous firing rate is denoted by
ν
0
.
2.1 A neuron with a discrete activation function
First, we consider a neuron with a discrete activation function. In this case, the firing rate can only be some
discrete values between
ν
0
and
ν
max
. Therefore, we denote the probability that the firing rate is
ν
by
p
ν
,
instead of
p
(
ν
) that we commonly write. The mutual information is then
I
(
s,n
) =
I
(
ν,n
) =
X
n
X
ν
p
ν
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.1)
We can define the entropy of the spike count at a given firing rate as
h
(
ν
) =
−
+
∞
X
n
=0
p
(
n
|
ν
) log
p
(
n
|
ν
)
(S2.2)
and note that
P
(
n
) =
X
ν
p
ν
p
(
n
|
ν
)
,
(S2.3)
2
so that we have
I
(
ν,n
) =
−
X
ν
p
ν
h
(
ν
)
−
X
ν
p
ν
X
n
p
(
n
|
ν
) log
P
(
n
)
.
(S2.4)
Since
p
ν
are probabilities, we have the constraint that
P
ν
p
ν
= 1, hence to optimize the objective function
we include a Lagrange multiplier,
e
I
=
I
(
ν,n
) +
λ
(
X
ν
p
ν
−
1)
.
(S2.5)
Assuming optimality,
∂
p
ν
e
I
=
−
h
(
ν
)
−
X
n
p
(
n
|
ν
) log
P
(
n
)
−
X
n
p
(
n
|
ν
) +
λ
= 0
.
(S2.6)
Absorbing
−
P
n
p
(
n
|
ν
) =
−
1 into
λ
, i.e.
λ
→
λ
−
1, we have
−
h
(
ν
)
−
X
n
p
(
n
|
ν
) log
P
(
n
) +
λ
= 0
.
(S2.7)
Multiplying both sides by
p
ν
and summing over
ν
, we have
I
(
s,n
) +
λ
= 0
.
(S2.8)
We define
i
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.9)
Multiplying this equation with
p
ν
and summing over
ν
, we have
I
=
X
ν
p
ν
i
(
ν
)
.
(S2.10)
Therefore, we call
i
(
ν
) “the density of mutual information”, which is also defined by Eq. 6 in the main text.
According to Eq. S2.7, we can write
I
(
ν,n
) =
−
λ
=
−
X
n
p
(
n
|
ν
) log
P
(
n
)
−
h
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
=
i
(
ν
)
.
(S2.11)
This means when the mutual information is optimized,
i
(
ν
) is a constant for all possible
ν
. The convexity
of the mutual information ensures that the optimal solution is unique. As a special case, if the spontaneous
rate
ν
0
= 0, according to Eq. 1 and Eq. 2, we have
p
(
n
= 0
|
ν
= 0) = 1 and
p
(
n
̸
= 0
|
ν
= 0) = 0. As a result,
I
max
=
i
(
ν
= 0) =
−
log
P
(0)
.
(S2.12)
Also, Eq. S2.11 means the mutual information
I
(
ν,n
) is distributed proportionally to the probabilities
p
ν
when it is maximized. In addition, one can also define
i
s
(
s
) =
X
n
p
(
n
|
s
) log
p
(
n
|
s
)
P
(
n
)
,
(S2.13)
then we have
I
=
ˆ
d
sp
(
s
)
i
s
(
s
)
(S2.14)
and
i
(
ν
) =
i
s
(
s
)
.
(S2.15)
Therefore, the maximal mutual information
I
(
s,n
) will be distributed proportionally to the probability
density of the stimulus
s
, denoted by
p
(
s
) in the main text and Fig. 1B. The density function
i
s
(
s
) is also
a constant over the space of stimulus
s
. For example, if we have a ternary activation function with three
possible firing rates 0,
ν
max
/
2, and
ν
max
, and the stimulus
s
follows a standard normal distribution, the input
space in terms of
ν
is
{
0
,ν
max
/
2
,ν
max
}
, so we have
i
(
ν
= 0) =
i
(
ν
=
ν
max
/
2) =
i
(
ν
=
ν
max
). Similarly, the
input space of
s
is then the set of all real numbers
R
, and we have
i
s
(
s
) = const
,s
∈
R
.
3
2.2 A neuron with a continuous activation function
We assume that the neuron has a continuous and smooth (analytic) activation function, with the lowest rate
(i.e., the spontaneous firing rate)
ν
0
and the maximal firing rate
ν
max
. Then, the mutual information can
be written as:
I
=
+
∞
X
n
=0
ˆ
ν
max
ν
0
d
sp
(
s
)
p
(
n
|
s
) log
p
(
n
|
s
)
P
(
n
)
=
+
∞
X
n
=0
ˆ
ν
max
ν
0
d
ν p
(
ν
)
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
.
(S2.16)
Define
e
I
=
I
+
λ
́
ν
max
ν
0
p
(
ν
)d
ν
−
1
, then
e
I
=
−
+
∞
X
n
=0
P
(
n
) log
P
(
n
) +
+
∞
X
n
=0
ˆ
ν
max
ν
0
d
ν p
(
n
|
ν
)
p
(
ν
) log
p
(
n
|
ν
) +
λ
ˆ
ν
max
ν
0
d
ν p
(
ν
)
−
1
.
(S2.17)
When optimized,
δ
e
I
=
−
+
∞
X
n
=0
(log
P
(
n
) + 1)
δP
(
n
)
−
ˆ
ν
max
ν
0
d
ν h
(
ν
)
δp
(
ν
) +
λ
ˆ
ν
max
ν
0
d
ν δp
(
ν
) = 0
.
(S2.18)
Because
δP
(
n
) =
δ
ˆ
ν
max
ν
0
d
ν p
(
ν
)
p
(
n
|
ν
)
=
ˆ
ν
max
ν
0
d
ν p
(
n
|
ν
)
δp
(
ν
)
,
(S2.19)
we have
δ
e
I
=
−
ˆ
ν
max
ν
0
d
ν
+
∞
X
n
=0
p
(
n
|
ν
) (log
P
(
n
) + 1)
δp
(
ν
)
−
ˆ
ν
max
ν
0
d
ν h
(
ν
)
δp
(
ν
) +
λ
ˆ
ν
max
ν
0
d
ν δp
(
ν
) = 0 (S2.20)
which leads to
−
+
∞
X
n
=0
p
(
n
|
ν
) (log
P
(
n
) + 1)
−
h
(
ν
) +
λ
= 0
.
(S2.21)
Absorbing
−
P
+
∞
n
=0
p
(
n
|
ν
) =
−
1 into
λ
, multiplying by
p
(
ν
), and integrating over
ν
, we have
I
+
λ
= 0. As
a result,
I
=
−
λ
=
−
X
n
p
(
n
|
ν
) log
P
(
n
)
−
h
(
ν
) =
X
n
p
(
n
|
ν
) log
p
(
n
|
ν
)
P
(
n
)
=
i
(
ν
)
,
for
ν
∈
[
ν
0
,ν
max
]
(S2.22)
which means the density of mutual information
i
(
ν
) is a constant for all firing rates
ν
. One can still define the
density function with stimulus
s
as Eq. S2.13 and the
i
s
(
s
) is still a constant when the mutual information
is optimized.
In summary, we have shown that the density of mutual information
i
(
ν
) is a constant for all possible
firing rates, independent of whether the activation function is discrete or continuous. We note that this
result has also been proven in previous work using a different approach based on the convexity of mutual
information [2, 3].
3 The optimal activation functions of a population of neurons are
discrete
To prove that the optimal activation functions are discrete, we first need to prove that when the mutual
information of a population of
N
neurons is maximized, the density of mutual information
e
i
(
ν
1
) that we
defined in the main text is a constant and equals to the maximal mutual information
I
max
N
(Eq. 35). Con-
sistent with the main text, we denote
p
(
n
i
|
ν
i
) by
L
(
n
i
,ν
i
T
) from now on.
4
According to the definition in the main text (Eq. 34), we have
e
i
(
ν
1
) =
X
n
1
p
(
n
1
|
ν
1
) log
p
(
n
1
|
ν
1
)
P
(
n
)
+
p
(
n
1
= 0
|
ν
1
)
I
max
N
−
1
=
X
n
1
L
(
n
1
,ν
1
T
) log
L
(
n
1
,ν
1
T
)
P
(
n
)
+
L
(0
,ν
1
T
)
I
max
N
−
1
(S3.1)
and we can see that when neurons 2
,...,N
are all optimized,
ˆ
d
ν
1
i
(
ν
1
)
p
(
ν
1
) =
I
N
=
I
(
F
1
) +
P
1
(0)
I
max
N
−
1
.
(S3.2)
Similar as in the last section, we define
e
I
N
=
I
N
+
λ