Supplementary Information
Neural networks solve predictive coding by performing maximum likelihood estimation
We can express the model distribution
?
(
>
C
|
>
<
C
)
as
?
(
>
C
|
>
<
C
)
=
π
?
(
>
C
,
G
C
,
G
<
C
|
>
<
C
)
3G
C
3G
<
C
=
π
?
(
>
C
|
G
C
)
?
(
G
C
|
G
<
C
)
?
(
G
<
C
|
>
<
C
)
3G
C
3G
<
C
=
E
G
⇠
?
(
G
C
|
>
<
C
)
[
?
(
>
C
|
G
C
)
]
Performing maximum likelihood estimation,
arg max
E
>
⇠
?
data
(
>
)
?
(
>
1
,...,
>
)
)
=
arg max
)
’
C
=
1
E
>
⇠
?
data
(
>
)
⇥
E
G
C
⇠
?
(
G
C
|
>
C
)
?
(
>
C
|
G
C
)
⇤
As
log
is a monotonic, increasing function, we can take perform maximum
log-likelihood
estimation,
arg max
E
>
⇠
?
data
(
>
)
?
(
>
1
,...,
>
)
)
=
arg max
E
>
⇠
?
data
(
>
)
log
?
(
>
1
,...,
>
)
)
=
arg max
)
’
C
=
1
E
>
⇠
?
data
(
>
)
E
G
C
⇠
?
(
G
C
|
>
C
)
⇥
log
?
(
>
C
|
G
C
)
⇤
Predictive coding, which is solving for
?
(
>
C
|
G
C
)
and
?
(
G
C
|
>
<
C
)
, is equivalent to estimating the data-generating
distribution
?
(
>
1
,...,
>
)
)
.
Suppose that the agent’s path and observations are deterministic. First, the agent’s next position given its past
positions
G
C
=