of 13
Supplemental Material: Tensor Network Computations That Capture Strict
Variationality, Volume Law Behavior, and the Efficient Representation of Neural
Network States
Wen-Yuan Liu,
1,
Si-Jing Du,
2,
Ruojing Peng,
1
Johnnie Gray,
1
and Garnet Kin-Lic Chan
1,
1
Division of Chemistry and Chemical Engineering,
California Institute of Technology, Pasadena, California 91125, USA
2
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, California 91125, USA
S-1. MONTE CARLO SIMULATIONS WITH PEPS
A. Variational Monte Carlo
Here we review variational Monte Carlo with some specific comments on its implementation with PEPS. There are
standard works on this in the literature, but we follow here the presentation in Refs. [1, 2].
In VMC, expectation values are computed by importance sampling of configurations
|
n
=
|
n
1
n
2
···
n
L
. For
example, the total energy reads:
E
tot
=
Ψ
|
H
|
Ψ
Ψ
|
Ψ
=
1
Ψ
|
Ψ
X
n
,
n
Ψ
|
n
⟩⟨
n
|
H
|
n
⟩⟨
n
|
Ψ
=
X
n
|⟨
Ψ
|
n
⟩|
2
Ψ
|
Ψ
X
n
n
|
Ψ
n
|
Ψ
n
|
H
|
n
=
X
n
p
(
n
)
E
loc
(
n
)
,
(1)
where
p
(
n
) =
|⟨
Ψ
|
n
⟩|
2
/
Ψ
|
Ψ
is the probability of the configuration
|
n
, and
E
loc
(
n
) is the local energy defined as
E
loc
(
n
) =
X
n
n
|
Ψ
n
|
Ψ
n
|
H
|
n
=
X
ij
X
n
n
|
Ψ
n
|
Ψ
n
|
H
ij
|
n
.
(2)
Here
n
|
Ψ
is the amplitude of the configuration
|
n
. For simplicity we assume the Hamiltonian
H
comprises nearest-
neighbor two-site interaction terms
H
ij
, i.e.,
H
=
P
ij
H
ij
. Thus in the summation of Eq.(2) for
|
n
, we only need
to consider nonzero Hamiltonian elements. Under this assumption about the Hamiltonian, the states
|
n
and
|
n
can
differ only at two sites.
The sampling in Eq.(1) is performed using the standard Markov Chain Monte Carlo procedure. Starting from
the current configuration
|
n
0
, a trial configuration
|
n
1
is generated. Following the Metropolis algorithm, a trial
configuration
|
n
1
is then accepted as a new configuration for the Markov Chain if a randomly chosen number from
the uniform distribution in the interval [0,1) is smaller than the ratio
p
(
n
1
)
/p
(
n
0
). Otherwise, the trial configuration
|
n
1
is rejected, and another trial configuration
|
n
1
is generated.
The application of VMC to PEPS, either as a tensor network state or as a TNF, follows the above form. We
discuss the definition of the amplitudes in the TNS and TNF in more detail in the following section. To generate
configurations, we use a move which flips the spins/occupations of nearest neighbour pairs. We use a schedule of
moves where we sequentially flip all allowed nearest-neighbor spin pairs in the sample subspace [2]. This defines a
Monte Carlo sweep. Physical quantities are measured after each Monte Carlo sweep to minimize autocorrelation, and
both the energies and energy gradients are evaluated through this Monte Carlo sampling approach [1, 2].
We optimize the local PEPS tensors using the simple update imaginary time evolution method. This provides states
close to the ground state [3], but does not produce the variationally optimal PEPS. Subsequently, we employ gradient
methods, including the stochastic gradient descent method [1, 2] and stochastic reconfiguration optimization [4–6], to
further minimize the energy. The isometries are not directly optimized, but are obtained from the SVD of contractions
of the local tensors (see next section).
B. Definition of amplitudes in PEPS and PEPS-TNF
The key difference between the standard view of a PEPS and the TNF view, is that in the standard view, contracting
the PEPS is necessarily approximate for a general graph, i.e. it must have some error for any reasonable computation.
This means that one can choose to approximately contract the PEPS in different ways (i.e. insert isometries in
different ways), so long as the approximation error is tolerable. Consequently, in the standard use of PEPS with
2
Figure 1: In a TN-VMC calculation, to reduce computational cost, the amplitude Ψ(
n
c
) for a configuration
|
n
c
can
be computed using different intermediates, depending on the position in the Markov chain. (a) The tensor networks
for Ψ(
n
a
) and Ψ(
n
c
) have two different tensors due to the differences between the configurations
|
n
a
=
|···
n
i
n
i
+1
···
n
p
n
p
+1
···⟩
and
|
n
c
=
|···
n
i
n
i
+1
···
n
p
n
p
+1
···⟩
. The amplitude Ψ(
n
c
) is obtained via
intermediates
U
2
|
and
|
D
4
generated in computing Ψ(
n
a
)
≈⟨
U
2
|
ˆ
O
3
|
D
4
, and we compute Ψ(
n
c
)
≈⟨
U
2
|
ˆ
O
3
|
D
4
.
(b) The tensor networks for Ψ(
n
b
) and Ψ(
n
c
) have two different tensors. Ψ(
n
c
) is obtained via intermediates
U
3
|
and
|
D
5
generated in computing Ψ(
n
b
)
≈⟨
U
3
|
ˆ
O
4
|
D
5
, and it has Ψ(
n
c
)
≈⟨
U
3
|
ˆ
O
4
D
5
|⟩
. The intermediates
U
2
|
,
|
D
4
,
U
3
|
and
|
D
5
are obtained by the standard boundary-MPS contraction method (see next section).
VMC, one is free to choose different contraction schemes depending on where one is in the Markov chain in order to
reuse as many intermediates possible and reduce the computational cost, so long as the error is comparable for each
contraction scheme. On the other hand, in a TNF, the position of the isometries is part of the definition of the exact
TNF, so one is not free to change them, otherwise one is changing the type of TNF one is handling.
As an example of reusing intermediates in standard TN-VMC, we show in Fig. 1(a) the evaluation of the amplitude
Ψ(
n
c
) where the configuration
|
n
c
is obtained from a Monte Carlo move from
|
n
a
.
|
n
a
=
|···
n
i
n
i
+1
···
n
p
n
p
+1
···⟩
and differs from
|
n
c
=
|···
n
i
n
i
+1
···
n
p
n
p
+1
···⟩
only at the sites
i
and
i
+ 1, so the computation of the amplitude
Ψ(
n
c
) can reuse much of the computation involved in computing Ψ(
n
a
). We can arrange for this reuse by building
the intermediates involved by approximately contracting tensors from the top and bottom respectively. Fig. 1(b)
shows another example of this reuse in action. Here we consider the same
|
n
c
, but this time it is reached by a
Monte Carlo move from a prior configuration
|
n
b
, which differs at sites
p
and
p
+ 1 from
|
n
c
. Consequently, we
compute Ψ(
n
c
) using different intermediates obtained by contracting the tensors in a different order. The use of
different intermediates in Fig. 1(a) and Fig. 1(b) mean that the locations of the isometries involved in defining the
same amplitude Ψ(
n
c
) is different in the two cases.
The specific choice of isometries used in the calculations for Fig. 2 and Fig. 3 in the main text (associated with
PEPS on a square lattice) are shown in Fig. 2. These isometries and their locations follow the SVD compression steps
of the standard boundary-MPS contraction method. On the left hand side, we show the isometries used in TN-VMC
where different isometries are used at different points in the Markov chain as discussed above (corresponding to the
blue site spins being flipped and the orange site spins being flipped). On the right hand side we show the isometries
used to define the TNF. Here the positions are kept fixed for every amplitude, and the values of the isometries are
obtained in the boundary-MPS contraction and compression steps from the top and the bottom up to the middle
layer of the tensor network, regardless of which Markov chain step/configuration we are considering.
S-2. 1D KICKED ISING MODEL DYNAMICS
A. Definition of the model
Here we give the detailed definition of the 1D kicked Ising model and its corresponding unitary Floquet dynamics [7–
9] discussed in the main article. Starting from a product state
|
0
L
in an
L
-site spin chain, we consider the time
3
Figure 2: (a) The boundary-MPS method is used, and a two-row tensor network (contracting from bottom up) with
bond dimension
D
is compressed into an MPS with bond dimension
χ
, which can be realized by inserting isometries
obtained from SVD. Similarly, a three-row tensor network (contracting from bottom up) is first compressed into a
two-row tensor network and then further compressed into an MPS, which can be represented as a tensor network
contraction by inserting two layers of isometries obtained by SVDs. Note here “
” indicates approximate tensor
network contraction controlled by
χ
, and it would be rigorously “=” in the limit
χ
→∞
. (b) Given the PEPS, in a
TN-VMC simulation, for a given configuration
|
n
c
, the amplitude is evaluated by different computational graphs,
depending on its prior configurations such as
|
n
a
and
|
n
b
. (c) Given the PEPS, in a TNF-VMC simulation, for a
configuration
|
n
c
, its amplitude is always evaluated by a fixed computational graph.
evolution of the kicked Ising Hamiltonian
H
(
t
) =
H
I
+
H
K
P
m
=
−∞
δ
(
t
m
), where
δ
(
t
) is the Dirac
δ
function, with
H
I
=
J
L
1
X
j
=1
σ
z
j
σ
z
j
+1
h
L
X
j
=1
σ
z
j
,
(3)
H
K
=
g
L
X
j
=1
σ
x
j
.
(4)
Setting the time interval between kicks to 1, the unitary Floquet operator generated by the above Hamiltonian reads
as (set ̄
h
= 1):
F
KI
=
e
iH
K
e
iH
I
(5)
=
Y
j
e
igσ
x
j
Y
j
e
ihσ
z
j
Y
j even
e
iJσ
z
j
σ
z
j
+1
Y
j odd
e
iJσ
z
j
σ
z
j
+1
,
(6)
which admits a straightforward quantum circuit representation with only single-qubit and two-qubit gates, as shown
in Fig. 3(a). By combining the unitary gates within each Floquet time step into an MPO (where we use singular value
decomposition to split the two-qubit gates into local MPOs and contract tensors on each site), we can represent the
Floquet evolution as a (1 + 1)D tensor network as illustrated in Fig. 3(b).
4
...
...
...
...
...
...
...
...
...
...
t
=1
t
=2
t
=15
ۧ
|
Ψ
(
0
)
=
ۧ
|
0
ۧ
|
Ψ
(
)
=
ۧ
|
Ψ
(
0
)
(a)
t
=0
t
=1
t
=2
t
=3
t
=4
t
=12
t
=13
t
=14
t
=15
ۧ
|
Ψ
(
0
)
=
ۧ
|
0
ۧ
|
Ψ
(
)
=
ۧ
|
Ψ
(
0
)
TNF contraction direction
(b)
-
site spin chain:
Bulk subsystem:
B
Half
-
chain subsystem:
A
(c)
Figure 3: (a) The quantum circuit representation of the kicked Ising Floquet dynamics. (b) The (1 + 1)D tensor
network for Floquet dynamics, with contraction order either along the inverse-time direction for tensor network
function or time-evolution direction for standard MPS-MPO compression. (c) The 1D spin chain and different
subsystems to compute the entanglement entropy.
B. Numerical results on the 1D kicked Ising model dynamics
1. Choice of model parameters
To study the quantum (pure) state entanglement in the kicked Ising model dynamics, for each Floquet time step
t
, we
compute the half-chain bipartite von Neumann entropy
S
(
t
) =
Tr (
ρ
A
log
ρ
A
) of the state
|
Ψ(
t
)
=
F
t
|
0
L
(where
A
indicates the subsystem of the half-chain (the second row in Fig. 3(c))) and record its value along the Floquet evolution.
The reduced density matrix is obtained as the partial trace of the whole density matrix
ρ
=
P
ij
n
i
|
Ψ
⟩⟨
Ψ
|
n
j
⟩|
n
i
⟩⟨
n
j
|
,
which is constructed by enumerating all amplitudes
n
i
|
Ψ
(assuming normalization) in the
σ
z
basis for the system
sizes considered in this work. We choose two representative sets of parameters (
J,g,h
) = (
π/
4
,π/
4
,
0
.
5) and (
J,g,h
) =
(0
.
7
,
0
.
5
,
0
.
5). Both parameter sets correspond to non-integrable dynamics but have different entanglement growth
speeds. For (
J,g,h
) = (
π/
4
,π/
4
,
0
.
5), the kicked Ising Floquet dynamics is maximally chaotic where the half-chain
entanglement entropy grows at a maximal speed and linearly in time [10]. For (
J,g,h
) = (0
.
7
,
0
.
5
,
0
.
5), the dynamics
is less chaotic and has a slower half-chain entanglement growth speed. Results of the exact entanglement dynamics
for both parameter sets are shown in Fig.4 in the main article as well as in Fig. 7. Here we point out that the choice
of the parameter values for the less chaotic case is not fine-tuned, but rather an arbitrary choice that is away from the
maximally chaotic regime. We expect similar conclusions (about tensor network function) to hold for other general
choices of parameters in the 1D kicked Ising dynamics away from the maximally chaotic regime.
2. Spatial entanglement structure of the evolved states
To demonstrate the spatial entanglement structure of the evolved states along the kicked Ising dynamics, we
consider a bulk subsystem
B
in the middle of a spin chain (the third row in Fig.3(c)) of size
L
= 14, and compute the
entanglement entropy
S
B
(
L
B
) of its reduced density matrix
ρ
B
as a function of the subsystem size
L
B
for Floquet
time up to
t
= 20. For the maximally chaotic case (
J,g,h
) = (
π/
4
,π/
4
,
0
.
5), after a very short evolution time, the
state starts to display volume-law physics where the bulk subsystem entanglement entropy grows linearly with the
subsystem size
L
B
, as shown in Fig. 4(a). For the less chaotic case (
J,g,h
) = (0
.
7
,
0
.
5
,
0
.
5), many early-evolved states
obey an entanglement entropy’s area law where the bulk subsystem entropy saturates as the subsystem size increases,
as shown in Fig. 4(b).
In the maximally chaotic regime, an accurate conventional MPS simulation requires a bond dimension
χ
that grows
exponentially with the system size, even for a short-time simulation. In the less chaotic case, however, a conventional
MPS with a finite constant bond dimension
χ
can still describe the evolved state in the early-time evolution regime.
5
0
1
2
3
4
5
6
subsystem size
L
B
0
1
2
3
4
5
6
S
B
(
L
B
)
14 qubits,
J
=
g
=
/4,
h
= 0.5
t
= 1
t
= 2
t
= 3
t
= 4
t
= 5
t
= 6
t
= 7
t
= 8
t
= 9
t
= 10
t
= 11
t
= 12
t
= 13
t
= 14
t
= 15
t
= 16
t
= 17
t
= 18
t
= 19
t
= 20
(a)
0
1
2
3
4
5
6
subsystem size
L
B
0
1
2
3
4
5
6
S
B
(
L
B
)
14 qubits,
J
= 0.7,
g
=
h
= 0.5
t
= 1
t
= 2
t
= 3
t
= 4
t
= 5
t
= 6
t
= 7
t
= 8
t
= 9
t
= 10
t
= 11
t
= 12
t
= 13
t
= 14
t
= 15
t
= 16
t
= 17
t
= 18
t
= 19
t
= 20
(b)
Figure 4: Bulk subsystem entanglement entropy scaling with respect to the subsystem size along the kicked Ising
model dynamics, obtained from exact state vector simulation for 20 Floquet steps in a
L
= 14 spin chain. (a)
Maximally chaotic regime (
J
=
g
=
π/
4
,h
= 0
.
5), where evolved states after
t
= 3 already display volume-law
behavior. (b) Less chaotic regime (
J
= 0
.
7
,g
=
h
= 0
.
5), where many of the evolved states in early steps display
area-law entanglement entropy behavior.
3. Half-chain entanglement spectrum
In Fig. 5 we present the entanglement dynamics calculation for a less chaotic regime of the kicked Ising model
(
J,g,h
) = (0
.
7
,
0
.
5
,
0
.
5), where the entanglement entropy initially grows sublinearly up until
t
= 7. The TNF for
any
χ
captures the initial growth and more entanglement than the conventional MPS of the same bond dimension.
However, at small
χ
the TNF entropy saturates at too small a value relative to the exact result. We note that the
exact entanglement entropy grows very slowly at long times, as shown in Fig. 5 for
L
= 10 and
L
= 14 chains. This
suggests that the distinct TNF behavior, compared to the maximally chaotic case, is reasonable.
To further analyze the entanglement structure captured by tensor network function, we compute the entanglement
spectrum of the half-chain reduced density matrix at Floquet step
t
= 15 for the
L
= 10 and
L
= 14 chains, comparing
the results of the conventional MPS and tensor network function. More specifically, we evolve a product state
|
0
L
under the kicked Ising dynamics (Eq. 6) for
t
= 15 steps, then compute the eigenvalues
λ
2
α
of the half-chain reduced
density matrix
ρ
A
=
P
α
λ
2
α
|
α
⟩⟨
α
|
of the final state using the conventional MPS and tensor network function. Exact
state vector simulation is also performed to provide benchmark. Results are shown in Fig. 6. The tensor network
function is contracted transversely along the spatial direction. Figs. 6(c) is a clearer version of the inset of Figs. 4(b)
in the main article. As shown, the tensor network function with small isometry bond dimension
χ
can already capture
the broad features of the spectrum, while the conventional MPS with bond dimension
χ
exhibits a sharp cutoff at the
χ
-th largest eigenvalue.
4. Entanglement dynamics from inverse-time contraction
The tensor network function formalism allows for very flexible ways to define the amplitude from the tensor network
graph, depending on how one inserts the isometries. Each choice of isometry insertion yields a different tensor
network function. In the numerical data shown in the main article, we employed a transverse contraction along
the spatial direction for the calculation of amplitudes. Here we perform the same half-chain entanglement dynamics
calculation using a tensor network function corresponding to contraction along the inverse-time direction, as illustrated
in Fig. 3(b).
In Fig. 7, we show the numerical results for the entanglement entropy dynamics of the 1D kicked Ising model for
two different parameter sets for
L
= 10 and
L
= 14 spin chains, where the tensor network functions are contracted
along the inverse-time direction and the conventional MPS results are also shown for comparison. In addition, we
6
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
t
0
1
2
3
4
5
S(t)
10 qubits,
J
=0.7,
g = h
=0.5
MPS, =2
MPS, =4
MPS, =8
TNF transverse, =2
TNF transverse, =4
TNF transverse, =8
exact
(a)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
t
0
1
2
3
4
5
6
7
S(t)
14 qubits,
J
=0.7,
g = h
=0.5
MPS, =2
MPS, =4
MPS, =8
MPS, =32
TNF transverse, =2
TNF transverse, =4
TNF transverse, =8
exact
(b)
Figure 5: Entanglement dynamics of the kicked Ising chain in the less chaotic regime for Floquet time up to
t
= 90
for (a)
L
= 10 chain and (b)
L
= 14 chain. Transverse contraction is used for the TN function, and conventional
MPS-MPO contraction along the time direction is used for the MPS results.
show results from an inverse-time MPO-MPO contraction (i.e Heisenberg-style evolution) for comparison. In the
latter case, the isometries are not amplitude dependent, as the amplitude is evaluated only after the entire tensor
network is contracted (in the inverse time direction) into an MPS.
One can see that with the same
χ
, the tensor network function contracted in the inverse direction captures more
entanglement than the conventional MPS, as well as the Heisenberg-style MPS from MPO-MPO contraction. However,
compared to the transverse contraction, inverse-time contraction has a lower accuracy at the early steps of the Floquet
7
5
1
0
1
5
2
0
2
5
3
0
3
5
4
0
1
1
0
-
7
1
0
-
6
1
0
-
5
1
0
-
4
1
0
-
3
1
0
-
2
1
0
-
1
1
0
0
λ
2
α

e
x
a
c
t
M
P
S
χ
=
2
M
P
S
χ
=
4
M
P
S
χ
=
8
T
N
F
χ
=
2
T
N
F
χ
=
4
T
N
F
χ
=
8
1
0
q
u
b
i
t
s
,
J
=
g
=
π
/
4
,
h
=
0
.
5
t
=
1
5
(a)
5
1
0
1
5
2
0
2
5
3
0
3
5
4
0
1
1
0
-
6
1
0
-
5
1
0
-
4
1
0
-
3
1
0
-
2
1
0
-
1
1
0
0
λ
2
α

e
x
a
c
t
M
P
S
χ
=
2
M
P
S
χ
=
4
M
P
S
χ
=
8
T
N
F
χ
=
2
T
N
F
χ
=
4
T
N
F
χ
=
8
1
0
q
u
b
i
t
s
,
J
=
0
.
7
,
g
=
h
=
0
.
5
t
=
1
5
(b)
0
5
1
0
1
5
2
0
2
5
3
0
3
5
4
0
1
1
0
-
3
1
0
-
2
1
0
-
1
1
0
0
1
4
q
u
b
i
t
s
,
J
=
g
=
π
/
4
,
h
=
0
.
5
λ
2
α

e
x
a
c
t
M
P
S
χ
=
2
M
P
S
χ
=
4
M
P
S
χ
=
8
T
N
F
χ
=
2
T
N
F
χ
=
4
T
N
F
χ
=
8
t
=
1
5
(c)
5
1
0
1
5
2
0
2
5
3
0
3
5
4
0
1
1
0
-
4
1
0
-
3
1
0
-
2
1
0
-
1
1
0
0
1
4
q
u
b
i
t
s
,
J
=
0
.
7
,
g
=
h
=
0
.
5
λ
2
α

e
x
a
c
t
M
P
S
χ
=
2
M
P
S
χ
=
4
M
P
S
χ
=
8
T
N
F
χ
=
2
T
N
F
χ
=
4
T
N
F
χ
=
8
t
=
1
5
(d)
Figure 6: Half-chain entanglement spectrum for
L
= 10 (a)-(b) and
L
= 14 (c)-(d) kicked Ising chains at Floquet
step
t
= 15.
λ
2
α
is the
α
-th largest eigenvalue of the half-chain reduced density matrix, and 40 largest eigenvalues are
shown here. (a) and (c): Maximally chaotic regime (
J
=
g
=
π/
4
, h
= 0
.
5); (b) and (d): Less chaotic regime
(
J
= 0
.
7
,g
= 0
.
5
,h
= 0
.
5). Transverse contraction is used for the TN function, and conventional MPS-MPO
contraction along the time direction is used for the MPS results.
dynamics, showing a convergence of the entanglement entropy ahead of the true saturation time for the maximally
chaotic kicked Ising dynamics.
S-3. ADDITIONAL DISCUSSION OF THE TENSOR NETWORK FUNCTION REPRESENTATION OF
NEURAL NETWORK COMPUTATIONAL GRAPHS
A. Classical binary arithmetic circuits
In our first construction of the tensor network function representation of a feed forward neural network, we use
classical binary arithmetic using logic and copy gates. Here we describe in more detail this binary arithmetic and
the building blocks (in terms of the tensors) of polynomial functions. (We can use polynomial functions to efficiently
represent activation functions in a neural network under mild conditions). Note that we have chosen the construction