of 9
Supplemental Material for “Information Scrambling in Quantum Neural Networks”
Huitao Shen,
1
Pengfei Zhang,
2
Yi-Zhuang You,
3
and Hui Zhai
2,
1
Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
2
Institute for Advanced Study, Tsinghua University, Beijing, 100084, China
3
Department of Physics, University of California, San Diego, CA 92093, USA
In this supplemental material, we present more results of magnetization learning, staggered magnetization learning, and
winding number learning, along with details of gradient calculation and measurement.
I. MAGNETIZATION LEARNING
In this section, we provide more details of magnetization learning and present an argument on why in magnetization learning,
long string operators should exist in
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
when it is expanded under the basis of product of local Pauli matrices.
A. Learning Task Details
Figure 1 shows the distribution of magnetization
M
α
z
in the training and validation datasets. The magnetization distributions
within the training and validation set are similar. There are roughly equal number of wavefuntions that are “ferromagnetic”
(
|
M
α
z
|≥
0
.
5
) or “paramagnetic” (
|
M
α
z
|
<
0
.
5
).
For the AMSGrad algorithm [1], momentum parameters are always
β
1
= 0
.
9
and
β
2
= 0
.
999
throughout this work. Becasue
the training set is not very big, we use gradient descent instead of stochastic or mini-batch gradient descent. In other words, each
epoch involves only one gradient descent step.
We confirm that validation losses also decrease monotonically when the training proceeds (not shown here), indicating that
the network can learn to compute the magnetization reasonably well without overfitting.
In Fig. 2, we show the training loss and the tripartite information as functions of the training epoch. We plot both the averaged
values over 20 different random initializations and two typical initializations. Both the averaged value and the two training
instances show two-stage training dynamics. In particular, Fig. 3(a) and Fig. 4 in the main text use the same initialization as
Initialization 2 here.
FIG. 1. Distribution of magnetization
M
α
z
in the training and validation sets. The training and validation dataset contains
N
= 2500
and
500
wavefunction–magnetization pairs respectively, sampled from random Hamiltonian ensemble of system size
n
= 9
, where random parameters
are distributed uniformly within
J
ij
/J
[
1
,
0]
,
K
ij
/J
[
1
,
1]
,
g
i
/J
[
6
,
6]
and
h/J
[
0
.
04
,
0
.
04]
.
J
is the energy unit.
hzhai@tsinghua.edu.cn
2
(a)
(b)
FIG. 2. Magnetization learning. (a) Training loss as functions of the training epoch. Different colors represent the average over 20 different
random initializations or typical results from two training instances. The shaded area represents one standard deviation. The network has
n
= 9
qubits and depth
l
= 6
. The learning rate is
λ
= 10
2
. (b) Tripartite information
I
3
(
A,C,D
)
as a function of the training epoch. Here
the input subsystem size
|
C
|
= 5
. The dotted vertical line indicates the boundary between two training stages, which is determined as the local
maximum of the averaged
I
3
.
B. Explicit Construction of Unitary that Learns Magnetization
Generally, it is impossible to find an unitary
ˆ
U
such that
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
=
ˆ
M
z
,
(1)
because the L.H.S. and the R.H.S. of the above equality have different eigenvalues. As a result, we can only expect the above
equality to hold at the level of expectation
ψ
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
ψ
=
ψ
|
ˆ
M
z
|
ψ
,
(2)
within a subset of states
{|
ψ
〉}
that are of interest
1
. In the following, we present an explicit construction of
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
for the
magnetization learning problem when the subset of states are eigenstates of
ˆ
M
z
n
i
=1
σ
z
i
/n
. The purpose of this construction
is to use an explicit example to demonstrate why it is usually necessary to have string operators in
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
for a quantum
neural netowrk (NN) that learns magnetization.
We first elaborate the rationale behind choosing eigenstates of
ˆ
M
z
. For magnetization learning, the dataset consists of ground
states of Hamiltonians (Eq. (2) in the main text) with a small pinning field
h
to trigger spontaneous
Z
2
symmetry breaking in the
finite-size numerical simulation. The spin-spin interaction is also chosen to be nonlocal to ensure that we have sufficient number
of distinct states. To actually probe the physics of
Z
2
symmetry breaking in one dimension, we should take the thermodynamics
limit
n
→∞
while sending
h
0
and fixing the spin-spin interaction range. It is well-known that in such systems, the ordered
ferromagnetic ground state is gapped. Consequently, the quantum fluctuation of
ˆ
M
z
is
(
ˆ
M
z
ˆ
M
z
〉)
2
=
n
i,j
=1
1
n
2
δσ
z
i
δσ
z
j
=
n
i
=1
1
n
δσ
z
i
δσ
z
1
〉∼
1
n
,
(3)
1
Note that the subset is in general not a subspace as linear combinations in general break the equality Eq. (2).
3
because
δσ
z
i
δσ
z
1
decays exponentially with
i
. Therefore, the fluctuation of
ˆ
M
z
is suppressed, and ground states of our random
Hamiltonian can be well approximated by eigenstates of
ˆ
M
z
in the thermodynamic limit.
We are now ready to present our construction. Denote the eigenstates of
ˆ
M
z
as
|
m,i
such that
ˆ
M
z
|
m,i
=
m
|
m,i
. Here
m
[
1
,
1]
is the eigenvalue, which is also the average magnetization.
i
= 1
,...,d
m
represents the state in the degenerate
eigenspace and
d
m
is the degeneracy. The states are orthonormal
m,i
|
m
,i
=
δ
mm
δ
ii
and complete
m
d
m
= 2
n
. Because
of the spin-flip symmetry,
d
m
=
d
m
. In general
d
m
>
1
unless
m
=
±
1
, where all spins are polarized to the same direction.
For degenerate subspaces, note that the choice of
|
m,i
for fixed
m
but different
i
is not unique.
In the following, we construct matrix elements of
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
under
|
m,i
basis such that
m,i
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
m,i
=
m,
(4)
for all
m
and
i
. Consider the two-dimensional subspace spanned by
|
m,i
and
|−
m,i
for all
m
and
i
. Within this subspace, we
set
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
= sin
θσ
x
+ cos
θσ
z
,
(5)
where
θ
= arccos
m
. It is straightforward to verify the constraint Eq. (4) is satisfied and the eigenvalues are
±
1
. Under this
construction, half of
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
’s eigenvalues are
+1
and half are
1
. It is then not hard to see that there must exist some
ˆ
U
such that
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
has the matrix elements under
|
m,i
basis as constructed.
Although the above matrix is constructed explicitly on a particular choice of basis, it is straightforward to verify that the
following basis-independent constraint holds
m
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
m
=
m,
(6)
where
|
m
〉≡
d
m
i
=1
c
i
|
m,i
is any linear combination of eigenstates within the same degenerate eigenspace.
d
m
i
=1
|
c
i
|
2
= 1
.
Because the choice of basis within a degenerate subspace is not unique, our constructions above are not unique either. Nev-
ertheless, generally
|
m,i
and
|−
m,i
are related to each other by a string of local Pauli matrices whose length is of order of
system size
n
. A particular choice is that
|−
m,i
=
n
j
=1
σ
x
i
|
m,i
such that the two states are related by a global spin-flip
operator, which is a string operator of length
n
. Because of Eq. (5), such string operator must exist in
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
.
II. STAGGERED MAGNETIZATION LEARNING
In this section, we present results of staggered magnetization learning task, where empirical correlation between the NN
performance and the tripartite information is also found.
Dataset.
Similar to magnetization learning, the dataset consists of
N
input-target pairs
{
(
|
G
α
,
M
α
z
)
= 1
,...,N
}
, where
the input wavefunction
|
G
α
is the ground state wavefunction of the parent Hamiltonian with random long-ranged spin-spin
interactions:
ˆ
H
=
n
1
i
=1
J
i,i
+1
σ
z
i
σ
z
i
+1
+
n
i,j
=1
K
ij
σ
x
i
σ
x
j
+
n
i
=1
(
g
i
σ
x
i
+
z
i
)
,
(7)
where
J
i,i
+1
,
K
ij
,
g
i
and
h
are all random numbers. Compared with Eq. (2) in the main text, here only nearest-neighbour
σ
z
i
σ
z
i
+1
interactions are included to avoid frustration.
The target is the average staggered magnetization computed as
M
α
z
≡ 〈
G
α
|
ˆ
M
z
|
G
α
, where the staggered magnetization
operator is
ˆ
M
z
n
i
=1
(
1)
n
σ
z
i
/n
. In sampling the random Hamiltonian, we ensure
J
ij
0
such that the ground state
wavefunctions are either “antiferromagnetic” or “paramagnetic” measured under
ˆ
M
z
.
h
is a small pinning field randomly drawn
from a distribution with zero mean, which is used to trigger the spontaneous
Z
2
symmetry breaking in the antiferromagnetic
phase.
Task.
The quantum NN takes the input wavefunction
|
G
α
and applies the unitary transformation
ˆ
U
on it. The staggered
magnetization is readout by measuring
σ
x
of the central qubit. The loss function to be minimized is the absolute error of the
staggered magnetization:
L
=
1
N
N
α
=1
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
〉−
M
α
z
.
(8)
4
FIG. 3. Distribution of staggered magnetization
M
α
z
in the training and validation sets. The training and validation dataset contains
N
= 2500
and
500
wavefunction–staggered magnetization pairs respectively, sampled from random Hamiltonian ensemble of system size
n
= 9
, where
random parameters are distributed uniformly within
J
i,i
+1
/J
[0
,
8]
,
K
ij
/J
[
1
,
1]
,
g
i
/J
[
2
,
2]
and
h/J
[
0
.
04
,
0
.
04]
.
J
is the
energy unit.
In Fig. 3, we show the distribution of staggered magnetization
M
α
in the training and validation datasets. The magnetiza-
tion distributions within the training and validation set are similar. There are roughly equal number of wavefuntions that are
“antiferromagnetic” (
|
M
α
z
|≥
0
.
5
) or “paramagnetic” (
|
M
α
z
|
<
0
.
5
).
Results.
Figure 4 is the training loss and tripartite information during quantum NN training for the staggered magnetization
learning task. We confirm the validation loss is similar to that in the training set.
The figure looks almost identical to Fig. 2, despite that we now have a different dataset. The two-stage training dynamics,
i.e., an early stage with rapid decrease of loss and increase of tripartite information, followed by a later stage with slow decrease
of both loss and tripartite information, can be clearly observed. One can also see the initial rapid linear growth of tripartite
information in Fig. 4 with almost identical slopes for both two training instances and the averaged result.
Finally, we have also tried Hamiltonians similar to Eq. (7) but with longer interaction range such that the ground state is
frustrated. The quantum NN shows similar performance and training dynamics.
III. WINDING NUMBER LEARNING
In this section we present the results of winding number learning task, which again reinforces the generality of two-stage
training dynamics of quantum NNs.
Dataset.
The input data consist of
N
product states of
n
qubits, where each qubit represents a vector on the
xz
plane of
the Bloch sphere. The target is the winding number of these vectors by treating the
n
qubits as vectors on an one-dimensional
Brillouin zone [2]. Formally, the dataset consists of
N
input-target pairs
{
(
|
H
α
,w
α
)
= 1
,...,N
}
, where the input wave-
function
|
H
α
=
n
i
=1
|
ψ
α
(
k
i
)
,
k
i
= 2
π
(
i
1)
/
(
n
1)
, and
ψ
α
(
k
)
is the ground state of the following random two-band
Hamiltonian in one-dimensional Brillouin zone
k
[0
,
2
π
)
with chiral symmetry
σ
y
H
(
k
)
σ
y
=
H
(
k
)
:
H
(
k
) =
h
x
(
k
)
σ
x
+
h
z
(
k
)
σ
z
.
(9)
Here the coefficient
h
μ
(
k
)
,
μ
=
x,z
is represented in terms of Fourier components up to
p
-th harmonic:
h
μ
(
k
) =
p
n
=0
cos(
nk
)
c
μ
n
+
p
n
=1
sin(
nk
)
s
μ
n
,
(10)
where
c
μ
n
and
s
μ
n
are random numbers.
The learning target is the discrete version of winding number:
w
α
=
1
2
π
n
i
=1
Im ln
[
e
i
(
φ
α
(
k
i
)
φ
α
(
k
i
+1
))
]
,
(11)
where
φ
α
(
k
)
is defined as the argument of the following complex number:
e
α
(
k
)
=
h
α
z
(
k
) +
ih
α
x
(
k
)
h
α
z
(
k
)
2
+
h
α
x
(
k
)
2
.
(12)
5
(a)
(b)
FIG. 4. Staggered magnetization learning. (a) Training loss as functions of the training epoch. Different colors represent the average over 20
different random initializations or typical results from two training instances. The shaded area represents one standard deviation. The network
has
n
= 9
qubits and depth
l
= 6
. The learning rate is
λ
= 10
2
. (b) Tripartite information
I
3
(
A,C,D
)
as a function of the training epoch.
Here the input subsystem size
|
C
|
= 5
. The dotted vertical line indicates the boundary between two training stages, which is determined as
the local maximum of the averaged
I
3
.
FIG. 5. Distribution of winding number
w
α
in the training and validation sets.
The branch cut for the logarithm in Eq. (11) is along the negative direction of the
x
axis such that
φ
(
k
)
φ
(
k
)
[
π,π
)
.
Task.
In the following, we set the harmonic cutoff
p
= 1
.
c
μ
n
and
s
μ
n
are sampled from a uniform distribution between
[
1
/
3
,
1
/
3]
for
n
= 0
and
[
1
,
1]
for
n >
0
. We then post-select data with winding number
w
= 0
,
1
and discard those with
w
=
1
. In this way, the task becomes binary classification. The parameters are chosen such that there are roughly equal
number of data with
w
= 0
and
1
, as shown in Fig. 5.
The quantum NN takes the input wavefunction
|
H
α
and applies the unitary transformation
ˆ
U
on it. The probability that the
6
(a)
(b)
FIG. 6. Winding number learning. (a) Training loss (solid, left) and accuracy (dashed, right) as functions of the training epoch. Different
colors represent the average over 20 different random initializations or typical results from two training instances. The shaded area represents
one standard deviation. The network has
n
= 9
qubits and depth
l
= 8
. The training and validation dataset contains
N
= 3000
and
500
wavefunction-winding number pairs respectively, sampled from random wavefunctions defined in the main text. The learning rate is
λ
= 10
2
.
(b) Tripartite information
I
3
(
A,C,D
)
as a function of the training epoch for different initializations. Here the input subsystem size
|
C
|
= 5
.
The dotted vertical line indicates the boundary between two training stages, which is determined as the local maximum of the averaged
I
3
.
w
α
= 1
is readout by measuring
σ
x
of the central qubit:
p
α
=
1 +
H
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
H
α
2
.
(13)
Therefore, the loss function to be minimized is the negative binary cross-entropy:
L
=
1
N
N
α
=1
[
w
α
ln
p
α
(1
w
α
) ln(1
p
α
)]
.
(14)
A more sensible metric is the prediction accuracy. Let the prediction of the winding number be
o
α
(1 + sgn(
p
a
1
/
2))
/
2
.
The prediction accuracy is then
A≡
1
1
N
N
α
=1
|
o
α
w
α
|
.
(15)
Results.
In Fig. 6, we present the training loss and accuracy for the winding number learning task, along with the tripartite
information. We confirm the validation loss and accuracy is similar to that in the training set. The network depth
l
is larger
than that in the magnetization learning as we suspect the winding learning task is more difficult. However, using a shallower
network will not affect the performance significantly. Because of the difficulty of this task, not all initializations can lead to high
accuracies after 400 epochs. In computing the average, we post-select 20 different initializations with smallest training losses
out of 50 initializations.
First, the quantum NN manages to learn distinguish wavefunctions with winding number
w
= 0
and
1
, as the final accuracy is
more than 90%. Second, the trend of the loss function and the tripartite information is similar to that in (staggered) magnetization
learning: At the early stage of the training, the loss decreases rapidly and the tripartite information increases. In the later stage,
the tripartite information decreases again. The trend is robust when different initializations are averaged. However, we note
the tripartite information is slightly more volatile in the later stage than that in the (staggered) magnetization learning, which is
7
FIG. 7. Schematic of a quantum circuit with brick-wall geometry. Here the network has
n
= 5
qubits and depth
l
= 4
. All these two-qubit
gates form a giant unitary transformation
ˆ
U
. The
i
-th two-qubit gate in the
d
-th layer is denoted as
ˆ
U
d
i
reflected by a second local maximum of the averaged
I
3
around 350 epochs in Fig. 6. Because this behavior does not appear in
other tasks, we believe it is not as universal and leave the in-depth understanding of this behavior for future research.
Compared with the (staggered) magnetization task, the input wavefunction here is a product state and is essentially classical,
and the target is now a binary label instead of a real number. Despite the very different nature of this task, the empirical
correlation between the NN performance and the tripartite information still holds. This suggests the generality of the two-stage
training dynamics of quantum NNs.
IV. GRADIENTS IN QUANTUM NNS
In this section, we report the method of computing gradients of quantum NNs in this work.
A. In Classical Simulations
A schematic of the quantum NN with
n
= 5
qubits and depth
l
= 4
is shown in Fig. 7. The
i
-th two-qubit gate in the
d
-th
layer is denoted as
ˆ
U
d
i
. Assuming
n
is odd, here
i
= 1
,
2
...
(
n
1)
/
2
. It follows the giant unitary
ˆ
U
is the composition of
ˆ
U
d
i
:
ˆ
U
=
(
n
1)
/
2
i
=1
ˆ
U
l
i
...
(
n
1)
/
2
i
=1
ˆ
U
2
i
(
n
1)
/
2
i
=1
ˆ
U
1
i
l
d
=1
(
n
1)
/
2
i
=1
ˆ
U
d
i
.
(16)
The order of unitaries within a layer does not matter because these unitaries are applied on non-overlapping qubits.
In general, each two-qubit gate
ˆ
U
d
i
is a
4
×
4
matrix in the
SU(4)
group and can be parametrized by 15 parameters. However,
as explained in the main text, in this work we restrict
ˆ
U
d
i
to
SO(4)
with 6 Euler angles: Generally, a matrix in
SO(4)
can be
parametrized by a vector
θ
with 6 components [3]:
ˆ
U
SO(4)
=
O
34
(
θ
1
)
O
23
(
θ
2
)
O
12
(
θ
3
)
O
34
(
θ
4
)
O
23
(
θ
5
)
O
34
(
θ
6
)
.
(17)
Here
O
ij
(
θ
)
exp(
θJ
ij
)
is a rotation in the
ij
plane:
J
ij
an antisymmetric matrix with
ij
(
ji
)
element equal to
1 (
1)
and all
other elements zero. As a result there are
l
(
n
1)
/
2
independent vectors
θ
d
i
and thus
6
l
(
n
1)
/
2
independent parameters in
total to fully describe the quantum NN.
To be concrete, in the following, we use magnetization learning as the example. The staggered magnetization learning and
winding number learning are similar. The loss function in magnetization learning is
L
=
1
N
N
α
=1
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
〉−
M
α
z
.
(18)
The gradient of
L
with respect to
θ
d
j,a
,
a
= 1
,...,
6
is
L
∂θ
d
j,a
=
1
N
N
α
=1
sgn
(
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
〉−
M
α
z
)
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
∂θ
d
j,a
.
(19)
8
The gradient of the network output can be further simplified as
∂θ
d
j,a
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
=
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
∂θ
d
j,a
|
G
α
+ h
.
c
.
=
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
∂θ
d
j,a
l
d
=1
(
n
1)
/
2
i
=1
ˆ
U
d
i
|
G
α
+ h
.
c
.
=
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
(
n
1)
/
2
i
=1
ˆ
U
l
i
...
(
ˆ
U
d
1
ˆ
U
d
2
...
ˆ
U
d
j
∂θ
d
j,a
...
ˆ
U
d
n
1
2
)
...
(
n
1)
/
2
i
=1
ˆ
U
1
i
|
G
α
+ h
.
c
.,
(20)
where,
ˆ
U
d
j
/∂θ
d
j,a
can be further simplified using Eq. (17). For example,
ˆ
U
d
j
∂θ
d
j,
4
=
O
34
(
θ
d
j,
1
)
O
23
(
θ
d
j,
2
)
O
12
(
θ
d
j,
3
)
J
34
O
34
(
θ
d
j,
4
)
O
23
(
θ
d
j,
5
)
O
34
(
θ
d
j,
6
)
.
(21)
Gradients with respect to other components
a
can be computed in the similar way by adding an additional corresponding
J
matrices.
In this work, we directly compute the gradient according to Eqs. (19), (20) and (21) in the classical simulation.
B. In Real Quantum NNs
In a real quantum NN, this gradient could instead be determined through the measurement of the following Hermitian operator:
ˆ
g
d
j,a
=
σ
x
(
n
+1)
/
2
ˆ
U
∂θ
d
j,a
ˆ
U
+ h
.
c
..
(22)
It is straightforward to see that
G
α
|
ˆ
U
ˆ
g
d
j,a
U
|
G
α
=
∂θ
d
j,a
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
.
However, this operator is generally non-local and is hard to measure.
Alternatively, one could perform the following three measurements [4, 5]:
1. Measure the output of the quantum NN normally with the original parameter
θ
d
i
. The result is denoted as
o
1
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
;
2. Measure the output of the quantum NN with
θ
d
j,a
replaced by
θ
d
j,a
+
π/
4
. The result is denoted as
o
2
;
3. Measure the output of the quantum NN with
θ
d
j,a
replaced by
θ
d
j,a
+
π/
2
. The result is denoted as
o
3
.
It follows the desired gradient is
∂θ
d
j,a
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
= 2
o
2
o
1
o
3
.
(23)
The reason is that if we focus on some specific
θ
d
j,a
, we have
o
1
=
...O
p,p
+1
(
θ
d
j,a
)
...O
p,p
+1
(
θ
d
j,a
)
...
,
(24)
o
2
=
...O
p,p
+1
(
θ
d
j,a
+
π/
4)
...O
p,p
+1
(
θ
d
j,a
+
π/
4)
...
=
...
[
(1 +
J
p,p
+1
)
O
p,p
+1
(
θ
d
j,a
)
]
...
(1 +
J
p,p
+1
)
O
p,p
+1
(
θ
d
j,a
)
...
/
2
,
(25)
o
3
=
...O
p,p
+1
(
θ
d
j,a
+
π/
2)
...O
p,p
+1
(
θ
d
j,a
+
π/
2)
...
=
...
[
J
p,p
+1
O
p,p
+1
(
θ
d
j,a
)
]
...J
p,p
+1
O
p,p
+1
(
θ
d
j,a
)
...
.
(26)
9
Here
p
(
p
+ 1)
is the rotation plane associated with
a
. As a result:
2
o
2
o
1
o
3
=
...O
p,p
+1
(
θ
d
j,a
)
...J
p,p
+1
O
p,p
+1
(
θ
d
j,a
)
...
+ h
.
c
.
=
∂θ
d
j,a
G
α
|
ˆ
U
σ
x
(
n
+1)
/
2
ˆ
U
|
G
α
.
(27)
The above method can be easily generalized to
SU(4)
as well.
[1] Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar, “On the Convergence of Adam and Beyond,” in
International Conference on Learning
Representations
(2018).
[2] Pengfei Zhang, Huitao Shen, and Hui Zhai, “Machine Learning Topological Invariants with Neural Networks,” Phys. Rev. Lett.
120
,
066401 (2018).
[3] P Dita, “Factorization of unitary matrices,” J. Phys. A
36
, 2781–2789 (2003).
[4] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantum circuit learning,” Phys. Rev. A
98
, 032309 (2018).
[5] Maria Schuld, Alex Bocharov, Krysta M. Svore, and Nathan Wiebe, “Circuit-centric quantum classifiers,” Phys. Rev. A
101
, 032308
(2020).