Mixed-state entanglement from local randomized measurements
–
Supplemental Material
Andreas Elben,
1, 2,
∗
Richard Kueng,
3,
∗
Hsin-Yuan (Robert) Huang,
4, 5
Rick van
Bijnen,
1, 2
Christian Kokail,
1, 2
Marcello Dalmonte,
6, 7
Pasquale Calabrese,
6, 7, 8
Barbara Kraus,
9
John Preskill,
4, 5, 10, 11
Peter Zoller,
1, 2
and Benoˆıt Vermersch
1, 2, 12
1
Center for Quantum Physics, University of Innsbruck, Innsbruck A-6020, Austria
2
Institute for Quantum Optics and Quantum Information of the Austrian Academy of Sciences, Innsbruck A-6020, Austria
3
Institute for Integrated Circuits, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria
4
Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA
5
Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA
6
The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy
7
SISSA, via Bonomea 265, 34136 Trieste, Italy
8
INFN, via Bonomea 265, 34136 Trieste, Italy
9
Institute for Theoretical Physics, University of Innsbruck, A–6020 Innsbruck, Austria
10
Walter Burke Institute for Theoretical Physics, Caltech, Pasadena, CA, USA
11
AWS Center for Quantum Computing, Pasadena, CA, USA
12
Univ. Grenoble Alpes, CNRS, LPMMC, 38000 Grenoble, France
Appendix A: The
p
3
-PPT condition
In this section we present, prove and discuss the
p
3
-
PPT condition. The
p
3
-PPT condition is the contraposi-
tive of the following statement about moments of positive
semidefinite matrices with unit trace.
Proposition 1.
For every positive semidefinite matrix
X
with unit trace (
Tr(
X
) = 1
) it holds that
tr(
X
2
)
2
≤
tr(
X
3
)
.
(A1)
Note that Eq. (A1) resembles the following well-known
monotonicity relation among R ́enyi entropies (see e.g.,
Ref. [1]):
S
3
(
ρ
)
≤
S
2
(
ρ
) for
S
n
(
ρ
) =
1
1
−
n
log
2
(
tr(
ρ
n
)
)
.
(A2)
However, this relation only applies to density matrices,
i.e. positive semidefinite matrices with unit trace. The
p
3
-PPT condition, in contrast, is designed to test the ab-
sence of positive semidefiniteness. Hence, it is crucial to
have a condition that does not break down if the matrix
in question has negative eigenvalues. Rel. (A1) (and its
direct proof provided in the next subsection) do achieve
this goal, while an argument based on monotonicity rela-
tions between R ́enyi entropies can break down, because
the logarithm of non-positive numbers is not properly
defined.
1. Proof of the
p
3
-PPT condition
Let
X
be a Hermitian
d
×
d
matrix with eigenvalue
decomposition
X
=
∑
d
i
=1
λ
i
|
x
i
〉〈
x
i
|
. For
p
≥
1, we intro-
∗
These authors contributed equally.
duce the Schatten-
p
norms
‖
X
‖
p
=
(
d
∑
i
=1
|
λ
i
|
p
)
1
/p
= Tr (
|
X
|
p
)
1
/p
,
where
|
X
|
=
√
X
2
=
∑
d
i
=1
|
λ
i
||
x
i
〉〈
x
i
|
denotes the
(matrix-valued) absolute value. The Schatten-
p
norms
encompass most widely used matrix norms in quantum
information. Concrete examples are the trace norm (
p
=
1), the Hilbert-Schmidt/Frobenius norm (
p
= 2) and the
operator/spectral norm (
p
=
∞
). Each Schatten-
p
norm
corresponds to the usual vector
`
p
-norm of the vector of
eigenvalues
λ
= (
λ
1
,...,λ
d
)
T
∈
R
d
:
‖
λ
‖
`
p
=
(
d
∑
i
=1
|
λ
i
|
p
)
1
/p
for
p
≥
1.
(A3)
Hence, Schatten-
p
norms inherit many desirable proper-
ties from their vector-norm counterparts. Here, we shall
use vector norm relations to derive a relation among
Schatten-
p
norms. It is based on Hoelder’s inequality
that relates the inner product
〈
v,w
〉
=
d
∑
i
=1
v
i
w
i
for
v,w
∈
R
d
(A4)
to a combination of
`
p
norms.
Fact 1
(Hoelder’s inequality for vector norms)
.
Fix
p,q
≥
1
such that
1
/p
+ 1
/q
= 1
. Then,
|〈
v,w
〉|≤
d
∑
i
=1
|
v
i
w
i
|≤‖
v
‖
`
p
‖
w
‖
`
q
(A5)
for any
v,w
∈
R
d
.
2
The well-known Cauchy-Schwarz inequality is a special
case of this fact. Set
p
=
q
= 1
/
2 to conclude
|〈
v,w
〉|≤‖
v
‖
`
2
‖
w
‖
`
2
=
〈
v,v
〉
1
/
2
〈
w,w
〉
1
/
2
.
(A6)
At the heart of our proof for the
p
3
-PPT condition is a
simple relation between Schatten-
p
norms of orders
p
=
1
,
2
,
3.
Lemma 1.
The following norm relation holds for every
Hermitian matrix
X
:
‖
X
‖
4
2
≤‖
X
‖
1
‖
X
‖
3
3
Proof.
Let
λ
= (
λ
1
,...,λ
d
)
T
be the
d
-dimensional vec-
tor of eigenvalues of
X
. Apply Hoelder’s inequality with
p
= 3
,q
= 3
/
2 to the inner product of this vector of
eigenvalues with itself:
Tr(
X
2
) =
〈
λ,λ
〉≤‖
λ
‖
`
3
‖
λ
‖
`
2
/
3
=
‖
X
‖
3
‖
λ
‖
`
3
/
2
.
(A7)
Next, we apply Cauchy-Schwarz to the remaining
`
3
/
2
-
norm:
‖
λ
‖
`
3
/
2
=
(
d
∑
i
=1
|
λ
i
|
3
/
2
)
2
/
3
=
(
d
∑
i
=1
|
λ
i
||
λ
i
|
1
/
2
)
2
/
3
≤
(
(
d
∑
i
=1
|
λ
i
|
2
)
1
/
2
(
d
∑
i
=1
|
λ
i
|
2
/
2
)
1
/
2
)
2
/
3
=
‖
λ
‖
2
/
3
`
2
‖
λ
‖
1
/
3
`
1
=
‖
X
‖
2
/
3
2
‖
X
‖
1
/
3
1
.
Inserting this relation into Eq. (A7) reveals
‖
X
‖
2
2
≤‖
X
‖
2
/
3
2
‖
X
‖
1
/
3
1
‖
X
‖
3
which is equivalent to the claim (take the 3rd power and
divide by
‖
X
‖
2
2
).
Proposition 1 is an immediate consequence of Lemma 1
and elementary properties of positive semidefinite ma-
trices. Recall that a Hermitian
d
×
d
matrix is pos-
itive semidefinite (psd) if every eigenvalue is nonnega-
tive. This in turn ensures
|
X
|
=
X
and, by extension,
‖
X
‖
p
= Tr(
X
p
)
1
/p
for all
p
≥
1.
2. Discussion and potential generalizations
The
p
3
-PPT condition tests the absence of positive
semidefiniteness based on moments Tr(
X
p
) of order
p
=
1
,
2
,
3. It is natural to wonder whether higher order mo-
ments allow the construction of more refined tests. It is
possible to show that every positive semidefinite matrix
X
with unit trace must obey
tr(
X
p
−
1
)
p
−
1
≤
tr(
X
p
)
p
−
2
for all
p >
2.
(A8)
As this is a direct extension of the
p
3
-PPT condition
(
p
= 3), we omit the proof. Unfortunately, we found nu-
merically that these direct extensions actually produce
weaker
tests for the absence of positive semidefiniteness,
i.e. there exist matrices
X
that violate the
p
3
-PPT con-
dition but satisfy Rel. (A8) for higher moments
p
≥
4.
This is not completely surprising, since Rel. (A8) com-
pares (powers of) neighboring matrix moments with or-
der (
p
−
1) and
p
. As
p
increases, these matrix mo-
ments suppress contributions of small eigenvalues ever
more strongly. In the case of partially transposed quan-
tum states, the eigenvalues are required to sum up to
one and must be contained in the interval [
−
1
/
2
,
1] [2].
Thus, the negative eigenvalues can never dominate the
spectrum and high matrix moment tests for the existence
of negative eigenvalues suffer from suppression effects.
This observation suggests that powerful tests for neg-
ative eigenvalues should involve
all
matrix moments
tr(
X
p
) up to a certain order
p
max
. It is useful to change
perspective in order to reason about potential improv-
ments. The
p
3
-PPT condition checks whether the fol-
lowing inequality is true:
F
3
(
X
) =
−
tr(
X
3
) + tr(
X
2
)
2
>
0
.
(A9)
For matrices
X
with unit trace, we can reinterpret the
matrix-valued function
F
3
(
X
) as a sum of (identical)
degree-3 polynomials applied to all eigenvalues
λ
1
,...,λ
d
of
X
. Set
p
2
= tr(
X
2
) and use tr(
X
) =
∑
d
i
=1
λ
i
= 1 to
conclude
F
3
(
X
) =
−
tr(
X
3
) + 2
p
2
tr(
X
2
)
−
p
2
2
tr(
X
)
=
d
∑
i
=1
(
−
λ
3
i
+ 2
p
2
λ
2
i
−
p
2
2
λ
i
)
=
d
∑
i
=1
−
λ
i
(
λ
i
−
p
2
)
2
=:
d
∑
i
=1
f
3
(
λ
i
)
.
(A10)
Note that the polynomial
f
3
(
x
) =
−
x
(
x
−
p
2
)
2
for
x
∈
R
(A11)
depends on
p
2
and, by extension, also on the matrix
X
.
We will come back to this aspect later. For now, we
point out that – regardless of the actual value of
p
2
–
this polynomial has three interesting properties:
f
3
(
x
)
≤
0
if
x >
0
,
f
3
(0) = 0
,
f
3
(
x
)
>
0
if
x <
0
.
(A12)
These properties reflect the behavior of another well-
known function – the
(negated) rectifier function
(ReLU):
r
(
−
x
) = max
{
0
,
−
x
}
=
{
0
if
x
≥
0
,
|
x
|
if
x <
0
.
(A13)
See Figure A1 for a visual comparison. Applying the
(negated) rectifier function to the eigenvalues of
X
would
recover the negativity:
N
(
X
) =
∑
λ
i
<
0
|
λ
i
|
=
d
∑
i
=1
r
(
−
λ
i
)
.
(A14)
3
−
0
.
5
0
.
0
0
.
5
1
.
0
x
−
0
.
5
0
.
0
0
.
5
1
.
0
f
3
(
x
) =
−
x
(
x
−
p
2
)
2
p
2
= 0
.
25
p
2
= 0
.
5
p
2
= 1
max
{−
x,
0
}
FIG. A1. Comparison of
f
3
(
x
) =
−
x
(
x
−
p
2
)
2
with the negated
rectifier function
r
(
−
x
) = max
{−
x,
0
}
for different values of
p
2
in the relevant interval [
−
1
/
2
,
1] [2].
Hence, it is instructive to interpret
F
3
(
X
) as a polyno-
mial approximation to the (non-analytic) negativity func-
tion.
On the level of polynomials, the condition
f
3
(
x
)
≤
0
whenever
x >
0 is most important. It implies that pos-
itive eigenvalues of
X
can never increase the value of
F
3
(
X
) =
∑
d
i
=1
f
3
(
λ
i
). In particular,
F
3
(
X
)
≤
0 when-
ever
X
is positive semidefinite – as stated in Proposi-
tion 1. The
p
3
-PPT condition is sound, i.e. it has no
false positives.
Conversely,
f
3
(
x
)
>
0 for
x <
0 implies that
F
3
(
X
) can
become positive if
X
has negative eigenvalues. Hence,
the
p
3
-PPT condition is not vacuous. It is capable of
detecting negative eigenvalues in many, but not all, unit-
trace matrices
X
.
Let us now return to the (matrix-dependent) param-
eter choice in Eq. (A11). In principle, every polynomial
of the form
f
(
a
)
3
(
x
) =
−
x
(
x
−
a
)
2
with
a
∈
R
obeys the
important structure constraints (A12) and therefore pro-
duces a sound test for negative eigenvalues. For fixed
X
,
the associated matrix polynomial evaluates to
F
(
a
)
3
(
X
) =
−
tr(
X
3
) + 2
a
tr(
X
2
)
−
a
2
tr(
X
)
.
(A15)
We can optimize this expression over the parameter
a
∈
R
to make the test as strong as possible. The opti-
mal choice is
a
]
= tr(
X
2
)
/
tr(
X
) and produces a matrix
polynomial that obeys
F
(
a
]
)
3
(
X
)
≥
max
a
∈
R
F
(
a
)
3
(
X
) for
X
fixed. If
X
has also unit trace, the optimal parame-
ter becomes
a
]
=
p
2
and produces the
p
3
-PPT condition
(A9).
This construction of PPT conditions readily extends
to higher order polynomials
f
p
(
x
) =
a
p
x
p
+
···
a
1
x
+
a
0
.
Increasing the degree
p
produces more expressive ansatz
functions that can approximate the (negated) rectifier
function – and its core properties – ever more accu-
rately. Viewed from this angle, it becomes apparent that
measuring more matrix moments can produce stronger
tests for detecting negative eigenvalues. However, it is
not so obvious how to choose the parameters
a
p
,...,a
0
“optimally”, or what “optimally” actually means in this
context. Some well-known polynomial approximations
of the rectifier function
r
(
−
x
) – like Taylor expansions
of
s
(
−
x
) = ln(1 + e
−
x
) (the “softplus” function) – are
not well-suited for this task, because
s
(
−
x
)
>
0 even for
x >
0. This in turn would imply that the associated test
condition may not be sound. We believe that a thorough
analysis of these questions is timely and interesting, but
would go beyond the scope of this work. We intend to
address it in future research.
Appendix B:
p
3
-PPT condition for Werner States
Werner states are bipartite quantum states in a Hilbert
space
H
AB
=
H
A
⊗H
B
with dimensions
d
A
=
d
B
≡
d
,
defined as
ρ
W
=
α
(
d
+ 1
2
)
−
1
Π
+
+ (1
−
α
)
(
d
2
)
−
1
Π
−
(B1)
with parameter
α
∈
[0
,
1] and Π
±
=
1
2
(
I
±
Π
12
) pro-
jectors onto symmetric
H
+
and anti-symmetric
H
−
sub-
spaces of
H
=
H
+
⊕H
−
, respectively [3]. Here, Π
12
=
∑
d
i,j
=1
|
i
〉〈
j
|⊗|
j
〉〈
i
|
is the swap operator. We note that
the eigenvalues of
ρ
W
are thus given as
λ
+
=
α
(
d
+1
2
)
−
1
with multiplicity
(
d
+1
2
)
and
λ
−
= (1
−
α
)
(
d
2
)
−
1
with mul-
tiplicity
(
d
2
)
. The reduced state
ρ
A
of qudit
A
is given by
ρ
A
= Tr
B
[
ρ
W
] =
I
A
/d
.
Using furthermore that Π
T
A
±
= 1
/
2(∆
1
±
(
d
±
1)∆
0
)
with ∆
0
=
|
φ
+
〉〈
φ
+
|
being a projector onto the maxi-
mally entangled state and ∆
1
=
I
−
∆
0
[3], we find
ρ
T
A
W
=
2
α
−
1
d
∆
0
+
1 +
d
−
2
α
d
∆
1
d
2
−
1
(B2)
with eigenvalues
λ
0
= (2
α
−
1)
/d
with multiplicity 1 and
λ
1
= (1 +
d
−
2
α
)
/d
(
d
2
−
1) with multiplicity
d
2
−
1.
We note that, for any
d
,
λ
0
<
0 for 0
≤
α <
1
/
2. Thus,
using the PPT condition, we find that
ρ
W
is entangled
for 0
≤
α <
1
/
2. Using the explicit expression of the
eigenvalues, we can furthermore determine Tr
[
(
ρ
T
A
)
n
]
for any
n
. We find for all local dimensions
d
Tr
[
(
ρ
T
A
)
2
]
2
>
Tr
[
(
ρ
T
A
)
3
]
for 0
≤
α <
1
2
(B3)
Thus, for Werner states the
p
3
-PPT condition is equiv-
alent to the full PPT condition. It can be furthermore
shown that Werner states are separable for
α
≥
1
/
2 [3].
Thus, for Werner states, the
p
3
-PPT condition is a neces-
sary and sufficient condition for bipartite entanglement.
This also holds true for “isotropic” states of the form
ρ
=
α
1
/d
2
+ (1
−
α
)
|
φ
+
〉〈
φ
+
|
, which are closely related.
We note that Werner states can have non-positive PT-
moments. For local dimension
d >
3 there exists a pa-
rameter interval [0
,α
∗
) such that the associated Werner
4
state (B1) obeys
p
3
= Tr
[
(
ρ
T
A
W
)
3
]
<
0 for all
α
∈
[0
,α
∗
).
This highlights that the logarithm of PT-moments, ap-
pearing also in the ratio
R
3
=
−
log
2
(
p
3
/
Tr[
ρ
3
]), need
not be properly defined, justifying a claim from the pre-
vious subsection. It is difficult to use entropic arguments
for reasoning about relations between (logarithmic) PT-
moments.
Finally, as shown in Ref. [4], we remark that
R
3
is not
an entanglement monotone. For separable Werner states
with 1
/
2
≤
α <
1
/
2 + 1
/
(2
d
), it holds that 0
< p
3
<
Tr[
ρ
3
]. Thus,
R
3
=
−
log
2
(
p
3
/
Tr[
ρ
3
]) can be greater than
zero, even for separable states. Since
R
3
equals zero for
all product states, it is not an entanglement monotone
[5].
Appendix C: Comparison of entanglement
conditions for quench dynamics
In this section, we compare the diagnostic power of the
full PPT-condition, the
p
3
-PPT condition and a condi-
tion based on purities of nested subsystems to detect bi-
partite entanglement of mixed states. Specifically, given
a reduced density matrix
ρ
AB
in a bipartite system
AB
,
we consider:
1. the PPT-condition detecting bipartite entangle-
ment between
A
and
B
for a strictly positive neg-
ativity
N
(
ρ
AB
) =
∑
λ<
0
|
λ
|
>
0, with
λ
the spec-
trum of
ρ
T
A
AB
[5].
2. the
p
3
-PPT condition detecting bipartite entangle-
ment between
A
and
B
for 1
−
p
3
/p
2
2
>
0.
3. a condition based on the purity of nested subsys-
tems detecting bipartite entanglement between
A
and
B
for Tr[
ρ
2
A
]
<
Tr[
ρ
2
AB
] with
ρ
A
= Tr
B
[
ρ
AB
]
the reduced density matrix of subsystem
A
[5].
The latter ’purity’ condition was used in previous exper-
imental works measuring the second R ́enyi entropy [6–9]
to reveal bipartite entanglement of weakly mixed states.
To test these conditions, we consider here, as an exam-
ple, quantum states generated via quench dynamics in
interacting spin models. Specifically, we study quenches
in the
XY
-model with long-range interactions, as defined
in Eq. (6) of the main text, in a total system with
N
= 10
spins. The initial separable product state is a N ́eel state
|↑↓↑↓
...
〉
.
As shown in Fig. C1, the negativity (red lines) de-
tects bipartite entanglement for all partitions sizes and
all times after the quench. The
p
3
-PPT condition (blue
lines) performs similar for the partitions considered in
panel (b) and (c) and is thus able to detect bipartite
entanglement for highly mixed states
ρ
AB
whose purity
decreases to 0
.
3 for the panel (b) at late times. The
p
3
-PPT conditions fails however to detect the entangle-
ment for the close-to completely mixed states of small
partitions
|
AB
|
= 4 at late times, displayed in panel (a).
−
1
.
0
−
0
.
5
0
.
0
0
.
5
A
=[4
,
5]
,B
=[6
,
7]
−
1
0
1
A
=[3
,
4
,
5]
,B
=[6
,
7
,
8]
0 (0.0)
1.25 (0.5)
2.5 (1.0)
3.75 (1.6)
5 (2.1)
t
[
ms
](
J
0
t
)
0
1
2
A
=[2
,
3
,
4
,
5]
,B
=[6
,
7
,
8
,
9]
N
(
ρ
AB
)
1
−
p
3
/p
2
2
1
−
Tr
[
ρ
2
A
]
/
Tr
[
ρ
2
AB
]
a)
b)
c)
FIG. C1. Comparing conditions for bipartite entanglement
between two subsystems
A
and
B
for states generated with
quench dynamics governed by
H
XY
arising from an initial
N ́eel state in a total system with
N
= 10 spins. Modeling
the experiment of Ref. [9], we chose
J
0
= 420
s
−
1
,α
= 1
.
24,
while other parameter choices lead to similar results. In all
panels, and for all quantities, a strictly positive value signals
bipartite entanglement.
This can be attributed to the fact that the
p
3
-PPT con-
dition only relies on low order PT-moments. The purity
condition (green lines) is only useful for the detection
of entanglement for large partitions
AB
with
|
AB
|
= 8
(panel (c)). These remain weakly mixed during the entire
time evolution, since the total system of
N
= 10 spins is
described here by a pure state.
Appendix D: Error bars for PT moment predictions
Let us first review the data acquisition procedure. To
obtain meaningful information about an
N
-qubit state
ρ
, we first perform a collection of random single qubit
rotations:
ρ
7→
uρu
†
, where
u
=
u
1
⊗···⊗
u
N
and each
u
i
is chosen from a unitary 3-design. Subsequently, we
perform computational basis measurements and store the
outcome:
ρ
7→
uρu
†
7→|
k
1
,...,k
N
〉
.
(D1)
Here,
k
1
,...,k
N
∈ {
0
,
1
}
denote the measurement out-
comes on qubits 1
,...,N
. As shown in [10–12], the out-
come of this measurement provides a (single-shot) esti-
mate for the unknown state:
ˆ
ρ
=
N
⊗
i
=1
[
3(
u
i
)
†
|
k
i
〉〈
k
i
|
u
i
−
I
2
]
(D2)
This tensor product is a random matrix – the unitaries
u
(
i
)
, as well as the observed outcomes
k
i
are random –
that produces the true underlying state in expectation:
E
[ ˆ
ρ
] =
ρ.
(D3)
5
Thus, the result of a (randomly selected) single-shot mea-
surement provides a classical snapshot (D2) that repro-
duces the true underlying state in expectation. This de-
sirable feature extends to density matrices of subsystems.
Let
AB
⊂ {
1
,...,N
}
be a subset of
|
AB
| ≤
N
qubits
and let
ρ
AB
= Tr
¬
AB
(
ρ
) the associated reduced density
matrix. Then,
ˆ
ρ
AB
= Tr
¬
AB
( ˆ
ρ
) =
⊗
i
∈
AB
[
3(
u
i
)
†
|
k
i
〉〈
k
i
|
u
i
−
I
2
]
(D4)
obeys
E
[ ˆ
ρ
AB
] =
ρ
AB
.
This feature can be used to estimate linear properties
of the subsystem in question:
o
= Tr (
Oρ
AB
). Perform
M
independent repetitions of the data acquisition proce-
dure and use them to create a collection of (independent)
snapshots ˆ
ρ
(1)
AB
,...,
ˆ
ρ
(
M
)
AB
– a “classical shadow” [12] – and
form the empirical average of subsystem properties:
ˆ
o
=
1
M
M
∑
r
=1
Tr
(
O
ˆ
ρ
(
r
)
AB
)
.
(D5)
Convergence to the target value
o
=
E
[ˆ
o
] = Tr(
Oρ
AB
)
is controlled by the variance. Chebyshev’s inequality as-
serts
Pr [
|
ˆ
o
−
o
|≥
]
≤
Var [ˆ
o
]
2
=
Var [Tr(
O
AB
ˆ
ρ
AB
)]
M
2
.
(D6)
The remaining (single-shot) variance obeys the follow-
ing useful relation.
Fact 2
(Proposition 3 in [12])
.
Fix a subsystem
AB
and a
linear function
Tr(
Oρ
AB
)
. Then, the single-shot variance
associated with
ˆ
ρ
AB
defined in Eq.
(D4)
obeys
Var [Tr (
O
ˆ
ρ
AB
)]
≤
2
|
AB
|
Tr
(
O
2
)
.
(D7)
This inequality is true for any underlying state
ρ
and
bounds the variance in terms of the subsystem dimension
d
A
= 2
|
AB
|
and the Hilbert-Schmidt norm (squared) of
the observable
O
. Thus, roughly
M
≈
2
|
AB
|
Tr(
O
2
)
/
2
measurement repetitions are necessary to predict
o
up to
accuracy
.
1. Predicting quadratic properties (
p
2
)
The formalism introduced above readily extends to
predictions of higher order polynomials. The special
case of quadratic functions has already been addressed in
Refs. [9, 10, 13, 14], and Ref. [12] (for the present formal-
ism). The key idea is to represent a quadratic function
in
ρ
as a linear function on the tensor product
ρ
⊗
ρ
:
o
= Tr (
Oρ
AB
⊗
ρ
AB
)
.
(D8)
This function can be approximated by replacing
ρ
⊗
ρ
with a
symmetric
tensor product of two distinct snap-
shots ˆ
ρ
(
i
)
,
ˆ
ρ
(
j
)
(
i
6
=
j
):
1
2!
∑
π
∈S
2
ˆ
ρ
(
π
(
i
))
AB
⊗
ˆ
ρ
(
π
(
j
))
AB
=
1
2
(
ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
+ ˆ
ρ
(
j
)
AB
⊗
ˆ
ρ
(
i
)
AB
)
.
(D9)
There are
(
M
2
)
different ways of combining a collection
of
M
snapshots ˆ
ρ
(1)
,...,
ˆ
ρ
(
M
)
in this fashion. We can
predict
o
= Tr(
Oρ
AB
⊗
ρ
AB
) by forming the empirical
average over all of them:
ˆ
o
=
(
M
2
)
−
1
∑
i<j
Tr
(
O
1
2!
∑
π
∈S
2
ˆ
ρ
(
π
(
i
))
AB
⊗
ˆ
ρ
(
π
(
j
))
AB
)
=
(
M
2
)
−
1
∑
i<j
Tr
(
O
(
s
)
ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
)
.
(D10)
Here, we have implicitly defined the
symmetrization
O
(
s
)
of the original target function
O
. This ansatz is a special
case of Hoeffding’s U-statistics estimator [15]. Averaging
boosts convergence to the desired expectation
E
[ˆ
o
] =
o
and the speed of convergence is controlled by the variance
(D6).
Restriction to subsystems is also possible. Suppose
that
O
only acts nontrivially on a subsystem
AB
of both
state copies. Then,
ˆ
o
=
(
M
2
)
−
1
∑
i<j
Tr
(
O
(
s
)
ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
)
(D11)
and the effective problem dimension becomes
d
2
AB
=
4
|
AB
|
. The tensor product structure (D2) of the indi-
vidual snapshots allows for generalizing linear variance
bounds to this setting. Simply view ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
as a sin-
gle snapshot of the quantum state
ρ
AB
⊗
ρ
AB
. Fact 2
then ensures
Var
[
Tr
(
O
(
s
)
ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
)]
≤
4
|
AB
|
Tr
(
O
2
(
s
)
)
.
(D12)
The full variance of ˆ
o
is controlled in part by this relation,
but also features linear variance terms [12, App. 6.A].
Rather than reviewing this argument in full generality,
let us focus on the task at hand: computing the variance
associated with predicting the PT-moment of order two.
Fix a bipartite subsystem
AB
of interest and rewrite
p
2
as
p
2
= Tr
(
(
ρ
(
T
A
)
AB
)
2
)
= Tr
(
ρ
2
AB
)
= Tr (Π
AB
ρ
AB
⊗
ρ
AB
)
.
(D13)
Here, Π
AB
denotes the swap operator that permutes the
entire subsystems
AB
within two copies of the global
system. We refer to Table I below for a visual deriva-
tion of this well-known relation. The swap operator is
6
symmetric under permuting tensor factors, Hermitian
(Π
†
AB
= Π
AB
) and orthogonal (Π
2
AB
=
I
AB
). These
properties ensure that the associated general estimator
(D11) can be simplified considerably:
ˆ
p
2
=
(
M
2
)
−
1
∑
i<j
Tr
(
Π
AB
ˆ
ρ
(
i
)
AB
⊗
ˆ
ρ
(
j
)
AB
)
=
(
M
2
)
−
1
∑
i<j
Tr
(
ˆ
ρ
(
i
)
AB
ˆ
ρ
(
j
)
AB
)
.
(D14)
By construction,
E
[ ˆ
p
2
] =
p
2
= Tr(
ρ
2
) and the speed of
convergence is controlled by the variance. This variance
decomposes into a linear and a quadratic part. We ex-
pand the definition of the variance:
Var [ ˆ
p
2
] =
E
[
( ˆ
p
2
−
E
[ ˆ
p
2
])
2
]
=
E
[
ˆ
p
2
2
]
−
E
[ ˆ
p
2
]
2
(D15)
=
(
M
2
)
−
2
∑
i<j
∑
k<l
(
Tr
(
ˆ
ρ
(
i
)
AB
ˆ
ρ
(
j
)
AB
)
Tr
(
ˆ
ρ
(
k
)
AB
ˆ
ρ
(
l
)
AB
)
−
Tr(
ρ
2
AB
)
2
)
.
The size and nature of each contribution depends on the
relation between the indices
i,j,k,l
[15]:
1.
all indices are distinct:
distinct indices label in-
dependent snapshots.
In this case the expec-
tation value factorizes completely and produces
E
[
Tr( ˆ
ρ
(
i
)
AB
ˆ
ρ
(
j
)
AB
)Tr( ˆ
ρ
(
k
)
AB
ˆ
ρ
(
l
)
AB
)
]
= Tr(
ρ
2
AB
)
2
. This is
completely offset by the subtraction of the expecta-
tion value squared. Hence, terms where all indices
are distinct do not contribute to the variance.
2.
exactly two indices coincide:
In this case,
the expectation value partly factorizes, e.g.
E
[
Tr( ˆ
ρ
(
i
)
AB
ˆ
ρ
(
j
)
AB
)Tr( ˆ
ρ
(
k
)
AB
ˆ
ρ
(
j
)
AB
)
]
=
E
[
Tr(
ρ
AB
ˆ
ρ
(
j
)
AB
)
2
]
for
i
6
=
k
and
j
=
l
. Such index combinations
produce a linear variance term Var [Tr(
O
ˆ
ρ
)] with
O
=
ρ
AB
. The entire sum contains
(
M
2
)(
2
1
)(
M
−
2
2
−
1
)
=
(
M
2
)
2(
M
−
2) terms of this form.
3.
two
pairs
of
indices
coincide:
there are
(
M
2
)(
2
2
)(
M
−
2
2
−
2
)
=
(
M
2
)
contributions of this form
and each of them produces a quadratic variance
Var [Tr (
O
ˆ
ρ
AB
⊗
ˆ
ρ
′
AB
)] with
O
= Π
AB
(swap).
We conclude that the variance of ˆ
p
2
decomposes into lin-
ear and quadratic terms. These can be controlled via
Rel. (D7) and Rel. (D12), respectively:
Var [ ˆ
p
2
] =
(
M
2
)
−
1
(
2(
M
−
2)Var [Tr(
ρ
AB
ˆ
ρ
AB
)]
+Var
[
Tr(Π
AB
ˆ
ρ
(1)
AB
⊗
ˆ
ρ
(2)
AB
)
])
≤
4(
M
−
2)2
|
AB
|
M
(
M
−
1)
Tr
(
ρ
2
AB
)
+
2
×
4
|
AB
|
M
(
M
−
1)
Tr
(
Π
2
AB
)
≤
4
(
2
|
AB
|
p
2
M
)
+ 4
(
2
1
.
5
|
AB
|
M
)
2
.
(D16)
Chebyshev’s inequality (D6) allows us to translate this
insight into an error bound.
Lemma 2
(Error bound for estimating
p
2
)
.
Fix a subsys-
tem
AB
of interest and suppose that we wish to estimate
p
2
= Tr
(
(
ρ
T
A
AB
)
2
)
. For
,δ >
0
, a total of
M
≥
8 max
{
2
|
AB
|
p
2
2
δ
,
2
1
.
5
|
AB
|
√
δ
}
(D17)
snapshots suffice to ensure that the estimator
(D14)
obeys
|
ˆ
p
2
−
p
2
|≤
with probability at least
1
−
δ
.
It is worthwhile to briefly discuss this two-pronged er-
ror bound. Asymptotically, i.e. for
M
→∞
, the approx-
imation error decays at a rate proportional to 1
/
√
M
.
This is the expected asymptotic decay rate for an estima-
tion procedure that relies on empirical averaging (Monte
Carlo). The actual rate is also multiplicative, i.e. the ap-
proximation error is proportional to the target
p
2
. In the
practically more relevant, non-asymptotic setting, things
can look strikingly different. For small and moderate
sample sizes
M
, the variance bound (D16) is dominated
by the next-to-leading order term (2
1
.
5
|
AB
|
>
2
|
AB
|
p
2
,
especially if
p
2
is small). Lemma 2 captures this discrep-
ancy and heralds an error decay rate proportional to 1
/M
in this regime.
Finally, we point out that the dependence on
δ
in
Eq. (D17) can be considerably improved by using me-
dian of means estimation [12]: split the total data into
equally sized chunks, construct independent estimators
and take their median. For this procedure, a sampling
rate proportional to log(1
/δ
) suffices. Moreover, median
of means is much more robust towards outlier corruption
and allows for using the same data to predict purities
of many different subsystems simultaneously. This, how-
ever, comes at the price of somewhat larger constants in
the error bound (D17) and heralds a tradeoff. In sta-
tistical terms, median of means estimation dramatically
increases confidence levels (1
−
δ
) at the cost of slightly
larger error bars (confidence intervals). This tradeoff be-
comes advantageous when one attempts to predict very
many properties from a single data set.
2. Predicting cubic properties (
p
3
and
Tr(
ρ
3
AB
)
)
Cubic properties can be predicted in much the same
fashion as quadratic properties [12].
Write
o
=
Tr (
Oρ
AB
⊗
ρ
AB
⊗
ρ
AB
) and approximate
ρ
⊗
ρ
⊗
ρ
by
a
symmetric
tensor product of three distinct snapshots
ˆ
ρ
(
i
)
AB
,
ˆ
ρ
(
j
)
AB
,
ˆ
ρ
(
k
)
AB
:
1
3!
∑
π
∈S
3
ˆ
ρ
(
π
(
i
))
AB
⊗
ˆ
ρ
(
π
(
j
))
AB
⊗
ˆ
ρ
(
π
(
k
))
AB
.
(D18)
There are
(
M
3
)
different ways of combining a collection of
M
(independent) snapshots ˆ
ρ
(1)
AB
,...,
ˆ
ρ
(
M
)
AB
in this fash-
ion. We estimate the cubic function
o
by averaging over
7
all of them (U-statistics [15]):
ˆ
o
=
(
M
3
)
−
1
∑
i<j<k
Tr
(
O
1
3!
∑
π
∈S
3
ˆ
ρ
(
π
(
i
))
⊗
ˆ
ρ
(
π
(
j
))
⊗
ˆ
ρ
(
π
(
k
))
)
.
(D19)
Once more, the variance controls the rate of convergence
to the desired target value
E
[ˆ
o
] = Tr (
Oρ
⊗
ρ
⊗
ρ
). This
variance decomposes into a linear, a quadratic and a cu-
bic part. The argument is a straightforward generaliza-
tion of the analysis from the previous subsection. Rather
than repeating the steps in full generality, we directly fo-
cus on the 3rd order PT-moment
p
3
of a subsystem
AB
:
p
3
= Tr
(
(
ρ
T
A
AB
)
3
)
.
(D20)
For notational simplicity, we suppress the subscript
AB
indicating the subsystem of interest and label the shad-
ows by lower-case indices: ˆ
ρ
(
i
)
AB
7→
ˆ
ρ
i
for
i
= 1
,...,M
.
Due to the cyclicity of the trace, the U-statistics estima-
tor simplifies to
(
M
3
)
ˆ
p
3
=
∑
i<j<k
Tr
(
1
3!
∑
π
∈S
3
ˆ
ρ
T
A
π
(
i
)
ˆ
ρ
T
A
π
(
j
)
ˆ
ρ
T
A
π
(
k
)
)
(D21)
=
∑
i<j<k
1
2
(
Tr
(
ˆ
ρ
T
A
i
ˆ
ρ
T
A
j
ˆ
ρ
T
A
k
)
+ Tr
(
ˆ
ρ
T
A
j
ˆ
ρ
T
A
i
ˆ
ρ
T
A
k
))
,
where we have moved the normalization factor
(
M
3
)
−
1
to the left hand side in order to to increase readabil-
ity. When computing the variance, we need to consider
two sums over triples of distinct indices in
{
1
,...,M
}
.
If all indices are distinct, the overall contribution van-
ishes. Otherwise the contribution depends on the number
c
∈ {
1
,
2
,
3
}
of indices the triples have in common. The
number of distinct choices for two triples with exactly
c
integers in common is
(
M
3
)(
3
c
)(
M
−
3
3
−
c
)
and we infer
(
M
3
)
Var [ ˆ
p
3
]
=
(
3
1
)(
M
−
3
2
)
Var
[
Tr
(
(
ρ
T
A
)
2
ˆ
ρ
T
A
1
)]
+
(
3
2
)(
M
−
3
1
)
Var
[
Tr
(
ρ
T
A
1
2
(
ˆ
ρ
T
A
1
ρ
T
A
2
+ ˆ
ρ
T
A
2
ˆ
ρ
T
A
1
))]
+Var
[
1
2
(
Tr
(
ˆ
ρ
T
A
1
ˆ
ρ
T
A
2
ˆ
ρ
T
A
3
)
+ Tr
(
ˆ
ρ
T
A
2
ˆ
ρ
T
A
1
ˆ
ρ
T
A
3
))]
≤
(
M
3
)(
9
M
L
+
18
M
2
Q
+
12
M
3
C
)
.
(D22)
Here, ˆ
ρ
1
,
ˆ
ρ
2
,
ˆ
ρ
3
denote independent, random realizations
of the snapshot ˆ
ρ
and we have introduced place-holders
for linear (
L
), quadratic (
Q
) and cubic (
C
) contributions,
respectively. For the task at hand, these contributions
can be bounded individually and depend on the subsys-
tem size
AB
:
1.
linear contribution:
set
O
= (
ρ
T
A
AB
)
2
for notational
brevity. We can use Tr(
O
ˆ
ρ
T
A
) = Tr(
O
T
A
ˆ
ρ
) to ab-
sorb the partial transpose in the linear function.
Rel. (D7) then ensures
L
≤
2
|
AB
|
Tr(
ρ
2
)
2
,
(D23)
where we have also used Tr((
O
T
A
)
2
) = Tr(
O
2
), as
well as Tr(
O
2
) =
‖
O
‖
2
2
≤ ‖
O
‖
2
1
= Tr(
O
)
2
, because
O
=
ρ
2
is psd.
2.
quadratic
contribution:
We
can
bring
1
2
(
Tr(
ρ
T
A
ˆ
ρ
T
A
1
ˆ
ρ
T
A
2
) + Tr(
ρ
T
A
ˆ
ρ
T
A
2
ˆ
ρ
T
A
1
)
into
the
canonical form Tr
(
O
ˆ
ρ
(1)
AB
⊗
ˆ
ρ
(2)
AB
)
by introducing
O
=
1
2
(Π
A
(
ρ
⊗
I
AB
)Π
B
+Π
B
(
ρ
⊗
I
AB
)Π
A
)
.
(D24)
We refer to Table I for a visual derivation. Here,
Π
A
and Π
B
are permutation operators that swap
the two
A
- and
B
-systems, respectively. Rel. (D12)
then ensures
Q
≤
2
2
|
AB
|
Tr
(
O
2
)
≤
2
3
|
AB
|
Tr(
ρ
2
)
.
(D25)
The final estimate follows from exploiting Π
2
A
=
Π
2
B
=
I
AB
, as well as Tr
(
ρ
2
⊗
I
2
AB
)
= 2
|
AB
|
Tr(
ρ
2
).
3.
cubic contribution:
We can bring the cubic func-
tion
1
2
(
Tr( ˆ
ρ
T
A
1
ˆ
ρ
T
A
2
ˆ
ρ
T
A
3
) + Tr( ˆ
ρ
T
A
2
ˆ
ρ
T
A
1
ˆ
ρ
T
A
3
)
)
into the
canonical form Tr
(
O
ˆ
ρ
1
⊗
ˆ
ρ
2
⊗
ˆ
ρ
3
)
by introducing
O
=
1
2
(
−→
Π
A
⊗
←−
Π
B
+
−→
Π
†
A
⊗
←−
Π
†
B
)
,
(D26)
see Table I below. Here,
−→
Π
A
is a cyclic permuta-
tion that exchanges all
A
-systems in a “forward”
fashion (
A
1
7→
A
2
,
A
2
7→
A
3
,
A
3
7→
A
1
), while
←−
Π
B
is another cyclic permutation that exchanges
all
B
-systems in a “backwards” fashion (
B
3
7→
B
2
,
B
2
7→
B
1
,
B
1
7→
B
3
). A staightforward extension
of Rel. (D12) to cubic functions implies
C
≤
2
3
|
AB
|
Tr(
O
2
)
≤
2
6
|
AB
|
,
(D27)
because permutations are orthogonal (ΠΠ
†
=
I
)
and Tr(
O
2
) is dominated by Tr(
I
AB
⊗
I
AB
⊗
I
AB
) =
2
3
|
AB
|
.
Inserting these bounds into the variance formula for
p
3
reveals
Var [ ˆ
p
3
]
≤
9
M
L
+
18
M
2
Q
+
12
M
3
C
(D28)
≤
9
2
|
AB
|
M
Tr(
ρ
2
)
2
+ 18
2
3
|
AB
|
M
2
Tr(
ρ
2
) + 12
2
6
|
AB
|
M
3
.
Combining this insight with Chebyshev’s inequality (D6)
produces a suitable error bound.
Recall that
p
2
=
Tr
(
(
ρ
T
A
)
2
)
= Tr(
ρ
2
)
∈
[2
−|
AB
|
,
1] denotes the purity of
the subsystem in question.
8
Lemma 3
(Error bound for estimating
p
3
)
.
Fix a subsys-
tem
AB
of interest and suppose that we wish to estimate
p
3
= Tr
(
(
ρ
T
A
)
3
)
. For
,δ >
0
, a total of
M
≥
39 max
{
2
|
AB
|
p
2
2
2
δ
,
2
1
.
5
|
AB
|
p
2
√
δ
,
2
2
|
AB
|
2
/
3
δ
1
/
3
}
(D29)
snapshots suffice to ensure that the estimator
(D21)
obeys
|
ˆ
p
3
−
p
3
|≤
with probability at least
1
−
δ
.
This bound on the sampling rate provides different er-
ror decay rates for different regimes. For
M
→ ∞
, the
first term in the maximum dominates and the error de-
cays at an asymptotically unavoidable rate proportional
to 1
/
√
M
. Conversely, for very small sample sizes
M
, the
third term dominates and conveys a much larger decay
rate proportional to 1
/M
3
/
2
. In the intermediate regime,
the second term may dominate and lead to a inverse lin-
ear decay rate 1
/M
, instead. The dependence on the er-
ror parameter
δ
can once more be considerably improved
(from 1
/δ
to log(1
/δ
)) by using median of means estima-
tion. This refinement also allows for using the same data
to predict the cubic PT-moment of very many subsys-
tems simultaneously [12].
Finally, we point out that the estimation error for
s
3
=
‖
ρ
‖
3
3
= Tr(
ρ
3
) can be bounded in exactly the same
fashion. For
,δ >
0, a sampling rate
M
that obeys
Rel. (D29) also ensures that the U-statistics estimator
U-statistics estimator
ˆ
s
3
=
(
M
3
)
−
1
∑
i<j<k
1
2
(
Tr
(
ˆ
ρ
i
ˆ
ρ
j
ˆ
ρ
k
)
+Tr
(
ˆ
ρ
j
ˆ
ρ
i
ˆ
ρ
k
))
(D30)
obeys
|
ˆ
s
3
−
s
3
|≤
with probability 1
−
δ
.
The proof is almost identical to the
p
3
-analysis and we
leave it as an exercise for the dedicated reader.
3. Additional numerical simulations
Here, we complement Fig. 2 of the MT by showing in
Fig. D1 statistical errors in the estimation of
p
2
and
p
3
for the ground state of the transverse Ising model
H
=
J
(
∑
i
σ
x
i
σ
x
i
+1
+
σ
z
i
) at criticality. We observe the same
scaling behavior as in the case of the GHZ state. For
p
2
[panel a)], there are indeed two regimes with different
decay rates (1
/M
and 1
/
√
M
). For
p
3
[panel b)], the
latter two decay rates 1
/M
and 1
/
√
M
are also clearly
visible. In contrast, the early regime decay rate is not as
pronounced. This is likely due to limited system sizes –
1
/M
3
/
2
does appropriately capture the decay of red dots
(largest system size considered) in the top left corner,
but seems to be absent in decay rates for smaller system
sizes.
10
0
10
1
10
2
10
3
10
4
M/
2
|
AB
|
10
−
3
10
−
2
10
−
1
10
0
10
1
Err.
p
2
|
AB
|
=2
|
AB
|
=4
|
AB
|
=6
|
AB
|
=8
10
0
10
1
10
2
10
3
10
4
M/
2
|
AB
|
10
−
3
10
−
2
10
−
1
10
0
10
1
Err.
p
3
|
AB
|
=2
|
AB
|
=4
|
AB
|
=6
|
AB
|
=8
a)
b)
FIG. D1. Statistical errors for the ground state of the trans-
verse field Ising model. Dashed lines represent scalings of
∝
1
/M
, and
∝
1
/
√
M
. In all cases, the number of measure-
ments to estimate
p
2
a) and
p
3
b) with accuracy 0
.
1 is of the
order of 100
×
2
|
AB
|
.
Appendix E: Auxiliary results and wiring diagrams
The arguments from the previous subsections make use
of identities satisfied by traces of partial transposes of
bipartite operators. Wiring diagrams – also known as
tensor network diagrams – provide a useful pictorial cal-
culus for deriving such identifies. We refer the interested
reader to Refs. [16–18] for a thorough introduction and
content ourselves here with a concise overview that will
suffice for the purposes at hand. The wiring formalism
represents operators as boxes with lines emanating from
them. These lines represent contra- (on the left) and
co-variant indices (on the right):
X
=
∑
i,j
[
X
ij
]
|
i
〉〈
j
|
=
X
i
j
.
(E1)
Two operators
X
and
Y
can be multiplied to produce an-
other operator. This corresponds to an index contraction
and is represented in the following fashion:
XY
=
∑
i,k
(
∑
j
[
X
]
ij
[
X
]
jk
)
|
i
〉〈
k
|
=
X
Y
j
i
k
.
(E2)
Transposition exchanges outgoing (contravariant) and in-
coming (covariant) indices
X
T
=
∑
i,j
[
X
]
ij
|
j
〉〈
i
|
=
X
j
i
,
(E3)
while the trace pairs up both indices and sums over them:
Tr(
X
) =
∑
i
[
X
]
ii
=
X
i
=
X
.
(E4)
We abbreviate this loop (contraction of leftmost and
rightmost indices) by putting two circles at the end points
of lines that should be contracted. This notation is not
standard, but will considerably increase the readability
of more complex contraction networks.
This basic formalism readily extends to tensor prod-
ucts if we arrange tensor product factors in parallel. For
9
expression
diagram representation
diagram reformulation
modified expression
Tr(
X
T
A
AB
Y
T
A
AB
)
X
AB
Y
AB
X
AB
Y
AB
Tr (
X
AB
Y
AB
)
Tr (
X
AB
Y
AB
)
X
AB
Y
AB
Π
B
Π
A
Y
AB
X
AB
Tr (Π
B
Π
A
X
AB
⊗
Y
AB
)
Π
B
,
Π
A
: swaps
Tr(
ρ
T
A
AB
X
T
A
AB
Y
T
A
AB
)
ρ
AB
X
AB
Y
AB
ρ
AB
I
AB
Π
B
Π
A
Y
AB
X
AB
Tr (Π
B
(
ρ
AB
⊗
I
AB
)Π
A
X
AB
⊗
Y
AB
)
Tr(
ρ
T
A
AB
Y
T
A
AB
X
T
A
AB
)
ρ
AB
Y
AB
X
AB
ρ
AB
I
AB
Π
A
Π
B
Y
AB
X
AB
Tr (Π
A
(
ρ
AB
⊗
I
AB
)Π
B
X
AB
⊗
Y
AB
)
Tr
(
X
T
A
AB
Y
T
A
AB
Z
T
A
AB
)
X
AB
Y
AB
Z
AB
−→
Π
B
←−
Π
A
Z
AB
Y
AB
X
AB
Tr
(
−→
Π
B
←−
Π
A
X
AB
⊗
Y
AB
⊗
Z
AB
)
−→
Π
B
,
←−
Π
A
: cycle permutations
Tr
(
Y
T
A
AB
X
T
A
AB
Z
T
A
AB
)
Y
AB
X
AB
Z
AB
−→
Π
A
←−
Π
B
Z
AB
Y
AB
X
AB
Tr
(
←−
Π
A
−→
Π
B
X
AB
⊗
Y
AB
⊗
Z
AB
)
←−
Π
A
−→
Π
B
=
(
−→
Π
B
←−
Π
A
)
†
TABLE I.
Reformulations of relevant tensor product expressions:
The variance bounds in Sub. D 1 and Sub. D 2 are contingent
on bringing certain expressions into canonical form, i.e. Tr (
OX
AB
⊗
Y
AB
) for bilinear functions and Tr (
O
′
X
AB
⊗
Y
AB
⊗
Z
AB
)
for trilinear ones. This table supports visual derivations for these reformulations. Expressions of interest (very left) are first
translated into wiring diagrams (center left). Subsequently, the rules of wiring calculus are used to re-arrange the diagrams
(center right). Translating them into formulas (very right) produces equivalent expressions that respect the desired structure.
instance, a bipartite operator features two parallel lines
on the left and on the right:
X
AB
=
X
AB
A
A
B
B
(E5)
The upper lines represent the system
A
, while the lower
lines represent system
B
. Two important bipartite opera-
tors are the identity
I
(do nothing) and the swap operator
Π that exchanges the systems:
I
=
and
Π
=
.
(E6)
Rules for multiplying and contracting operators readily
extend to the tensor setting. This allows us to reformu-
late well-known expressions pictorially. For instance,
Tr(
XY
) =
X
Y
=
X
Y
(E7)
=Tr (Π
X
⊗
Y
)
.
(E8)
The wiring formalism is also exceptionally well-suited to
capture partial operations, like the partial transpose:
X
T
A
AB
=
X
AB
.
(E9)
These elementary rules can be used to visually represent
more complicated expressions – like a trace of multiple
partial transposes. The wiring formalism provides a pic-
torial representation for such objects and a visual frame-
work for modifying them. In particular, it is possible to
bend, as well as unentangle, index lines and rearrange