submat.pdf

Mixed-state entanglement from local randomized measurements

–

Supplemental Material

Andreas Elben,

1, 2,

∗

Richard Kueng,

3,

∗

Hsin-Yuan (Robert) Huang,

4, 5

Rick van

Bijnen,

1, 2

Christian Kokail,

1, 2

Marcello Dalmonte,

6, 7

Pasquale Calabrese,

6, 7, 8

Barbara Kraus,

9

John Preskill,

4, 5, 10, 11

Peter Zoller,

1, 2

and Benoˆıt Vermersch

1, 2, 12

1

Center for Quantum Physics, University of Innsbruck, Innsbruck A-6020, Austria

2

Institute for Quantum Optics and Quantum Information of the Austrian Academy of Sciences, Innsbruck A-6020, Austria

3

Institute for Integrated Circuits, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria

4

Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA

5

Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA

6

The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy

7

SISSA, via Bonomea 265, 34136 Trieste, Italy

8

INFN, via Bonomea 265, 34136 Trieste, Italy

9

Institute for Theoretical Physics, University of Innsbruck, A–6020 Innsbruck, Austria

10

Walter Burke Institute for Theoretical Physics, Caltech, Pasadena, CA, USA

11

AWS Center for Quantum Computing, Pasadena, CA, USA

12

Univ. Grenoble Alpes, CNRS, LPMMC, 38000 Grenoble, France

Appendix A: The

p

3

-PPT condition

In this section we present, prove and discuss the

p

3

-

PPT condition. The

p

3

-PPT condition is the contraposi-

tive of the following statement about moments of positive

semidefinite matrices with unit trace.

Proposition 1.

For every positive semidefinite matrix

X

with unit trace (

Tr(

X

) = 1

) it holds that

tr(

X

2

)

2

≤

tr(

X

3

)

.

(A1)

Note that Eq. (A1) resembles the following well-known

monotonicity relation among R ́enyi entropies (see e.g.,

Ref. [1]):

S

3

(

ρ

)

≤

S

2

(

ρ

) for

S

n

(

ρ

) =

1

−

n

log

2

(

tr(

ρ

n

)

.

(A2)

However, this relation only applies to density matrices,

i.e. positive semidefinite matrices with unit trace. The

p

3

-PPT condition, in contrast, is designed to test the ab-

sence of positive semidefiniteness. Hence, it is crucial to

have a condition that does not break down if the matrix

in question has negative eigenvalues. Rel. (A1) (and its

direct proof provided in the next subsection) do achieve

this goal, while an argument based on monotonicity rela-

tions between R ́enyi entropies can break down, because

the logarithm of non-positive numbers is not properly

defined.

1. Proof of the

p

3

-PPT condition

Let

X

be a Hermitian

d

×

d

matrix with eigenvalue

decomposition

X

=

∑

d

i

=1

λ

i

|

x

i

〉〈

x

i

|

. For

p

≥

1, we intro-

∗

These authors contributed equally.

duce the Schatten-

p

norms

‖

X

‖

p

=

(

d

∑

i

=1

|

λ

i

|

p

)

1

/p

= Tr (

|

X

|

p

)

1

/p

,

where

|

X

|

=

√

X

2

=

∑

d

i

=1

|

λ

i

||

x

i

〉〈

x

i

|

denotes the

(matrix-valued) absolute value. The Schatten-

p

norms

encompass most widely used matrix norms in quantum

information. Concrete examples are the trace norm (

p

=

1), the Hilbert-Schmidt/Frobenius norm (

p

= 2) and the

operator/spectral norm (

p

=

∞

). Each Schatten-

p

norm

corresponds to the usual vector

`

p

-norm of the vector of

eigenvalues

λ

= (

λ

1

,...,λ

d

)

T

∈

R

d

:

‖

λ

‖

`

p

=

(

d

∑

i

=1

|

λ

i

|

p

)

1

/p

for

p

≥

1.

(A3)

Hence, Schatten-

p

norms inherit many desirable proper-

ties from their vector-norm counterparts. Here, we shall

use vector norm relations to derive a relation among

Schatten-

p

norms. It is based on Hoelder’s inequality

that relates the inner product

〈

v,w

〉

=

d

∑

i

=1

v

i

w

i

for

v,w

∈

R

d

(A4)

to a combination of

`

p

norms.

Fact 1

(Hoelder’s inequality for vector norms)

.

Fix

p,q

≥

1

such that

1

/p

+ 1

/q

= 1

. Then,

|〈

v,w

〉|≤

d

∑

i

=1

|

v

i

w

i

|≤‖

v

‖

`

p

‖

w

‖

`

q

(A5)

for any

v,w

∈

R

d

.

2

The well-known Cauchy-Schwarz inequality is a special

case of this fact. Set

p

=

q

= 1

/

2 to conclude

|〈

v,w

〉|≤‖

v

‖

`

2

‖

w

‖

`

2

=

〈

v,v

〉

1

/

2

〈

w,w

〉

1

/

2

.

(A6)

At the heart of our proof for the

p

3

-PPT condition is a

simple relation between Schatten-

p

norms of orders

p

=

1

,

2

,

3.

Lemma 1.

The following norm relation holds for every

Hermitian matrix

X

:

‖

X

‖

4

2

≤‖

X

‖

1

‖

X

‖

3

Proof.

Let

λ

= (

λ

1

,...,λ

d

)

T

be the

d

-dimensional vec-

tor of eigenvalues of

X

. Apply Hoelder’s inequality with

p

= 3

,q

= 3

/

2 to the inner product of this vector of

eigenvalues with itself:

Tr(

X

2

) =

〈

λ,λ

〉≤‖

λ

‖

`

3

‖

λ

‖

`

2

/

3

=

‖

X

‖

3

‖

λ

‖

`

3

/

2

.

(A7)

Next, we apply Cauchy-Schwarz to the remaining

`

3

/

2

-

norm:

‖

λ

‖

`

3

/

2

=

(

d

∑

i

=1

|

λ

i

|

3

/

2

)

2

/

3

=

(

d

∑

i

=1

|

λ

i

||

λ

i

|

1

/

2

)

2

/

3

≤

(

d

∑

i

=1

|

λ

i

|

2

)

1

/

2

(

d

∑

i

=1

|

λ

i

|

2

/

2

)

1

/

2

)

2

/

3

=

‖

λ

‖

2

/

3

`

2

‖

λ

‖

1

/

3

`

1

=

‖

X

‖

2

/

3

2

‖

X

‖

1

/

3

1

.

Inserting this relation into Eq. (A7) reveals

‖

X

‖

2

≤‖

X

‖

2

/

3

2

‖

X

‖

1

/

3

1

‖

X

‖

3

which is equivalent to the claim (take the 3rd power and

divide by

‖

X

‖

2

).

Proposition 1 is an immediate consequence of Lemma 1

and elementary properties of positive semidefinite ma-

trices. Recall that a Hermitian

d

×

d

matrix is pos-

itive semidefinite (psd) if every eigenvalue is nonnega-

tive. This in turn ensures

|

X

|

=

X

and, by extension,

‖

X

‖

p

= Tr(

X

p

)

1

/p

for all

p

≥

1.

2. Discussion and potential generalizations

The

p

3

-PPT condition tests the absence of positive

semidefiniteness based on moments Tr(

X

p

) of order

p

=

1

,

2

,

3. It is natural to wonder whether higher order mo-

ments allow the construction of more refined tests. It is

possible to show that every positive semidefinite matrix

X

with unit trace must obey

tr(

X

p

−

1

)

p

−

1

≤

tr(

X

p

)

p

−

2

for all

p >

2.

(A8)

As this is a direct extension of the

p

3

-PPT condition

(

p

= 3), we omit the proof. Unfortunately, we found nu-

merically that these direct extensions actually produce

weaker

tests for the absence of positive semidefiniteness,

i.e. there exist matrices

X

that violate the

p

3

-PPT con-

dition but satisfy Rel. (A8) for higher moments

p

≥

4.

This is not completely surprising, since Rel. (A8) com-

pares (powers of) neighboring matrix moments with or-

der (

p

−

1) and

p

. As

p

increases, these matrix mo-

ments suppress contributions of small eigenvalues ever

more strongly. In the case of partially transposed quan-

tum states, the eigenvalues are required to sum up to

one and must be contained in the interval [

−

1

/

2

,

1] [2].

Thus, the negative eigenvalues can never dominate the

spectrum and high matrix moment tests for the existence

of negative eigenvalues suffer from suppression effects.

This observation suggests that powerful tests for neg-

ative eigenvalues should involve

all

matrix moments

tr(

X

p

) up to a certain order

p

max

. It is useful to change

perspective in order to reason about potential improv-

ments. The

p

3

-PPT condition checks whether the fol-

lowing inequality is true:

F

3

(

X

) =

−

tr(

X

3

) + tr(

X

2

)

2

>

0

.

(A9)

For matrices

X

with unit trace, we can reinterpret the

matrix-valued function

F

3

(

X

) as a sum of (identical)

degree-3 polynomials applied to all eigenvalues

λ

1

,...,λ

d

of

X

. Set

p

2

= tr(

X

2

) and use tr(

X

) =

∑

d

i

=1

λ

i

= 1 to

conclude

F

3

(

X

) =

−

tr(

X

3

) + 2

p

2

tr(

X

2

)

−

p

2

tr(

X

)

=

d

∑

i

=1

(

−

λ

3

i

+ 2

p

2

λ

2

i

−

p

2

λ

i

)

=

d

∑

i

=1

−

λ

i

(

λ

i

−

p

2

)

2

=:

d

∑

i

=1

f

3

(

λ

i

)

.

(A10)

Note that the polynomial

f

3

(

x

) =

−

x

(

x

−

p

2

)

2

for

x

∈

R

(A11)

depends on

p

2

and, by extension, also on the matrix

X

.

We will come back to this aspect later. For now, we

point out that – regardless of the actual value of

p

2

–

this polynomial has three interesting properties:

f

3

(

x

)

≤

0

if

x >

0

,

f

3

(0) = 0

,

f

3

(

x

)

>

0

if

x <

0

.

(A12)

These properties reflect the behavior of another well-

known function – the

(negated) rectifier function

(ReLU):

r

(

−

x

) = max

{

0

,

−

x

}

=

{

0

if

x

≥

0

,

|

x

|

if

x <

0

.

(A13)

See Figure A1 for a visual comparison. Applying the

(negated) rectifier function to the eigenvalues of

X

would

recover the negativity:

N

(

X

) =

∑

λ

i

<

0

|

λ

i

|

=

d

∑

i

=1

r

(

−

λ

i

)

.

(A14)

3

−

0

.

5

0

.

0

.

5

1

.

0

x

−

0

.

5

0

.

0

.

5

1

.

0

f

3

(

x

) =

−

x

(

x

−

p

2

)

2

p

2

= 0

.

25

p

2

= 0

.

5

p

2

= 1

max

{−

x,

0

}

FIG. A1. Comparison of

f

3

(

x

) =

−

x

(

x

−

p

2

)

2

with the negated

rectifier function

r

(

−

x

) = max

{−

x,

0

}

for different values of

p

2

in the relevant interval [

−

1

/

2

,

1] [2].

Hence, it is instructive to interpret

F

3

(

X

) as a polyno-

mial approximation to the (non-analytic) negativity func-

tion.

On the level of polynomials, the condition

f

3

(

x

)

≤

0

whenever

x >

0 is most important. It implies that pos-

itive eigenvalues of

X

can never increase the value of

F

3

(

X

) =

∑

d

i

=1

f

3

(

λ

i

). In particular,

F

3

(

X

)

≤

0 when-

ever

X

is positive semidefinite – as stated in Proposi-

tion 1. The

p

3

-PPT condition is sound, i.e. it has no

false positives.

Conversely,

f

3

(

x

)

>

0 for

x <

0 implies that

F

3

(

X

) can

become positive if

X

has negative eigenvalues. Hence,

the

p

3

-PPT condition is not vacuous. It is capable of

detecting negative eigenvalues in many, but not all, unit-

trace matrices

X

.

Let us now return to the (matrix-dependent) param-

eter choice in Eq. (A11). In principle, every polynomial

of the form

f

(

a

)

3

(

x

) =

−

x

(

x

−

a

)

2

with

a

∈

R

obeys the

important structure constraints (A12) and therefore pro-

duces a sound test for negative eigenvalues. For fixed

X

,

the associated matrix polynomial evaluates to

F

(

a

)

3

(

X

) =

−

tr(

X

3

) + 2

a

tr(

X

2

)

−

a

2

tr(

X

)

.

(A15)

We can optimize this expression over the parameter

a

∈

R

to make the test as strong as possible. The opti-

mal choice is

a

]

= tr(

X

2

)

/

tr(

X

) and produces a matrix

polynomial that obeys

F

(

a

]

)

3

(

X

)

≥

max

a

∈

R

F

(

a

)

3

(

X

) for

X

fixed. If

X

has also unit trace, the optimal parame-

ter becomes

a

]

=

p

2

and produces the

p

3

-PPT condition

(A9).

This construction of PPT conditions readily extends

to higher order polynomials

f

p

(

x

) =

a

p

x

p

+

···

a

1

x

+

a

0

.

Increasing the degree

p

produces more expressive ansatz

functions that can approximate the (negated) rectifier

function – and its core properties – ever more accu-

rately. Viewed from this angle, it becomes apparent that

measuring more matrix moments can produce stronger

tests for detecting negative eigenvalues. However, it is

not so obvious how to choose the parameters

a

p

,...,a

0

“optimally”, or what “optimally” actually means in this

context. Some well-known polynomial approximations

of the rectifier function

r

(

−

x

) – like Taylor expansions

of

s

(

−

x

) = ln(1 + e

−

x

) (the “softplus” function) – are

not well-suited for this task, because

s

(

−

x

)

>

0 even for

x >

0. This in turn would imply that the associated test

condition may not be sound. We believe that a thorough

analysis of these questions is timely and interesting, but

would go beyond the scope of this work. We intend to

address it in future research.

Appendix B:

p

3

-PPT condition for Werner States

Werner states are bipartite quantum states in a Hilbert

space

H

AB

=

H

A

⊗H

B

with dimensions

d

A

=

d

B

≡

d

,

defined as

ρ

W

=

α

(

d

+ 1

2

)

−

1

Π

+

+ (1

−

α

)

(

d

2

)

−

1

Π

−

(B1)

with parameter

α

∈

[0

,

1] and Π

±

=

1

2

(

I

±

Π

12

) pro-

jectors onto symmetric

H

+

and anti-symmetric

H

−

sub-

spaces of

H

=

H

+

⊕H

−

, respectively [3]. Here, Π

12

=

∑

d

i,j

=1

|

i

〉〈

j

|⊗|

j

〉〈

i

|

is the swap operator. We note that

the eigenvalues of

ρ

W

are thus given as

λ

+

=

α

(

d

+1

2

)

−

1

with multiplicity

(

d

+1

2

)

and

λ

−

= (1

−

α

)

(

d

2

)

−

1

with mul-

tiplicity

(

d

2

)

. The reduced state

ρ

A

of qudit

A

is given by

ρ

A

= Tr

B

[

ρ

W

] =

I

A

/d

.

Using furthermore that Π

T

A

±

= 1

/

2(∆

1

±

(

d

±

1)∆

0

)

with ∆

0

=

|

φ

+

〉〈

φ

+

|

being a projector onto the maxi-

mally entangled state and ∆

1

=

I

−

∆

0

[3], we find

ρ

T

A

W

=

2

α

−

1

d

∆

0

+

1 +

d

−

2

α

d

∆

1

d

2

−

1

(B2)

with eigenvalues

λ

0

= (2

α

−

1)

/d

with multiplicity 1 and

λ

1

= (1 +

d

−

2

α

)

/d

(

d

2

−

1) with multiplicity

d

2

−

1.

We note that, for any

d

,

λ

0

<

0 for 0

≤

α <

1

/

2. Thus,

using the PPT condition, we find that

ρ

W

is entangled

for 0

≤

α <

1

/

2. Using the explicit expression of the

eigenvalues, we can furthermore determine Tr

[

(

ρ

T

A

)

n

]

for any

n

. We find for all local dimensions

d

Tr

[

(

ρ

T

A

)

2

]

2

>

Tr

[

(

ρ

T

A

)

3

]

for 0

≤

α <

1

2

(B3)

Thus, for Werner states the

p

3

-PPT condition is equiv-

alent to the full PPT condition. It can be furthermore

shown that Werner states are separable for

α

≥

1

/

2 [3].

Thus, for Werner states, the

p

3

-PPT condition is a neces-

sary and sufficient condition for bipartite entanglement.

This also holds true for “isotropic” states of the form

ρ

=

α

1

/d

2

+ (1

−

α

)

|

φ

+

〉〈

φ

+

|

, which are closely related.

We note that Werner states can have non-positive PT-

moments. For local dimension

d >

3 there exists a pa-

rameter interval [0

,α

∗

) such that the associated Werner

4

state (B1) obeys

p

3

= Tr

[

(

ρ

T

A

W

)

3

]

<

0 for all

α

∈

[0

,α

∗

).

This highlights that the logarithm of PT-moments, ap-

pearing also in the ratio

R

3

=

−

log

2

(

p

3

/

Tr[

ρ

3

]), need

not be properly defined, justifying a claim from the pre-

vious subsection. It is difficult to use entropic arguments

for reasoning about relations between (logarithmic) PT-

moments.

Finally, as shown in Ref. [4], we remark that

R

3

is not

an entanglement monotone. For separable Werner states

with 1

/

2

≤

α <

1

/

2 + 1

/

(2

d

), it holds that 0

< p

3

<

Tr[

ρ

3

]. Thus,

R

3

=

−

log

2

(

p

3

/

Tr[

ρ

3

]) can be greater than

zero, even for separable states. Since

R

3

equals zero for

all product states, it is not an entanglement monotone

[5].

Appendix C: Comparison of entanglement

conditions for quench dynamics

In this section, we compare the diagnostic power of the

full PPT-condition, the

p

3

-PPT condition and a condi-

tion based on purities of nested subsystems to detect bi-

partite entanglement of mixed states. Specifically, given

a reduced density matrix

ρ

AB

in a bipartite system

AB

,

we consider:

1. the PPT-condition detecting bipartite entangle-

ment between

A

and

B

for a strictly positive neg-

ativity

N

(

ρ

AB

) =

∑

λ<

0

|

λ

|

>

0, with

λ

the spec-

trum of

ρ

T

A

AB

[5].

2. the

p

3

-PPT condition detecting bipartite entangle-

ment between

A

and

B

for 1

−

p

3

/p

2

>

0.

3. a condition based on the purity of nested subsys-

tems detecting bipartite entanglement between

A

and

B

for Tr[

ρ

2

A

]

<

Tr[

ρ

2

AB

] with

ρ

A

= Tr

B

[

ρ

AB

]

the reduced density matrix of subsystem

A

[5].

The latter ’purity’ condition was used in previous exper-

imental works measuring the second R ́enyi entropy [6–9]

to reveal bipartite entanglement of weakly mixed states.

To test these conditions, we consider here, as an exam-

ple, quantum states generated via quench dynamics in

interacting spin models. Specifically, we study quenches

in the

XY

-model with long-range interactions, as defined

in Eq. (6) of the main text, in a total system with

N

= 10

spins. The initial separable product state is a N ́eel state

|↑↓↑↓

...

〉

.

As shown in Fig. C1, the negativity (red lines) de-

tects bipartite entanglement for all partitions sizes and

all times after the quench. The

p

3

-PPT condition (blue

lines) performs similar for the partitions considered in

panel (b) and (c) and is thus able to detect bipartite

entanglement for highly mixed states

ρ

AB

whose purity

decreases to 0

.

3 for the panel (b) at late times. The

p

3

-PPT conditions fails however to detect the entangle-

ment for the close-to completely mixed states of small

partitions

|

AB

|

= 4 at late times, displayed in panel (a).

−

1

.

0

−

0

.

5

0

.

0

.

5

A

=[4

,

5]

,B

=[6

,

7]

−

1

0

1

A

=[3

,

4

,

5]

,B

=[6

,

7

,

8]

0 (0.0)

1.25 (0.5)

2.5 (1.0)

3.75 (1.6)

5 (2.1)

t

[

ms

](

J

0

t

)

0

1

2

A

=[2

,

3

,

4

,

5]

,B

=[6

,

7

,

8

,

9]

N

(

ρ

AB

)

1

−

p

3

/p

2

2

1

−

Tr

[

ρ

2

A

]

/

Tr

[

ρ

2

AB

]

a)

b)

c)

FIG. C1. Comparing conditions for bipartite entanglement

between two subsystems

A

and

B

for states generated with

quench dynamics governed by

H

XY

arising from an initial

N ́eel state in a total system with

N

= 10 spins. Modeling

the experiment of Ref. [9], we chose

J

0

= 420

s

−

1

,α

= 1

.

24,

while other parameter choices lead to similar results. In all

panels, and for all quantities, a strictly positive value signals

bipartite entanglement.

This can be attributed to the fact that the

p

3

-PPT con-

dition only relies on low order PT-moments. The purity

condition (green lines) is only useful for the detection

of entanglement for large partitions

AB

with

|

AB

|

= 8

(panel (c)). These remain weakly mixed during the entire

time evolution, since the total system of

N

= 10 spins is

described here by a pure state.

Appendix D: Error bars for PT moment predictions

Let us first review the data acquisition procedure. To

obtain meaningful information about an

N

-qubit state

ρ

, we first perform a collection of random single qubit

rotations:

ρ

7→

uρu

†

, where

u

=

u

1

⊗···⊗

u

N

and each

u

i

is chosen from a unitary 3-design. Subsequently, we

perform computational basis measurements and store the

outcome:

ρ

7→

uρu

†

7→|

k

1

,...,k

N

〉

.

(D1)

Here,

k

1

,...,k

N

∈ {

0

,

1

}

denote the measurement out-

comes on qubits 1

,...,N

. As shown in [10–12], the out-

come of this measurement provides a (single-shot) esti-

mate for the unknown state:

ˆ

ρ

=

N

⊗

i

=1

[

3(

u

i

)

†

|

k

i

〉〈

k

i

|

u

i

−

I

2

]

(D2)

This tensor product is a random matrix – the unitaries

u

(

i

)

, as well as the observed outcomes

k

i

are random –

that produces the true underlying state in expectation:

E

[ ˆ

ρ

] =

ρ.

(D3)

5

Thus, the result of a (randomly selected) single-shot mea-

surement provides a classical snapshot (D2) that repro-

duces the true underlying state in expectation. This de-

sirable feature extends to density matrices of subsystems.

Let

AB

⊂ {

1

,...,N

}

be a subset of

|

AB

| ≤

N

qubits

and let

ρ

AB

= Tr

¬

AB

(

ρ

) the associated reduced density

matrix. Then,

ˆ

ρ

AB

= Tr

¬

AB

( ˆ

ρ

) =

⊗

i

∈

AB

[

3(

u

i

)

†

|

k

i

〉〈

k

i

|

u

i

−

I

2

]

(D4)

obeys

E

[ ˆ

ρ

AB

] =

ρ

AB

.

This feature can be used to estimate linear properties

of the subsystem in question:

o

= Tr (

Oρ

AB

). Perform

M

independent repetitions of the data acquisition proce-

dure and use them to create a collection of (independent)

snapshots ˆ

ρ

(1)

AB

,...,

ˆ

ρ

(

M

)

AB

– a “classical shadow” [12] – and

form the empirical average of subsystem properties:

ˆ

o

=

1

M

∑

r

=1

Tr

(

O

ˆ

ρ

(

r

)

AB

)

.

(D5)

Convergence to the target value

o

=

E

[ˆ

o

] = Tr(

Oρ

AB

)

is controlled by the variance. Chebyshev’s inequality as-

serts

Pr [

|

ˆ

o

−

o

|≥

]

≤

Var [ˆ

o

]

2

=

Var [Tr(

O

AB

ˆ

ρ

AB

)]

M

2

.

(D6)

The remaining (single-shot) variance obeys the follow-

ing useful relation.

Fact 2

(Proposition 3 in [12])

.

Fix a subsystem

AB

and a

linear function

Tr(

Oρ

AB

)

. Then, the single-shot variance

associated with

ˆ

ρ

AB

defined in Eq.

(D4)

obeys

Var [Tr (

O

ˆ

ρ

AB

)]

≤

2

|

AB

|

Tr

(

O

2

)

.

(D7)

This inequality is true for any underlying state

ρ

and

bounds the variance in terms of the subsystem dimension

d

A

= 2

|

AB

|

and the Hilbert-Schmidt norm (squared) of

the observable

O

. Thus, roughly

M

≈

2

|

AB

|

Tr(

O

2

)

/

2

measurement repetitions are necessary to predict

o

up to

accuracy

.

1. Predicting quadratic properties (

p

2

)

The formalism introduced above readily extends to

predictions of higher order polynomials. The special

case of quadratic functions has already been addressed in

Refs. [9, 10, 13, 14], and Ref. [12] (for the present formal-

ism). The key idea is to represent a quadratic function

in

ρ

as a linear function on the tensor product

ρ

⊗

ρ

:

o

= Tr (

Oρ

AB

⊗

ρ

AB

)

.

(D8)

This function can be approximated by replacing

ρ

⊗

ρ

with a

symmetric

tensor product of two distinct snap-

shots ˆ

ρ

(

i

)

,

ˆ

ρ

(

j

)

(

i

6

=

j

):

1

2!

∑

π

∈S

2

ˆ

ρ

(

π

(

i

))

AB

⊗

ˆ

ρ

(

π

(

j

))

AB

=

1

2

(

ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

+ ˆ

ρ

(

j

)

AB

⊗

ˆ

ρ

(

i

)

AB

)

.

(D9)

There are

(

M

2

)

different ways of combining a collection

of

M

snapshots ˆ

ρ

(1)

,...,

ˆ

ρ

(

M

)

in this fashion. We can

predict

o

= Tr(

Oρ

AB

⊗

ρ

AB

) by forming the empirical

average over all of them:

ˆ

o

=

(

M

2

)

−

1

∑

i<j

Tr

(

O

1

2!

∑

π

∈S

2

ˆ

ρ

(

π

(

i

))

AB

⊗

ˆ

ρ

(

π

(

j

))

AB

)

=

(

M

2

)

−

1

∑

i<j

Tr

(

O

(

s

)

ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

)

.

(D10)

Here, we have implicitly defined the

symmetrization

O

(

s

)

of the original target function

O

. This ansatz is a special

case of Hoeffding’s U-statistics estimator [15]. Averaging

boosts convergence to the desired expectation

E

[ˆ

o

] =

o

and the speed of convergence is controlled by the variance

(D6).

Restriction to subsystems is also possible. Suppose

that

O

only acts nontrivially on a subsystem

AB

of both

state copies. Then,

ˆ

o

=

(

M

2

)

−

1

∑

i<j

Tr

(

O

(

s

)

ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

)

(D11)

and the effective problem dimension becomes

d

2

AB

=

4

|

AB

|

. The tensor product structure (D2) of the indi-

vidual snapshots allows for generalizing linear variance

bounds to this setting. Simply view ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

as a sin-

gle snapshot of the quantum state

ρ

AB

⊗

ρ

AB

. Fact 2

then ensures

Var

[

Tr

(

O

(

s

)

ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

)]

≤

4

|

AB

|

Tr

(

O

2

(

s

)

.

(D12)

The full variance of ˆ

o

is controlled in part by this relation,

but also features linear variance terms [12, App. 6.A].

Rather than reviewing this argument in full generality,

let us focus on the task at hand: computing the variance

associated with predicting the PT-moment of order two.

Fix a bipartite subsystem

AB

of interest and rewrite

p

2

as

p

2

= Tr

(

ρ

(

T

A

)

AB

)

2

)

= Tr

(

ρ

2

AB

)

= Tr (Π

AB

ρ

AB

⊗

ρ

AB

)

.

(D13)

Here, Π

AB

denotes the swap operator that permutes the

entire subsystems

AB

within two copies of the global

system. We refer to Table I below for a visual deriva-

tion of this well-known relation. The swap operator is

6

symmetric under permuting tensor factors, Hermitian

(Π

†

AB

= Π

AB

) and orthogonal (Π

2

AB

=

I

AB

). These

properties ensure that the associated general estimator

(D11) can be simplified considerably:

ˆ

p

2

=

(

M

2

)

−

1

∑

i<j

Tr

(

Π

AB

ˆ

ρ

(

i

)

AB

⊗

ˆ

ρ

(

j

)

AB

)

=

(

M

2

)

−

1

∑

i<j

Tr

(

ˆ

ρ

(

i

)

AB

ˆ

ρ

(

j

)

AB

)

.

(D14)

By construction,

E

[ ˆ

p

2

] =

p

2

= Tr(

ρ

2

) and the speed of

convergence is controlled by the variance. This variance

decomposes into a linear and a quadratic part. We ex-

pand the definition of the variance:

Var [ ˆ

p

2

] =

E

[

( ˆ

p

2

−

E

[ ˆ

p

2

])

2

]

=

E

[

ˆ

p

2

]

−

E

[ ˆ

p

2

]

2

(D15)

=

(

M

2

)

−

2

∑

i<j

∑

k<l

(

Tr

(

ˆ

ρ

(

i

)

AB

ˆ

ρ

(

j

)

AB

)

Tr

(

ˆ

ρ

(

k

)

AB

ˆ

ρ

(

l

)

AB

)

−

Tr(

ρ

2

AB

)

2

)

.

The size and nature of each contribution depends on the

relation between the indices

i,j,k,l

[15]:

1.

all indices are distinct:

distinct indices label in-

dependent snapshots.

In this case the expec-

tation value factorizes completely and produces

E

[

Tr( ˆ

ρ

(

i

)

AB

ˆ

ρ

(

j

)

AB

)Tr( ˆ

ρ

(

k

)

AB

ˆ

ρ

(

l

)

AB

)

]

= Tr(

ρ

2

AB

)

2

. This is

completely offset by the subtraction of the expecta-

tion value squared. Hence, terms where all indices

are distinct do not contribute to the variance.

2.

exactly two indices coincide:

In this case,

the expectation value partly factorizes, e.g.

E

[

Tr( ˆ

ρ

(

i

)

AB

ˆ

ρ

(

j

)

AB

)Tr( ˆ

ρ

(

k

)

AB

ˆ

ρ

(

j

)

AB

)

]

=

E

[

Tr(

ρ

AB

ˆ

ρ

(

j

)

AB

)

2

]

for

i

6

=

k

and

j

=

l

. Such index combinations

produce a linear variance term Var [Tr(

O

ˆ

ρ

)] with

O

=

ρ

AB

. The entire sum contains

(

M

2

)(

2

1

)(

M

−

2

−

1

)

=

(

M

2

)

2(

M

−

2) terms of this form.

3.

two

pairs

of

indices

coincide:

there are

(

M

2

)(

2

)(

M

−

2

−

2

)

=

(

M

2

)

contributions of this form

and each of them produces a quadratic variance

Var [Tr (

O

ˆ

ρ

AB

⊗

ˆ

ρ

′

AB

)] with

O

= Π

AB

(swap).

We conclude that the variance of ˆ

p

2

decomposes into lin-

ear and quadratic terms. These can be controlled via

Rel. (D7) and Rel. (D12), respectively:

Var [ ˆ

p

2

] =

(

M

2

)

−

1

(

2(

M

−

2)Var [Tr(

ρ

AB

ˆ

ρ

AB

)]

+Var

[

Tr(Π

AB

ˆ

ρ

(1)

AB

⊗

ˆ

ρ

(2)

AB

)

])

≤

4(

M

−

2)2

|

AB

|

M

(

M

−

1)

Tr

(

ρ

2

AB

)

+

2

×

4

|

AB

|

M

(

M

−

1)

Tr

(

Π

2

AB

)

≤

4

(

2

|

AB

|

p

2

M

)

+ 4

(

2

1

.

5

|

AB

|

M

)

2

.

(D16)

Chebyshev’s inequality (D6) allows us to translate this

insight into an error bound.

Lemma 2

(Error bound for estimating

p

2

)

.

Fix a subsys-

tem

AB

of interest and suppose that we wish to estimate

p

2

= Tr

(

ρ

T

A

AB

)

2

)

. For

,δ >

0

, a total of

M

≥

8 max

{

2

|

AB

|

p

2

δ

,

2

1

.

5

|

AB

|

√

δ

}

(D17)

snapshots suffice to ensure that the estimator

(D14)

obeys

|

ˆ

p

2

−

p

2

|≤

with probability at least

1

−

δ

.

It is worthwhile to briefly discuss this two-pronged er-

ror bound. Asymptotically, i.e. for

M

→∞

, the approx-

imation error decays at a rate proportional to 1

/

√

M

.

This is the expected asymptotic decay rate for an estima-

tion procedure that relies on empirical averaging (Monte

Carlo). The actual rate is also multiplicative, i.e. the ap-

proximation error is proportional to the target

p

2

. In the

practically more relevant, non-asymptotic setting, things

can look strikingly different. For small and moderate

sample sizes

M

, the variance bound (D16) is dominated

by the next-to-leading order term (2

1

.

5

|

AB

|

>

2

|

AB

|

p

2

,

especially if

p

2

is small). Lemma 2 captures this discrep-

ancy and heralds an error decay rate proportional to 1

/M

in this regime.

Finally, we point out that the dependence on

δ

in

Eq. (D17) can be considerably improved by using me-

dian of means estimation [12]: split the total data into

equally sized chunks, construct independent estimators

and take their median. For this procedure, a sampling

rate proportional to log(1

/δ

) suffices. Moreover, median

of means is much more robust towards outlier corruption

and allows for using the same data to predict purities

of many different subsystems simultaneously. This, how-

ever, comes at the price of somewhat larger constants in

the error bound (D17) and heralds a tradeoff. In sta-

tistical terms, median of means estimation dramatically

increases confidence levels (1

−

δ

) at the cost of slightly

larger error bars (confidence intervals). This tradeoff be-

comes advantageous when one attempts to predict very

many properties from a single data set.

2. Predicting cubic properties (

p

3

and

Tr(

ρ

3

AB

)

Cubic properties can be predicted in much the same

fashion as quadratic properties [12].

Write

o

=

Tr (

Oρ

AB

⊗

ρ

AB

⊗

ρ

AB

) and approximate

ρ

⊗

ρ

⊗

ρ

by

a

symmetric

tensor product of three distinct snapshots

ˆ

ρ

(

i

)

AB

,

ˆ

ρ

(

j

)

AB

,

ˆ

ρ

(

k

)

AB

:

1

3!

∑

π

∈S

3

ˆ

ρ

(

π

(

i

))

AB

⊗

ˆ

ρ

(

π

(

j

))

AB

⊗

ˆ

ρ

(

π

(

k

))

AB

.

(D18)

There are

(

M

3

)

different ways of combining a collection of

M

(independent) snapshots ˆ

ρ

(1)

AB

,...,

ˆ

ρ

(

M

)

AB

in this fash-

ion. We estimate the cubic function

o

by averaging over

7

all of them (U-statistics [15]):

ˆ

o

=

(

M

3

)

−

1

∑

i<j<k

Tr

(

O

1

3!

∑

π

∈S

3

ˆ

ρ

(

π

(

i

))

⊗

ˆ

ρ

(

π

(

j

))

⊗

ˆ

ρ

(

π

(

k

))

)

.

(D19)

Once more, the variance controls the rate of convergence

to the desired target value

E

[ˆ

o

] = Tr (

Oρ

⊗

ρ

⊗

ρ

). This

variance decomposes into a linear, a quadratic and a cu-

bic part. The argument is a straightforward generaliza-

tion of the analysis from the previous subsection. Rather

than repeating the steps in full generality, we directly fo-

cus on the 3rd order PT-moment

p

3

of a subsystem

AB

:

p

3

= Tr

(

ρ

T

A

AB

)

3

)

.

(D20)

For notational simplicity, we suppress the subscript

AB

indicating the subsystem of interest and label the shad-

ows by lower-case indices: ˆ

ρ

(

i

)

AB

7→

ˆ

ρ

i

for

i

= 1

,...,M

.

Due to the cyclicity of the trace, the U-statistics estima-

tor simplifies to

(

M

3

)

ˆ

p

3

=

∑

i<j<k

Tr

(

1

3!

∑

π

∈S

3

ˆ

ρ

T

A

π

(

i

)

ˆ

ρ

T

A

π

(

j

)

ˆ

ρ

T

A

π

(

k

)

(D21)

=

∑

i<j<k

1

2

(

Tr

(

ˆ

ρ

T

A

i

ˆ

ρ

T

A

j

ˆ

ρ

T

A

k

)

+ Tr

(

ˆ

ρ

T

A

j

ˆ

ρ

T

A

i

ˆ

ρ

T

A

k

))

,

where we have moved the normalization factor

(

M

3

)

−

1

to the left hand side in order to to increase readabil-

ity. When computing the variance, we need to consider

two sums over triples of distinct indices in

{

1

,...,M

}

.

If all indices are distinct, the overall contribution van-

ishes. Otherwise the contribution depends on the number

c

∈ {

1

,

2

,

3

}

of indices the triples have in common. The

number of distinct choices for two triples with exactly

c

integers in common is

(

M

3

)(

3

c

)(

M

−

3

−

c

)

and we infer

(

M

3

)

Var [ ˆ

p

3

]

=

(

3

1

)(

M

−

3

2

)

Var

[

Tr

(

ρ

T

A

)

2

ˆ

ρ

T

A

1

)]

+

(

3

2

)(

M

−

3

1

)

Var

[

Tr

(

ρ

T

A

1

2

(

ˆ

ρ

T

A

1

ρ

T

A

2

+ ˆ

ρ

T

A

2

ˆ

ρ

T

A

1

))]

+Var

[

1

2

(

Tr

(

ˆ

ρ

T

A

1

ˆ

ρ

T

A

2

ˆ

ρ

T

A

3

)

+ Tr

(

ˆ

ρ

T

A

2

ˆ

ρ

T

A

1

ˆ

ρ

T

A

3

))]

≤

(

M

3

)(

9

M

L

+

18

M

2

Q

+

12

M

3

C

)

.

(D22)

Here, ˆ

ρ

1

,

ˆ

ρ

2

,

ˆ

ρ

3

denote independent, random realizations

of the snapshot ˆ

ρ

and we have introduced place-holders

for linear (

L

), quadratic (

Q

) and cubic (

C

) contributions,

respectively. For the task at hand, these contributions

can be bounded individually and depend on the subsys-

tem size

AB

:

1.

linear contribution:

set

O

= (

ρ

T

A

AB

)

2

for notational

brevity. We can use Tr(

O

ˆ

ρ

T

A

) = Tr(

O

T

A

ˆ

ρ

) to ab-

sorb the partial transpose in the linear function.

Rel. (D7) then ensures

L

≤

2

|

AB

|

Tr(

ρ

2

)

2

,

(D23)

where we have also used Tr((

O

T

A

)

2

) = Tr(

O

2

), as

well as Tr(

O

2

) =

‖

O

‖

2

≤ ‖

O

‖

2

1

= Tr(

O

)

2

, because

O

=

ρ

2

is psd.

2.

quadratic

contribution:

We

can

bring

1

2

(

Tr(

ρ

T

A

ˆ

ρ

T

A

1

ˆ

ρ

T

A

2

) + Tr(

ρ

T

A

ˆ

ρ

T

A

2

ˆ

ρ

T

A

1

)

into

the

canonical form Tr

(

O

ˆ

ρ

(1)

AB

⊗

ˆ

ρ

(2)

AB

)

by introducing

O

=

1

2

(Π

A

(

ρ

⊗

I

AB

)Π

B

+Π

B

(

ρ

⊗

I

AB

)Π

A

)

.

(D24)

We refer to Table I for a visual derivation. Here,

Π

A

and Π

B

are permutation operators that swap

the two

A

- and

B

-systems, respectively. Rel. (D12)

then ensures

Q

≤

2

|

AB

|

Tr

(

O

2

)

≤

2

3

|

AB

|

Tr(

ρ

2

)

.

(D25)

The final estimate follows from exploiting Π

2

A

=

Π

2

B

=

I

AB

, as well as Tr

(

ρ

2

⊗

I

2

AB

)

= 2

|

AB

|

Tr(

ρ

2

).

3.

cubic contribution:

We can bring the cubic func-

tion

1

2

(

Tr( ˆ

ρ

T

A

1

ˆ

ρ

T

A

2

ˆ

ρ

T

A

3

) + Tr( ˆ

ρ

T

A

2

ˆ

ρ

T

A

1

ˆ

ρ

T

A

3

)

into the

canonical form Tr

(

O

ˆ

ρ

1

⊗

ˆ

ρ

2

⊗

ˆ

ρ

3

)

by introducing

O

=

1

2

(

−→

Π

A

⊗

←−

Π

B

+

−→

Π

†

A

⊗

←−

Π

†

B

)

,

(D26)

see Table I below. Here,

−→

Π

A

is a cyclic permuta-

tion that exchanges all

A

-systems in a “forward”

fashion (

A

1

7→

A

2

,

A

2

7→

A

3

,

A

3

7→

A

1

), while

←−

Π

B

is another cyclic permutation that exchanges

all

B

-systems in a “backwards” fashion (

B

3

7→

B

2

,

B

2

7→

B

1

,

B

1

7→

B

3

). A staightforward extension

of Rel. (D12) to cubic functions implies

C

≤

2

3

|

AB

|

Tr(

O

2

)

≤

2

6

|

AB

|

,

(D27)

because permutations are orthogonal (ΠΠ

†

=

I

)

and Tr(

O

2

) is dominated by Tr(

I

AB

⊗

I

AB

⊗

I

AB

) =

2

3

|

AB

|

.

Inserting these bounds into the variance formula for

p

3

reveals

Var [ ˆ

p

3

]

≤

9

M

L

+

18

M

2

Q

+

12

M

3

C

(D28)

≤

9

2

|

AB

|

M

Tr(

ρ

2

)

2

+ 18

2

3

|

AB

|

M

2

Tr(

ρ

2

) + 12

2

6

|

AB

|

M

3

.

Combining this insight with Chebyshev’s inequality (D6)

produces a suitable error bound.

Recall that

p

2

=

Tr

(

ρ

T

A

)

2

)

= Tr(

ρ

2

)

∈

[2

−|

AB

|

,

1] denotes the purity of

the subsystem in question.

8

Lemma 3

(Error bound for estimating

p

3

)

.

Fix a subsys-

tem

AB

of interest and suppose that we wish to estimate

p

3

= Tr

(

ρ

T

A

)

3

)

. For

,δ >

0

, a total of

M

≥

39 max

{

2

|

AB

|

p

2

δ

,

2

1

.

5

|

AB

|

p

2

√

δ

,

2

|

AB

|

2

/

3

δ

1

/

3

}

(D29)

snapshots suffice to ensure that the estimator

(D21)

obeys

|

ˆ

p

3

−

p

3

|≤

with probability at least

1

−

δ

.

This bound on the sampling rate provides different er-

ror decay rates for different regimes. For

M

→ ∞

, the

first term in the maximum dominates and the error de-

cays at an asymptotically unavoidable rate proportional

to 1

/

√

M

. Conversely, for very small sample sizes

M

, the

third term dominates and conveys a much larger decay

rate proportional to 1

/M

3

/

2

. In the intermediate regime,

the second term may dominate and lead to a inverse lin-

ear decay rate 1

/M

, instead. The dependence on the er-

ror parameter

δ

can once more be considerably improved

(from 1

/δ

to log(1

/δ

)) by using median of means estima-

tion. This refinement also allows for using the same data

to predict the cubic PT-moment of very many subsys-

tems simultaneously [12].

Finally, we point out that the estimation error for

s

3

=

‖

ρ

‖

3

= Tr(

ρ

3

) can be bounded in exactly the same

fashion. For

,δ >

0, a sampling rate

M

that obeys

Rel. (D29) also ensures that the U-statistics estimator

U-statistics estimator

ˆ

s

3

=

(

M

3

)

−

1

∑

i<j<k

1

2

(

Tr

(

ˆ

ρ

i

ˆ

ρ

j

ˆ

ρ

k

)

+Tr

(

ˆ

ρ

j

ˆ

ρ

i

ˆ

ρ

k

))

(D30)

obeys

|

ˆ

s

3

−

s

3

|≤

with probability 1

−

δ

.

The proof is almost identical to the

p

3

-analysis and we

leave it as an exercise for the dedicated reader.

3. Additional numerical simulations

Here, we complement Fig. 2 of the MT by showing in

Fig. D1 statistical errors in the estimation of

p

2

and

p

3

for the ground state of the transverse Ising model

H

=

J

(

∑

i

σ

x

i

σ

x

i

+1

+

σ

z

i

) at criticality. We observe the same

scaling behavior as in the case of the GHZ state. For

p

2

[panel a)], there are indeed two regimes with different

decay rates (1

/M

and 1

/

√

M

). For

p

3

[panel b)], the

latter two decay rates 1

/M

and 1

/

√

M

are also clearly

visible. In contrast, the early regime decay rate is not as

pronounced. This is likely due to limited system sizes –

1

/M

3

/

2

does appropriately capture the decay of red dots

(largest system size considered) in the top left corner,

but seems to be absent in decay rates for smaller system

sizes.

10

0

10

1

10

2

10

3

10

4

M/

2

|

AB

|

10

−

3

10

−

2

10

−

1

10

0

10

1

Err.

p

2

|

AB

|

=2

|

AB

|

=4

|

AB

|

=6

|

AB

|

=8

10

0

10

1

10

2

10

3

10

4

M/

2

|

AB

|

10

−

3

10

−

2

10

−

1

10

0

10

1

Err.

p

3

|

AB

|

=2

|

AB

|

=4

|

AB

|

=6

|

AB

|

=8

a)

b)

FIG. D1. Statistical errors for the ground state of the trans-

verse field Ising model. Dashed lines represent scalings of

∝

1

/M

, and

∝

1

/

√

M

. In all cases, the number of measure-

ments to estimate

p

2

a) and

p

3

b) with accuracy 0

.

1 is of the

order of 100

×

2

|

AB

|

.

Appendix E: Auxiliary results and wiring diagrams

The arguments from the previous subsections make use

of identities satisfied by traces of partial transposes of

bipartite operators. Wiring diagrams – also known as

tensor network diagrams – provide a useful pictorial cal-

culus for deriving such identifies. We refer the interested

reader to Refs. [16–18] for a thorough introduction and

content ourselves here with a concise overview that will

suffice for the purposes at hand. The wiring formalism

represents operators as boxes with lines emanating from

them. These lines represent contra- (on the left) and

co-variant indices (on the right):

X

=

∑

i,j

[

X

ij

]

|

i

〉〈

j

|

=

X

i

j

.

(E1)

Two operators

X

and

Y

can be multiplied to produce an-

other operator. This corresponds to an index contraction

and is represented in the following fashion:

XY

=

∑

i,k

(

∑

j

[

X

]

ij

[

X

]

jk

)

|

i

〉〈

k

|

=

X

Y

j

i

k

.

(E2)

Transposition exchanges outgoing (contravariant) and in-

coming (covariant) indices

X

T

=

∑

i,j

[

X

]

ij

|

j

〉〈

i

|

=

X

j

i

,

(E3)

while the trace pairs up both indices and sums over them:

Tr(

X

) =

∑

i

[

X

]

ii

=

X

i

=

X

.

(E4)

We abbreviate this loop (contraction of leftmost and

rightmost indices) by putting two circles at the end points

of lines that should be contracted. This notation is not

standard, but will considerably increase the readability

of more complex contraction networks.

This basic formalism readily extends to tensor prod-

ucts if we arrange tensor product factors in parallel. For

9

expression

diagram representation

diagram reformulation

modified expression

Tr(

X

T

A

AB

Y

T

A

AB

)

X

AB

Y

AB

X

AB

Y

AB

Tr (

X

AB

Y

AB

)

Tr (

X

AB

Y

AB

)

X

AB

Y

AB

Π

B

Π

A

Y

AB

X

AB

Tr (Π

B

Π

A

X

AB

⊗

Y

AB

)

Π

B

,

Π

A

: swaps

Tr(

ρ

T

A

AB

X

T

A

AB

Y

T

A

AB

)

ρ

AB

X

AB

Y

AB

ρ

AB

I

AB

Π

B

Π

A

Y

AB

X

AB

Tr (Π

B

(

ρ

AB

⊗

I

AB

)Π

A

X

AB

⊗

Y

AB

)

Tr(

ρ

T

A

AB

Y

T

A

AB

X

T

A

AB

)

ρ

AB

Y

AB

X

AB

ρ

AB

I

AB

Π

A

Π

B

Y

AB

X

AB

Tr (Π

A

(

ρ

AB

⊗

I

AB

)Π

B

X

AB

⊗

Y

AB

)

Tr

(

X

T

A

AB

Y

T

A

AB

Z

T

A

AB

)

X

AB

Y

AB

Z

AB

−→

Π

B

←−

Π

A

Z

AB

Y

AB

X

AB

Tr

(

−→

Π

B

←−

Π

A

X

AB

⊗

Y

AB

⊗

Z

AB

)

−→

Π

B

,

←−

Π

A

: cycle permutations

Tr

(

Y

T

A

AB

X

T

A

AB

Z

T

A

AB

)

Y

AB

X

AB

Z

AB

−→

Π

A

←−

Π

B

Z

AB

Y

AB

X

AB

Tr

(

←−

Π

A

−→

Π

B

X

AB

⊗

Y

AB

⊗

Z

AB

)

←−

Π

A

−→

Π

B

=

(

−→

Π

B

←−

Π

A

)

†

TABLE I.

Reformulations of relevant tensor product expressions:

The variance bounds in Sub. D 1 and Sub. D 2 are contingent

on bringing certain expressions into canonical form, i.e. Tr (

OX

AB

⊗

Y

AB

) for bilinear functions and Tr (

O

′

X

AB

⊗

Y

AB

⊗

Z

AB

)

for trilinear ones. This table supports visual derivations for these reformulations. Expressions of interest (very left) are first

translated into wiring diagrams (center left). Subsequently, the rules of wiring calculus are used to re-arrange the diagrams

(center right). Translating them into formulas (very right) produces equivalent expressions that respect the desired structure.

instance, a bipartite operator features two parallel lines

on the left and on the right:

X

AB

=

X

AB

A

B

(E5)

The upper lines represent the system

A

, while the lower

lines represent system

B

. Two important bipartite opera-

tors are the identity

I

(do nothing) and the swap operator

Π that exchanges the systems:

I

=

and

Π

=

.

(E6)

Rules for multiplying and contracting operators readily

extend to the tensor setting. This allows us to reformu-

late well-known expressions pictorially. For instance,

Tr(

XY

) =

X

Y

=

X

Y

(E7)

=Tr (Π

X

⊗

Y

)

.

(E8)

The wiring formalism is also exceptionally well-suited to

capture partial operations, like the partial transpose:

X

T

A

AB

=

X

AB

.

(E9)

These elementary rules can be used to visually represent

more complicated expressions – like a trace of multiple

partial transposes. The wiring formalism provides a pic-

torial representation for such objects and a visual frame-

work for modifying them. In particular, it is possible to

bend, as well as unentangle, index lines and rearrange