Hyperoptimized Approximate Contraction of Tensor Networks with Arbitrary Geometry
Johnnie Gray and Garnet Kin-Lic Chan
Division of Chemistry and Chemical Engineering, California Institute of Technology,
Pasadena, California 91125, USA
(Received 27 July 2022; accepted 7 December 2023; published 26 January 2024)
Tensor network contraction is central to problems ranging from many-body physics to computer science.
We describe how to approximate tensor network contraction through bond compression on arbitrary
graphs. In particular, we introduce a hyperoptimization over the compression and contraction strategy itself
to minimize error and cost. We demonstrate that our protocol outperforms both handcrafted contraction
strategies in the literature as well as recently proposed general contraction algorithms on a variety of
synthetic and physical problems on regular lattices and random regular graphs. We further showcase the
power of the approach by demonstrating approximate contraction of tensor networks for frustrated three-
dimensional lattice partition functions, dimer counting on random regular graphs, and to access the
hardness transition of random tensor network models, in graphs with many thousands of tensors.
DOI:
10.1103/PhysRevX.14.011009
Subject Areas: Computational Physics,
Quantum Physics, Statistical Physics
I. INTRODUCTION
Tensor network contraction, a summation over a product
of multidimensional quantities, is a common computational
structure. For example, this computation underlies quantum
circuit simulation
[1
–
9]
, quantum many-body simulations
[10
–
17]
, evaluating classical partition functions
[18
–
25]
,
decoding quantum error correcting codes
[26
–
32]
, coun-
ting solutions of satisfiability problems
[25,33
–
39]
, stat-
istical encoding of natural language
[40
–
43]
, and many
other applications. The cost of exact contraction scales,
in general, exponentially with the number of tensors.
However, there is evidence, for example, in some many-
body physics applications, that tensor networks of interest
can often be approximately contracted with satisfactory
and controllable accuracy, without necessarily incurring
exponential cost
[44,45]
. Many different approximation
strategies for tensor network contraction have been pro-
posed
[12,13,17,20,23,46
–
50]
. Especially in many-body
physics contexts, the approximate contraction algorithms are
usually tied to the geometry of a structured lattice. In this
work, we consider how to search for an optimal approximate
tensor network contraction strategy, within an approach that
can be used not only for structured lattices, but also for
arbitrary graphs. We view the essential prescription as the
order in which contractions and approximate compressions
are performed: this sequence can be summarized as a
computational tree with contraction and tensor bond com-
pression steps. Within this framework, sketched in Fig.
1
,the
problem reduces to optimizing a cost function over such
computational trees: we term the macro-optimization
over trees
“
hyperoptimization.
”
As we will demonstrate in
several examples, optimizing a simple cost function related
to the memory or computational cost of the contraction
also leads to an approximate contraction tree with small
contraction error. Consequently, our hyperoptimized
approximate contraction enables the efficient and accurate
simulation of a wide range of graphs encountered in different
tasks, bringing the possibility of eliminating, or otherwise
improving on, formal exponential costs. In addition, in the
structured lattices arising in many-body physics simulations,
we observe that we can improve on the best physically
motivated approximate contraction schemes in the literature.
A tensor
T
is a multi-index quantity (i.e., a multidimen-
sional array). We use lower indices to index into the tensor,
e.g.,
T
i
1
i
2
...
i
n
is an element of an
n
-index
T
, and upper
indices to label a specific tensor, e.g.,
T
½
1
;T
½
2
...
, out of a
set of tensors. A tensor network contraction sums (con-
tracts) over the (possibly shared) indices of a product of
tensors,
T
f
e
out
g
¼
X
f
e
g
=
f
e
out
g
Y
v
T
½
v
f
e
v
g
;
ð
1
Þ
where
f
e
g
is the total set of indices,
f
e
out
g
is the subset
that is left uncontracted, and
f
e
v
g
is the subset of
f
e
g
for
tensor
T
½
v
. We can place the tensors at the vertices
v
of a
network (graph), with the bonds (edges) corresponding to
Published by the American Physical Society under the terms of
the
Creative Commons Attribution 4.0 International
license.
Further distribution of this work must maintain attribution to
the author(s) and the published article
’
s title, journal citation,
and DOI.
PHYSICAL REVIEW X
14,
011009 (2024)
2160-3308
=
24
=
14(1)
=
011009(19)
011009-1
Published by the American Physical Society
the indices
f
e
g
. An examples of contraction is shown in
Fig.
2(a)
.
In practice, the sum in Eq.
(1)
is performed as sequence
of pairwise contractions, and the order of contraction
greatly affects both the memory and time costs. Much
recent work has been devoted to optimizing contraction
paths in the context of simulating quantum circuits
[6
–
9]
.
Parametrized heuristics that efficiently sample the space of
contraction paths, for example, by graph partitioning, are
crucial, and optimizing the parameters of such heuristics
(hyperoptimization) to minimize the overall cost has
proven particularly powerful, leading to dramatic reduc-
tions in contraction cost (i.e., many orders of magnitude).
Here we extend the ideas of hyperoptimized tensor
network contraction to the setting of approximate tensor
network contraction. As discussed above, approximate
contraction has a long history in many-body simulation,
but such work has focused on regular lattices. Although
several recent contributions have addressed arbitrary graphs
[30,51,52]
, with a fixed contraction strategy, they do not
focus on optimizing the strategy itself. In part, this is
because there is a great deal of flexibility (and thus many
components to optimize) when formulating an approximate
contraction algorithm, and because an easily computable
metric of quality is not clear
a priori
.
We proceed by first formulating the search space of
approximate tensor network contraction algorithms, which
weidentifyasasearchoverapproximatecontractiontrees.To
reduce the search space, we define simple procedures for
gauging and when to compress bonds in the tree. We then
discuss how to sample the large space of trees, by optimizing
FIG. 1. Overview. The approximate contraction is specified by
a sequence of contractions and compressions, expressed as an
ordered tree. The strategy optimizes a cost function over such
trees. (a) The hyperoptimization loop. Approximate contraction
trees
Υ
on the graph
G
are suggested by the tree generator. The
tree characteristics are controlled by heuristic parameters
θ
and
maximum bond dimension
χ
. The hyperoptimizer minimizes a
cost function
M
or
C
(peak memory or computational cost).
(b) The approximate contraction tree. The tensor network
T
is
shown at the bottom. Moving upward, pairs of tensors are
contracted (blue lines), and singular value compressions are
performed between tensors (orange lines). By the top of the tree,
one obtains a scalar output
Z
, using resources
∼
M
or
C
.
FIG. 2. (a) Pairwise exact contraction of a tensor network, with the unordered contraction tree
Υ
unordered
indicating the contractions.
Each intermediate (green node) corresponds to a pair of parentheses in the expression. (b) An approximate contraction tree
Υ
for same
network. Since compression steps do not commute, this tree is ordered. Here, the compressions (orange lines) take place at steps 4 and 6.
(c) The sequence of contractions and compressions associated with the tree in (b). Newly contracted tensors in green, tensors with
compressed bonds in orange.
JOHNNIE GRAY and GARNET KIN-LIC CHAN
PHYS. REV. X
14,
011009 (2024)
011009-2
the hyperparameters of a contraction tree generator, with
respect to the peak memory or computational cost. We
use numerical experiments to establish the success of the
strategy, comparing to existing algorithms designed for
structured lattices and for arbitrary graphs. Finally, using
the hyperoptimized approximate contraction algorithms, we
showcase the range of computational problems that can be
addressed, in many-body physics, computer science, and
complexity theory, illustrating the power of approximate
tensor network computation.
II. FRAMEWORK FOR APPROXIMATE
CONTRACTION ALGORITHMS
A. Components of approximate contraction
In an exact tensor network contraction, the computational
graph, specified by the sequence of pairs of tensors which are
contracted, can be illustrated as a computational contraction
tree. This is illustrated in Fig.
2(a)
, where the tensor network
is shown by the black lattice at the bottom, and the
contractions between pairs occur at the green dots in tree,
Υ
unordered
. Note that the value and cost of the exact tensor
network contraction do not depend on the order in which the
contractions are performed
[53]
; thus, the contraction tree is
unordered. The problem of optimizing the cost of exact
contraction is thus a search over contraction trees to optimize
the floating point and/or memory costs.
In the process of contracting tensors, one generally creates
larger tensors, which share more bonds with their neighbors.
In approximate contraction we aim to reducethe cost of exact
contraction by introducing an approximation error. The most
commonly employed approximation is to compress the large
tensors into smaller tensors (with fewer or smaller indices);
this is the type of numerical approximation that we also
consider here. The simplest notion of compression arises in
matrix contraction, e.g., given two
D
×
D
matrices
A
,
B
,the
contraction
AB
≈
̄
A
̄
B
, where
̄
A
is of dimension
D
×
χ
and
B
is of dimension
χ
×
D
, and the approximation is an example
of a low-rank matrix factorization. The singular value
decomposition (SVD) is an optimal (with respect to the
Frobenius norm) low-rank matrix factorization. Singular
value decomposition is also at the heart of compressing
tensor network bonds. For example, if we have two tensors
connected by bonds [Fig.
3(a)
], we can view the bonds as
performing a matrix contraction [Fig.
3(b)
], and use SVD
to replace the connecting bonds by one of dimension
χ
[Fig.
3(d)
]. In the general tensor network setting, however,
things are more complicated, because when compressing a
contraction between two tensors, one should consider the
other tensors in the network, which affect the approximation
error. The effect of the surrounding tensors on the compres-
sion of a given bond is commonly known as including the
“
environment
”
or
“
gauge
”
into the compression. We consider
how to perform bond compression, including a simple way to
include environment effects into the bond compression for
general graphs, in Sec.
II C
.
Given a compression method, we view the approximate
tensor network contraction as composed of a sequence of
contraction and compression steps. Compressions do not
commute with contractions (or each other); thus, a con-
traction tree with compression (an approximate contraction
tree) is an ordered tree. An example tree
Υ
is shown in
Fig.
2(b)
, where in addition to the contraction operations
(the green dots), we see compressions of bonds between
tensors (the orange lines). The ordered sequence of con-
tractions and compressions is visualized in Fig.
2(c)
.Ifwe
work in the setting where the compressed bond dimension
χ
is specified at the start, then once the approximate
contraction tree is written down, the memory or computa-
tional cost of the contractions and compressions can be
computed. Optimizing the approximate contraction for
such costs thus corresponds to optimizing over the space
of approximate contraction trees.
The space of trees to optimize over is extremely large.
This we tackle in two ways: by defining the position of
compression steps in the tree entirely in terms of where
the contractions take place (discussed in Sec.
II D
),
which means we only need to optimize over the order
of the contractions; and by using the hyperoptimiza-
tion strategy, where (families) of trees are parametrized
by a small set of heuristic parameters, constituting a
reduced dimensionality search space (described in
Secs.
II G
and
II H
).
FIG. 3. Primitive tensor operators for approximate contraction.
(a) Grouping of indices into left, shared, and right sets, giving a
matricization of the product
AB
, with rank
D
AB
. (b) Graphical
depiction of contracting two tensors
AB
→
C
. (c) Graphical
depiction of an isometric tensor
Q
such that when dimensions
with incoming arrows are grouped
Q
†
Q
¼
1
. (d) Compression of
the shared bonds between two tensors
A
and
B
, via QR reduction
and truncated SVD to new shared bond dimension
χ
. (e) Gauging
of the bond between
A
and
B
to generate an isometric tensor
Q
A
.
HYPER-OPTIMIZED APPROXIMATE CONTRACTION OF TENSOR
...
PHYS. REV. X
14,
011009 (2024)
011009-3
Ideally we wish to minimize the error of the approximate
contraction as well as the cost, but the error is not known
a priori
. This can only be examined by benchmarking the
errors of our hyperoptimized contraction trees. This is the
subject of Sec.
III
.
Note that other ingredients could also be included in an
approximate tensor network contraction algorithm, for
example, the use of factorization to rewire a tensor net-
work, generating a graph with more vertices
[20,52]
,or
inserting unitary disentanglers to reduce the compression
error
[54,55]
. We do not currently consider these ingre-
dients in our algorithm search space, although the frame-
work is sufficiently flexible to include such ingredients in
the future. We also note that some of these additional
ingredients are targeted at the renormalization of loop
correlations in tensor networks to yield a proper renorm-
alization group (RG) flow
[56]
. We discuss the corner
double line model and show that it is accurately contract-
ible, at least at small bond dimension, with our approximate
strategy in the Supplemental Material (SM)
[57]
.
To aid in our discussion of the ingredients of the
approximation contraction algorithm, and how to examine
our choices, we will use a set of standard benchmark
models, which we now discuss.
B. Models for testing
To assess our algorithmic choices, we will consider
two families of lattices and two tensor models. (Note
that these are only the tensor networks we use for testing
the algorithm; Sec.
IV
further considers other models
to demonstrate the power of the final protocol). The two
types of lattices we consider are (i) the 2D square and
3D cubic lattices, which reflect the structured lattices
commonly found in many-body physics applications,
and (ii) 3-regular random graphs (graphs with random
connections between vertices, where each vertex has degree
3). On these lattices, the two types of tensors we consider
are (i) (uniaxial) Ising model tensors, at inverse temperature
β
close to the critical point, and (ii) tensors with random
entries drawn uniformly from the distribution
½
λ
;
1
(we
refer to this as the URand model). Changing
λ
allows us to
tune between positive tensor network contractions and
tensors with random signs, the latter case being reminiscent
of some random circuit tensor networks. In all models, the
dimension of the tensor indices of the initial tensor network
will be denoted
D
, while the dimension of compressed
bonds will be denoted
χ
; we refer to the value of the tensor
network contraction as
Z
, and the free energy per site
f
¼
−
ln
Z=
β
N
, where
N
is the number of spins. More
discussion of these models (as well as a treatment of corner
double line models
[56]
) is in the SM
[57]
.
C. Bond compression strategies
We first define how to compress the shared bonds
f
e
AB
g
between tensors
A
,
B
. We can matricize these by grouping
the indices as
f
e
A
g
=
f
e
AB
g
,
f
e
AB
g
, and
f
e
B
g
=
f
e
AB
g
, with
effective dimensions
D
A
,
D
AB
, and
D
B
, respectively [see
Fig.
3(a)
]. Generally,
D
AB
<D
A
and
D
AB
<D
B
and so
AB
is already low rank and we can avoid forming it fully.
Instead we perform QR-decomposition (QR) of the matri-
cized
A
,
B
, giving
AB
¼
Q
A
ð
R
A
R
B
Þ
Q
B
ð
2
Þ
¼
Q
A
ð
R
AB
Þ
Q
B
;
ð
3
Þ
where the
Q
matrices satisfy the canonical conditions
Q
†
A
Q
A
¼
1
,
Q
B
Q
†
B
¼
1
, with the canonical direction indi-
cated by an arrow in graphical notation shown in Fig.
3(c)
(detailed in the SM
[57]
). Then, we obtain the compressed
̄
A
,
̄
B
through the SVD of
R
AB
,
R
AB
≈
U
A
σ
V
†
B
;
̄
A
¼
Q
A
U
A
σ
1
=
2
;
̄
B
¼
σ
1
=
2
V
†
B
Q
B
;
ð
4
Þ
truncating to
χ
maximal singular values in
σ
. Because of the
canonical nature of the
Q
matrices, truncating the SVD of
R
AB
achieves an optimal compression in the matrix
Frobenius norm of
AB
due to the orthogonality of
Q
A
,
Q
B
.
Usually
T
will contain additional tensors connected to
A
,
B
. We refer to the additional network of connected tensors as
the environment
E
,with
T
¼
P
f
e
g
=
f
e
out
g
A
f
e
A
g
B
f
e
B
g
E
f
e
E
g
[Fig.
4(a)
]. To compress the bond
e
AB
optimally, we must
account for
E
. We first consider the case when
E
forms a tree
around the bond
e
AB
[Fig.
4(a)(iii)
]. Then, we can perform
QR inward from the leaves of the tree, pushing the
R
factors
toward the bond [Figs.
4(b)(i)
–
4(b)(iii)
]. This is a type of
gauging of the tensor network (i.e., it changes the tensors but
does not change the contraction
T
), and we refer to this
as setting the bond
e
AB
in the tree gauge; alternatively, we
can say the tensors in the tree are in the canonical form
centeredaround bond
e
AB
.Thisresultsina similarmatricized
T
¼
Q
A
ð
R
AB
Þ
Q
B
, where
Q
A
,
Q
B
have accumulated the
products of
R
factors from all tensors to the left and right of
bond
e
AB
[Fig.
4(b)(iii)
]. Then, the truncated SVD of
R
AB
in
Eq.
(4)
similarly achieves an optimal compression of
e
AB
with respect to error in
T
.
More generally,
T
may contain loops, which extend into
the environment [Fig.
4(a)
] and a similarly optimal gauge
is hard to compute
[58,59]
. However, by cutting loops in
the environment
E
(i.e., not contracting some of the bonds
in the loops) we obtain a tree of tensors around bond
e
AB
,
e.g., a spanning tree out to a given distance
r
. (There are
multiple ways to cut bonds to obtain a spanning tree; the
specific spanning tree construction heuristic is given in
the SM
[57]
.) Placing
e
AB
in the tree gauge (of distance
r
),
we can then perform the same compression by truncated
SVD, but without the guarantee of optimality since we are
JOHNNIE GRAY and GARNET KIN-LIC CHAN
PHYS. REV. X
14,
011009 (2024)
011009-4
neglecting loop correlations; see Fig.
4(c)
. However, this
type of tree gauge compression is easy to use in the general
graph setting, and thus will be the main bond compression
scheme explored in this work.
One can show
[57,60
–
62]
that performing the truncation
in Eq.
(4)
is equivalent to inserting the projectors
P
A
¼
R
B
V
B
σ
−
1
=
2
and
P
B
¼
σ
−
1
=
2
U
†
A
R
A
such that
AB
≈
AP
A
P
B
B
.
As such, having computed
R
A
and
R
B
from the local
spanning trees we can form and contract
P
A
and
P
B
directly
into our original tensor network without affecting any
tensors other than
A
and
B
, but still include information
from distance
r
away. In other words, the steps depicted in
Fig.
4
are performed virtually, which avoids having to reset
the gauge after compression.
D. Early versus late compression
In practice, compression must be performed many times
during a tensor network contraction. It might seem natural
to perform compression immediately after two tensors are
contracted to form a tensor larger than some size threshold,
here given by a maximum bond dimension
χ
(early
compression). This is illustrated in Fig.
5(a)
. However,
as discussed above, including information from the envi-
ronment is important for the quality of compression. Early
compression means that tensors in the environment are
already compressed, decreasing their quality. An alternative
strategy is to compress a bond between tensors only when
one of them (exceeding the size threshold) is to be
contracted (late compression), as illustrated in Fig.
5(b)
.
By delaying the compression, more bonds or tensors in the
environment are left uncompressed, which can potentially
improve the quality of the contraction. However, late
compression will also increase the cost or memory of
contraction (as there are more large tensors to consider).
This means that it is most efficient to use late compression
when the associated gain in accuracy is large.
In Fig.
5(c)
, we assess the effect of early versus late
compression when contracting a 2D
16
×
16
lattice (
D
¼
4
,
URand model with tensor entries
∈
½
−
0
.
5
;
1
). All com-
pressions are performed using the tree gauge (out to some
distance
r
, several tree distances
r
are shown), and we show
FIG. 4. Overview of the tree gauge for improving bond compression accuracy, suitable for arbitrary local geometry. (a) Given the bond
e
AB
connecting tensors
A
and
B
, we want to take into account information from the surrounding environment
E
, shown in (i). In (ii) we
form a local spanning tree (shaded bonds) up to distance
r
¼
2
from
A
and
B
. If we consider
“
loop
”
bonds (colored orange) that are not
part of the spanning tree as cut, then the resulting local tree environment shown in (iii) can be optimally compressed as a proxy target.
(b) Depiction of the gauging process for a local tree. In (i) tensors at distance
r
¼
2
from
A
and
B
are QR decomposed, and the
R
factors
(yellow circles) are accumulated toward the bond
e
AB
; see Fig.
3(e)
. In (ii) the same happens for
r
¼
1
tensors, and finally in (iii) the
R
factors from
A
and
B
after accumulating all the outer gauges,
R
A
and
R
B
, are contracted to form
R
AB
. (c) The Frobenius norm (squared)
of an
r
¼
1
local region in the tree gauge is shown in (i). The norm of the local network with loop bonds (orange) cut is shown in (ii),
which is exactly encoded, due to the isometric tensors, as
ð
Tr
R
†
AB
R
AB
Þ
1
=
2
. Performing a truncated SVD on
R
AB
is thus only
r
-locally
optimal up to the presence of such loop bonds.
HYPER-OPTIMIZED APPROXIMATE CONTRACTION OF TENSOR
...
PHYS. REV. X
14,
011009 (2024)
011009-5
the relative error of the contraction
Δ
Z
as a function of the
maximum allowed bond dimension
χ
. We see in this case
that late compression is more accurate than early com-
pression, and that this improvement increases when using
larger tree gauge distances, reflecting the fact that the
gauging is incorporating more environment information. In
Fig.
5(d)
, we similarly compare early versus late compres-
sion using the tree gauge in a 3D lattice (using a hyper-
optimized
Span
tree as described later). In contrast to the
2D result, here we see a smaller improvement from
performing late versus early compression and from increas-
ing the tree gauge distance. This suggests that incorporating
the effect of the environment requires a more sophisticated
gauging strategy in 3D. In general, we summarize our
findings as follows: late compression is preferred when
trying to maximize accuracy for a given bond dimension
χ
or size of the largest single tensor operation, while early
compression can be better when optimizing computational
total cost or memory for a given accuracy. In our sub-
sequent calculations, we will indicate the choice of early or
late compression in the simulations.
E. Comparison of the tree gauge to other gauges
To evaluate the quality of the tree gauge compared to
other gauging or environment treatments in the literature,
we consider contractions on a 2D lattice. To isolate the
comparison to only the choice of gauge, we use the same
approximate contraction tree as used in boundary contrac-
tion; namely, contraction occurs row by row starting from
the bottom, and compression occurs left to right after the
entire row is contracted. We then use four different gauges
or environment treatments during the compression: none,
tree, boundary, and full. None corresponds to no gauging.
Tree is the tree gauge discussed above (up to distance
r
).
Boundary corresponds to the standard matrix product state
(MPS) boundary gauging
[44,50]
, where, after the new row
of tensors has been contracted into the boundary, the
boundary MPS is canonicalized around the leftmost tensor
and then compressed left to right in an MPS compression
sweep (see Fig.
3
of the SM for an illustration
[57]
). Full
corresponds to explicitly computing the environment
E
by
approximate contraction (using the standard MPS boun-
dary contraction algorithm to contract rows from the top).
Then, for the tensors
A
,
B
sharing bond
e
AB
to be com-
pressed, the scalar value of the tensor network is
Z
¼
Tr
BEA
(where
A
,
B
,
E
have been matricized). Using the
eigenvector decomposition,
BEA
¼
L
σ
R
†
, where
L
,
R
are
left, right eigenvectors, respectively, then
e
AB
is optimally
compressed by defining
̃
B
¼
̃
L
†
B
,
̃
A
¼
A
̃
R
, where
̃
L;
̃
R
are
the eigenvectors corresponding to the eigenvalues of largest
absolute magnitude
[22,24]
. Note that the full environment
gauge is expensive, as it requires an estimate of
E
from all
the tensors in the network.
The numerical performance of the different strategies is
shown in Fig.
6
for two problems: a
32
×
32
lattice (2D
Ising model, near critical) and a
16
×
16
lattice (
D
¼
4
,
URand model with entries
∈
½
−
0
.
5
;
1
). In all cases, we see
that including some environment information is better than
not including any environment (
“
none
”
). In the 2D Ising
model, as the tree distance
r
increases, tree gauge com-
pression converges in quality to the MPS boundary
environment scheme (
“
boundary
”
); the two are related as
the MPS boundary corresponds to setting an infinite tree
distance
r
for a tree that grows only along the boundary. In
the 2D URand model, even for small
r
, the tree gauge
already improves on the boundary environment. The full
environment treatment yields the best compression quality
for larger
χ
, but this is achieved at larger cost.
FIG. 5. (a) Schematic of
“
early
”
compression, where
after
each pairwise contraction, any shared bonds of total size
>
χ
are truncated.
(b) Schematic of
“
late
”
compression, where
before
each pairwise contraction, any shared bonds of total size
>
χ
are truncated. (c) Error
Δ
Z
of contracting a
16
×
16
D
¼
4
TN with uniform random entries
∈
½
−
0
.
5
;
1
as a function of
χ
, tree gauging distance
r
, and either
early or late compression. The TN is contracted using the standard MPS boundary contraction algorithm. Line (band) shows median
(interquartile range) over 50 instances. (d) The same but for an approximate contraction of a
5
×
5
×
5
D
¼
2
tensor network with
uniform random entries
∈
½
−
0
.
4
;
1
. The 3D TN is contracted using a hyperoptimized
Span
tree.
JOHNNIE GRAY and GARNET KIN-LIC CHAN
PHYS. REV. X
14,
011009 (2024)
011009-6
Our numerical results in 2D suggest that the tree gauge is a
reasonable compromise between accuracy and efficiency,
equaling or outperforming the common boundary environ-
ment strategy, while being well defined for more general
graphs.
F. Approximate contraction algorithm
Given a choice of late or early compression, and using
the tree gauge, we can explicitly write down a simple
pseudo-code version of the core approximate contraction
function, Algorithm 1, which implements Fig.
1(b)
. The
exact form of the inner functions is detailed in the SM
[57]
.
An alternative, that might be useful in some contexts, is to
use the compression locations to transform a tensor net-
work into an approximately equivalent but exactly con-
tractible form, by inserting a set of explicit projectors
—
this
is also detailed in the SM
[57]
.
G. Generating contraction trees
After fixing the choice of early or late compression, the
subsequent location of compressions in the contraction tree
is purely determined by the contraction order. This is a
major simplification, because, when optimizing over the
approximate contraction trees, we need only optimize the
order of contractions. Nonetheless, the space of ordered
trees is still extremely large and hard to sample fully.
To simplify the search, we work within a lower-
dimensional parametrization of the search space by intro-
ducing tree generators. These heuristics generate trees
within three structural families we term
Greedy
,
Span
,
and
Agglom
. The specific instance of tree within each
family is defined by a set of hyperparameters that can then
be optimized. Here we describe the heuristic generators at a
high level (with a more detailed description in the SM
[57]
).
The input to the generators is only the tensor network
FIG. 6. (a) Error in free energy
Δ
f
as a function of bond dimension
χ
for different gauging and environments for the 2D Ising model at
the critical point. (b) Contraction error
Δ
Z
for the same settings but on a
D
¼
4
URand model with
λ
¼
−
0
.
5
. All contractions contract
from the boundary row by row; thus all bond compressions are for bonds on the boundary. However, different gauging is performed
before the compressions. None, no gauging; tree, bonds are placed in the tree gauge up to distance
r
, followed by
“
late
”
compression,
Boundary, bonds are placed in the canonical form of the MPS boundary, before compression; full, the environment around tensors
A
,
B
is explicitly contracted using a counterpropagating MPS of the same bond dimension, and the bond between
AB
is then truncated to
minimize the error in Tr
BEA
. Note that since
E
is itself only approximate and many truncations are compounded, the error overall is not
guaranteed to be smaller than another method
—
as seen for some small
χ
points here. (c) Illustration of the different environments that a
single compression step is optimal with regard to.
ALGORITHM 1. Approximate contraction.
Input:
tensor network
T
, ordered contraction tree
Υ
, maximum
bond dimension
χ
,treegauge distance
r
, flag
compress_late
//
i
,
j
,
k
,
l
label tensors,
T
½
i
;
...
in
T
.
for
i; j
∈
Υ
do
if
compress_late
then
for
l
∈
NEIGHBORS
ð
T
;i
Þ
do
if
BONDSIZE
ð
T
;i;l
Þ
>
χ
and
l
≠
j
then
COMPRESS
TRUE
GAUGE
ð
χ
;r;
T
;i;l
Þ
end if
end for
for
l
∈
NEIGHBORS
ð
T
;j
Þ
do
if
BONDSIZE
ð
T
;j;l
Þ
>
χ
and
l
≠
i
then
COMPRESS
TRUE
GAUGE
ð
χ
;r;
T
;j;l
Þ
end if
end for
end if
k
←
CONTRACT
ð
T
;i;j
Þ
if not
compress_late
then
for
l
∈
NEIGHBORS
ð
T
;k
Þ
do
if
BONDSIZE
ð
T
;k;l
Þ
>
χ
then
COMPRESS
TRUE
GAUGE
ð
χ
;r;
T
;k;l
Þ
end if
end for
end if
end for
Return:
T
HYPER-OPTIMIZED APPROXIMATE CONTRACTION OF TENSOR
...
PHYS. REV. X
14,
011009 (2024)
011009-7
graph, bond sizes, and
χ
—
the tensor entries are not
considered.
The
Greedy
tree generator assigns a score to each
bond in the
T
. It then chooses the highest scoring bond,
generates a new
T
by simulating bond contraction and
compression (i.e., computing the new sizes and network
structure), and repeats the process, building an ordered
tree. The bond score is a combination of the tensor sizes
before and after (simulated) compression and contraction, a
measure of the centrality of the tensors (their average
distance to every other tensor), and the subgraph size of
each intermediate (i.e., how many tensors were contracted
to make the current tensor); the hyperparameters are the
linear weights of each component in the score.
The
Span
tree generator is inspired by boundary con-
traction. It generates a directed spanning tree of the original
graph, and the contraction order is chosen to contract the
leaves inward. Contracting simultaneously along all the
branches of the tree defines an effective contraction boun-
dary, that sweeps toward the root. The algorithm begins by
choosing either the most or least central node in the tensor
network (TN),
i
0
,asan initial span,
S
¼f
i
0
g
.Itthengreedily
expands to a connected node
j
, adding the contraction
ð
i; j
Þ
→
i
in
reverse
order to the path, where
i
∈
S
;j
∉
S
.
The algorithm repeats by then considering all neighbors
of the newly expanded span
S
→
S
∪
f
j
g
.Afewlocal
quantities
—
connectivity to
S
, dimensionality, centrality, and
distance to
i
0
—
are combined into a score used in the greedy
selection of the next node in the tree, and the combination
weights are the hyperparameters.
The previous approaches grow ordered trees locally.
The
Agglom
tree generator explicitly considers the full
TN from the start and is inspired by renormalization group
contraction strategies in the literature
[20]
. Given a com-
munity size
K
, the generator performs a balanced parti-
tioning over the
N
tensors in
T
to find
N=K
roughly equal
subgraphs. These subgraphs then define intermediate ten-
sors, and the tensors within the subgraph are contracted
using the
Greedy
algorithm with default parameters. After
simulating the sequence of compressions and contractions,
the network of intermediate tensors defines a new
“
coarse
grained
”
tensor network for which the agglomerative
process can be repeated. In this work,
Agglom
uses the
K
a
H
y
P
ar
graph partitioner
[63,64]
, treating the community
size
K
, imbalance, partitioning mode, and objective as the
tunable hyperparameters.
Some sample ordered contraction trees generated by the
above heuristics are shown in Fig.
7
for a 2D
8
×
8
lattice.
In particular, we observe the boundarylike contraction
order of the
Span
tree (contracting row by row from
the bottom) and the hierarchical RG-like structure of the
Agglom
tree (forming increasing clusters); the
Greedy
tree contracts simultaneously from all four corners inward,
rather than from one side like the
Span
tree. Note that the
Agglom
tree tends to perform more contractions before
compressions are performed than the
Span
tree because it
constructs many separate clusters simultaneously, and
the
Greedy
tree exhibits behavior intermediate between
the two.
H. Optimizing the contraction trees
We optimize the trees by tuning the hyperparameters
that generate them with respect to a cost function. Since
we also wish to sample many different trees it is important
that the cost function is cheap to evaluate. We perform the
optimization over the hyperparameter space using Bayesian
optimization
[65,66]
, which is designed for gradient-free
high-dimensional optimization. The overall process is shown
in Fig.
1(a)
, with more detailed pseudo-code in the SM
[57]
.
Depending on the computational resources available, we
can choose the cost function to be memory (peak memory
usage
M
) or the computational (floating point) cost
C
.
For
C
we include the cost of contractions, QR, and SVD
decompositions.
We optimize the contraction trees over the hyperpara-
meters in each of the three families of ordered tree
generators. In all results with optimized trees, we used a
budget of 4096 trees, though in practice a few hundred
often achieves the same result. The practical effect of the
hyperoptimization time is considered in the SM
[57]
.
I. Quality of hyperoptimization
To test the quality of the hyperoptimization and the tree
search space, we first consider a small (but nonetheless
FIG. 7. Example approximate contraction trees for a 2D
8
×
8
square lattice TN. Panels (a)
–
(c) show the ordered contraction
trees with compressions (orange, using the
“
late
”
strategy)
explicitly marked for the
Span
,
Agglom
, and
Greedy
methods,
respectively. The computation proceeds from the bottom to top.
Panels (d)
–
(f) show the same contraction trees as hierarchical
communities, with the contraction ordering proceeding from the
smallest pink bands to largest blue bands.
JOHNNIE GRAY and GARNET KIN-LIC CHAN
PHYS. REV. X
14,
011009 (2024)
011009-8
nontrivial-to-contract due to large
D
and
χ
) square TN of
size
6
×
6
. In this case, the number of tensors is small
enough that it is possible to perform an exhaustive search
over all ordered contraction trees using a branch and bound
algorithm (BB; see SM
[57]
); here we minimize peak
memory
M
. In Fig.
8
we compare the performance of the
hyperoptimized
Greedy
algorithm against the exhaustive
branch and bound search (using tree gauging for compres-
sion in each case for a fair comparison). The standard
boundary contraction algorithm is also shown as a com-
parison point.
As shown in Fig.
8(a)
, hyperoptimizing over the space of
Greedy
trees produces performance quite similar to the
BB search and very different from the standard boundary
contraction. This indicates that the hyperoptimization is
doing a good job of searching the approximation contrac-
tion tree space for graphs of this size. In the top panel, we
can see the optimal contraction strategy found by BB
produces a very different contraction order to boundary
contraction, exploiting the finite size of the graph and the
targeted
χ
to significantly reduce
M
.
We can also verify that optimizing
M
leads to reduced
error. In Figs.
8(b)
and
8(c)
, we show the contraction error
for the URand model with
λ
¼
−
0
.
8
, where it can be seen
that for equivalent error (
∼
10
−
4
, indicated by the yellow
circles) the peak memory or cost of using the hyper-
optimized
Greedy
or BB approximate contraction trees is
indeed much lower than that of boundary contraction.
Interestingly, the heavily optimized BB tree does not
improve on the error of the
Greedy
tree for a given peak
memory
M
.
III. BENCHMARKING HYPEROPTIMIZED
APPROXIMATE CONTRACTION TREES
A. Summary of hand-coded strategies
for regular lattices
In our benchmarking below, when considering regular
lattices, we will compare to a range of hand-coded con-
traction strategies used in the literature in many-body
physics applications, namely boundary contraction, cor-
ner transfer renormalization group (CTMRG)
[67]
, and
higher-order tensor renormalization group (HOTRG)
[68]
.
We briefly summarize the hand-coded strategies here.
Boundary contraction (as already used above) is a standard
method in 2D, but has not been widely applied in 3D.
FIG. 8. Performance of hyperoptimized approximate contraction for a
6
×
6
square TN with all bonds of size
D
¼
16
. The TN is filled
with uniformly random entries
∈
½
λ
¼
−
0
.
8
;
1
.
0
(2D URand model). We test three different contraction trees: standard MPS boundary
contraction, optimization over
Greedy
trees, brute force optimization over all approximate contraction trees (BB). The upper insets
show snapshots illustrating the boundary contraction, an example of
Greedy
, and the optimal BB contraction for
χ
¼
32
. (a) The peak
memory usage
M
of the standard MPS boundary method, compared to the
Greedy
and BB algorithms which have been optimized for
each
χ
. (b),(c) The error for each plotted against peak memory
M
and computational cost
C
, respectively, averaged over 20 instances of
the URand model. The compressions are performed late using a tree-gauge distance
r
¼
1
. The yellow circles correspond to
approximately equal error
Δ
Z
≈
10
−
4
, using the boundary and
Greedy
trees, to enable the ratio of costs to be determined; e.g., the
observed speed-up of
Greedy
over boundary for this error is
120
×.
HYPER-OPTIMIZED APPROXIMATE CONTRACTION OF TENSOR
...
PHYS. REV. X
14,
011009 (2024)
011009-9
We define a 3D version of projected entangled pair state
(PEPS) boundary contraction on a cube that first contracts
from one face of the cube toward the other side, leaving a
final 2D PEPS tensor network that is contracted by 2D
boundary contraction (with the same
χ
). CTMRG is usually
applied in 2D and to infinite systems. Here we apply
CTMRG to the finite lattice by using a finite number of
CTMRG moves
[57]
. Finally, HOTRG has been applied to
both 2D and 3D infinite simulations; here we perform
a limited number of RG steps appropriate for the finite
lattice. For both CTMRG and HOTRG, we also compute
and insert different projectors for each compression, since
we are dealing with generically inhomogeneous systems.
Illustrations of all the algorithms are given in the SM
[57]
.
B. Cost scaling with graph size
In Fig.
9
we show the computational cost (memory and
floating point cost) of hyperoptimized trees in the
Greedy
,
Span
, and
Agglom
classes for a 2D square of size
L
×
L
,
a 3D cube of size
L
×
L
×
L
, and for 3-regular random
FIG. 9. Contraction peak memory and cost of approximate contraction trees in different families versus exact contraction cost and
CTMRG, HOTRG, and boundary contraction algorithms, for different geometries (insets show sample geometry). The tree gauging
distance is set to
r
¼
0
for this comparison, with
“
early
”
compression, and we also turn off gauging in the boundary algorithm. The
hyperoptimized trees are optimized for
M
and
C
separately. (a),(b) Peak memory
M
and cost
C
of contracting a 2D square TN with
D
¼
4
and
χ
¼
32
as a function of side length
L
. (c),(d) Peak memory
M
and cost
C
of contracting a 3D cube TN with
D
¼
4
and
χ
¼
32
as a function of side length
L
. (e),(f) Peak memory
M
and cost
C
of contracting 3-regular random graphs with initial bond
dimension
D
¼
2
and
χ
¼
4
as a function of number of vertices
j
V
j
. The line and shaded band show the median and interquartile range
across 20 instances, respectively.
FIG. 10. Depiction of the contraction tree found by the
Span
algorithm, optimized for minimum floating point cost
C
, for a
large 3D lattice. The colors of the highlighted edges and nodes
indicate the stage of contraction they are involved in, running
from blue (earliest) to red (latest).
JOHNNIE GRAY and GARNET KIN-LIC CHAN
PHYS. REV. X
14,
011009 (2024)
011009-10
graphs with
j
V
j
vertices, using the early compression
strategy, and given bond dimension
χ
. We compare against
the cost of contraction trees generated by boundary con-
traction, CTMRG, and HOTRG.
From this and other examples, we can make some
general observations. First,
Span
trees yield good costs
for simple lattices which have a regular local structure,
the
Agglom
tree is superior for random graphs, and the
Greedy
tree works well for both sets. The markedly
different performance of the
Agglom
and
Span
trees on
random versus simple lattices suggests that these are two
good limiting cases for testing contraction heuristics.
Interestingly, in the 3D cubic case, the hyperoptimized
Span
tree performs a boundarylike contraction, but rather
than contracting from one face across to the other side, it
can find a strategy that contracts all faces toward a point, as
visualized in Fig.
10
. This substantially improves over the
hand-coded boundary PEPS strategy in terms of cost.
Similar observations apply to the
Greedy
tree, which is
similar to or outperforms both
Span
and hand-coded
algorithms for smaller structured lattices, although its
performance degrades for larger lattices. We also find that
Greedy
trees optimize the cost function less well in other
instances of large lattices. This suggests that the search
space generated by the
Greedy
tree generator is limiting at
larger lattice sizes.
In the 2D square lattice, we find that compared to the
hand-coded algorithms, the
Span
and
Greedy
trees with
early compression are superior with respect to memory and
cost, even beating out the most widely used boundary MPS
strategy. At smaller system sizes, the superior performance
of
Span
and
Greedy
over boundary MPS reflects the
FIG. 11. Error versus compressed bond dimension
χ
of hyperoptimized approximate contraction using optimized
Span
,
Greedy
, and
Agglom
trees, in comparison to boundary contraction, CTMRG, and HOTRG, for medium size TNs. (Insets show sample geometry.) In
terms of gauging settings for the hyperoptimized methods, in 2D we use
r
¼
6
with late compressions, and for the others we use
r
¼
3
and early compression. (a) Relative error in the free energy of the 2D Ising model on a
32
×
32
square lattice close to the critical point.
(b) Relative error in the contracted value of the 2D URand model on a
16
×
16
square lattice with
D
¼
4
in the intermediate hardness
regime of
λ
. (c) Relative error in the free energy of the 3D Ising model on a
6
×
6
×
6
cubic lattice close to the critical point. (d) Relative
error in the contracted value of the 3D URand model on a
5
×
5
×
5
cubic lattice with
D
¼
2
in the intermediate hardness regime of
λ
.
(e) Relative error in the free energy of the Ising model on 3-regular random graphs with
j
V
j¼
300
close to the critical point (line and
bands show median and interquartile range across 20 instances). (f) Relative error in the contracted value of the URand model on
3-regular random graphs with
D
¼
2
in the intermediate hardness regime of
λ
(line and bands show median and interquartile range
across 20 instances).
HYPER-OPTIMIZED APPROXIMATE CONTRACTION OF TENSOR
...
PHYS. REV. X
14,
011009 (2024)
011009-11