Nonlinear Quantum Optimization Algorithms via Ecient Ising Model Encodings

Despite extensive research eﬀorts, few quantum algorithms for classical optimization demonstrate realizable advantage. The utility of many quantum algorithms is limited by high requisite circuit depth and nonconvex optimization landscapes. We tackle these challenges to quantum advantage with two new variational quantum algorithms, which utilize multi-basis graph encodings and nonlinear activation functions to outperform existing methods with remarkably shallow quantum circuits. Both algorithms provide a polynomial reduction in measurement complexity and either a factor of two speedup or a factor of two reduction in quantum resources. Typically, the classical simulation of such algorithms with many qubits is impossible due to the exponential scaling of traditional quantum formalism and the limitations of tensor networks. Nonetheless, the shallow circuits and moderate entanglement of our algorithms, combined with eﬃcient tensor method-based simulation, enable us to successfully optimize the MaxCut of high-connectivity global graphs with up to 512 nodes (qubits) on a single GPU.


INTRODUCTION
NP-hard optimization problems, such as Traveling Salesman and MaxCut, are central to a wide array of fields, such as logistics, engineering, and network design [1]. Despite the classical nature of these problems, there is immense interest in identifying variational quantum algorithms (VQAs) which solve them faster or more precisely than any classical method, a concept known as quantum advantage [2][3][4][5]. One common approach is the variational quantum eigensolver (VQE), where parameterized quantum circuits are optimized through gradient descent in order to find the ground state of a problemencoded Hamiltonian [6][7][8]. The quantum approximate optimization algorithm (QAOA) is a related protocol in which parameterized rotations about both an initial and a problem encoded Hamiltonian are alternated in order to find a solution encoded ground state [9][10][11][12][13]. Novel VQA encoding strategies have also been considered in [14][15][16]. While the approximation ratios of VQE and QAOA can surpass those of polynomial complexity classical algorithms (e.g., Goemans-Williamson [17]) [18,19], they require between polynomially and exponentially many gates in the number of qubits n. Such circuit depths can limit the algorithms' potential to demonstrate quantum advantage, rendering them not only computationally inefficient, but also highly susceptible to quantum noise [10,11,20] and barren plateaus [21][22][23][24][25][26].
The traditional formalism of quantum mechanics scales exponentially in n, making simulation of large quantum networks a central challenge to the development of optimization algorithms. This is due to the use of standard tensors for 1D vector states and 2D matrix operators, which scale as 2 n and 2 2n , respectively. These intractable dimensions for quantum network simulation can be remediated by employing a decomposed tensor formalism [27]. While many varieties of decomposed tensors exist, tensor train (TT) has proven particularly popular in the quantum sciences due to its modularity and rank structure, which have close parallels to quantum entanglement. In the TT formalism, states |ψ are represented by matrix product states (MPS) and quantum operators by matrix product operators (MPOs) [28][29][30]. However, TT formalism is often unsuitable for high-depth and connectivity regimes, which are most commonly used in quantum optimization, as TT tensors quickly become prohibitively large (high-rank/bond-dimension) when simulating deep or complicated circuits [31]. Moreover, they are limited to only nearest-neighbor interactions. Due in part to these limitations, no simulation of more than ∼ 100 qubits [32] has demonstrated successful quantum optimization rivaling that of classical methods for nonlocal graph instances, with other large-scale implementations focusing on more restrictive problems. For instance, QAOA MaxCut optimization with up to 210 qubits has been achieved for highly-local 3-regular graphs [33]. It has also been implemented with several thousand qubits when exploring only local features of global graphs, a method which did not yield high approximation ratios [34]. Moreover, large-scale MaxCut implementations using VQE have not been explored.
Our approach -This manuscript takes major steps towards near-term quantum advantage for classical optimization by introducing two novel algorithms which both outperform existing VQAs and do so with fewer quantum resources and lower computational complexity. In particular: • We devise two new methods of MaxCut graph encoding, Nonlinear Parallel VQA (NP-VQA) and Nonlinear Dense VQA (ND-VQA), which, compared to traditional VQAs, double the time-efficiency of quantum MaxCut optimizations, or halve the required quantum (a) Schematic of Nonlinear Parallel VQA (NP-VQA). Two distinct graphs, G 0 and G 1 , are mapped to the classical Ising model, save that G 1 utilizes the x, rather than the z, basis. This encoding is similar to that of the Heisenberg Hamiltonian (Eq. 6). In addition to containing two optimization problems, the resulting parallel loss function, L = L p (Eq. 8), has local minima at bistable points in the (x, z) solution space and is generated by using a nonlinear activation function on the single-qubit components.
(b) Schematic of Nonlinear Dense VQA (ND-VQA). n/2 nodes of an n-qubit Ising model are reassigned from σ z to σ x operators in a manner similar to the ZX Hamiltonian. When this mapping is combined with single-qubit measurements and nonlinear activation functions (similar to those of NP-VQA), only n/2 physical qubits are required to encode n logical qubits. The dense loss function, L = L d (Eq. 15), has local minima at bistable points in the (x, z) solution space.
FIG . 1 resources, respectively. This is accomplished by encoding the classical Ising model into both the z and x-bases of our quantum cost function. Since independent measurements of the axes are used, the Heisenberg uncertainty principle is not violated.
• We illustrate that NP-VQA and ND-VQA reduce runtime and hardware overhead. Moreover, they introduce additional constraints, or regularization, into the optimization problem that are beneficial to the algorithm's performance, reducing its susceptibility to local minima in the training landscape.
• As alternatives to the deep-circuit paradigms of QAOA and VQE, we highlight that sampling multiple initializations of NP-VQA and ND-VQA on shallow circuits (with depth L approximately logarithmic in n) can efficiently address MaxCut problems [35] on quantum hardware.
• We demonstrate that NP-VQA and ND-VQA, in conjunction with low-depth circuits represented by lowrank tensor networks, are capable of solving global optimization problems. Furthermore, we find that these algorithms offer considerable performance even in the absence of entanglement, opening the door for "quantum inspired" algorithms as well.
• To implement both the performance advantages of our novel algorithms and the efficiency of our TT network representations, we develop TensorLy-Quantum [36], a new software package for simulating efficient quantum circuits with decomposed tensors on CPU and GPU. TensorLy-Quantum is based on the TensorLy software family [37].
• Using TensorLy-Quantum, we simulate a MaxCut problem requiring 512 logical qubits on a single NVIDIA A100 GPU and exhibit performance superior to that of polynomial algorithms. This sets a new record for the large-scale simulation of a successful, global quantum optimization algorithm.
By introducing quantum algorithms which improve optimization performance, require fewer quantum resources, and operate on more error-resistant circuits, this manuscript offers multiple paths towards establishing quantum advantage.

MaxCut Optimization Problems
MaxCut is a partitioning problem on unidirected graphs G = (V, E), where V are the set of vertices (blue orbs in Fig. 2, left) connected by edges E (black lines connecting orbs) [35]. The objective is to optimally assign all vertices v i , v j ∈ {−1, 1}, so as to maximize the edge weights w ij ∈ E, where any such assignment is referred to as a "cut". In this work, we will consider a generalized form of the problem known as weighted MaxCut, in which w ij can take arbitrary real values.
Two formulations of MaxCut exist: the NP-complete decision problem and the NP-hard optimization problem [38]. The former seeks to determine if a cut of size c or greater exists for a given graph G, whereas the latter attemps to identify the largest cut of G possible. We here focus on the more general optimization problem formulation, the ground truth of which we denote MaxCut(G). It is common practice to express this optimization in its binary quadratic form [35]  |ψ . U encodes a circuit of depth L (here L = 4, red box) in this manuscript's layer (block) pattern: one layer (block) of single-qubit y-axis rotations R y followed by two, alternating control-Z gates. The energy expectation value L = E is minimized via gradient descent. The global minimum of L corresponds to |ψ = |ψ g .

VQE Framework and Tensor Train Formalism
To complete MaxCut on a quantum computer, it is convenient to minimize the equivalent summation, The problem is then reduced to finding the wavefunction |ψ which minimizes the energy expectation value E = ψ|H|ψ of the classical Ising Model Hamiltonian H is obtained by substituting vertices v i for the Pauli-Z spin operators σ z i , as depicted in Fig. 2, and w zz ij = w ij is a relabeling to specify the zz-spin interactions. As H contains only terms in the z-basis, its eigenvectors are classical (zero-entanglement product states) |ψ i = n |n , where |n ∈ {|0 , |1 }. We here denote the lowest eigenvalue or "ground state" solution as |ψ g , the qubits of which form a bijection with the optimal v i of MaxCut(G). Fig. 2 (right) depicts the VQE framework [6][7][8]. Eq. 1 is optimized by defining the loss function L = E and varying the parametersθ of a quantum circuit with unitary U (θ), which acts on the input quantum state (Fig.  2, right). Without loss of generality, we define the input state as the n-qubit zero state |0 = n |0 , such that We can decompose this unitary matrix as Λ subunitaries Hermitian operators W j and unitary matrices M k . Thus, the gradient g l (Ô) = ∂ Ô ∂θ l of operatorÔ with respect to any parameter θ l ∈θ is where U L and U R are the compositions of unitaries U k with k ≥ l and k < l, respectively. Rather than using deep circuits with extensive connectivity, we instead focus on 1D TT circuits of n qubits. In particular, we opt for tensor rings, which have periodic boundary conditions such that qubit n − 1 is connected to qubit 0. Such local connectivity makes the circuit amenable to both near-term quantum hardware [10,12] and simulation via decomposed tensors. We accomplish this simulation with TensorLy-Quantum [36]. A nascent and expanding software package, TensorLy-Quantum strives to leverage the structure of decomposed tensors in order to simulate quantum machine learning in the most efficient, non-approximate manner possible. When judiciously constructed, TT simulations can yield a low-rank quantum formalism that permits enormous compression of state and operator spaces. Although in the quantum sciences TT methods are most frequently associated with state approximations and truncations, like the density matrix renormalization group [39], we here advocate for their use in exact quantum simulation. Similarly, due to their local connectivity, TT decompositions in quantum computing have traditionally been employed for local optimization problems, such as 3-regular MaxCut [40], however we here emphasize their utility for global optimization tasks.
To analyze VQE with TT formalism, the MPO H {β,γ} 4 is generated from Eq. 2. The energy L = E is then calculated with a single large contraction (Fig. 2, right) where is an n-qubit MPS of m cores and is the corresponding MPO unitary. As we work in the absence of quantum noise, states |ψ display time-reversal symmetry and can be fully expressed with real numbers [41]. We thus restrict our rotations to those of the Pauli-Y generator σ y and implement a simple, repeating subunitary pattern of three layers, also known as blocks. The pattern is illustrated in Fig.  2 (right): a row of parameterized single-qubit rotations R y (θ) (W = σ y ) and two rows of control-z (CZ) gates, which alternate control between even and odd qubits. As each single qubit rotation is a 2×2 dense matrix and each two-qubit control-z gate is a rank-2 MPO of two, eightelement cores, the memory requirements of the circuit representation scale only linearly in both n and L, an exponential reduction in resources compared to circuits described in traditional quantum formalism. Likewise, TT decomposition of the input state |0 requires exponentially fewer terms, as it is represented by a rank-n i=0 1 MPS with just n, two-element cores.

Parallel VQA (NP-VQA)
Our NP-VQA method uses a loss function which applies nonlinear activations functions to the long-range, quantum Heisenberg model for x and z-basis spin interaction For simplicity, we have chosen to neglect both external fields and y-basis interactions, although we note that the addition of these terms could be used to both improve the algorithm's performance, as well as to simultaneously optimize three (rather than two) MaxCut instances. NP-VQA is depicted in Fig. 1a. Two distinct graphs, G 0 and G 1 , are mapped to the Ising model Hamiltonian in Eq. 6, save that the weights of G 1 are denoted w xx ij and that its vertices are parametrized with σ x , rather than σ z , operators. The objective is to simultaneously solve MaxCut(G 0 ) and MaxCut(G 1 ). In order to optimize these independent graphs in parallel, we must make several alterations to standard VQE. To begin, the ground state |ψ p g of H p is not only significantly more challenging to obtain variationally than that of H, it is an entangled state, whose z and x-basis spin components are mutually dependent. H p is thus an unsuitable loss function, as it generally encodes neither MaxCut(G 0 ) nor MaxCut(G 1 ). Furthermore, the cuts of G 0 and G 1 cannot be simultaneously represented by a single quantum state due to the normalization condition of the Bloch sphere of each qubit i where equality holds for real-valued pure states. As such, linear loss functions of the spin components z and x cannot be simultaneously satisfied in this model and a nonlinear activation function should be used.
For NP-VQA, we assign the loss function L as the parallel function where tanh(x) is trivially implemented on the classical computer controlling gradient descent. We note that, as tanh(x) = 2sig(2x) − 1, where sig(x) is the sigmoid function, this adaptation is a differentiable rescaling of the non-unity, single-qubit expectation values σ z i and σ x i , such that they may independently approach, but not satisfy, the ±1 values that fully extremize single-axis Ising model Hamiltonians (Eq. 2). In particular, the tanh(x) activation function prevents the convergence of the loss function into any representation of a classical state as, on the [−1, 1] domain of the Pauli operators, the codomain of tanh(x) extends only to ≈ [−0.76, 0.76] (inset Fig. 3b). As such, L p may only ever partially descend into local minima and is better equipped to escape their regions of attraction. Furthermore, as the gradient of tanh(x) reduces near the ±1 poles, full optimization of one axis at the expense of the other is discouraged and optimal cuts for both graphs can be deduced despite the normalization condition of Eq. 7. In this manner, NP-VQA (as well as ND-VQA, detailed below) is a dual-axis quantum analog to linear programming relaxations [42].
As L p is calculated from single-qubit measurements, it can be considered a form of measurement-based quantum computation (MBQC) [43][44][45]. Moreover, as the  The probability that a single run will converge to an optimal cut for n = 100, using NP-VQA with L = 7 (light green), L = 1 (dark green), and VQE with L = 7 (black). While the L = 1 case is entanglement-free, it benefits from the multi-axis superposition constraint of NP-VQA.

FIG. 3
number of possible single-qubit measurements scales linearly with circuit width, L p represents a polynomial reduction in the number of observables required to solve complete graphs from ∼ n 2 (specifically n(n − 1)/2 twooperator Pauli strings) to ∼ 2n (two single-qubit mea-surements per qubit), lowering the measurement complexity and runtime of the algorithm on real quantum hardware [46,47]. L p is more numerically compact for large or dense graphs, where the MPO H can quickly become numerically cumbersome, requiring 2n 2 (n − 1) elements to construct and generating a semi-contracted representation that scales roughly as ∼ n 3 (n − 1) 2 . However, for the single-qubit measurements required for L p , contraction with a simple, single-qubit operator needs to occur 2n times. In order to efficiently compute 2n single-qubit measurements on large, exact tensor networks without either reconstructing an exponentially large (2 n ) space or contracting over the full network ∼ n times, we devise an efficient partial trace-based contraction scheme in which we construct k distinct reduced density matrix operators where K is the kth set of kept indices. K should be sufficiently small so that the 2 |K| elements of ρ k remain numerically tractable. For each ρ k , |K| smaller partial traces can be done to isolate single-qubit density matrices ρ q , with which we take the single-qubit expectation values of Eq. 8 Not only does NP-VQA enable us to solve for the Max-Cut of multiple graphs in parallel, the additional constraints increase the approximation ratio (quotient of average cut over ground truth MaxCut) by only permitting convergence to local minima which are bistable points for both the z and x-axes, rather than the monostable requirement of traditional VQE. While bistable points are by no means guaranteed to be global minima, the concurrence of a zero gradient for both independently parametrized axes at a single, non-optimal point in parameter space is considerably less likely. As L p is best extremized by larger σ z/x , the circuit will tend towards satisfying the equality in Eq. 7. As this corresponds to entanglement-free qubits, there is a systematic disentanglement of the circuit into product states (Fig. 3b). To understand this process, note that for the general wavefunction |φ = α|0 i 0 r + β|0 i 1 r + γ|1 i 0 r + δ|1 i 1 r describing any two qubits i and r, minimization of the loss function leads to which is maximized when the concurrence (entanglement [48,49]) is minimized and vice versa. Once disentanglement nears completion, the equality in Eq. 7 begins to hold and for any θ t and qubit i, such that As σ z/x q = 0 is unfavorable for the optimization of L p , both axes of each qubit i must be bistable with respect to each angle θ t in order for update of that parameter to halt. In this manner, NP-VQA is a sort of quantum analog to alternating minimization in classical algorithms [50], but which uses both quantum superposition and classical nonlinearity to minimize two cost functions simultaneously, rather than one sequentially. Alternating minimization has also proven useful in QAOA protocols [15,[51][52][53].
As minimizing Eq. 8 cannot yield classical solutions to Eq. 1, we define a rounding scheme for the classification and scoring C of MaxCut(G) estimates where the classical function R rounds the measured expectation values to ±1. We note that this scoring is our true, or computational MaxCut estimate, as it is the MaxCut assignement which results from projecting the qubit measurements of our quantum state from the [−0.76, 0.76] codomain of our linear programming relaxation and tanh(x) activation function back into the ±1 codomain of MaxCut nodes. The average performance of the the NP-VQA method vs traditional VQE is displayed in Fig. 3a for registers of n = 8, 20, and 100 qubits. The simulations were completed using PyTorch [54] and the tensor contractions implemented with Opt-Einsum [55]. For n = 8 and n = 20, we generate exact solutions to complete (allto-all) graphs through brute force computation, whereas the n = 100 graphs are the first three 0.9 density weighted MaxCut graphs (cataloged as the w09-100 instances) from the extensively studied Biq Mac library [56]. Like other recent works [22,57], we implement simple entanglement-based pre-training prior to both the NP and ND-VQA algorithms (details in the Supplementary  Information [58]). Shallow circuits of depth L = 7 are selected in order to adopt a protocol suitable for nearterm quantum devices. While for this fixed, rather shallow L, both VQE and NP-VQA suffer decreasing performance with increasing n, NP-VQA consistently demonstrates a 5%-7% average performance increase with varying n, as seen in Fig. 3a. Moreover, for graphs larger than n = 8, substantially improved accuracy could be obtained with even logarithmically deep circuits. Crucially, for the graphs examined, the approximation ratio of all sized NP-VQA circuits exceeds that of classical algorithms with polynomial complexity both with (∼ 0.88) [19] and without (∼ 0.941) [18] the unique games conjecture. Finally, we again emphasize that not only is the NP-VQA algorithm more accurate than traditional VQE, it simultaneously solves MaxCut(G) for two, rather than a single, graphs G.
While quantum optimization literature typically targets deep circuits with deterministic convergence, we signal that a probabilistic sampling of various shallow NP-VQA circuit initializations (that is, running faster circuits multiple times) can be a more efficient alternative. As larger values of C are a direct certificate of superior optimization, there should be no preference for less efficient single-shot techniques. Shallow implementations are particularly important for near-term quantum devices, which are prohibitively susceptible to noise at even moderate circuit-depth. Fig. 3b displays the probability that an optimal cut, which we define as C > T = 0.97 × MaxCut(G), will be found for graphs G with n = 100. While studies have indicated that up to an exponential number of parameters are required to obtain nearly perfect convergence [8], a quantity that would be inconceivable for a n = 100 qubits, traditional VQE alone can produce highly optimal solutions approximately 12.5% of the time (Fig. 3b) with only seven total layers and 3n = 300 parameters, compared to ∼ 2 99 parameters for deterministic convergence. Moreover, probabilistic circuit sampling with NP-VQA is dramatically more successful than with VQE.
From Fig. 3b, we note that L = 1 circuits obtain highly optimal solutions with probability 0.36, tripling the convergence rate of standard VQE with 1/7th the resources. As circuits with L = 1 are comprised of only local rotations without control gates, the totality of the performance is due to mutual constraints on multi-basis quantum superpositions, and not due to quantum entanglement. Like other entanglement-free formulations [59][60][61], this renders the circuit efficient for classical simulation and indicates that algorithms for simulated superposition with multi-basis constraints may hold promise as "quantum inspired" classical algorithms. However, we note that such shallow, entanglement-free implementations are known to suffer decreased performance with increasing circuit width [8]. Furthermore, even modest entanglement and circuit-depth can greatly increase the probability of optimal convergence. For depth L = 7, 7 NP-VQA produces an optimal cut with 50% probability. The cumulative effects of such probabilistic sampling can lead to high-confidence convergence with markedly few repetitions r. Fig. 4a shows the probability of obtaining at least one optimal cut for n = 100 and r = 5, which nears 97% in fewer than 100 training steps for shallow NP-VQA circuits. For r = 10, convergence is greater than 99.9% and the 3nr = 3000 parameters utilized for ten repetitions still pale in comparison to the exponentially many required by deep-circuit techniques.
NP-VQA also offers superior performance over traditional VQE in terms of the diversity of tenable graphs (Fig. 4a). For r = 10, not only does NP-VQA find optimal solutions for all of the complete n = 20 graphs tested (compared to 90% for VQE), its parallel implementation doubles the number of MaxCut instances optimized.

Nonlinear Dense VQA (ND-VQA)
The ND-VQA paradigm also draws upon nonlinear activation functions and single-qubit measurements, but rather than independently encoding two graph instances into the z and x-bases, it jointly encodes a single n node (qubit) graph into the z and x bases of n/2 qubits (see Fig. 1b). We will refer to these n and n/2 qubit registers as the logical and physical qubits, respectively. This encoding is carried out by treating both the z and x bases of each of the n/2 physical qubits as a separate logical qubit, such that G can be encoded by a loss function inspired by the the ZX Hamiltonian and optimized by the corresponding nonlinear loss function and rounded MaxCut estimation We note the dramatically increased performance for n = 8, 100 with ND-VQA over VQE and comment that, while VQE with n = 512 was prohibitively memory inefficient, ND-VQA with n = 512 outperforms VQE with n = 8, a system 1/64th of its size and still converges to optimal cuts ∼ 13% of the time. Furthermore, improved optimization of the n = 512 would be readily attained with deeper circuits. (right) ND-VQA for n = 100 with L = 13 (light green), L = 7 (dark green), and VQE with L = 7 (black). Incresing depth from 7 to 13, while still extremely shallow for n = 100, greatly improves performance.
We emphasize that, as Eqs. 15 and 16 are comprised of independently collected measurements, the uncertainty principle is not violated for w zx ij with j = i. ND-VQA doubles the number of quantum resources available, a valuable asset for a nascent field which has invested millions of dollars and spent multiple decades to achieve ∼ 50-qubit registers. Furthermore, like that of its parallel counterpart, the dual-basis constraint of ND-VQA significantly improves MaxCut optimization. Fig.  4b illustrates the average performance of both ND-VQA and VQE circuits for n = 8, 100 and the ND-VQA circuit alone for the pm3-8-50 (n = 512) instance of the DIMACS library [62]. The pm3-8-50 (n = 512) with traditional VQE was too memory inefficient for evaluation on a single A100 GPU. Like its dense counterpart, ND-VQA demonstrates marked improvement over VQE in both average MaxCut convergence and probabilistic sampling.
Although our current contraction algorithm yields a maximum circuit depth of L = 13 for logical n = 512 on a single GPU, moderately deeper circuits would greatly improve performance while still maintaining L roughly logarithmic in n, the simulation of which could be provided by tensor contraction backends with improved memory management, such as the cuTensor library. Even with this sublogarithmic depth, ND-VQA displays strong performance on the n = 512 graph, achieving average cut of ∼ 96% of the ground truth (Fig. 4b, left) and a highest accuracy upwards of 98% from thirty total runs. This exceeds not only the average performance of polynomial algorithms, but also that of notable specialty algorithms on this specific instance under similar specifications [63]. Moreover, ND-VQA obtains optimal cuts upwards of 13% of the time (Fig. 4b, right). While computational benchmarking has been demonstrated for thousands of qubits, to our knowledge, ND-VQA with n = 512 is the largest simulation of successful global quantum optimization algorithms ever conducted.

DISCUSSION
In this manuscript, we introduced the novel NP-VQA and ND-VQA algorithms. Even with sublogarithmic circuit depth, the approximation ratios of the graphs tested with these circuits exceed the averages of polynomialtime classical algorithms. Both of these algorithms provide meaningful efficiency improvements and thus substantially lower the overhead of demonstrating quantum advantage for classical optimization problems. These improvements include a polynomial reduction in circuit measurements, as well as a factor of two speedup for NP-VQA, and a factor of two reduction in quantum resources for ND-VQA. In actuality, both encoding methods are part of a broader framework of multi-axis qubit encodings, which include any nonlinear renormalization of quantum observables that permits the optimization of multiple, mutually regularizing observables on a single qubit. These findings are likely to spur additional research in efficient qubit encoding or the application or our techniques to related algorithms. Since deeper circuits would be attainable with more efficient tensor contraction methods or distributed computing efforts, this work encourages further development of large-scale quantum simulation with tensor methods. Most critically, as these simulations are ultimately memory-bound, the implementation of these algorithms at-scale constitutes a strong and novel candidate for quantum advantage.
These algorithms are powerful enough to enable largescale simulations of effective optimization algorithms on a single, consumer-grade GPU. To our knowledge, we were able to produce the largest simulation of a quantum global optimization algorithm which exceeds classical (polynomial) performance achieved to-date. Such a successful and large-scale implementation demonstrates that exceedingly simple and low-rank TT representations are sufficient to model diverse techniques in quantum machine learning, and to do so without truncation or approximation. Finally, through the use of large-scale global graphs, we demonstrate that the local connectivity and low entanglement capacity of both the MPS formalism and linearly connected near-term quantum devices do not preclude successful quantum optimization routines. When we expand our definition of accuracy to encompass probabilistic sampling of various circuit initializations, we find that remarkably few quantum resources can be requisite for classical optimization problems.