Information Scrambling in Computationally Complex Quantum Circuits

Interaction in quantum systems can spread initially localized quantum information into the many degrees of freedom of the entire system. Understanding this process, known as quantum scrambling, is the key to resolving various conundrums in physics. Here, by measuring the time-dependent evolution and fluctuation of out-of-time-order correlators, we experimentally investigate the dynamics of quantum scrambling on a 53-qubit quantum processor. We engineer quantum circuits that distinguish the two mechanisms associated with quantum scrambling, operator spreading and operator entanglement, and experimentally observe their respective signatures. We show that while operator spreading is captured by an efficient classical model, operator entanglement requires exponentially scaled computational resources to simulate. These results open the path to studying complex and practically relevant physical observables with near-term quantum processors.

The inception of quantum computers was motivated by their ability to simulate dynamical processes that are challenging for classical computation [1].However, while the size of the Hilbert space scales exponentially with the number of qubits, quantum dynamics can be simulated in polynomial times when entanglement is insufficient [2][3][4] or when they belong to special classes such as the Clifford group [5][6][7].A physical process that fully leverages the computational power of quantum processors is quantum scrambling, which describes how interaction in a quantum system disperses local information into its many degrees of freedom [8][9][10][11][12].Quantum scrambling is the underlying mechanism for the thermalization of isolated quantum systems [13,14] and as such, accurately modeling its dynamics is the key to resolving a number of physical phenomena, such as the fast-scrambling conjecture for black holes [9,10], non-Fermi liquid behaviors [15,16] and many-body localization [17].Understanding scrambling also provides a basis for designing algorithms in quantum benchmarking or machine learning that would benefit from efficient exploration of the Hilbert space [18][19][20].
A precise formulation of quantum scrambling is found in the Heisenberg picture, where quantum operators evolve and quantum states are stationary.Analogous to classical chaos, scrambling manifests itself as a "butterfly effect", wherein a local perturbation is rapidly amplified over time [11,21].More specifically, the perturbation is realized as an initially local operator (the "butterfly operator") Ô, typically a Pauli operator acting on one of the qubits (the "butterfly qubit").When the quantum system undergoes a dynamical process Û , the butterfly operator Ô acquires a time-dependence and becomes Ô (t) = Û † Ô Û , with Û † being the inverse of Û .The resulting Ô (t) can be expanded as Ô (t) = np i=1 w i Bi , where Bi = ρ(i) 1 ⊗ ρ(i) 2 ⊗ ... are basis operators consisting of single-qubit operators ρ(i) j acting on different qubits and w i are their coefficients.
Quantum scrambling is enabled by two different mechanisms: operator spreading and operator entanglement [11,[22][23][24][25][26].Operator spreading refers to the transformation of basis operators such that on average, each Bi involves a higher number of non-identity single-qubit operators.Operator entanglement, on the other hand, refers to the increase in n p , i.e. the minimum number of terms required to expand Ô (t).Independent characterizations of these two mechanisms are essential for a complete understanding of the nature of quantum scrambling.Quantifying the degree of operator entanglement also holds the key to assessing the classical simulation complexity of quantum observables [27].However, operator spreading and operator entanglement are generally intertwined and indistinguishable in past experimental studies of quantum scrambling [28][29][30][31][32].
In this Article, we perform a comprehensive characterization of quantum scrambling in a two-dimensional (2D) quantum system of 53 superconducting qubits.Signatures of operator spreading and operator entanglement are clearly distinguished in our experiment.These results are enabled by quantum circuit designs that independently tune the degree of each scrambling mechanism, as well as extensive error-mitigation techniques that allow us to faithfully recover coherent experimental signals in the presence of substantial noise.Lastly, we find that while operator spreading can be efficiently predicted by a classical stochastic process, simulating the experimental signature of operator entanglement is significantly more costly with a computational resource that scales exponentially with the size of the quantum circuit.
Our experiment approach is based on evaluating the correlator between Ô (t) and a "measurement operator", M , which is another Pauli operator on a different qubit (the "measurement qubit"): Here ... denotes the expectation value over a particular quantum state.C(t) is commonly known as the out-oftime-order correlator (OTOC) and related to the commutator [ Ô(t), M ] by [11,21,[34][35][36][37].Quantum scrambling is characterized by measuring C over a collection of quantum circuits with microscopic differences, e.g.phases of individual gates.Operator spreading is then reflected in the average OTOC value, C, which decays from 1 when Ô(t) and M overlap and no longer commute [30,31].In the fully scrambled limit where the commutation between Ô(t) and M is completely randomized, C becomes ∼0.If operator entanglement is also present (i.e.n p 1), C approaches 0 for all circuits and their fluctuation δ C vanishes as well.This is because each C includes contributions from many basis operators Bi with different phases.It is therefore sufficient to identify operator spreading through the decay of C, whereas any insight into operator entanglement necessitates an estimate of δ C .
The measurement protocol for OTOC is described in Fig. 1A and consists of a quantum circuit Û and its inverse Û † , with a butterfly operator Ô (Pauli operator σ(b) x on Q b ) inserted in-between.An ancilla qubit Q a , connected to the measurement qubit Q 1 via a controlledphase (CZ) gate, measures C between Ô and M (Pauli operator σ (1) z on Q 1 ) through its σy [35,38].For this work, we employ quantum circuits composed of random single-qubit gates and fixed two-qubit gates (Fig. 1B) due to the wide range of quantum scrambling that may be achieved with limited circuit depths [33,39].The OTOC measurement protocol is first implemented on a one-dimensional (1D) chain of 21 qubits (Fig. 1C).We use the qubit at one end of the chain as Q a and successively choose qubits Q 2 through Q 20 as Q b .A two-qubit gate (iSWAP) is applied to each pair (Q j , Q j+1 ), where j = 0, 2, 4... in odd circuit cycles and j = 1, 3, 5... in even circuit cycles.
In the left panel of Fig. 1C, experimental values of σy are shown for different numbers of cycles in Û .Here we average σy over 60 circuit instances to first focus on operator spreading.It is seen that σy decays as early as cycle 1, irrespective of the location of Q b .This observation contradicts the 1D geometry which requires h + 1 circuit cycles before the time-evolved butterfly operator Ô (t) overlaps with M , with h being the number of qubits between Q b and Q 1 .The signals are therefore complicated by errors in the quantum circuits, such as mismatch between Û and Û † or qubit decoherence [29,30].These deleterious effects are mitigated by additionally measuring σy without applying the butterfly operator [41].These data, referred to as normalization values, are also shown in the left panel of Fig. 1C and approximately equal to the total fidelities of Û and Û † [29].We then divide σy for each Q b by the normalization values to recover the effects of scrambling [40].
The normalized data, equal to the average OTOCs C after error-mitigation, are shown in the right panel of Fig. 1C and exhibit features consistent with operator spreading: For each location of Q b , C retains values near 1 before sufficient circuit cycles have occurred to allow an overlap between Ô (t) and M .Beyond these cycles, C converges to 0, indicating that Ô (t) and M have overlapped and no longer commute.In addition, we observe that the time-evolution of C for each Q b resembles a ballistically propagating wave.The front of each wave coincides with the edge of the "light-cone" associated with Q 1 , i.e. the set of qubits that have been entangled with Q 1 .This profile is attributed to the iSWAP gates used in these circuits, which spread single-qubit operators at the same rate as their light-cones expand [42].For generic quantum circuits, the spreading velocity (a.k.a. the butterfly velocity) is typically slower.Using the full 2D system, we next demonstrate how the evolution of C may be used to diagnose the butterfly velocity of operator spreading.
In Fig. 2A, the spatial distribution of C is shown for five different numbers of cycles in Û , with iSWAP still being the two-qubit gate.It is seen that the number of qubits with C < 1 rapidly increases with the number of cycles, consistent with the spatial spread of the timeevolved butterfly operator.Moreover, for each circuit cycle, the values of C abruptly change across the edge of the light-cone associated with Q 1 (dashed lines in Fig. 2A).In contrast, the spatiotemporal evolution of C shown in  moves further away from Q 1 .
The different OTOC behaviors can alternatively be seen in the full temporal evolution of four specific qubits (Fig. 2C).For iSWAP, the shape of the OTOC wavefront remains sharp and relatively insensitive to the location of Q b , similar to the 1D example in Fig. 1C.On the other hand, the wavefront propagates more slowly for √ iSWAP and also broadens as the distance between Q b and Q 1 increases.As a result, more circuit cycles are required before C reaches 0 for √ iSWAP.The wavefront behavior seen with √ iSWAP is similar to generic quantum circuits analyzed in past works [23,24].
The observed features of average OTOCs are quantitatively understood by mapping operator spreading to a classical Markov process involving population dynamics [40].In this model, the 2D qubit lattice is populated by fictitious particles representing two copies of a singlequbit operator.The initial state of the entire system is a single particle at the site of Q a .Whenever a twoqubit gate is applied to two neighboring lattice sites, their particle occupation changes between four possible states: ♦♦ (both empty), ♦ (left empty, right filled), ♦ (right empty, left filled) and (both filled).The transition probabilities are described by the stochastic matrix: where a = 1 3 sin 4 θ, b = 1 3 1 2 sin 2 2θ + 2 sin 2 θ with θ being the swap angle of the two-qubit gate.The average probability of finding a particle at the site of Q 1 is then used to estimate C. In this classical picture, the OTOC wavefront corresponds to the boundary separating the empty region from the region populated by particles.
The difference in OTOC propagation between iSWAP and √ iSWAP is then captured by the dependence of Ω on θ: After each application of an iSWAP gate (θ = π/2), the particle occupation always changes (with the exception of ♦♦).In particular, any previously empty site will be filled and the region populated by particles always grows.This leads to the observed maximal butterfly velocity.In contrast, the application of any √ iSWAP gate (θ = π/4) can leave the particle occupation unchanged with a probability 5  12 .C therefore decays more slowly in this case and its broadening is explained by the fact that the wavefront spreads at a different velocity for each trial of the Markov process.The predicted values of C, plotted as dashed lines in Fig. 2C, agree well with the experimental data and indicate that the dynamics of operator spreading can be reliably predicted by classical models.We note that the effect of noise is included when C is calculated for √ iSWAP as it is found to introduce deformation to the observed signals [40].
Unlike operator spreading, an efficient classical description of operator spreading is not known to exist.In particular, population dynamics cannot be used to model the circuit-to-circuit fluctuation of OTOCs [40].Resolving the growth of operator spreading is also difficult with a quantum processor since it is often accompanied by increased operator spreading [11,[22][23][24].We overcome this challenge by gradually adjusting the composition of Û and Û † , realizing a group of circuits with predominantly Clifford gates (iSWAP, √ X ±1 or √ Y ±1 ) and a small number of non-Clifford gates ( √ W ±1 and √ V ±1 ).As illustrated by the evolution of a butterfly operator in Fig. 3A, Clifford gates generate operator spreading by extending the butterfly operator to other qubits while preserving the total number of basis operators (which are products of Pauli operators, or Pauli strings, in these cases).In contrast, non-Clifford gates generate operator entanglement by transforming one single Pauli string into a superposition of multiple Pauli strings, maintaining the spatial extent of operator spreading in the process.
These distinctive properties of Clifford and non-Clifford gates therefore provide us a way to independently tune one scrambling mechanism without affecting the other.We now focus on operator entanglement and measure the circuit-to-circuit fluctuation of OTOCs, as shown in Fig. 3B.Here the number of circuit cycles is fixed at 12 and the number of non-Clifford gates in Û , N D , is successively changed from 0 to 32.For each N D , the individual OTOCs C of 130 random circuit instances are measured using a modified normalized procedure [40].At N D = 0 where the circuits consist of only Clifford gates, we see that C takes discrete values of 1 or −1.This is expected as the time-evolved butterfly operator Ô(t) is a single Pauli string and therefore either commutes or anti-commutes with the measurement operator M .As more non-Clifford gates are introduced into the circuits, C starts to assume intermediate values between ±1 and converges toward 0.
The mean C and fluctuation (i.e.root-mean-square value) δ C of C are then computed from experimental data and plotted against N D in Fig. 3C.We observe different behaviors for C and δ C : C remains largely constant and close to 0, confirming that operator spreading remains unaffected by the increasing number of non-Clifford gates.On the other hand, δ C decays from an initial value of 1 and is almost suppressed by two orders of magnitude as N D increases from 0 to 32.Over the same range of N D , we have numerically calculated the average numbers of Pauli strings in Ô(t), n p , which are seen to increase exponentially (inset to Fig. 3C).These results demonstrate that the decay of OTOC fluctuation allows the growth of operator entanglement to be experimentally diagnosed.
To determine the accuracy of our measurements, the OTOCs of experimental circuits are simulated using a Clifford-expansion method and overlaid on the data in Fig. 3B and Fig. 3C [40].We find good agreement between experiment and simulation even when δ C is as small as 0.03, indicating the quantum processor's capability to resolve high degrees of operator entanglement.This may appear surprising given that other signatures of quantum entanglement, such as entropy, are highly susceptible to unwanted interaction with the environment [43].Instead, we find that environmental effects are nearly absent from these data.This robustness is a result of the effective normalization protocol and a range of other error-mitigation techniques used in our experiment [40].
Having identified means of characterizing both operator spreading and operator entanglement, it remains to be asked how the computational complexity of quantum scrambling, as well as our experimental error, scale with circuit size.This is addressed by systematically increasing the number of iSWAPs in the quantum circuits, N S (here N S counts only iSWAP gates that lie within the light-cones of Q a and Q 1 [40]).At the same time, N D is kept at a large value such that the Clifford-expansion simulation method used in Fig. 3 is challenging to perform and tensor-contraction is the most efficient classical simulation method [33,40,44].Figure 4A shows representative data for three circuit configurations with different values of N S , along with the corresponding numerical simulation results.As N S and the computational complexity for tensor-contraction increase, we observe that the OTOC fluctuation decreases and the agreement between experiment and simulation also degrades.
To quantify these observations, we define an ideal OTOC signal as the fluctuation δ C computed from the simulated values of C. We also define an experimental error as the RMS deviation between the simulated and the measured values of C. Both quantities are shown as functions of N S in the upper panel of Fig. 4B.It is seen that the OTOC signal generally decreases as N S increases (see Extended Data in SM for data at N D = 24 [40]).This change in signal size is difficult to predict theoretically [40] and may be related to the fact that the Pauli strings in Ô(t) become less correlated with each other as the light-cone associated with the butterfly qubit grows [42].On the other hand, the experimental error first decreases to a minimum of ∼0.01 for N S = 232 before starting to increase.The ratio of these quantities is the experimental signal-to-noise ratio for OTOC and plotted in the bottom panel of Fig. 4B.The SNR is seen to monotonically decay from a value of 3 at N S = 159 to 0.55 at N S = 271.
At last, we estimate the time needed to simulate the OTOC of one circuit on a single CPU core using tensorcontraction, t sim [40].The result is plotted against N S in Fig. 4C.We observe an exponential increase in t sim as N S increases, confirming that simulating the scrambling of complex quantum circuits indeed demands exponentially scaled classical computational resources.In particular, although simulating the results in Fig. 4A currently requires t sim ≈ 100 hours, this cost becomes t sim > 10 10 hours when N S reaches 400.Chartering a path to this experimental regime is the focus of our ongoing research.In the SM, we have provided numerical simulation results showing that a percentage decrease in the coherent or incoherent errors of iSWAP gates will lead to at least commensurate levels of reduction for the OTOC errors [40].Of less certainty is the size of δ C in this regime, which cannot be predicted by any known classical model [40].Nevertheless, this difficulty also implies that resolving δ C alone might reveal scrambling dynamics beyond the simulation capacity of classical computers.
In conclusion, we characterize quantum scrambling in a 53-qubit system and demonstrate that entanglement in the space of quantum operators is the key to computational complexity of quantum observables.This result highlights the importance of careful classical analysis in the ongoing quest to attain quantum computational advantage on various problems of interest.On the other hand, the challenge in predicting OTOC fluctuations even moderately beyond the experimental regime also indicates that quantum processors of today can already shed light on certain physical phenomena as well as classical computers.Another encouraging finding of our work is that the accuracy of quantum processors can be significantly improved through effective error-mitigation.For example, an SNR of ∼1 for OTOCs is achieved for N S = 251 where the circuit fidelity is merely 3% [40].In the immediate future, classical simulation of average OTOCs may be used to efficiently benchmark performance of quantum processors.The experimental framework established here can also be used to study other quantum dynamics of interest, such as the integrability of the XY model (see SM for preliminary data) and many-body localization [17,45].As the fidelity of quantum processors continues to increase, modeling scrambling in quantum gravity and unconventional quantum phases may become a reality as well [9,10,15,16].In this section, we provide data not covered in the main text.These include the spatial distribution of average OTOCs on the 53-qubit system, average OTOC behaviors for additional types of quantum circuits and more detailed investigation of errors in experimental OTOC measurements.

Supplemental Materials
A. Full 53-Qubit Average OTOC Data Figure S1 shows experimentally measured average OTOCs, C, for 51 different possible butterfly qubit locations.The data are also shown for every circuit cycle from 1 through 15.Here iSWAP is used as the two-qubit gate.The sharp nature of the wavefront propagation is readily visible, where a large reduction from C = 1 is observed as soon as the lightcone of the measurement qubit (black filled circle) reaches a given qubit.In comparison, similar data are shown up to a total of 24 circuit cycles for random circuits containing √ iSWAP gates (Fig. S2).The dynamics is seen to be slower than iSWAP, as mentioned in the main text.In particular, the wavefront is also broader, as seen by the more gradual spatial transition from C = 1 to C ≈ 0.

B. OTOCs for Non-Integrable and Integrable
Quantum Circuits In the main text of this work, we have primarily focused on quantum circuits that are non-integrable [46], i.e. capable of evolving a quantum system into states with maximal degrees of scrambling.In general, many quantum circuits or dynamical processes are integrable and lead to small degree of quantum scrambling even at long time scales.Examples include Clifford circuits [5], dynamics of free fermions [3] and many-body localization [47].For certain integrable, pseudo-random circuits such as Clifford circuits, OTOC fluctuation is needed to distinguish them from non-integrable circuits, as demonstrated in the main text.In many other cases, average OTOCs behave differently for integrable quantum circuits compared to non-integrable ones and are sufficient to differentiate between them, which we illustrate next.
Figure .S3 shows two different types of quantum circuits.The circuit in the left panel consists of √ iSWAP gates and single-qubit gates randomly chosen from V ± .This is the example of a nonintegrable circuit, which is expected to lead to maximal scrambling at long times.The circuit in the right panel has the same two-qubit gates, but the single-qubit gates are replaced with Z gates that have angles randomly cho-  sen from the interval [−π, π].This is the example of an integrable circuit, where the dynamics does not lead to maximal scrambling if implemented in 1D.The experimentally measured average OTOCs are shown in Fig. S3B for both types of quantum circuits.Here, the two-qubit gates are applied in a brick-work pattern similar to what is used in Fig. 1C of the main text.For the non-integrable circuits, we observe a clear propagation of the OTOCs along with a diffusive broadening of the wavefronts, similar to what was observed in Fig. 2C of the main text.In particular, C monotonically decays toward 0 at large circuit cycles.On the other hand, C does not show the wave-like propagation for the integrable circuits.For qubits closer to the measurement qubit Q 1 , C first decays but gradually increases for longer circuit cycles.Qubits further from Q 1 barely show any appreciate degree of OTOC decay up to 50 circuit cycles.Overall, C for different qubits converges toward values > 0.5 for the largest circuit cycles probed in this experiment.These results show how one might in some cases uses average OTOC behavior to distinguish non-integrable quantum dynamics from integrable ones.
The integrable circuits studied in Fig. S3 in fact are the digitized realizations of the so-called XY model [45] which is of wide interest in condensed-matter physics due to its ability to capture a variety of interesting physical phenomena such as quantum phase transitions.To demonstrate an immediate application of our work, we use OTOCs to study a particular feature of the XYmodel, namely the transition from integrability to nonintegrability due to geometry.
It is well-known that XY-model in 1D exhibits integrable dynamics, as demonstrated in Fig. S3.Dynamics for XY-model in 2D remains a highly active area of research and is generally non-integrable.In Fig. S4, we show that the transition into non-integrability for XYmodel occurs as soon as the geometry changes from 1D to a ladder-like geometry (Fig. S4A).Here we have arranged 16 qubits into two parallel chains of 8 qubits and connected them with 5 "cross-links".Next, we measure OTOCs of random circuits implemented with this new geometry where the single-qubit gates are again randomly chosen from Z gates with angles in the interval [−π, π] and the two-qubit gates are of non-integrable quantum circuits are both recovered in this modified geometry, indicating a transition from integrability to non-integrability for the XY-model.Detailed experimental studies of this transition with larger numbers of qubits is a subject of future work.

C. Characterization of Experimental Errors
In this section, we present additional characterization data to further corroborate and understand the experimental results in Fig. 4 of the main text.In particular, we focus on answering two questions: 1.What fraction of the observed experimental errors can be attributed to statistical errors due to limited sampling?2. How sensitive is the signal-to-noise ratio to the number of non-Cliffords in the circuits?
We first address question 1.The role of finite sampling in experimental OTOC measurement may be understood as follows: For n stats single-shot measurements, an average error of 1 √ nstats is expected to be present in the estimate for σy of the ancilla qubit due to statistical uncertainty.In the presence of circuit errors and a normalization value σy I < 1, this expected statistical error is amplified to a value of 1 σy I √ nstats , assuming the ideal OTOC value to be significantly smaller than 1 (i.e.σy I | σy B | where σy B is σy measured with the butterfly operator applied).
In Fig. S6A, this expected statistical error is plotted Right panel shows an integrable quantum circuit composed of √ iSWAP and singlequbit gates that are rotations around the z-axis of the Bloch sphere with random angles.(B) Average OTOCs C for the two types of random circuits, implemented on a 1D chain of 16 qubits.The qubit configuration is shown on top, where the unfilled (filled) black circle represents the ancilla (measurement) qubit Qa (Q1).Left panel shows C with the butterfly qubit corresponding to qubits Q2 through Q16, where the random circuits are of the type described in the left panel of (A).Right panel shows similar data for the random circuits of the type in the right panel of (A).The locations of the butterfly qubit are indicated by the colors of the data symbols and the legend on top.All data are averaged over 40 circuit instances.
as a function of n stats for N S = 251 and N D = 40 where σy I ≈ 0.04.On the same plot, we have included experimental OTOC errors (i.e. the RMS deviation between simulated and experimental OTOC values of 100 random circuits) obtained from reduced amounts of single-shot data.We observe that the experimental error initial decreases with increasing values of n stats , suggesting that statistical uncertainty being a significant source of error for small numbers of single-shot measurements.At n stats > 10 7 , the experimental error has a significantly weaker dependence on n stats and is markedly higher than the expected statistical error, indicating other error sources are dominant in this regime.In Fig. S6A, we have re-plotted the experimental errors in Fig. 4C of the main text along with the expected statistical errors calculated from n stats and σy I of each N S .The increase in statistical error at higher N S is a result of decreasing normalization values.It is evident that the observed experimental errors are consistently larger than the expected statistical errors, indicating other error mechanisms are dominant.In Section VI, we provide numerical simulation results to further analyze the sources of experimental  error.
Next, we focus on the scaling of experimental error vs number of iSWAPs for a different number of non-Cliffords in the quantum circuits.Fig. S6A shows C of 100 circuit instances for N S = 113, 251 and 300, all of which have N D = 24.On the same plots, exactly simulated OTOC values using the branching method (Section IV) are also plotted.Similar to Fig. 4B of the main text, we observe that the agreement between experimental and simulated OTOC values degrades as N S increases.In addition, the OTOC signal size (i.e. the fluctuation of the simulated OTOC values) also decreases as N S increases.
In Fig. S6B, the OTOC signal size, experimental error and expected statistical error are plotted as functions of N S (all with N D = 24).Here we see that the OTOC signal size indeed monotonically decreases as N S increases.The experimental error, on the other hand, shows less dramatic changes as a function of N S .After taking the ratio of the OTOC signal size and the experiment error, we plot the resulting SNR as a function of N S in the lower panel of Fig. S6B.The scaling of SNR vs N S is roughly consistent with Fig. 4 of the main text where higher values of N D (64 and 40) are used.In particularly, SNR falls below 1 after N S has increased beyond ∼250.The relative insensitivity of SNR to N D allows us to conveniently benchmark our system at lower values of N D where classical simulation is easy.This is especially important in the future when we conduct experiments with N S > 400 where tensor-contraction may no longer be feasible (Section III).

II. EXPERIMENTAL TECHNIQUES
In this section, we describe the calibration process and metrology of quantum gates used in the OTOC experiment.In addition, we also demonstrate a series of errormitigation strategies used in the compilation of experimental circuits which significantly reduced errors from various sources such as those related to state preparation and readout (SPAM), cross-talk, or coherent control errors on two-qubit gates.

A. Calibration of iSWAP and √ iSWAP Gates
The measurement of OTOCs requires faithful inversion of a given quantum circuit, Û .The "Sycamore" gate used in our previous work [33], equivalent to an iSWAP gate followed by a CPHASE gate with a conditional-phase of π/6 radians (rad), is an ill-suited building block for Û since its inversion cannot be easily created by combining Sycamore with single-qubit gates.On the other hand, iSWAP and √ iSWAP are commonly used two-qubit gates that can be readily inverted by adding local Z rotations: Here G is the two-qubit unitary corresponding to iSWAP or √ iSWAP and is the Pauli-Z matrix acting on qubit i (i = 1 or 2).Compared to Sycamore, realizing pure iSWAP and √ iSWAP gates requires the development of two additional capabilities on our quantum processor: 1.A significant reduction of the coherent error associated with the conditional-phase in Sycamore.2. The abilility to implement arbitrary single-qubit rotations around the Z axis.

Reducing Conditional-Phase Errors
We first describe the calibration technique for reducing conditional phase errors.As discussed in previous works [33], such a phase arises from the dispersive interaction between the |11 and |02 (or |20 ) states of two transmon qubits while their coupling strength g is raised to a finite value to enable a resonant interaction between the |10 and |01 states (Fig. S7A).Although eliminating this phase is possible by concatenating a Sycamore gate with a pure CPHASE gate having an opposite conditionalphase [48], this process would involve complicated waveforms and large qubit frequency excursions which are demanding to implement on a 53-qubit processor.Instead, we adopt an alternative approach based on the relatively simple control waveform used in Ref. [33] (Fig. S7A).This waveform sequence produces a Fermionic Simulation (FSIM) gate comprising a partial-iSWAP gate with swap angle θ, a CPHASE gate with conditional-phase φ and four local Z-rotations (Fig. S7B).
Next, we consider how the angles θ and φ depend on readily tunable waveform parameters such as the pulse duration t p and maximum qubit-qubit coupling g max .This is done by numerically solving the time-evolution of two coupled transmons with typical device parameters.The dependences of θ and φ on t p and g max are shown in Fig. S7C and Fig. S7D, respectively.We observe that although both θ and φ increase linearly with t p , their scaling with respect to g max is different: θ ∝ g max whereas φ ∝ g 2 max .For a given value of θ, it is therefore possible to reduce φ by increasing t p and decreasing g max while keeping t p g max constant.However, since longer gate operation is more susceptible to decoherence effects such as relaxation and dephasing, it is also desirable to minimize values of t p .As a result, we use t p = 12 ns, g max /2π ≈ 10 MHz for calibrating the √ iSWAP gate and t p = 36 ns, g max /2π ≈ 7 MHz for calibrating the iSWAP gate.Based on the simulation results, these choices would yield values of θ = π/4 rad, φ = 0.14 rad for the √ iSWAP gate and θ = π/2 rad, φ = 0.19 rad for the iSWAP gate.
The calibrated values of θ and φ associated with every qubit pair on the 53-qubit processor are shown in Fig. S7E and Fig. S7F, respectively.Each angle is measured via cross entropy benchmarking (XEB), similar to previous works [33,48].The median value of θ is 0.783 rad (standard deviation = 0.010 rad) for √ iSWAP and 1.563 rad (standard deviation = 0.027 rad) for iSWAP.These median values are very close to the target values of π/4 = 0.785 rad for √ iSWAP and π/2 = 1.571 rad for iSWAP.In the case of φ, the median value is 0.112 rad for √ iSWAP (standard deviation = 0.026 rad) and 0.136 rad for iSWAP (standard deviation = 0.055 rad).The median values of φ are close to predictions from numerical simulation and 4 to 5 times lower compared to the Sycamore gate.
The coherent error introduced by remnant values of φ is further reduced by adjusting other phases of a FSIM unitary U FSIM (θ, φ, ∆ + , ∆ − , ∆ −,off ), defined as: (S2) Here ∆ + , ∆ − and ∆ −,off are phases that can be freely adjusted by local Z-rotations.Imperfect gate calibration often results in an actual two-qubit unitary U a that differs slightly from the target unitary U t , leading to a Pauli error [33,49]: Here D = 4 is the dimension of a two-qubit Hilbert space.Given that our target unitaries are U t = U FSIM (π/4, 0, 0, 0, 0) for √ iSWAP and U t = U FSIM (π/2, 0, 0, 0, 0) for iSWAP, one may naively expect that ∆ + , ∆ − and ∆ −,off should all be set to 0 in U a in order to minimize r p .While this is indeed the case if φ = 0 in U a , it is not true when φ assumes a finite value φ a in U a .In fact, simple algebraic calculation shows that in such a case, the minimum value of r p occurs at (∆ + , ∆ − , ∆ −,off ) = (φ a /2, 0, 0), where it is a factor of 3 smaller compared to (∆ + , ∆ − , ∆ −,off ) = (0, 0, 0).By calibrating our system such that ∆ + = φ a /2 for every qubit pair, we estimate that the median Pauli error introduced by the conditional-phase to be only 0.07 % for √ iSWAP and 0.12 % for iSWAP.

Arbitrary Z-Rotations
We now describe the implementation of arbitrary single-qubit Z-rotations on the quantum processor.In addition to constructing the inverse gates √ iSWAP −1 and iSWAP −1 , Z-rotations are also used in removing native Z-rotations Z in the FSIM gate (Fig. S7B) as well as adjusting values of ∆ + to minimize conditional-phase errors.The procedure for incorporating Z-rotations Before Compilation into quantum circuits is two-fold: First, we recompile the circuits and combine all Z-rotations occurring after each two-qubit FSIM gate with the Z-rotations before the next FSIM gate, as shown in Fig. S8A.If any microwave-driven single-qubit gate such as R π/2 (χ) = e −i π 4 (cos χσx+sin χσy) occurs between the two FSIM gates, where σx and σy are Pauli-X and Pauli-Y matrices and χ is the phase of the microwave drive, we apply the equivalence R π/2 (χ)Z(ϕ) = Z(ϕ)R π/2 (χ − ϕ) to shift the rotation axis of the single-qubit gate and "push" the Zrotation through.This process effectively reduces the number of Z-rotations in the circuits by a factor of 2 and has been demonstrated to incur negligible degradation in the overall fidelity, since the only physical changes are modifications to the phases of the microwave drives of single-qubit gates [50].
The second step for implementing Z-rotations is different for iSWAP and √ iSWAP gates.In the case of iSWAP, the equivalence U FSIM (π/2, 0, 0, 0, 0)Z 1 (ϕ 1 )Z 2 (ϕ 2 ) = Z 1 (ϕ 2 )Z 2 (ϕ 1 )U FSIM (π/2, 0, 0, 0, 0) allows us to push Zrotations through each two-qubit gate by simply rearranging their phases (Fig. S8B).As a result, the Zrotations occurring in circuits with iSWAP gates are entirely virtual.In the case of √ iSWAP, such equivalence does not exist and we implement Z-rotations before the two-qubit gates using additional control flux pulses, as illustrated in Fig. S8C.Here, we detune the qubit frequencies from their idle positions by an variable amount ∆f and for a fixed duration t z = 20 ns before each √ iSWAP gate, leading to a Z-rotation ≈ Z (2π∆f t z ).

B. Gate Error Benchmarking and Cross-Talk Mitigation
The Pauli errors of the calibrated iSWAP and √ iSWAP gates are measured through XEB [33,39], which uses a collection of random circuits comprising in-terleaved two-qubit and random single-qubit gates.For each random circuit, the probability distribution of all possible output bit-strings is both measured experimentally and computed numerically.The statistical correlation between the two distributions ("cross-entropy") is then used to infer the total error of each quantum circuit.After measuring a sufficient number of circuit instances at different circuit depths, it has been shown that XEB reliably yields gate errors that are very consistent with values obtained from conventional characterization methods such as randomized benchmarking [19,33,48].
A key difference between the XEB process used in our current work compared to prior experiments [33] is the unitaries used in the numerical computation of the benchmarking circuits.Previously, such unitaries are freely adjusted for each individual qubit pair, whereby values of various FSIM angles θ, φ, ∆ + , ∆ − and ∆ −,off are optimized to obtain the lowest Pauli errors.In this work, we do not perform such an optimization step during gate error characterization and instead use a fixed unitary (U FSIM (π/2, 0, 0, 0, 0) for iSWAP and U FSIM (π/4, 0, 0, 0, 0) for √ iSWAP) for all qubit pairs.The gate errors characterized by the "fixed-unitary" XEB process include contributions from both incoherent sources such as relaxation and dephasing and coherent sources such as remnant conditional-phases.The adoption of a more stringent benchmarking criterion for gate errors is motivated by the fact that both coherent and incoherent errors can lead to imperfect reversal of quantum circuits and adversely impact the accuracy of OTOC measurements, in contrast to previous experiment in which coherent errors are compensated by modifying circuits in simulation [33].
The Pauli error rate r p per cycle (aggregate error of two single-qubit gates and one two-qubit gate) associated with each qubit pair is first measured in isolation.The results are plotted as integrated histograms in Fig. S9A and Fig. S9B, where we observe a mean (median) error of 0.0109 (0.0106) for iSWAP and 0.0106 (0.0089) for √ iSWAP.These higher errors rates compared to our previous work [33] are a result of enhanced incoherent errors due to longer pulses used in iSWAP and additional detuning pulses in √ iSWAP, as well as coherent errors arising from remnant conditional-phases.
We then repeat the same process but measure the error rates of different pairs simultaneously.We first observe, similar to previous work [33], a sizable increase in r p , with the mean (median) being 0.0193 (0.0163) for iSWAP and 0.0190 (0.0161) for √ iSWAP.To reduce these cross-talk effects, we first fit the XEB results to obtain the shifts in the two-qubit unitary associated with each individual qubit pair, which are often related to the single-qubit phases ∆ + , ∆ − and ∆ −,off .In a second step, instead of simply incorporating these shifts into classical simulation as was done previously [33], we add local Z-rotations into the quantum circuits to offset the unitary shifts.The parallel error rates are then re-measured with these Zrotations.The mean (median) value of r p for simultane-

C
Spin Echoes Q a : ous operation is reduced to 0.0140 (0.0131) for iSWAP and 0.0142 (0.0123) for √ iSWAP.The effects of the cross-talk correction can be readily seen in the "normalization" values used in the OTOC experiment.Figure S9C shows the configuration for a 53-qubit OTOC experiment.The corresponding OTOC normalization σy as a function of the number of cycles in a quantum circuit is shown in Fig. S9D.Without applying the additional Z-rotations for cross-talk correction, σy decays rapidly and falls below 0.1 after 8 cycles.After applying the additional Z-rotations, σy decays at a visibly slower rate and falls below 0.1 after 10 cycles.The slower decay of σy is indicative of more accurate inversion of the quantum circuit after cross-talk correction.

C. Dynamical Decoupling
We now describe a series of additional error-mitigation techniques used in the compilation of quantum circuits that further improve the accuracy of OTOC experiments.The first such technique is dynamical decoupling, motivated by long "idling" times at non-ground states for some of the qubits during the experiment.The most prominent of such qubits is the ancilla qubit, which remains idle throughout the time needed to implement the quantum circuit Û and its inverse Û † .Intrinsic decoherence of the ancilla qubit can in principle limit the circuit depth at which OTOC can be resolved, especially if the characteristic time T 2 is comparable to the total duration of Û and Û † .
To benchmark the intrinsic T 2 of the ancilla qubit during an OTOC experiment, we design the test circuits shown in Fig. S10A and Fig. S10B.In either case, we removed the two CZ gates otherwise present in an actual OTOC experiment such that the ancilla qubit does not interact with any other qubit apart from cross-talk effects.The difference between the two cases is that the ancilla qubit remains completely idle during Û and Û † in Fig. S10A, whereas a train of spin echo pulses X − X − Y − Y − X − X... is applied to the ancilla qubit during Û and Û † in Fig. S10B.
The y-axis projection of the ancilla qubit, σy , at the end of Û and Û † is measured both with and without the added spin echoes.The results are shown in Fig. S10C.We observe that without spin echo, σy decays rather quickly despite no entanglement between the ancilla and other qubits in the system, falling to ∼0 after 25 circuit cycles (corresponding to an evolution time t c ≈ 3.0 µs).The Gaussian shape of the decay at earlier times suggests that low-frequency noise is likely the dominant source of decoherence [51].On the hand, with the addition of spin echo, σy decays at a much slower rate maintaining a value of 0.78 even after 40 circuit cycles (t c ≈ 4.6 µs).By fitting σy to a functional form Ae − tc T 2 where A and T 2 are fitting parameters, we obtain a coherence time T 2 = 28.6 µs for the ancilla qubit, which is close to the 2T 1 limit of our quantum processor [33].Given this value of T 2 is significantly longer than all OTOC experimental circuits used in this work (the longest circuit lasts ∼5 µs), we conclude that changes in the ancilla projection σy in the OTOC experiments are indeed dominated by the many-body effects in Û and Û † rather than the intrinsic decoherence of the ancilla qubit itself.For all experimental results presented in the main text, spin echo is applied to the ancilla qubit.

D. Elimination of Bias in σy
The accuracy of OTOC measurements is particularly susceptible to any bias in σy of the ancilla qubit, i.e. a fixed offset d to the ideal value.Such a bias can, for example, be introduced by the different readout fidelities for the |0 and |1 states of the ancilla qubit.To see the impact of the bias on OTOC, we consider an ideal OTOC value of C 0 and an ideal normalization value of σy I .The ideal y-projection of the ancilla with the butterfly operator applied, σy B , in such a case is σy B ≈ C 0 σy I .However, in the presence of a bias, the measured projections become σy = σy I + d for the normalization value and σy = σy B + d with the butterfly operator applied.The experimental value for OTOC then Unbalanced State Preparation: Balanced State Preparation:  p = 0,  m = 0 and   p = 0 and ,  m = 0 and  We begin by measuring σy -bias due to asymmetry in readout errors.The test circuit is shown in the left panel of Fig. S11A under the label "unbalanced readout".We first project the qubit onto the equator of the Bloch sphere with a π/2 rotation, R π/2 (ϕ v ), where ϕ v is a variable phase.A second π/2 rotation around a fixed axis, √ X, is then applied and excited state population P ↑ is finally measured.The population is then converted to σy via σy = 2P ↑ − 1.The right panel of Fig. S11A shows σy as a function of ϕ v and a fit to a functional form σy = (1−2F r ) cos (ϕ v )+d r .Here F r is the average of the readout fidelities of the |0 and |1 states, and d r is their difference.For this unbalanced readout scheme, we obtain F r = 0.0962 and d r = 0.055.The observed difference in readout fidelities is consistent with past experiments where it was shown that energy relaxation of the qubit during dispersive readout generally leads to lower readout fidelity for the |1 state compared to the |0 state [52].
The bias d r = 0.055 is significantly reduced by adopting a "balanced readout" scheme, also shown in the left panel of Fig. S11A.Here, we measure the P ↑ of the ancilla qubit with the second gate in the test circuit being either √ X or √ −X.The results are then combined to obtain σy = (P ↑,+ − P ↑,− ), where P ↑,± is P ↑ measured with the second gate being √ ±X.The averaged σy is shown in the right panel of Fig. S11A as a function of ϕ v .A similar fit as before yields the same average readout fidelity F r = 0.0962 and a much reduced bias d r ≈ 0.0002.
The difference between the two state preparation schemes is illustrated in Fig. S11C, where we present the results of a 12-qubit OTOC experiment.The quantum circuit Û is non-integrable and composed of √ iSWAP and random single-qubit π/2 rotations around 8 different axes on the XY plane.A total of 40 circuit instances are used and the data shown are average values over all instances.The top panels show the measured values of σy for the normalization case and cases where a butterfly operator Z is successively applied to qubits 2 to 12 inbetween Û and Û † .The y-axis scale is intentionally limited to < 0.05.We observe in the case of unbalanced state preparation, the σy values exhibit sudden rise from 0 to >0.006 at cycles 29 to 38.This behavior is inconsistent with the effects of scrambling and decoherence, either of which is expected to lead to monotonic decay of σy toward 0 for a non-integrable process.In contrast, with balanced state preparation, σy indeed monotonically decays toward 0 at large cycles for all curves.
The bottom panels of Fig. S11C show the normalized average OTOCs, C, for each qubit.Here we observe that C obtained with the unbalanced state preparation scheme again manifests unphysical jumps from 0 at cycles 29 to 38, resulting from the finite bias σy and the mechanism outlined at the beginning of this section.In contrast, C obtained with the balanced state preparation scheme decays monotonically toward 0, in agreement with the scrambling behavior of a non-integrable process.These data suggest that a symmetrization step duration the state preparation phase of the ancilla qubit is needed to completely remove the bias in σy , in addition to the symmetrization step before readout.The physical origin of the remnant bias seen in the left panels of Fig. S11C is not completely understood at the time of writing, and could be related to control errors in the single-qubit gates on the ancilla qubit, incomplete depolarization of T 1 errors by the spin echoes, or other unknown mechanisms.

E. Light-cone Filter
Analogous to classical systems, quantum perturbations often travel at a limited speed (the "butterfly velocity").This typically results in a "light-cone" structure for many quantum circuits, which can be capitalized to reduce their classical simulation costs [53].Similarly, the light-cone structure of these quantum circuits may also be utilized to modify their implementations on a quantum processor and improve the fidelity of experimental results.In this section, we describe a light-cone-based circuit re-compilation technique that led to considerable improvements in the accuracy of experimental OTOC measurements.
Figure S12A displays the generic structure of an OTOC measurement circuit, where the component gates of the quantum circuit Û and its inverse Û † are explicitly shown.The butterfly operator possesses a pair of triangular light-cones extending from the middle of the circuit into both Û and Û † .Quantum gates outside these light-cones may be completely removed ("filtered") without altering the output of the circuit.Furthermore, since the measurement at the end of the circuit is also localized at a single qubit (Q 1 ), one may additionally discard quantum gates outside the light-cone of Q 1 originating from the right-end of Û † , without altering circuit output.The gates removed by the light-cone filter are shown with semi-transparent colors in Fig. S12A.In practice, some qubits have much longer idling times as a result of gate removal and become more susceptible to decoherence effects such as relaxation and dephasing.To mitigate such  effects, we also apply spin echo to qubits with long idling times, similar to the approach to the ancilla qubit in Fig. S10.
The effects of the light-cone filter on OTOC measure-ments are shown in Fig. S12B and Fig. S12C.The left panel of Fig. S12B shows the configuration for a 53-qubit OTOC experiment.Here we choose a quantum circuit Û with iSWAP and random single-qubit gates which are π/2 rotations around axes on the XY plane.The axes of rotation are chosen such that exactly 48 non-Clifford rotations (randomly selected from √ ±W and √ ±V ) occur in Û and Û † .All other single-qubit gates are Clifford rotations randomly selected from √ ±X and √ ±Y ).The number of circuit cycles is fixed at 11.The right panels of Fig. S12B show experimental results for 100 individual instances of Û , whereby σy for the normalization case is plotted at the top and σy with the butterfly operator (X) applied is plotted at the bottom.We observe significant enhancements in the amplitudes of σy after the application of the light-cone filter, with the normalization σy values averaging to 0.024 without the filter and 0.194 with filter.
The experimental improvement facilitated by the lightcone filter is more clearly seen by comparing the normalized OTOC values C with exact numerical simulation of the same circuits, as shown in Fig. S12C.Without the light-cone filter, the experimental C values are substantially different from simulation results.On the other hand, with the light-cone filter, the agreement between experimental and simulated values is much closer.We further quantify the effect of light-cone filter by plotting the differences between numerical and experimental values of C, , in Fig. S12D.Here we observe that the root-mean-square (RMS) value for is 0.156 without the light-cone filter, whereas it is reduced to 0.041 after filter is applied.This four-fold improvement in accuracy of OTOC measurements is a natural consequence of the reduction of number of gates in the overall quantum circuit (the number of iSWAP gates is reduced from 464 to 161 for the example in Fig. S12).Although the additional qubit idling introduced by the light-cone filter carries errors as well, they are expected to be much less than the errors of the removed two-qubit gates, particularly when spin echo is also applied during the idling.

F. Normalization via Reference Clifford Circuits
The last error-mitigation technique we use for the OTOC experiment is specific to quantum circuits Û composed of predominantly Clifford gates with a small number of non-Clifford gates.For such circuits, it is found through numerical studies that a modified normalization procedure yields more accurate values of OTOC (Fig. S13A): Consider a quantum circuit Û composed of mostly Clifford gates (iSWAP, √ ±X and √ ±Y ) and a small number of non-Clifford gates ( √ ±V and √ ±W ).We first measure the σy of the ancilla with a butterfly operator applied between Û and Û † (we denote this value as σy 2 ), same as before.In a second step, instead of measuring σy without applying the butterfly operator (denoted by σy 0 ), we measure σy with the same    S13B, where Û contains a total of 8 non-Clifford rotations and 14 cycles.Similar to the previous section, we present results from 100 circuit instances.Next, we process the data to obtain experimental val-ues of OTOC, C, in two different ways: First, we apply C = σy 2 / σy 0 , which corresponds to the normalization procedure used in the previous sections.Second, we apply C = σy 2 / σy 1 , corresponding to normalization using σy of a reference circuit.Here the absolute sign accounts for the fact that the theoretical OTOC values of Clifford circuits are ±1.The resulting C values are both plotted alongside exact simulation results in the left panel of Fig. S13C.It is easily seen that the second normalization procedure with reference Clifford circuits yields experimental values that are in much better agreement with simulation results.Indeed, the experimental errors (Fig. S13D) have an RMS value of 0.250 when C = σy 2 / σy 0 is used and 0.157 when C = σy 2 / σy 1 is used.Given these observations, we adopt normalization via reference Clifford circuits when measuring quantum circuits dominated by Clifford gates.
Lastly, we note that for the data in Fig. 3 and Fig. 4 of the main text, we apply a ensemble of reference circuits Ûref and use the average value of | σy | obtained from all Ûref to normalize σy of the actual quantum circuit Û .
The typical number of Ûref for each Û is 10 in Fig. 3 and varies between 15 and 70 for Fig. 4.

III. LARGE-SCALE SIMULATION OF OTOCS OF INDIVIDUAL CIRCUITS
In the last few years, there has been a constant development of new numerical techniques to simulate large scale quantum circuits.Among the many promising methods, two major numerical techniques are widely used on HPC clusters for large scale simulations: tensor contraction [44,[54][55][56][57][58] and Clifford gate expansion [6,7,59,60].All the aforementioned methods have advantages and disadvantages, which mainly depend on the underlying layout of the quantum circuits and the type of used gates.On the one hand, tensor contraction works best for shallow circuits with a small treewidth [58,61].On the other hand, Clifford gate expansion is mainly used to simulate arbitrary circuit layouts with few non-Clifford gates.Indeed, it is well known that circuits composed of Clifford gates only can be simulated in polynomial time [59], with a numerical cost which grows exponentially with the number of non-Clifford gates [6].Both methods can be used to sample exact and approximate amplitudes, with a computational cost which decreases with an increasing level of noise.For instance, approximate amplitudes can be sampled by slicing large tensor network and contracting only a fraction of the resulting sliced tensors [62,63].The final fidelity of the sample amplitudes is therefore proportional to the fraction of contracted slices [56,57], which can be tuned to match experimental fidelity.Similarly, it is possible to sample approximate amplitudes by only selecting the dominant stabilizer states in the Clifford expansion [7,60].
In our numerical simulations, we used tensor contraction to compute approximate OTOC values, which are then validated using results from the Clifford expansion for circuits with a small number of non-Clifford gates.Both methods are described in the following sections.

A. Numerical Calculation of the OTOC Value
As described in the main text and shown in Fig. 1(A), the experimental OTOC circuits have density-matrix-like structure of the form Ĉ = Û † σ(Q b ) Û , with σ(Q b ) being the butterfly operator.In all numerical simulations, we used iSWAPs as entangling two-qubit gates.Before and after Ĉ, a controlled-Z gate is applied between the qubit Q 1 (in Ĉ) and an ancilla qubit Q a (external to Ĉ): the OTOC value is therefore obtained by computing the expectation value of σy relative to the ancilla qubit Q a .To reduce the computational cost, it is always possible to project the ancilla to either 0 or 1 (in the computational basis).Let us call Ĉ0 = Ĉ ( Ĉ1 = σ(Q1) ) the circuit with the ancilla qubit projected on 0 (1).Therefore, the OTOC value σy can be obtained as: with |ψ 0 = Ĉ0 |+ and |ψ 1 = Ĉ1 |+ respectively.

B. Branching Method
To get exact OTOC value for circuit with a small number of non-Clifford rotations, we used a branching method based on the Clifford expansion.More precisely, recalling that OTOC circuits have a densitymatrix-like structure, that is it is possible to apply each pair of gates ĝt , ĝ † t to σ(Q b ) iteratively and "branch" only for non-Clifford gates.
At the beginning of the simulation, a Pauli "string" is initialized to all identities except a σ(Q b ) x operator (the butterfly operator) on the butterfly qubit Q b (in this examples, the Pauli X is chosen as butterfly operator), that is Whenever a pair of Clifford operators ĝt , ĝ † t is applied to P, the Pauli string is "evolved" to another Pauli string.For instance, an iSWAP operator evolves the Pauli string P = Î σx to P = iSWAP † Î σx iSWAP = − σy σz .On the contrary, if a non-Clifford operator is applied to P, the Pauli string P will evolve into a superposition of multiple Pauli strings, that can be eventually explored as independent branches.As an example, the non-Clifford rotation ĝt = √ W = √ X + Y will branch P = σx three times into P 1 = σx √ 2 , P 2 = σy √ 2 and P 3 = σz 2 respectively.Because ĝt , ĝ † t are always applied in pairs (one gate from U and one gate from U † ), the computational complexity depends on the number N D of non-Clifford gates After the applications of all {ĝ i } i=1, ...,t gates, the OTOC circuit Ĉ will be then represented as a superposition of distinct Pauli strings, each with a different amplitude.The full states |ψ 0 and |ψ 1 can be then obtained by applying the initial state |+ to the Clifford expansion of Ĉ and, therefore, the OTOC value from Eq. (S4).To reduce the memory footprint of our branching method, the initial state |+ is always applied to all Pauli strings once the last gate in U is applied.Therefore, our branching algorithm will output both |ψ 0 and |ψ 1 as a superposition of binary strings.Because some of binary string composing |ψ 0 and |ψ 1 may have a zero amplitude, (due to destructive interference), the number of binary strings n p composing |ψ 0 and |ψ 1 is typically smaller than the number of explored branches n b , that is OTOC experimental circuits are √ W ± and √ V ± , with W = X+Y and V = X−Y respectively, one can compute the expected scaling by assuming that, at each branching point, there is an homogeneous probability to find any of the four Pauli operators {I, X, Y, Z}.Because non-Clifford rotations branch only twice on Z, thrice on {X, Y } and never on I, the expected scaling is n b ∝ 3 to explore a given number of branches on single nodes of the NASA cluster Merope [64].While the interface of our branch simulator is completely written in Python, the core part is just-in-time (JIT) compiled using numba to achieve C−like performance.Our branch simulator also uses multiple threads (24 threads on the two 2 six-core Intel Xeon X5670@2.93GHznodes) to explore multiple branches at the same time and it can explore a single branch in ∼ 18 µs.Because multithreading starts for n b > 10 3 only, it is possible to see the small jump caused by the multithreading overhead.The inset of Fig. S14 shows the projected runtime on Summit by rescaling to Summit's R max [65] (R Summit max = 200,794.9TFlops, R Merope (single node) max = 140.64GFlops), assuming that |ψ 0 and |ψ 1 can be fully stored in Summit (seed Fig. S15).In Fig. S15, we report the number of elements different from zero in |ψ 0 and |ψ 1 .As one can see, the number of elements different from zeros scales as 2 0.79 N D and can be accommodated on single nodes (shaded area corresponds to the amount of virtual memory [RAM] used by the simulator).Because branches are explored using a depth-first search strategy, most of the virtual memory used by our branching algorithm is reserved to store |ψ 0 and |ψ 1 , it would be in principle possible to simulate between N D ≈ 50 and N D ≈ 60 non-Clifford rotations on Summit before running out of virtual memory (Summit has 10 PB of available DDR4 RAM among its 4,608 nodes [66]).

C. Tensor Contraction
Tensor contraction is a powerful tool to simulate large quantum circuits [44,[54][55][56][57][58].In our numerical simulations, we use tensor contraction to compute approximate OTOC values.It is well known that approximate amplitudes can be sampled by properly slicing the tensor network and only contracting a fraction of the sliced tensor networks [56,57,62,63].However, in our numerical simulations, we used a different approach to compute approximate OTOC values.More precisely, rather than computing one (approximate) amplitude at a time using tensor contraction, we compute (exact) "batches" of amplitudes by leaving some of the terminal qubits in the tensor network "open".Let us call κ a given projection of the non-open qubits and us define |ψ respectively.Therefore, we can re-define the OTOC value as a "weighted" average of partial OTOC values, that is: with and It is interesting to observe that Eq. (S6) corresponds to the correlation coefficient between two non-normalized states and that the exact OTOC value is equivalent to the weighted average of single projection OTOC values.Indeed, when all κ are included, Eq. (S5) reduces to Eq. (S4).However, because σ y needs to be O(1) to be experimentally measurable, few projection κ may actually be sufficient to get a good estimate of Eq. (S5).
In our numerical simulations, we left 24 qubits open and we used 20 random κ projections to compute an approximate OTOC value using Eq.(S5).If not indicated otherwise, medians and confidence intervals are computed by bootstrapping 1,000 times Eq.(S5) using 10 randomly chosen projections among the 20 available.Fig. S16 shows the comparison of approximate and exact OTOC values, by varying the circuit index (top) and by varying the number of non-Clifford rotations (bottom).As one can see, there is a great agreement between approximate and exact results across different circuits with different depth, layout and number of non-Clifford rotations (top).The bottom part of Fig. S16 shows the absolute error by varying the number of non-Clifford rotations.In particular, we are comparing results by averaging over single projections κ using Eq.(S6) (orange/light gray boxes) or by bootstrapping 1,000 times Eq.(S5) using 10 randomly chosen projections among the 20 available (blue/dark gray boxes).As expected, results become less noisy by increasing the number of used projections.
Due to the limited amount of memory in HPC nodes, slicing techniques are required to fit the tensor contraction on a single node and avoid node-to-node memory communication overhead [57,58].In our numerical simulations, we used cotengra [58] and quimb [67] to identify the best contraction (including slicing) and perform the actual tensor contraction respectively.Because each different projection κ may lead to a slightly different simplification of the tensor network (in our numerical simulations, we used the rank and column reduction included in the quimb library), we recompute the best contraction for each single projection.For each projection (regardless of the depth/number of iSWAPs) we fixed max repeats = 128 in cotengra and restarted the heuristics 10 times to identify the optimal contraction (with a hard limit of 2 minutes for each run), using 24 threads on a 2 six-core Intel Xeon X5670@2.93GHznode.We found that the runtime to the best contraction scales as 0.03N S − 3.68 minutes, with N S the number of iSWAPs in the circuit, that is less than 20 minutes for ∼ 600 iSWAPs.Fig. S17 shows the number of slices required to have the largest tensor in the tensor contraction no larger than 2 28 elements.Green stars in the figure correspond to the number of slices to com-RQC FIG.S17.Number of slices required to fit the largest tensor in memory during tensor contraction with a threshold of 2 28  elements by varying the number NS of iSWAPs (boxes extend from the lower to upper quartile values of the data, with a line at the median, while whiskers correspond to the 5%−95% confidence interval).Green stars correspond to expected number of slices to compute single amplitudes of random quantum circuits (RQC) with a similar depth [33,58].(Inset) Number of elements to store in memory for the largest tensor during tensor contraction.For sufficiently deep circuits, the number of elements for the largest tensor saturates to the size of the Hilbert space 2 n , with n = 53 the number of qubits in the Sycamore chip.
pute single amplitudes of random quantum circuits with a similar number of iSWAPs [33,58].Because OTOC circuits and the random quantum circuits presented in [33] share a similar randomized structure, the computational complexity mainly depends on the number of iSWAPs (OTOC slicing is slightly worse because of the open qubits).We may expect an improvement in the slicing by using novel techniques as the "subtree reconfiguration" proposed by Huang et al. [44].It is interesting to observe that, for deep enough circuits, the number of elements of the largest tensor in the tensor contraction saturates to the number of qubits (inset).
Fig. S18 summarizes the computational cost to compute an exact single projection κ.Top panel of Fig. S18 reports the actual runtime to contract a tensor network, by varying its expected total cost (including the slicing overhead) in FLOPs (expected FLOPs are using by cotengra to identify an optimal contraction).Dashed and dot-dashed lines correspond to the peak performance and expected performance of a 2 six-core Intel Xeon X5670@2.93GHznode.The sustained performance of 64% is consistent with similar analysis on the NASA cluster [57].Bottom panel reports the number of FLOPs (with and without the slicing overhead) by varying the number of iSWAPs.On the right y-axis, it is reported the extrapolated number of days to simulate OTOC circuits of a given number of iSWAPs, by assuming a sustained performance of 148,600 TFLOPs.Green stars corresponds to the total FLOPs (including the slicing overhead) to compute single amplitudes of random quantum circuits with a similar number of iSWAPs [58].(Top) Runtime (in seconds) to fully contract a single projection by varying the required number of FLOPs (results are for single nodes with 2 six-core Intel Xeon X5670@2.93GHz).Because long double are used throughout the simulation, we used the conversion factor 8 to convert FLOPs to actual time.(Bottom) Flops to contract a single projection by varying the number NS of iSWAPs, with and without the slicing overhead.Summit's days are obtained by using Summit's R peak [65] and the conversion factor 8 from FLOPs to actual time (because long double's have been used throughout the simulation).Green stars correspond to expected number of FLOPs to compute single amplitudes of random quantum circuits (RQC) with a similar number of iSWAPs [33,58].

IV. MARKOV POPULATION DYNAMICS
For a broad class of circuit ensembles the average OTOC can be computed efficiently (in polynomial time) on a classical computer.This appears surprising as computing the output of the random circuit is expected to require exponential resource.This contradiction is resolved by demonstrating an exact mapping of the average evolution of OTOC onto a Markov population dynamics process.Such connection was identified for Hamiltonian dynamics [11] and subsequently for simplified models of random circuits [23,24] where uniformly random two-qubit gate was assumed.In practice, we implement a specific gate set consisting of "cycles" of the form ij Ĝi Ĝj Ûij (θ, φ) applied to non-overlapping pairs of nearest neighbour qubits i, j .For each pair a cycle consists of two single qubit gates Ĝi , Ĝj and an entangling two-qubit gate, Ûi,j (θ, φ) = e − i 2 θ(XiXj +YiYj )− iφ 2 ZiZj , (S7) parameterized by fixed angles θ, φ.The random instances are generated by drawing single qubit gates from a specific finite set Ĝi .We consider two sets corresponding to generators of single qubit Clifford group { √ X ±1 , √ Y ±1 } and the set which in conjunction with any entangling two-qubit gate generates a universal set } introduced in Sec.I. Average OTOC can be calculated by first considering the dynamics of the pair of butterfly operators Ô(t) = U † ÔU evolved under circuit U .We introduce the average of the pair of the operators acting in the direct product of two Hilbert spaces of the two replicas of the same circuit: where averaging over ensemble of the circuits is denoted by (...), and the rightmost equation defines the shorthand notation.Analogously, one can introduce the higher order averages as a starting point to analyze the circuit to circuit fluctuations of C(t).
Then the value of OTOC is obtained by taking the matrix element of C(t) = ψ| M Ô(t) M Ô(t)|ψ with the initial state that in our experimental setup is chosen to be |ψ = n i=1 |+ i where |+ i is the symmetric superposition of the computational basis states.The average and the second moment of C(t) are obtained as a straightforward convolution of the indices in Eqs.(S8) and (S9) respectively.Operator Ô(4) can be also used to study the effect of the initial state to the OTOC (clearly its average is not sensitive to the initial conditions).

A. Symmetric single qubit gate set
We first consider average over uniformly random single qubits gate such that for any Pauli matrix, αi ≡ Ĝ † αi Ĝ = 0, (S10) We will analyze the specific discrete gate sets used in the experiment in Section IV D. In this section, we use the Latin indices to label a qubit, and the Greek ones to denote the corresponding Pauli matrices αi = { Xi , Ŷi , Ẑi } For a pair of operators as in Eq. (S8) there exist an analog of scalar product -a spherically symmetric combination that does not vanish after averaging, where we introduced "bond" notation, and the summation over the repeated Greek indices is always implied.
Analogously one can find non-vanishing averages of the Pauli matrices acting in four replicas space of the same qubit (we will omit the qubit label).There are in total six bilinear combinations α ⊗ β ⊗ 1 1 ⊗ 1 1 = δ αβ B (12) , B (12 13) , 34) , B (34 Four cubic invariants are possible because there is no inversion operation for the spin: (S13b) where αβγ is the three-dimensional Levi-Civita symbol.Because they would change sign under the inversion operation, the cubic invariants can appear only in pairs on neighboring qubits.
Finally, the three quartic invariants are This simple form implies the spherical symmetry of the single qubit averaging.For a lower symmetry (e.g.all the rotations of a cube) other quartic and cubic invariants are possible (like but we will ignore them for the sake of simplicity.

B. Efficient Population Dynamics for the Averaged OTOC
We expand average evolution of the pair of butterfly operators (S8) as, where normalization condition reads, the variable v i = {0, 1}, i = 1, 2, ..., n indicates whether or not Pauli matrices occupy qubit i, and P {vi} are formfactors.In other words, each ith is characterized either by ρ (0) i ≡ 1 1 ⊗2 i , "vacuum", or the "bond" B i .All other terms in the right hand side of Eq. (S14) vanish upon averaging over single qubit gates see Sec.IV A.
Some additional constraints can be extracted from the exact condition Convoluting operators in Eq. (S14), using 1 1 (S19) Preserving this condition in the update rule (S17) requires the elements in each row of the matrix Ω from Eq. (S16) to add up to one.Moreover, all the elements of Ω are non-negative and therefore not only is P {vi} (t) normalized but it is also non-negative.Therefore, rules (S17) and (S19) defines the Markov process and variables v i = 0, 1 correspond to the classical population of each qubit.As the non-populated states v i = v j = 0 do not evolve and the reproducing (01) → (11) and destruction (11) → (01) processes are allowed the problem (S17) is nothing but the classical population dynamics.The formfactors P v 1 ...v i v j ...v n , t are interpreted as a distribution function over an n bit register {v 1 ...v n }, subject to the Markov process defined by the update, Eq. (S16).
Direct solution of Eq. (S17) would require 2 n real numbers.However, unlike the original unitary evolution, the classical population dynamics involves only positive numbers constrained by normalization (S19).Such dynamics is very efficiently stimulated using a classical Monte Carlo type algorithm.
The butterfly operator is the starting point of the Markov process Eq. (S17).At long times the distribution P {vi} converges to the stationary state 4 1 1 ⊗2 i + 3B i , which corresponds to "vacuum" occurring on each site i with probability p 1 1 ⊗2 i = 1  4 and "bond" occurring with probability p (B i ) = 3/4.OTOC in this limit takes the value corresponding to the random matrix statistics [68].Intermediate dynamics of OTOC between these two limits is fully described by the Markov process Eq. (S17).It has a form of shock wave spreading from the initial butterfly operator to cover the whole system.Note that the choice of the two qubit gate parameter θ has a dramatic effect on the intermediate OTOC dynamics.As discussed in the main text θ = π/2 corresponds to the probability of butterfly operator to spread equal one, and therefore spreading with maximum velocity equal to the light cone velocity and saturating the Lieb-Robinson bound.At any other value of θ probability to spread is less then one which results in diffusive broadening of the front, and the center of the front propagates with a butterfly velocity that is smaller than the light cone velocity.

C. Sign Problem in the Population Dynamics for OTOC Fluctuations
As we already mentioned, the analytic calculation of the OTOC for an individual circuit is impossible.One can expect, however, that it is possible to express the variance of the OTOC in terms of some products of classical propagators similarly to the analysis of the mesoscopic fluctuations in the disordered metals.The purpose of this subsection is to show that even such a modest task can not be efficiently undertaken.
The starting point is the expansion in terms of the single qubit rotations invariants (S13).Similarly to Eq. (S14), we write for Ô(4) of Eq. (S9) where Qi is the vector with 14 = 1 + 6 + 4 + 3 operator components given by invariants of Eq. (S9): i , and V i are the 14 basis unit vectors so that 13 components equal to zero and the remaining component is 1.
In other words, each site can be in 14 possible states.
A straightforward generalization of the evolution equation (S17) reads where Ω is now 14 2 × 14 2 whose explicit form is known but not quite important for the further consideration.Equation (S22) involves 14 n real numbers.The only feasible path to the solution would be an efficient Monte Carlo sampling.Naively, one can hope to map the problem to the multicolored population dynamics.However, it is not possible as we explain below.
Configurations with V j , a, b = {3, 4} must be kept precisely the same.The requirement quickly becomes intractable with the increasing of the number of non-trivial matrices involved into cancellation.
To summarize, the general requirements on the evolution of the Ô(4) (t) lead to the sign and locality problems.Those two facts render a brute force classical Monte Carlo algorithm impossible.At present, we are not aware of any algorithm enabling to circumvent those obstacles.
where we introduce the basis Ξ Single qubit gate average can be described in terms of the invariants Ξ α → M (S) In this new basis {Ξ γ Ξ δ } instead of a 4 × 4 matrix Ω in Eq. (S16) effect of iSWAP gate is described by 16 × 16 matrix constructed straightforwardly from the rules Eqs.(S26) and the rules for iSWAP.

Universal Gate Set
The universal gates set used in the main text consists of eight choices of single qubit gates { For the butterfly operator we choose Ôi = Xi the dynamics is described in the reduced subspace spanned by the basis Λ = {1 1 ⊗2 , X⊗2 , Ŷ ⊗2 , Ẑ⊗2 }.Average of a pair of Pauli op- erators over this ensemble reads Λ Together with the rules for iSWAP gate it is straightforward to generate 16 × 16 matrix defining the Markov population dynamics process.Comparison of this noisefree population dynamics prediction for several different circuits with different locations of the butterfly operator X throughout the 53 qubit chip implemented experimentally is shown in Fig. S19.

V. EFFICIENT POPULATION DYNAMICS FOR NOISY CIRCUITS A. Inversion Error
The circuit to measure OTOC require inversion of gates Ûi,j (θ, φ) which is not perfect.An important source of error is non-invertible phase φ such that instead of Û † i,j (θ, φ) the gate Ûi,j (−θ, φ) is implemented.This inversion error can be included in the Markov population  dynamics process in the following way,

B. Generic Error Model
For an ensemble that satisfies Eq. (S10) the effect of relaxation and dephasing in the quantum processor can be captured by a one and two qubit depolarizing channel noise model, channel the Markov process Eq. (S16) is supplemented by the exponential decay rate as follows, for each bond per gate cycle.Two-qubit depolarizing channel noise is accounted by supplementing the Markov process by the following, Note that the effect of noise on the Markov population dynamics cannot be described by the global depolarizing channel model that is often conjectured for ergodic circuits.Instead the time dependence of OTOC will demonstrate characteristic time dependence that can be used to verify the experimental data.
The described procedure allows us to verify the experimental results for average fidelity of OTOC circuits, the circuit with no butterfly applied, by direct comparison to the noisy population dynamics, see Fig. S20 and Fig. S21.We use the two-qubit Pauli error as a fitting parameter.Best fit corresponds to errors within 10% of the average error of two qubit cycle measured independently.We use the extracted error to predict values of OTOC for every position of the butterfly with no additional fitting.Comparison of the values of normalized OTOC predicted in this way with experimental data is shown in Fig. S22.This procedure introduces substantial noise-dependent bias into the observed OTOC values, which is illustrated in Fig. S23.

C. Error Mitigation for Average OTOC
We estimate circuit fidelity from the circuit shown in, Fig. 1 A, B, without the butterfly operator.This fidelity estimate is used for the error mitigation procedure aplied to the data in Fig. 2. In the error-free circuit absence of the butterfly operator means that U and U † cancel each other exactly resulting in C z0 (t) = 1.In practice, inversion is imperfect as detailed in Sec.V A. Both errors in the unitary parameters and the effect of noise are reflected in the time dependence of C z0 (t) < 1 which serves as circuit fidelity, an analog of Loschmidt echo for local operator M .
Average over circuits reads, where the probabilities of vacuum F 1 1 ⊗2 1 and bond F B1 at the measurement site are described by the population dynamics Eqs.(S16) with respective decay rates Eqs.(S30-S34).Note that the decay rate grows with the extent of spreading of the butterfly operator, resulting in the decay of fidelity that is not a simple exponent.Moreover, in general F 1 1 ⊗2 1 , F B1 is not a simple probability no error occurred, as one could naively expect, see Fig. S20 for comparison.
The ratio of C zz /C 0z is compared to normalized data in Fig. S22.Note that this ratio does not correspond to the noise free OTOC, Fig. S23.This is because the probability of vacuum at measure site F 1 1 ⊗2 1 decays slower with respect to its noise free value p(1 1 ⊗2 1 ) than the probability of the bond F B1 .The latter corresponds to the weight of the operators which span the distance between measure and butterfly qubit which are more susceptible to noise.As a result normalized OTOC from noisy circuits overestimates the value of noise-free OTOC.For individual circuits such a simple description is no longer valid, as demonstrated by the dependence of fidelity estimate C z0 (t) on the circuit instance, see Sec.VI.In many cases circuit dependent corrections are relatively small and the normalization procedure still works for individual circuits as well.Nonetheless, for our OTOC fluctuations data we develop a more efficient error mitigation procedure described in the following section.

D. Theory of Error Mitigation for Individual Circuits
We use a more precise error mitigation procedure for individual OTOC measurements presented in Figs. 3, 4 of the main text.These circuits contain iSWAP entangling gate and OTOC value can be expanded in terms of contributions from Clifford circuits.In presence of non-Clifford gates the butterfly operator can be conveniently where κ α = {1, −1, −1, 1} for α = {0, x, y, z}.
For Clifford circuits ideal value of OTOC can be calculated efficiently.It can then be used to calculate circuit fidelity by comparing data with the expected value.The fidelity of the OTOC in presence of non-Clifford gates is then calculated by sampling a subset of OTOCs for Cliffords that appear in its expansion, see Eq. (S37).The fidelity is calculated by averaging over fidelities of individual Clifford contributions.Averaging reduces the circuit specific effect of noise and gives a more accurate estimate of fidelity.

VI. NUMERICAL SIMULATION OF ERROR-LIMITING MECHANISMS FOR OTOC
In this section, we provide numerical simulation results aimed at identifying the potential error sources that limit our experimental accuracy in resolving OTOCs.As demonstrated in Section I, shot noise from finite statistical sampling is unlikely the dominant mechanism.The remaining known error channels are: 1. Incoherent, depolarizing noise in the quantum circuits, which can arise from qubit dephasing or relaxation.2. Coherent errors in the quantum gates, e.g. the remnant conditional-phase φ demonstrated in Section II.We study each of these errors below.

A. Coherent and Incoherent Contributions
We first describe how OTOC error from depolarizing noise may be simulated.We consider a depolarizing channel model parameterized by an error probability p, where d = 2 n is the dimension, and I the identity operator.The Kraus operators of this map are all Pauli strings of length n, where each non-trivial string has weight p d 2 .We consider a model where after each two qubit gate (these typically dominate the loss in fidelity compared to single-qubit gates), a two-qubit depolarizing channel is applied.
In Fig. S24A, we show a number of instance-dependent OTOC values C simulated using full density matrix calculations and an experimentally measured Pauli error rate r p of 0.015 (r p = 15p/16 in Eq. (S39), due to the trivial Pauli string in the Kraus map from the identity matrix).Here we have used a smaller number of qubits due to the high cost of density-matrix simulation.To match with experiment, we have adjusted the number of circuit cycles to yield a total number of iSWAP gates close to those shown in Fig. 4 of the main text.The same normalization protocol as the experiment is also adopted, such that σy is simulated with and without the butterfly operator and their ratio is recorded as C. The results, compared to the ideal OTOC values also plotted in Fig. S24B, show little deviation.
Next, we simulate OTOCs of the same circuits with r p = 0 but an experimentally measured conditionalphase of φ = 0.136 rad on each iSWAP gate and its inverse.The results are also plotted in Fig. S24A and seen to deviate much more from the ideal values.To quantify these observations, we have plotted the OTOC error as a function of both r p and φ in Fig. S24B.Here we see that at the experimental limit (r p = 0.015 and φ = 0.136 rad), the OTOC error is dominated by the contribution from φ (where it is ∼0.03) rather than r p (where it is ∼0.006).Interestingly, the SNR is about 0.9 for φ = 0.136 rad, which is close to the value measured in Fig. 4 of the main text for N S ≈ 250.
Figure S24B also provides preliminary indication of how the OTOC accuracy for our experiments may be improved as we decrease both φ and r p .We see that the OTOC error is linearly proportional to r p whereas its scaling is steeper against φ.In particular, reducing φ by a factor of 2 leads to an OTOC error that is 3 times lower.Although these results may depend on the number of qubits and the structure of circuits, their similarity to experimental data nevertheless provides hints that reducing the conditional-phases in our iSWAP gates can potentially lead to large improvements in the OTOC accuracy.This can, for example, be achieved through concatenated pulses demonstrated in Ref. [48].

B. Perturbative Expansion of OTOC Error
The exact density-matrix calculation used in the preceding section is costly to implement and becomes quickly intractable as the number of qubits increase.One can more systematically estimate the instance specific noise contribution from a perturbation theory expansion of the quantum map Eq. (S39), with an entirely pure state calculation.Let us adopt the notation σ (i,j) m1,m2 = σ where C ideal is the OTOC value in the limit of no noise, C p the first order approximation of C ideal in parameter p, and n 2 the total number of two-qubit gates in the circuit, occurring over pairs defined by i, j (i.e.Eq. (S40) is a sum over all error terms in the circuit).For convenience, we have also redefined p → 15p/16 from Eq. (S39) due to the trivial contribution from the all zero Pauli string.
Eq. (S40) can be used to separate out the ideal OTOC value of an individual circuit, from the noise contribution.This noise contribution can be computed using 15 × n 2 circuit simulations (inserting each Pauli pair at all twoqubit gate locations in the circuit), though in some cases symmetries can be used to reduce this.For example, for the normalization curve C z0 , only around half of this is required, since for each error in the reverse circuit U † , there is an equivalent one in U .Of course one can continue this expansion to arbitrary order, however the number of terms quickly becomes infeasible to compute (the k'th order contains 15 × n2 k terms).In Fig. S25 we show the error scaling (comparing to an exact density matrix computation) for the zero'th (i.e.only the first term in Eq. (S40)) and first order approximation for C zx , giving scaling in the error as expected; the error is computed as |C p (exact) − C p (approx)|/|C p (exact)|, where C p (approx) is from Eq. (S40), and C p (exact) the value from the density matrix calculation with noise rate p.Moreover, the accuracy of the approximation remains fairly consistent, for a fixed noise level (p) for the three different circuit sizes (using a similar number of iSWAP gates as in the main text).
Here we have outlined a protocol of simulating the instance specific error contribution to the OTOC.This can be used to more accurately separate the instance specific OTOC value, systematic errors, and contributions from gate noise.Of course, the overhead in the simulation is a factor of 15n 2 , which can itself become challenging in deep enough circuits, although statistical sampling may be feasible in some cases.

20 FIG. 1 . 1 √ 2 (
FIG. 1. OTOC measurement protocol.(A) Measurement scheme for the OTOC of a quantum circuit Û .A quantum system (qubits Q1 through QN ) is first initialized in a superposition state j=N j=1 | + X j , where | + X j = 1 √ 2 (|0 + |1 ) j and |0 (|1 ) is the ground (excited) state of individual qubits.Û and its inverse Û † are then applied, with a butterfly operator (realized as an X gate on qubit Q b ) inserted in-between.An ancilla qubit Qa is initialized along the y-axis of the Bloch sphere and entangled with Q1 by a pair of CZ gates.The y-axis projection of Qa, σy , is measured at the end.(B) The structure of Û consists of K cycles: each cycle includes one layer of single-qubit gates (randomly chosen from √ X ±1 , √ Y ±1 , √ W ±1 and √ V ±1 ) and one layer of two-qubit entangling gates (EG).Here W = X+Y √ 2 and V = X−Y √ 2 .(C) Left panel: The filled circles represent σy measured with Q b successively chosen from Q2 through QN .The open grey circles are a set of normalization values, corresponding to σy without applying the butterfly operator.The data are plotted over different numbers of cycles in Û and averaged over 60 random circuit instances.Right panel: Experimental average OTOCs C for different Q b , obtained from dividing the corresponding σy by the normalization values.

2 FIG. 2 .
FIG. 2. OTOC propagation and speed of operator spreading.(A) Spatial profiles of average OTOCs, C, measured on the full 53-qubit processor.The ancilla qubit Qa and the the measurement qubit Q1 are indicated by the red arrows.The colors of other filled circles represent C with different choices of the butterfly qubit Q b .The two-qubit gates are iSWAP and applied between all nearest-neighbor qubits, with the same order as done in Ref. [33].The dashed lines delineate the light-cone of Q1.The data are averaged over 38 circuit instances.(B) Similar to (A) but with √ iSWAP as the two-qubit gates.Here C is averaged over 24 circuit instances.(C) Cycle-dependent C for 4 different choices of Q b .The top (bottom) panel shows data with iSWAP ( √ iSWAP) being the two-qubit gates.The colors of the data points indicate the locations of Q b (inset to bottom panel).Dashed lines show theoretical predictions based on a classical population dynamics model.
Fig. 2B is significantly different.Here the iSWAP gates are replaced with √ iSWAP gates and the decay of C is slower.Qubits far from Q 1 still retain average OTOC values closer to 1 even after 22 cycles.The sharp, steplike spatial transition seen with iSWAP is also absent for √ iSWAP.Instead, C changes in a gradual fashion as Q b

40 FIG. 4 .
FIG. 4. Classical simulation complexity of quantum scrambling.(A) Top panels show three different circuit configurations, each having the same Qa (black unfilled circle) and Q1 (black filled circle) but different Q b (colored circle) and number of cycles in Û .The number of iSWAPs in Û and Û † that affect classical simulation costs, NS, is indicated for each configuration.Bottom panels show the instancedependent OTOCs measured for each configuration.The dashed lines are simulation results using tensor-contraction methods.ND is also indicated for each configuration.(B) Top: OTOC signal size and experimental error as functions of NS.Bottom: Signal-to-noise ratio (SNR) as a function of NS.SNR equals the ratio of OTOC signal size to experimental error.ND = 64 for the first three values of NS and ND = 40 for the last two values.The reason for decreasing ND is to increase the signal size such that it can be resolved with less statistical averaging [40].(C) Estimated time tsim needed to simulate the OTOC of a single 53-qubit circuit with a variable number of iSWAPs, NS, on a single CPU core (6 Gflop/s).

6 FIG
FIG. S1.Full evolution of average OTOCs for iSWAP random circuits.The average OTOCs of the 53-qubit system shown for every cycle up to a total of 15.The black unfilled (filled) circle represents the location of the ancilla (measurement) qubit.The colors of the other filled circles represent the values of C for different locations of the butterfly qubit.The two-qubit gate used here is iSWAP and the data are averaged over 38 random circuit instances.
FIG. S2.Full evolution of average OTOCs for√ iSWAP random circuits.The average OTOCs of the 53-qubit system shown for every cycle up to a total of 24.The black unfilled (filled) circle represents the location of the ancilla (measurement) qubit.The colors of the other filled circles represent the values of C for different locations of the butterfly qubit.The two-qubit gate used here is √ iSWAP and the data are averaged over 24 random circuit instances.
FIG. S4.Transition into non-integrability in the XY Model.(A) A 2D arrangement of qubits in the form of two parallel 1D chains with 5 connections between each other.Black unfilled (filled) circle denotes the ancilla (measurement) qubit Qa (Q1).(B) Order for applying the two-qubit gates √ iSWAP.The qubit connections denote the √ iSWAP gates that are applied for a particular cycle.Additional cycles repeat the first three cycles, e.g.cycle 4 applies the same √ iSWAP gates as cycle 1 and so on.(C) Average OTOCs C with the butterfly qubit corresponding to qubits Q2 through Q16.The locations of the butterfly qubit are indicated by the colors of the data symbols and the colors of the qubits in (A) and (B).All data are averaged over 36 circuit instances.
FIG. S5.Contribution of finite sampling to experimental OTOC error.(A) Dependence of experimental OTOC error on the number of single-shots used to estimate σy , nstats.The dashed line shows expected statistical errors for different values of nstats.Here the number of iSWAPs is NS = 251 and the number of non-Cliffords is ND = 40.(B) Comparison of experimental OTOC errors and expected statistical errors for different values of NS.The experimental errors are reproduced from Fig. 4C of the main text.The statistical errors are calculated based on nstats and the average normalization value for each NS.

24 FIG
FIG. S6.OTOC errors for ND = 24.(A) OTOCs of 100 random circuit instances, C, for different values of NS.ND = 24 for all circuits.The dashed lines are exact numerical simulation results using the branching method.(B) Upper panel: OTOC signal size, experimental error and estimated statistical error as functions of NS.ND = 24 for all data included here.Lower panel: SNR (i.e.ratio of OTOC signal size to experimental error) as a function of NS.
FIG. S7.Calibrating "textbook" gates.(A) Schematic illustration of the waveforms used to realize pure iSWAP and √ iSWAP gates: The |0 → |1 transition frequencies of two transmon qubits, f1 and f2, are brought close to each other by adjusting the control fluxes to their superconducting quantum interference device (SQUID) loop.Concurrently, a pulse to the coupler's SQUID flux changes the qubit-qubit coupling g from 0 to a maximum value of gmax.The total length of the pulses is tp.(B) The generic FSIM gate realized by the waveforms in (A), composed of a partial-iSWAP gate with swap angle θ, a CPHASE gate with a conditionalphase φ and four local Z-rotations with angles α1, α2, β1, β2 each.(C, D) Simulated θ (C) and φ (D) as functions of gmax and tp.The simulation assumes an average qubit frequency 1 2 (f1 + f2) = 6.0 GHz and an anharmonicity of 200 MHz for each transmon qubit.The operating points for three different gates, Sycamore, iSWAP and √ iSWAP, are indicated by the the star symbols.(E, F) Integrated histogram (empirical cumulative distribution function, ECDF) of θ (E) and φ (F) for both the iSWAP and √ iSWAP gates.The data include all 86 qubit pairs on the processor.The x-axis location of each dashed line indicates the median value of the corresponding angle.
FIG. S8.Implementation of Z-rotations.(A) Circuit compilation procedure for reducing number of required Z-rotations.Z-rotations occurring after each FSIM gate are combined with the Z-rotations before the next FSIM gate.Phases on the microwave-driven single-qubit gates are shifted to maintain the same overall quantum evolution.ϕij denotes the angles of various Z-rotations and χi denotes the phases of microwave single-qubit gates.(B) Circuit compilation procedure for pushing Z-rotations through an iSWAP gate.Combined with (A), Z-rotations in circuits containing only iSWAP and single-qubit gates become entirely virtual.(C) Schematic illustration of the waveforms used to implement Z-rotations before each √ iSWAP gate.Compared to the waveforms in Fig.S7A, an additional control flux pulse is used to detune the qubits from their idle frequencies and thereby achieve a "physical" Z-gate.
FIG. S9.Fixed-Unitary XEB and cross-talk correction.(A, B) Integrated histograms (ECDF) of Pauli errors for both the iSWAP (A) and the √ iSWAP (B) gates.Three sets of values are shown for each case: the isolated (blue) curves represent the errors obtained by operating each qubit pair individually with all other qubits at ground state.The simultaneous, uncorrected (red) curves represent the errors obtained by operating all qubits at the same time, without any additional correction.The simultaneous, corrected (green) curves represent the simultaneous error rates after additional Z-rotations are included in the circuits to offset unitary shifts induced by cross-talk effects.Simultaneous error rates are obtained from four different configurations of parallel two-qubit operations, similar to Ref. [33].Dashed lines indicate the median locations of various errors.All error rates are obtained from XEB with pure iSWAP or √ iSWAP gate used in simulation, and include contributions from two single-qubit gates and one two-qubit gate.(C) An OTOC experimental configuration for evaluating the effects of cross-talk correction.The empty circle represents the ancilla qubit and the black filled circle represents the measurement qubit.All other qubits are represented by purple spheres.(D) The OTOC normalization value σy as a function of number of cycles in a quantum circuit Û for the configuration shown in (C), measured both with and without applying the cross-talk corrections.Û is composed of iSWAP and random single-qubit π/2 rotations around axes on the XY plane.
FIG. S10.Dynamical coupling in OTOC experiments.(A)Test experimental circuit for benchmarking the intrinsic coherence of the ancilla qubit.Compared to an actual OTOC circuit, the CZ gates between the ancilla qubit and measurement qubit is removed.(B) Same circuit as (A) with the additional of consecutive spin echo pulses to the ancilla qubit during the quantum circuits Û and Û † , in the form of random X or Y gates.(C) The projection of the ancilla qubit to the y-axis, σy , at the end of the circuit, measured with (red) and without (blue) spin echo.The bottom x-axis of the plot shows the number of cycles in Û ( Û † has the same number of cycles), and the top x-axis shows the corresponding total circuit duration tc.
FIG. S11.Unbiased measurements of σy .(A) Left panel shows the test circuits for characterizing σy arising from asymmetry in readout fidelities.Two different schemes are implemented.The conversion between the excited-state population(s) P ↑ and σy is described in the main text.Right panel shows the experimental values of σy as a function of the variable phase ϕv, obtained with both readout schemes.Dashed lines show fits to σy = (1 − 2Fr) cos (ϕv) + dr, where Fr and dr are fitting parameters.(B) Schematic of the full OTOC circuit showing two different state preparation schemes: in the unbalanced scheme, only one phase is used for the first R π/2 gate on the ancilla qubit.In the balanced scheme, two different phases are used.Balanced readout is used in both schemes.(C) Results of a 12-qubit OTOC experiment.Upper panels show σy at different number of circuit cycles.The black squares represent the normalization case and the colored spheres represent σy obtained with the butterfly operator (Z) successively applied to qubits 2 (Q2) through 12 (Q12).The location of the butterfly operator for each curve is indicated by the color legend on top.The normalized OTOCs, C, are shown in the lower panels.The data are the average values of 40 different random circuit instances.

FIG
FIG. S12.Re-compiling OTOC circuits with light-cone filter.(A) Schematic of an OTOC measurement circuit, including the component gates of the quantum circuits Û and Û † .The ancilla qubit and its related gates, as well as the √ Y gates used in the state preparation of all qubits, are omitted for simplicity.Gates shown with semi-transparent colors can be removed from the OTOC measurement circuit without altering its output.(B) Left panel shows a configuration for evaluating experimental effects of the light-cone filter.The unfilled circle represents the ancilla qubit and the black filled circle represents the measurement qubit.The purple filled circle indicates the butterfly qubit.The total number of circuit cycles is 11.Right panel shows the measured values of σy for 100 random circuit instances.Data obtained in the normalization case are shown on top and data obtained with the butterfly operator applied are shown at the bottom.The same number of repetitions (1 million) is used in all cases to estimate σy .(C) Normalized experimental OTOC values C for different circuit instances, plotted alongside exact numerical simulation results.(D) Experimental errors for different circuit instances, corresponding to the differences between experimental and simulated values.

FIG. S13 .
FIG. S13.Normalization via reference Clifford circuits.(A) Schematic of two quantum circuits: Û is the actual quantum circuit of interest, composed mostly of Clifford gates and a few non-Clifford gates ( √ ±V and √ ±W ).Ûref is a reference circuit with the same Clifford gates as Û .The non-Clifford gates in Û are replaced with random Clifford gates √ ±X and √ ±Y in Ûref .(B) Left panel shows an experimental configuration for comparing two normalization procedures.The black unfilled (filled) circle represents the ancilla (measurement) qubit.The purple filled circle represents the butterfly qubit.The total number of circuit cycles is 14.Right panel shows the measured values of σy for 100 random circuit instances.σy 0 denotes values obtained without applying butterfly operator to Û , σy 1 denotes values obtained with butterfly operator applied to Ûref , and σy 2 denotes values obtained with butterfly operator applied to Û .The same number of repetitions (4 millions) is used in all cases to estimate σy .(C) Normalized experimental OTOC values C for different circuit instances, plotted alongside exact numerical simulation results.(D) Experimental errors for different circuit instances, corresponding to the differences between experimental and simulated values.
butterfly operator but a different quantum circuit Ûref and its inverse Û † ref (denoted by σy 1 ).The reference circuit Ûref has the same Clifford gates as Û , whereas the non-Clifford gates in Û are replaced with Clifford gates chosen randomly from √ ±X and √ ±Y .Example data showing σy 1 , σy 2 and σy 3 are shown in Fig.
FIG. S14.(Top)Runtime (in seconds) to explore a given number of branches (results are for nodes with 2 six-core Intel Xeon X5670@2.93GHz).(Bottom) Number n b of explored branches by varying the number ND of non-Clifford rotations (boxes extend from the lower to upper quartile values of the data, with a line at the median, while whiskers correspond to the 5% − 95% confidence interval).The inset shows the projected runtime on Summit.
FIG. S15.Number np of bitstrings in |ψ0 and |ψ1 which are different from zero after applying the butterfly operator Ĉ = Û † σ(Q b ) Û to the initial state |+ , by varying the number ND of non-Clifford rotations.The shaded area correspond to the peak of the amount of virtual memory (RAM) used to simulate the OTOC circuits (95% among different simulations).

n 2 2 n 4 ,
which has been confirmed numerically in Fig. S14 (top).Fig S14 (bottom) shows the runtime (in seconds) FIG. S16.(Top) Comparison between exact and approximateOTOC values (circuits are ordered accordingly to the exact OTOC value over different circuits with different depths, layouts and numbers of non-Clifford rotations).The Pearson coefficient between exact and approximate OTOC values is R = 0.99987.(Bottom) Absolute error by varying the number ND of non-Clifford rotations (boxes extend from the lower to upper quartile values of the data, with a line at the median, while whiskers correspond to the 5% − 95% confidence interval).
FIG. S18.(Top)Runtime (in seconds) to fully contract a single projection by varying the required number of FLOPs (results are for single nodes with 2 six-core Intel Xeon X5670@2.93GHz).Because long double are used throughout the simulation, we used the conversion factor 8 to convert FLOPs to actual time.(Bottom) Flops to contract a single projection by varying the number NS of iSWAPs, with and without the slicing overhead.Summit's days are obtained by using Summit's R peak[65] and the conversion factor 8 from FLOPs to actual time (because long double's have been used throughout the simulation).Green stars correspond to expected number of FLOPs to compute single amplitudes of random quantum circuits (RQC) with a similar number of iSWAPs[33,58].
FIG.S19.Extended data for Fig.2in the main text: Average OTOC for four different locations of butterfly operator, X. Solid lines correspond to experimental data and dashed lines to the population dynamics simulation with Eq. (S27).
FIG. S20.Average fidelity of OTOC, i.e. the circuit with no butterfly applied shown as red line.OTOC fidelity from Monte Carlo simulation of noisy population dynamics Sec.V.The Pauli error rate rp = 0.013 is determined by fitting the numerics to the experimental data.This value is within 10% of the gate fidelity determined in a separate experiment benchmarking two-qubit gates, see Sec.II.This is the single parameter of the model which is used to reproduce the rest of the OTOC data for 51 different locations of butterfly operator on the chip.Black dashed line shows a naive estimate of circuit fidelity via product of fidelities of gates within the light cone.Note that naive fidelity decays much faster with time than OTOC fidelity.
FIG.S22.Extended data for Fig.2of the main text: Normalized OTOC for four different locations of butterfly operator Z. Solid lines show experimental data, and dashed lines correspond to the noisy population dynamics.
FIG. S23.Comparison of noisy OTOC obtained by normal-ization procedure, dashed lines, to noise-free population dynamics for the same circuit, solid lines.

| σ z 1 σ α 1 σ z 1 σ β 1 |+ 1 .
expanded into Pauli strings B i , O(t) = w α1...αn B α1...αn , B α1...αn = α1 ⊗ ... ⊗ αn For the initial state used in the experimental protocol |ψ = |+ i , the value of OTOC is expanded as, C = w αα2...αn w βα2...αn + 1 (S37) The real part of the OTOC corresponds to α = β, ReC = w 2 α1...αn κ α1 , (S38) FIG. S24.Simulation of incoherent and coherent OTOC errors.(A) OTOCs, C, of 47 quantum circuit instances simulated in three different ways: with φ error, where a conditional-phase of 0.136 rad is added to every two-qubit gate; with rp error, where a Pauli error of 0.015 is added to each two-qubit gate; ideal, where no error is introduced.The qubit configuration used here is a 1D chain of 10 qubits wherein Q1 and Q b reside at opposite ends of the chain.The two-qubit gate is iSWAP and ND = 12 non-Clifford gates are used for each instance.The number of circuit cycles is 34 (NS = 250).(B) The scaling of OTOC error (i.e.RMS deviation from the ideal OTOC values) against φ (red) and rp (blue).Dashed line shows the size of the OTOC signal (i.e.RMS value of the ideal OTOCs).

2 N 1 FIG
FIG. S25.Relative error of the 0'th and 1'st order error approximation (from Eq. (S40)) calculated by a pure state simulation, comparing to the exact value from a density matrix calculation, as a function of depolarizing probability p (Eq.(S39)).Here we compute Czx, varying the number of iSWAP (Ns) and non-Clifford gates (ND), with 11 qubits on a chain.The dots are the median over 48 circuits with lines of best fit given by the legend.The shaded region is the middle 50% of the data.The scaling of the error is close to ∝ p 2 as expected for the one-error approximation in all three cases.

15 ( 1 − p) n2− 1 (
m2 , where σ (i) m is the m'th Pauli operator (m ∈ {0, x, y, z}) applied on qubit i.We will also call C[σ (i,j) m1,m2 (d)] the OTOC value with the additional 'error gate' σ (i,j) m1,m2 inserted at layer d in the circuit.Then, to accuracy O(p 2 ) one hasC p = (1 − p) n2 C ideal + p m1,m2) =(0,0), i,j ,dC[σ (i,j) m1,m2 (d)], (S40) D. Population Dynamics for iSWAP Gate Sets Implemented in the Main TextCircuits with θ = π/2 were used to realize Clifford as well as a universal ensemble.It is therefore instructive to study dynamics of the specific gate sets used in the experiment in more detail.Consider conjugation of a pair of Pauli operators αi βj by the iSWAP gate Ŝ.It maps the pair onto another pair of Pauli operators γi δj according to the following rules, αi βj Xi 1 1 j Ŷi 1 1 j Ẑi 1 1 j Ẑi Xj Ẑi Ŷj αi αj Xi ŶjS † αi βj S − Ẑi Ŷj Ẑi Xj 1 1 i Ẑj − Ŷi 1 1 j Xi 1 1 j αi αj Ŷi Xj 1. Clifford Gate SetClifford gate set used to obtain the data in the main text is drawn form the single qubit gate set Average fidelity of OTOC, i.e. the circuit with no butterfly applied shown as red line.OTOC fidelity from Monte Carlo simulation of noisy population dynamics Sec.V.The Pauli error rate rp = 0.013 is determined by fitting the numerics to the experimental data.This value is within 10% of the gate fidelity determined in a separate experiment benchmarking two-qubit gates, see Sec.II.This is the single parameter of the model which is used to reproduce the rest of the OTOC data for 51 different locations of butterfly operator on the chip.Black dashed line shows a naive estimate of circuit fidelity via product of fidelities of gates within the light cone.Note that naive fidelity decays much faster with time than OTOC fidelity.