qTorch: The quantum tensor contraction handler

Classical simulation of quantum computation is necessary for studying the numerical behavior of quantum algorithms, as there does not yet exist a large viable quantum computer on which to perform numerical tests. Tensor network (TN) contraction is an algorithmic method that can efficiently simulate some quantum circuits, often greatly reducing the computational cost over methods that simulate the full Hilbert space. In this study we implement a tensor network contraction program for simulating quantum circuits using multi-core compute nodes. We show simulation results for the Max-Cut problem on 3- through 7-regular graphs using the quantum approximate optimization algorithm (QAOA), successfully simulating up to 100 qubits. We test two different methods for generating the ordering of tensor index contractions: one is based on the tree decomposition of the line graph, while the other generates ordering using a straight-forward stochastic scheme. Through studying instances of QAOA circuits, we show the expected result that as the treewidth of the quantum circuit’s line graph decreases, TN contraction becomes significantly more efficient than simulating the whole Hilbert space. The results in this work suggest that tensor contraction methods are superior only when simulating Max-Cut/QAOA with graphs of regularities approximately five and below. Insight into this point of equal computational cost helps one determine which simulation method will be more efficient for a given quantum circuit. The stochastic contraction method outperforms the line graph based method only when the time to calculate a reasonable tree decomposition is prohibitively expensive. Finally, we release our software package, qTorch (Quantum TensOR Contraction Handler), intended for general quantum circuit simulation. For a nontrivial subset of these quantum circuits, 50 to 100 qubits can easily be simulated on a single compute node.


Introduction
Experimental hardware for quantum computing has been steadily improving in the past twenty years, indicating that a useful quantum computer that outperforms a classical computer may eventually be built.However, until a large-scale and viable quantum computer has been built, numerically simulating quantum circuits on a classical computer will be necessary for predicting the behavior of quantum computers.
Such simulations can play an important role in the development of quantum computing by (1) numerically verifying the correctness and characterizing the performance of quantum algorithms [WS14, SSAG16, SSMAG16, TJD09, TJD11, Mis10], (2) simulating error and decoherence due to the interaction between the quantum computer and its environment [SSMAG16, GM13, VMH04, SMKE08, GZ13, TS14], and (3) deepening our understanding of the boundary between classical and quantum computing in terms of computational power, for which recent efforts for characterizing the advantage of quantum computers over classical computers [FH16, BIS + 16] serve as an example of this direction.
For our purposes, it suffices to consider the problem of quantum circuit simulation as one where we are given a quantum circuit and an initial state, with the goal of determining the probability of a given output state.Various approaches have been proposed for such simulation tasks.The most general method is to represent the state vector of an N -qubit state by a complex unit vector of dimension 2 N and apply the quantum gates by performing matrix-vector multiplications.This is essentially the approach adopted in, for example, [WS14,TJD09,SSAG16,PESV14].Such a method has the advantage that full information of the quantum computer is represented at any point during the circuit propagation.However, the exponential cost of storing and updating the state vector renders it prohibitive for simulating circuits of moderate sizes (e.g.40 qubits for qHipster [SSAG16] and 45 qubits by Häner and Steiger [HS17]).On the other hand, for a wide class of circuits with restricted gate sets and input states[Got98, AG04, Val01, TD01, BG16], efficient classical simulation algorithms are available.For example, the numerical package Quipu [GM13] has been developed for taking advantage of prior results [Got98, AG04,BG16] on the stabilizer formalism to speed up general quantum circuit simulation.Finally, path integral-based methods [RG06] have also been proposed-though they do not improve the simulation cost, they lead to reduced memory storage requirements.
Other than considering the gate sets involved, an alternative perspective of viewing a quantum circuit is through its geometry or topology.This perspective was initiated by the work of Markov and Shi [MS05] on simulating a quantum circuit via tensor network contractions (during the preparation of this manuscript, an implementation of this simulation approach was brought to our attention [McC16]).An advantage of viewing quantum circuits as tensor networks is that one can afford to ignore the particular kinds of quantum gates used in a circuit, and instead only focus on the graph theoretic properties.While it is known that general quantum circuits involving universal sets of elementary gates are likely hard to simulate on a classical computer [NC00], this geometric perspective often allows for the efficient simulation of a quantum circuit with a universal gate set, provided that it satisfies certain graph theoretic properties.At least one software implementation of the tensor network simulation of method exists [McC16].
Among others, treewidth is an important graph theoretic parameter that determines the efficiency of contracting a tensor network of quantum gates.A property of graphs that is intensely studied in the graph theory literature [RS91, BK10, BK11, GD04], the treewidth provides important structural information about a quantum circuit.Namely, if the circuit's underlying tensor network has treewidth T , it is shown in [MS05] that the cost of simulating the circuit is O(exp(T )).In [BIS + 16] treewidth is also used for estimating the classical resource needed for simulating certain quantum circuits.
Motivated by the importance of tensor networks in quantum circuit simulation in general (and for example quantum computational supremacy tests in particular), it becomes useful to have a circuit simulation platform singularly dedicated to tensor network contractions.One immediate challenge in contracting tensor networks is to find an efficient contraction ordering, which relies on finding a reasonable tree decomposition of the underlying graph (definitions are further discussed in Section 2).However, finding the optimal contraction ordering (or equivalently finding the minimum-size tree decomposition, or finding the treewidth of a graph) is NP-complete [ACP87]: therefore one must typically resort to heuristic methods when finding this decomposition.
For this study, we implemented a set of tensor network (TN) contraction schemes to characterize their performance and simulate quantum circuits.Two of these schemes are reported here, as the others were inferior to the two successful contraction schemes.However, there are likely other heuristic schemes that outperform our stochastic algorithm, and this is an avenue worth pursuing.For a large set of quantum circuits, our tensor network based methods are shown to be less costly than simulation of the full Hilbert space, by comparing to simulations using the LIQUi|> software package [WS14].We emphasize that the tests in this report give timing data for finding the expectation value of a measurement performed after implementing a quantum circuit, not for completely characterizing a the circuit's final state.
The remainder of the paper is organized as the follows.Section 2 sets up the definitions and notations used in the paper.Section 3 describes the heuristic methods used for contracting the quantum circuit tensor networks.Section 4 presents the example quantum circuits used as benchmarks for demonstrating the performance of our contraction algorithms.Section 5 gives results of comparisons between the qTorch contraction methods, and between qTorch simulations and LIQUi|>'s Hilbert space simulations.Lastly, we provide a detailed software guide for installing and using qTorch in Appendix C.

Preliminaries
In this section, we give some definitions.We will use standard graph theory terminologies.All graphs that we consider in this paper are undirected.We denote a graph as G(V, E) consists of the set of nodes The notions of treewidth and tree decomposition were introduced by Robertson and Seymour [RS91] as follows.
Definition 1 (Tree decomposition) A tree decomposition of a graph G(V, E) is a pair (S, T (I, F )), where S = {X i |i ∈ I} is a collection of subsets X i ⊆ V and T is a tree (with edge set F and node set I), such that The width of a tree decomposition (S, T ) is max i∈I |X i | − 1.The treewidth of a graph G is the minimum width among all tree decompositions of G.
Definition 2 (Tensor) In this context, a tensor is defined as a data structure with a rank k and dimension m.More specifically, each tensor is a multidimensional array with m k complex numbers.A tensor A i1,i2,i3,...i k has k indices, which take values from 0 to m − 1.
For our purposes, we use tensors of dimension four to store density matrices versus pure states, related to the fact that a single-qubit density matrix has four entries.Markov and Shi [MS05] provide additional details for preparing tensors from quantum gates, initial states, and measurement operators.
Definition 3 (Tensor Contraction) A tensor contraction is a generalized tensor-tensor multiplication.Here, A, a rank x + y dimension m tensor and B, a rank y + z dimension m tensor are contracted into C, a rank x + z dimension m tensor.
Definition 4 (Tensor Network) A tensor network is a graph G = (V, E) with tensors as vertices, and edges representing an index of the tensors.The rank of each tensor is given by the number of edges connected to it.An edge from one tensor to another indicates a contraction between the two tensors, and multiple connected edges indicate a contraction on multiple indices.
Here, we provide a visual example of a tensor network graph: rank: 3 rank: 3 rank: 5 rank: 3 rank: 2 Definition 5 (Contraction Scheme) A contraction scheme determines the order in which the tensor network is contracted.The ordering chosen for the contraction will greatly affect the computation and memory requirements.This is because some contraction orderings can result in much larger intermediate tensors than other orderings.
It is important to avoid tensors of large intermediate rank when contracting the network, as floating point operations grow exponentially with tensor rank.However, it is often the case that increasing the tensor rank is unavoidable.For example, a tetrahedron shaped graph of rank 3 tensors cannot be contracted without the rank of intermediate tensors increasing above 3.
To analyze the complexity of contracting a tensor network, we first create the linegraph L(G) of the network and then use Quick BB [GD04] to analyze its properties.

Definition 6 (Line graph)
There is a unique line graph L(G) for every graph G, with L(G) itself being an undirected graph.Each edge in G corresponds to a node in L(G).Two nodes in L(G) are connected if and only if these two nodes' corresponding edges in G are connected to the same node in G.
There exists an optimal tree decomposition of L(G) that provides the optimal contraction ordering of G.

Contraction schemes and implementation details
For many problems in quantum physics to which matrix product states (MPS) or other tensor network methods have been applied, an efficient contraction scheme is often obvious from the underlying structure of the Hamiltonian [Orú14].However, efficient contraction schemes are not available for arbitrary tensor networks.A general contraction scheme is important for the simulation of general quantum circuits, when one does not know a priori the topological properties of the underlying tensor network problem.Instead, for general circuits one must develop heuristics that define the contraction ordering.

Contraction schemes
qTorch implements two algorithms for determining the contraction ordering.For what we call the line graph (LG) method, we first create the line graph of the quantum circuit's graph.Then, the software package QuickBB [GD04] is used to determine an approximately optimal tree decomposition of this linegraph.QuickBB is a so-called anytime algorithm, meaning that it can be run for an arbitrary amount of time, such that when the program is stopped it provides the best solution found thus far.The resulting tree decomposition is used to define the order of contraction.This linegraph-based approach was previously described by Markov and Shi [MS05].
The second contraction scheme is stochastic.First, a wire is randomly chosen, which of course is connected to two nodes.If the rank of the contracted tensor is higher than the highest rank of the two nodes, plus a given threshold, the contraction is rejected.After a fixed number of rejected contraction attempts, the threshold is relaxed.
Other stochastic contraction schemes were attempted for this study, including schemes that would calculate a cost function based on how a given wire's contraction would affect the size of subsequent contractions.None of these outperformed the simple stochastic algorithm; because of this, none of them are included in qTorch.Nonetheless, it would be worth pursuing more sophisticated heuristics for contraction, as there likely exist other stochastic schemes that would outperform this first attempt at a simple stochastic algorithm.

Estimating the answer string
qTorch computes expectation values of the form x|C|y for some input and output states x and y and a quantum circuit C. (Alternatively, a measurement string can be determined as well, in e.g. a variational algorithm.)But to capture all the information of this final state of n qubits, it generally requires O(2 n ) repetitions of the algorithm.However, many quantities of interest may be calculated efficiently.For instance, the expectation value of one operator and one state can be estimated in just one contraction of the tensor network, a result essential to simulating the variational quantum eigensolver (VQE) [PMS + 14, WHT15, YCM + 14, MRBAG16].
qTorch provides a heuristic scheme to estimate the answer string of a variational algorithm such as QAOA, which we summarize here.Though this scheme is not used for the results presented in Section 5, it may be useful in the future for simulating algorithms (like QAOA) where the goal is to estimate a most likely bit string.
The scheme is implemented as follows.To begin, we run one simulation, and measure in the logical basis to project the first qubit into 0 or 1.Based on the resulting expected value from the simulation, we choose the value for the first qubit that has the greater probability.If the expected value is 0.5, we flip a fair coin to get the value.Then, we set the resulting qubit as the measurement for the first qubit in the next simulation, and repeat with a projective measurement on the second qubit.We continue this process for the rest of the qubits.As we will show below, this method often gives a good approximation of the most likely final computational basis state.In original tests on 3-regular graphs of 30 vertices, the scheme (used on Quantum Approximate Optimization Algorithm [QAOA] circuits) gave bit strings that provided good estimates to the solution of the Max-Cut problem (average approximation ratio of 94% compared to the exact brute force solution).
As a way to test the general applicability of this scheme, we performed some tests on more general circuits than the QAOA problem.These tests are meant to provide some insight into how useful this heuristic would be for estimating the most likely bit string of a quantum algorithm.We note that it is abundantly clear that this scheme will be very inaccurate in many cases-indeed, if it was a generally accurate scheme then we would have no need for a quantum computer.
In the remainder of this section, we consider the most likely bit string of the final state |ψ = computational basis state, where the number of simulations (i.e. the number of full tensor network contractions) required is linear in the number of qubits in the tensor network.We apply a unitary of the form where the matrix D j is a diagonal matrix with entries chosen randomly from the integers {1, where |α k | 2 is the probability of |0 that the algorithm obtains at the k th step.Let denote the probability distribution of the 2 n bit strings corresponding to the product state approximation |Ψ .Let i max be the (index of the) bit string returned by the algorithm.Then clearly i max = argmax i=0,...,2 n −1 p i .Let r be the place of i max in the actual distribution {p i }, namely r = 1 if i max is also the most probable bit string in {p i }, r = 2 if i max is the second most probable, r = 3 if it is the third most probable, and so on.We numerically investigate the distribution of r as well as the 1-norm distance p − p 1 between the approximate distribution p which the algorithm effectively assumes and the actual distribution p.
The results are shown in Figures 1 and 2.Here we use the number of qubits n = 6, the parameter m = 10 and p = 2.However, Figure 1 shows that most of the time our algorithm produces a high ranking bit string-roughly 90% of the time the output of the algorithm is among the top 10% most likely bit strings.Figure 2 shows that the 1-norm distance between the approximate and exact distributions is less than 0.1 for nearly all of the data points.These results suggest that our heuristic for for estimating an output bit string will produce acceptable results under some circumstances.10,000 trials) of how close the estimated most likely computational basis state is to the actual most likely computational basis state.In particular, the horizontal axis Ranking is the number of computational basis states in |Ψ with higher probability than the estimated state.We use the number of qubits n = 6, the parameter m = 10, and p = 2.

Noise models
Inserting a noise model into a quantum circuit would be straight-forward, as it ought to be possible to map any noise model onto a set of one-and multi-qubit gates.The most commonly used noise approximations assume uncorrelated noise, which allows for single-qubit gates to be used for modeling noise.In this case, because rank-2 tensors can always be contracted without increasing the rank of the resulting tensors, the complexity of simulating the resulting "noisy" quantum circuit would not increase.
A more physically realistic noise model would assume correlated noise, which would necessitate the insertion of two-qubit gates.In this case, the tree width of the circuit's underlying line graph, and hence the complexity of the problem, would increase in all but the most trivial cases.

Circuit simulations
In this section we describe the quantum circuits that were simulated for this paper.

Quantum approximate optimization algorithm / Max-Cut
Farhi, Goldstone, and Gutmann developed the quantum optimization approximation algorithm (QAOA) [FGG14a], meant to demonstrate a quantum speedup on low-depth quantum circuits.An overview of the algorithm is provided in Appendix A.1.Farhi and Harrow have shown that it ought to be classically hard to sample from the output of QAOA [FH16].Moreover, even though the QAOA algorithm was only recently invented, it has been applied to several quantum optimization problems.Wecker et.al developed an optimization algorithm based on QAOA that may have similar applications in quantum computers [WHT16].Farhi, Goldstone, and Gutmann applied QAOA to Max E3LIN2 of bounded occurrence, showing that a quantum computer will achieve results better than the best classical algorithm [FGG14b].Lin and Zhu generalize these results to more constraint satisfaction problems of bounded degree [YZ16].Most recently, Guerreschi and Smelyankskiy tested a series of classical optimization routines for use with QAOA [GS17].
To generate the graphs for Max-Cut, we wrote a random k-regular graph generator that places edges randomly throughout a given vertex set to satisfy a given regularity, checking for connectivity after all the edges have been placed.Disconnected graphs are rejected by the algorithm.QAOA/Max-Cut Quantum circuits based on these graphs are then trivial to construct.
In the numerical results of this paper, we report only the timing for a single contraction of each quantum circuit.A full analysis of QAOA is beyond the scope of this work.However, we note that once the graphs have been created, it is possible to use qTorch to optimize the QAOA angles using an optimization library.Finally, if one chooses, one can use qTorch to estimate a Max-Cut for the randomly-generated graph, using the most-likely Z-String estimation method of the previous section.

Hubbard Model
Quantum simulation of fermionic systems is arguably one of the most relevant applications of quantum computers, with direct impact on chemistry and materials science, especially for the design of new drugs and materials.Among all the algorithms proposed for quantum simulation of fermions, the quantum variational algorithm (VQE) and related approaches [PMS + 14, WHT15, YCM + 14, MRBAG16] hold greater appeal for near-term quantum devices due to their ability to correct certain types of errors and their lower coherence time requirements [MSCdJ16, OBK + 16].
As mentioned previously, in the VQE algorithm, a quantum computer is employed to prepare and measure the energy of quantum states associated to a parameterized quantum circuit.The approximate ground state of a Hamiltonian is obtained by variationally minimizing the energy with respect to the circuit parameters using a classical optimization routine.This hybrid quantumclassical approach offers a good compromise between classical and quantum resources.Classical simulations of the VQE algorithm for tens of qubits could provide insights into the complexity of the circuits used for state preparation and help design better ansatzes for the quantum simulation of fermions.
As an example of a VQE simulation, we employed our code to classically simulate variational circuits employed for the quantum simulation of 1D Hubbard lattices.We consider half-filled Hubbard models on N sites, with periodic boundary conditions.The Hamiltonian for these systems is given by where a † i,θ and a i,σ respectively create and annihilate an electron at site i with spin σ.The summation in the first term runs over nearest neighbors, denoted as i, j .These fermionic Hamiltonians can be mapped to qubit hamiltonians using an appropriate transformation, such as Jordan-Wigner or Bravyi-Kitaev [TSS + 15], which requires two qubits per site.
To construct variational circuits for these systems, we considered the variational ansatz introduced by Wecker et al [WHT15].In this case, the Hamiltonian in Eq. 4 is divided as H = h h + h U , where h h is the sum of hopping terms in the horizontal dimension and h U is the repulsion term (For 2D Hubbard lattices, the Hamiltonian also comprises vertical hopping terms).The variational circuit is constructed as a sequence of unitary rotations by terms in the Hamiltonian with different variational parameters.The sequence is repeated S times.In each step, there are two variational parameters, θ b U and θ b h , where b = 1, • • • , N such as: where U X (θ X ) denotes a Trotter approximation to exp(iθ X h X ) where X can be U or h.For our numerical simulations, we employed the variational circuit of Eq. 5 with S = 1 using a 1step Trotter formula for all the U X (θ X ) terms.Notice that this is only approximate for the h h term, which comprises a sum of non-commuting terms.We also assigned the value of 1 to all variational amplitudes.The corresponding unitary was mapped to a quantum circuit using the Jordan-Wigner transformation and the circuit was generated using a decomposition into CNOT gates and single-qubit rotations [NC00,WBAG11].The length of the sequence was reduced by

Results
Simulations were performed on NERSC's Cori supercomputer, using one node per simulation, each of which contains 68 cores and 96 GB of memory.Each LIQUi|> simulation was run on a full node as well, using Docker [Mer14].The free version of LIQUi|> allows for the simulation of 24 qubits.Because full Hilbert space simulation scales exponentially regardless of the quantum algorithm's complexity, we would not have been able to simulate more than ∼31 qubits on one of these compute nodes.For each set of parameters (regularity and number of vertices/qubits) 50 instances of Max-Cut/QAOA circuit were created.For higher qubit counts and higher regularities, only a subset of these circuits were completed, since many simulations ran out of memory.In this section, LG or qTorch-LG refer to the use of qTorch with the linegraph-based contraction, Stoch or qTorch-Stoch refer to qTorch with stochastic contraction.To determine a qTorch-LG contraction ordering, QuickBB simulations were run for an arbitrary time of 3000 seconds for each quantum circuit.The plotted qTorch results include only the contraction time, not the QuickBB run time.
We note that LIQUi|> implements many important optimizations, which makes it a fair benchmark against which to compare qTorch.For example, LIQUi|> fuses many gates together before acting on the state vector, and uses sparse operations.qTorch, on the other hand, does not yet use sparsity at all (even when the circuit consists primarily of sparse CNOT gates), which is one of several optimizations that we expect will further improve performance.
LIQUi|> is the fastest simulation method to use for the Hubbard simulations, as shown in Figure 3.This is because the tree width of the circuit's graph increases substantially with the number of qubits, even for these short-depth circuits.The result is not surprising-if the algorithm were easy to simulate with a tensor network on a classical computer, then it would not have been worth proposing as a candidate for a quantum computer.Simulation timing results for 3-, 4-, and 5-regular Max-Cut/QAOA circuits are shown using Tukey boxplots in Figures 4 and 5. Stoch and LG simulation times are of similar order of magnitude for these circuits, though LG is generally faster.The exception is the 3-regular graph problems, where Stoch apparently finds the more efficient contraction faster than QuickBB does.We note that if the QuickBB algorithm were run for infinite time before beginning the contraction, then qTorch-LG would always contract the circuit faster than qTorch-Stoch.Note that LIQUi|> begins to outperform tensor contraction methods once the algorithm is run on 5-regular graphs, because the increased circuit complexity leads to larger intermediate tensors in qTorch.
For 3-regular circuits (Figure 5), the LG method tends to be faster than the Stoch method.Using a single Cori NERSC node, we were able to contract quantum circuits of 90 qubits for a very small subset of the simulated graphs, though not on enough graphs to report statistics.Full Hilbert space methods would be limited to ∼30 qubits on these nodes, and indeed previous simulation packages have not yet surpassed 45 qubits [SSAG16, HS17], using thousands of nodes.
Interesting trends appear when the simulation time is plotted against regularity of the Max-Cut problem's graph (Figure 6).It is notable that the LG method runs out of memory before the Stoch method does.As previously mentioned, the LG method contracts more efficiently the longer QuickBB has been run, and we chose 3000 seconds as an arbitrary QuickBB limit for all circuits.In other words, there is a trade-off between running a longer QuickBB simulation and instead immediately using the Stoch method.Even with few qubits, at higher regularities the full Hilbert space simulation (using LIQUi|>) performs better.This is expected, since as the complexity of the quantum circuit increases, higher-rank tensors must be dealt with.
Figure 7 shows simulation time as the tree width upper bound increases, for Max-Cut/QAOA circuits of 18 qubits.These include 3-through 7-regular graphs.This tree width upper bound is simply the tree width of the tree decomposition that defines the contraction ordering.The plot clearly demonstrates the expected general trend of an increase simulation time with increased tree width, regardless of contraction scheme.
Finally, we note that we were easily able to perform simulations of 100 qubits for less complex graphs.To report one such example, we produced a random 3-regular graph with a slightly different procedure from that given in of Section 4.1.Beginning with a 2-regular graph (i.e. a ring) of 100 vertices, we added edges between random pairs of vertices until all vertices were of 3 degrees.Contracting this graph's Max-Cut/QAOA circuit took ∼150 seconds.
Figure 4: Simulation time plotted against number of qubits for Max-Cut/QAOA circuits.LG, Stoch, and LIQUi|> denote linegraph-based tensor contraction, stochastic tensor contraction, and the LIQUi|> software package, respectively.Tree decompositions for the LG method were determined by running the QuickBB simulation for 3000 seconds.For lower regularities, the tensor contraction methods outperform LIQUi|>, since LIQUi|> simulates the full Hilbert space.However, as the regularity of the Max-Cut graphs (and hence the tree width of the quantum circuits' line graphs) increase, full Hilbert space simulation using LIQUi|> becomes more efficient.LG and Stoch denote linegraph-based tensor contraction and stochastic tensor contraction respectively.For 3-regular Max-Cut/QAOA circuits, we were able to simulate a small subset of the 100-qubit circuits we created, not shown here.

Conclusions
We have implemented a tensor contraction code for the efficient simulation of quantum circuits.We compared a stochastic contraction scheme to one based on the line graph of the quantum circuit's graph, showing that the latter is more efficient in most situations.However, it is clear that in circuits for which calculating a good approximate optimal tree decomposition of the line graph takes longer than contracting the circuit stochastically, then the stochastic scheme is superior.As expected, qTorch becomes substantially faster than LIQUi|> (i.e. a full Hilbert space simulation) the smaller the treewidth of the tensor network's linegraph becomes.This is because tensor network contraction can simulate lower-complexity circuits with fewer floating point operations than Hilbert space methods can.
Several immediate algorithmic improvements are possible for this software.The use of sparse tensors would reduce the number of floating point operations for a large subset of relevant circuits.Another possible strategy may be to perform tensor contraction on some parts of the circuits, but to use full Hilbert space or stabilizer formalism for other parts of the network, in essence applying each algorithm to the pieces of the quantum circuit to which it would perform strongest.However, determining how to divide the circuit among different algorithms is unlikely to be a trivial task.Finally, more advanced parallelization methods would allow for faster calculation of a tree decomposition as well as faster contractions.This software may be integrated into larger packages such as qHiPSTER [SSAG16], LIQUi|> [WS14], ProjectQ [SHT16], or others, allowing for the simulation of a wider range of quantum circuits.

A Detailed Descriptions of Algorithms
A.1 QAOA The QAOA attempts to approximate solutions for satisfaction problems, in which one attempts to satisfy many clauses at once.The accuracy depends upon a parameter p; increasing it results in a better approximation.The relevant optimization problems, combinatorial in nature, are each defined by an objective function with a variable number of binary clauses and bits per clause.The goal when solving the problem is to maximize or minimize the number of clauses satisfied.Any objective function is defined by the sum of all of its n clauses: where z, a binary string of fixed length, is the input to optimize.Each clause C x (z), if satisfied, outputs 1 and if not, outputs 0. Usually, each clause C x (z) will only depend on a few of the bits in the string.
We create an operator U (C, γ), which is defined as Note that each operator in the product e −iγCx is local to the qubits acted on by the clause C x .Additionally, all operators in the sum commute because all are diagonal.Next, create a second operator U (B, β).The operator does not depend on the objective function (while U (C, γ) does) and is defined on q qubits by However, if p is greater than 1, there are p number of γ angles and p number of β angles, and 2p operators are applied to the state, from U (B, β 1 ) and U (C, γ 1 ) to U (B, β p ) and U (C, γ p ).The resulting state for parameter p is as follows: Once this state has been prepared, the expectation value for every clause C x is measured: This sum gives the cost of the objective function for the prepared state |γ, β ⊗ γ, β|; the goal is to choose the parameters γ('s) and β('s) such that the cost is maximized.The state |γ, β ⊗ γ, β| must be re-prepared and measured once for every clause in the objective function to determine the cost.This must be done at each step of the optimization procedure.

A.2.1 Definition
Max-Cut is a common optimization problems for graphs.The input is a graph represented by a vector of edges, G = E ij , where each edge connects two vertices in the graph.The solution is the binary string z where each bit corresponds to an edge, and z maximizes the number of edges "cut".A cut edge means that the two vertices connected by the edge have opposite values of zero and one or vice-versa.A completely optimal solution would cut every edge in the graph, but this is usually not possible.
To map the Max-Cut problem to a quantum computer, one represents each vertex in the graph by one qubit.Therefore, a graph with 30 vertices would require 30 qubits to represent.Now, we must define the objective function to be used in the QAOA optimization.The objective function for Max-Cut is as follows: where and each edge is represented by i and j, the qubits of the two vertices it connects.

A.2.2 Optimal Angles
Note that each clause in the objective function C ij only acts on two qubits, i and j.As a result, all of the terms that do not act on those two qubits in the preparation of the state |γ, β ⊗ γ, β| for each clause in the objective function will commute through and cancel to identity.For p = 1, the only terms left will be the terms that act on those two qubits.So, for each clause in the objective function, the resulting measurement will look like: However, as p increases the number of terms that commute through decreases.If p = 2, Û (C, γ) and Û (B, β) are applied twice, and the terms that don't commute will include not only the operators acting on qubits i and j, but also the operators that act on vertices that are neighbors to i and j.This is due to the fact that each Û (C x , γ) is a two qubit operator, so the first time Û (C, γ) is applied, each Û (C x , γ) will act on either qubit i or j and then on one other qubit.All other Û (C x , γ) operators will commute through.However, the second time Û (C, γ) is applied, any operator that acts on i, j, or the other qubits that were acted on by the first application of Û (C, γ) will not commute through.As a result, as p increases, more and more qubits will be needed to compute the cost of each clause in the objective function, which is necessary to find the optimal γ's and β's.
To maximize the value of the objective function, we construct the circuit necessary to simulate the cost of each clause in the objective function, represent the circuit in a tensor network, and contract the network.This must be repeated for every clause in the objective function to determine the complete cost.Finally, we take this calculation and feed it into the nlopt open source with the γ's and β's as the parameters to optimize.Once we have these optimal values, we can proceed with preparing the total state |γ, β ⊗ γ, β| and measuring it.

A.2.3 Max-Cut on 3-Regular Graphs
For 3-regular graphs, p can be interpreted as how much of the graph the algorithm sees when it maximizes the value of each of the objective function clauses [FGG14a].For example, as mentioned above, when p = 1, to calculate the cost of each clause in the objective function requires potentially fewer qubits.In the case of a 3-regular graph, the i and j vertices can only have a maximum of two other vertices each that they are connected to.Therefore, the maximum amount of qubits needed to compute the cost of each objective function clause is six.Below is the subgraph of this: i j For p = 2, to evaluate the cost of each clause in the objective function, the algorithm sees every vertex within distance two of i and j, while the other operators commute through.Therefore, the maximum number of qubits needed to compute the cost of each clause is 14:

A.3 Estimating Likely Z-String Example
We present here an example set of measurements for a four qubit ring graph when p = 1: The first measurement performed is: Consider the case where the expected value is 0.5 (ideal case).Then, we flip a coin.Say, we get 1 from the flip.Our next measurement is: Consider the case where the expected value is again 0.5 (ideal case).Then, by Bayes' Law, we can calculate the value of the second qubit: Therefore, Plugging in our values for P (qubit k ∩ qubits 0...k−1 ) = 0.5 and P (qubits 0...k−1 ) = 0.5, we obtain the probability of the second qubit being in the |0 0| state, which is 1.Therefore, we assign the value of our second qubit to 0. Repeating for a third time: Consider the case where we obtain an expected value of 0 (ideal case).Therefore, we can use Bayes' Law again to calculate that the value of qubit three should be 1.Repeating for the last time, we perform the final measurement: An expected value of 0.5 tells us that the value of qubit four should be 0 and that this answer string appears in the state |γ, β ⊗ γ, β| with a probability of 0.5.This high, nonzero probability tells us that 1010 is likely a correct solution for the Max-Cut of the four vertex ring graph.

B LIQUi|> benchmark
The quantum circuit simulations performed on LIQUi|> are used as benchmarks for comparing with our qTorch implementation.First, a .qasmfile describing the quantum circuit is converted to an F# script containing a function that runs the circuit in LIQUi|>.When the circuit has many gates (say, with more than 500 gates), multiple functions are constructed in the F# script, each containing a subset (say, 500 gates) of the quantum circuit.We then use LIQUi|> to first compile the functions(s) corresponding to (subsets of) the circuit, then run the circuit and finally compute the expectation value M of some user-specified operator M with respect to the final state of the circuit.We compute the total wall clock time of the three-step process and use it as a comparison to our qTorch performance.
Here are a few additional notes concerning the LIQUi|> benchmark: 1.For computing the expectation value M , we assume that M is a string of Pauli operators i.e. operators of the form Our method for computing M takes advantage of this property and runs in time that scales linearly as the number of non-zero amplitudes in the final state.Timing results from the benchmark circuits indicate that the overhead for computing M is only a small fraction (at most 10%) of the time spent running the circuit.
2. Some of the quantum circuits that we use for benchmarking are circuits for quantum chemistry simulations.Although LIQUi|> has a more optimized implementation specifically dedicated for quantum chemistry simulation, we choose not to take advantage of such implementation because our focus is on the average performance of simulating general quantum circuits.
3. Our LIQUi|> benchmark is performed using a Docker container to ensure that one could execute the programs regardless of the operating system used.The environment inside the container is Unix-like and mono is used for running Windows .exeexecutables.Due to the upper limit on the stack size imposed by mono, we restrict each function in our F# script generated from the .qasmfile to have at most 500 gates, and we use LIQUi|> to compile each function individually.This partition of the quantum circuit into segments of at most 500 gates renders the simulation speed possibly sub-optimal compared with the case where LIQUi|> is used for compiling the entire quantum circuit in one shot.However, the circuits that we simulate have been partitioned into no more than 10 sub-functions, thus we believe that our implementation should at least capture the order of magnitude of the optimized speed of LIQUi|> simulation.

C Software Guide for qTorch
The source code of qtorch can be obtained from the following URL: https://github.com/aspuru-guzik-group/qtorch

C.1 License
The qTorch software package is licensed under the Apache 2.0 license1 .

C.2.1 Dependencies
GCC/G++ version 4.9 or higher, Libtool or Glibtool (OSX), and GNU Make are the only external library dependencies.To check your GCC version, type gcc -version.If your computer runs OSX, it's likely that GCC links directly to a clang compiler.In that case, please ensure that clang is version 3.4 or higher using clang -version.The other dependencies, GNU make, and Libtool or Glibtool (OSX) can be installed using a local package installer such as yum or brew, or by following instructions online.However, it is likely that these packages are already installed.

C.2.2 Installing qTorch Using GNU Make
For convenience, an installation option is provided with the library.Please use the following commands inside the main directory qtorch for installing qTorch: make install or, to locally install for just one user,

make installlocal
This is necessary for running any simulations.If installation of nlopt-2.4.2 (the nonlinear optimization library used for QAOA angle optimization) fails, QAOA Max-Cut will not run.Next, we recommend the user set their shell environment variables as detailed in the README.txtfile to be able to run the compiled executables from any directory and extend the library to their custom .cppfiles.Additionally, a Makefile is provided.The available commands are: make all, make qtorch, make test, make cut, and make clean.The make all command compiles all three executables (tester, maxcutQAOA, and qtorch), and the make clean command removes any compiled executables.Please read the rest of the guide for further details on the other make commands.

C.2.3 Installing qTorch Using Docker
An alternative way of running qTorch is to run it inside a Docker container.This allows the user to use qTorch regardless of the local environment (operating system, package dependencies etc) of her machine.We provide the following Docker image on DockerHub therealcaoyudong/qtorch which contains the qTorch source code as well as the necessary libraries needed for building qTorch.
Here we provide a step-by-step instruction on how to use the Docker image, assuming no prior experience with Docker.
1. Install Docker on your computer.Also VirtualBox is needed; 2. In a command terminal, run docker-machine ls, which returns a list of virtual machines.
If the list is empty, create a virtual machine called default by running docker-machine create --driver virtualbox default.
3. Configure the shell to the virtual machine by running docker-machine env default.Run the command suggested by the returned message.
4. Now everything is set up, run the image by docker run -it therealcaoyudong/qtorch.It may take a while to download the image.When the download is complete, the shell header will look like for example root@c7d9e3be4c53#.The hash string after @ is the identifier of the Docker container started by the docker run command just executed.To see a list of running Docker containers, run docker ps.Note that in addition to a hash string, each container is also labels with a nickname (such as happy_einstein).
5. The Docker container is effectively an Ubuntu environment with qTorch installed and executable from any directory.The source code for qTorch can be located at /root/qtorch.The user is free to re-build qTorch as described in Section C.2.2.To see if qTorch is correctly installed, run qtorch anywhere inside the Docker container.

C.3 Library Introduction
The tensor contraction library provides a parallelized framework for users to simulate quantum circuits using a tensor network.Quantum circuits are translated from the QASM file format2 to a tensor network object and then contracted at the user's command, returning the probability of obtaining the measurement specified, not the amplitude.Additionally, the user must provide a list of measurements in a separate file: X, Y, Z, 0, 1, or T (Trace) on each qubit.Sample measurement files are provided for the user to peruse.Each contraction is irreversible, and the user cannot check the amplitudes of the wave-function at a specified time-step in the circuit.This is unfortunate but is also paramount in the tensor network's ability to calculate expected values for circuits with large numbers of qubits.Within the library, sample QASM files are provided for the user to look through and emulate.Furthermore, more information on how to write a QASM file compatible with the library exists in Section 2. Many gates are built into the library, but the user is free to design any arbitrary gate on one or two qubits.In this way, computation is universal, and any measurements can be executed.

C.4 Basic Rules of Writing QASM and Measurement Files
1.Each QASM file's first line must be the number of qubits in the circuit 2. Each successive line can either act on a qubit with an operator or define a one or two qubit operator 3. A corresponding measurement file must be provided with the QASM file.If it's blank, all qubits will be traced out 4. If the QASM file is incorrect, the tensor library will not simulate it

C.4.1 Acting On a Qubit with an Operator
The following operators are already supported by the library: CNOT, SWAP, Hadamard, Rx(θ), Ry(θ), Rz(θ), X, Y, Z, Depolarizing Noise Channel, CRk, CZ, and CPHASE(θ).A qubit index or two qubit indices for two qubit gates come after each operator.The only operators that have extra arguments are the Rx, Ry, Rz, and CPHASE gates.Here are a few examples of operations: • H 0 The software has the power to compile and execute arbitrary one and two qubit matrix operations.Note that when parsing these matrix files, the software does not check for unitary evolution.To define an arbitrary one or two qubit operator, the command def1 or def2 comes first, followed by the gate name and the path of the file the gate is located in.Here are a few examples: Within the gate file, the user must include 4 (one qubit) or 16 (two qubits) numbers separated by spaces.The numbers can either be complex and in the format (a, b), where the number is a + bi or real and formatted like a regular floating point number: 3.14159.Some example gate files are included for reference.The gate should only be defined once per QASM file, and after being defined, can be used within the file like any other one or two qubit gate.

C.4.3 Measurement Files
A measurement file must be provided in addition to an input QASM file.A measurement file consists of X, Y, Z, 0, 1, or T characters separated by spaces; these characters represent projection measurements.The order of the characters corresponds to the order of the qubits, i.e. the first character is a projection measurement on the first qubit.If the measurement file includes 5 measurements and an 8 qubit circuit is simulated, the last three qubits will be automatically traced out.Additionally, if the file path is incorrect or the file fails to open, all qubits will be traced out.

C.5 Simple QASM simulation
The main executable file (qtorch) provided allows the user to easily simulate an arbitrary QASM file without have to create their own tensor network object.The user writes an input script file that includes the path to their QASM input file, the path to their measurement file, the contraction type they would like to use (stochastic, line-graph ordering, or user-defined).If the contraction type is line-graph ordering, the user must also provide the amount of time they would like to run the tree decomposition algorithm (quick bb) for to determine the line graph ordering.If the type is user-defined, the user must provide the path to the contraction sequence.After the user writes the script file, they can run the main executable with the provided makefile.To run, simply type: make qtorch and then, if the shell PATH variable has been set, qtorch <global path to script file> or ./bin/qtorch<global path to script file> if the shell PATH variable was not set.Then, the output of the simulation will be printed to the console and to the file "output/qtorch.out".The user can specify a different (valid) output file as an input parameter in their script: >string outputpath /my/path/to/output.txt

C.5.1 Writing an Input Script
Every line in the script file begins with the character '>', otherwise, it will be ignored.To specify the QASM file path, the user types: >string qasm followed by the QASM file path.To specify the measurement file, the user types: >string measurement followed by the measurement file path.To specify the contraction method (either stochastic or line graph ordering), the user types >string contractmethod followed by either linegraph-qbb, user-defined, or simple-stoch.If the user chooses line graph ordering, the time cutoff for the ordering algorithm (Quick BB) may be specified to override the default of 20 seconds.To do that, the user types: >int quickbbseconds followed by the max time in seconds.If the user chooses user-defined, they must also specify the file which contains the contraction sequence with the line: >string user-contract-seq followed by the file path of the contraction sequence.What follows are some example input scripts.The user defined contraction sequence is specified by a wire elimination ordering, and each wire is defined by a pair of nodes.Therefore, each line the the user defined contraction sequence file contains two numbers separated by spaces, which represent the indices of the nodes connected by the wire.The indices of the nodes are specified as follows: the first n nodes for an n qubit simulation specify the initial state nodes, and the last n nodes specify the measurements.The other nodes, which are gates in the circuit are numbered by the ordering specified in the QASM file.A quick example ordering for a 2 qubit network with one 2 qubit gate would be: 0 2/n 1 2/n 3 2/n 4 2/n

C.5.3 Line-Graph Contraction and QuickBB Ordering
The Line-graph contraction method runs a binary executable in the library: quickbb_64 by default to determine the optimal wire elimination ordering of the tensor network.This can be changed to run quickbb_32 for a 32 bit system by inserting the line: >bool 64bit false into the input script file.To be able to run either executable, the user's system must be GNU/Linux and able to execute ELF executables.If this is not the case, we recommend using the library on a different operating system or sticking to the stochastic contraction method.

C.5.4 Modify the Threading for CPU Optimum Efficiency
The default number of threads used for large tensor contraction is eight, but the user has the option to modify this parameter for use on a supercomputer or a computer that supports more than eight threads.To change the number of threads, the user simply adds: >int threads <x> where <x> is the new number of threads to use.All threading is done via the C++ standard library's threading class, which uses pthread.

C.5.5 Understanding the Simulation Output
The output of the simulation, whether it's printed to the console, the default output file: "output/qtorch.out",or a customized output file path, is straightforward.If the simulation fails, any errors will be printed and the simulation aborted.If the simulation succeeds, the output file will contain three lines.The first line is the output of the contraction as a complex number (a, b), where the resulting amplitude is a + b * i.It's important to note that this number is the probability of reading the measurement string, not the amplitude of the string, as the simulation stores all values in density matrix format.The second line is the number of floating point operations that it took to contract the entire tensor network.This is for performance documentation if necessary.The final line is the time the simulation took in seconds, enclosed in brackets.

C.6 More Advanced QASM Simulation
To perform more advanced QASM file simulation, i.e. simulate a batch of files or modify simulation parameters, the user must write their own code and integrate the objects provided with the library under the qtorch namespace.We provide examples of how this can be done as a supplement to this guide.After reading, we encourage the user to try their own implementations or mimic our examples.First, it's important for the user to orient themselves with all the files in the library.

C.6.1 Network.h
The Network header file contains the Network object class.Each Network object represents an entire tensor network, comprised of both Wires (connections between tensors and Nodes (tensors), but the Network only stores pointers to all of the Nodes.The class also contains functions that parse the QASM and measurement files, contract two tensors, reduce the circuit to only nonadjacent two qubit gates, print the circuit to graph files formatted for visualization or treewidth calculations, get the final value from the simulation, and reset the circuit.For exact semantics, we suggest you look at the source code.
If the user would like to cut off a simulation that takes too long or restrict the simulation time, they can access the global variables: totTimer and maxTime located within the network class.Both variables are non-static and must be set for each different instance of a Network class.To use the timer, the user sets the value of maxTime to the maximum simulation time (in seconds).Then, the user starts the totTimer and calls one of the contraction methods.The contraction will automatically return after the maximum simulation time.

C.6.2 Node.h
The Node.h file contains the Node class and all of its inheritors.Each instance of the Node class is a tensor in the tensor network.The Node stores the data held in the tensor in a vector of complex doubles, and it stores connections to other nodes in the form of Wires.Therefore, when a Network parses a circuit, each gate becomes a Node.Therefore, an inheritance hierarchy exists in the Node class that helps create a CNOTNode when there's a CNOT in the QASM file or an HNode if there's an H in the QASM file.Once two tensors are contracted, their inner product becomes an IntermediateStateNode.The initial states of the qubits and their final measurements are also created stored as Nodes.

C.6.3 Wire.h
The Wire.h header file contains the Wire class, which is a simple class.Each Wire stores pointers to the two Nodes it connects and its unique ID.

C.6.4 LineGraph.h
This header file contains the LineGraph class.The user creates an instance of a LineGraph class with a pointer to a Network as an input parameter if the user plans to contract the Network using line graph ordering.The only functions in the class other than the constructor run Quick-BB, a branch and bound algorithm for treewidth and tree decomposition, to determine the optimal line graph ordering, or contract the network based on a Quick-BB ordering.

C.6.5 ContractionTools.h
This header files contains the ContractionTools class.If the user wishes to contract their tensor network with the stochastic method provided, they should create an instance of this class.The stochastic method is preferred for small circuits where an optimal ordering is unnecessary, or if quickbb is not available.To create a ContractionTools instance, the user can either provide an already created network as a parameter or their QASM file path and measurement file path.In this way, the user does not have to interact with the Network class at all if they choose.
The class contains a contract method, where the user supplies which stochastic method they would like to use to contract the network.Of the non-LineGraph methods, we recommend exclusively using Stochastic.Here's a quick example.
#include "qtorch.hpp"int main(){ qtorch::ContractionTools c("input.qasm","measure.txt");c.Contract(Stochastic); std::cout<<c.GetFinalVal(); } It also has a separate contraction function for the user to contract the tensor network using a wire ordering of their choice.The other methods in the class help the user visualize the tensor network or calculate the treewidth.The user can print the tensor network to a graph file and then call another method within the class to calculate the treewidth or treewidth bounds of the tensor network.The final function in the class allows the user to retrieve the simulation output as a complex double when the contraction finishes.

C.7 Calculating Treewidth
Calculating the treewidth of a tensor network is important because the complexity of contracting a tensor network is O(exp (treewidth) ).Therefore, we provide a Quick-BB binary executable that does this.However, our qtorch executable provides a simple wrapper around the binary executable.Please see C.5 for information on how to run Quick-BB using the linegraph method.The output file "output/qbb-stats.out"produced from running just Quick-BB will provide treewidth information on the underlying tensor network graph.To calculate the treewidth directly using the quickbb_64 or quickbb_32 binary executables3 , the user must first create a CNF file that specifies their graph.The advanced format of a CNF graph can be found here4 .However, for a more basic explanation, see the Quick-BB website 1 .Once the CNF file is created, the user runs the binary executable provided using the format detailed here 1 .If the PATH environment variable was set, the user can type: quickbb_64 or quickbb_32 from any directory".Otherwise, run quickbb from the bin directory: ./bin/quickbb_64or ./bin/quickbb_64The output statistics file will provide a bound on the treewidth.

C.8 Other Executables Included
We include two other executables: maxcut.cppand tests.cpp.If the user types the command: make test, it will compile and run the provided unit tests for the library.All test results will print to the "test.log"file in the output directory.The other executable, maxcut.cpp, is our implementation of QAOA (quantum approximation optimization algorithm) to solve instances of Max-Cut.We use our tensor network library to simulate QAOA and solve Max-Cut.To compile the maxcut executable, the user must use the command: "make cut".Please refer to the Examples section for how to run maxcut.

C.9 Extra Files
• Timer.h is a basic timer class where the user can access both wall and CPU time to time how long a simulation takes.Please see the actual header file for usage.
• preprocess.h is a file that we used for maxcut.The single method within it allows the user to determine a contraction sequence that runs faster than a provided maximum simulation time.
It repeats stochastic method contractions that return after the maximum time or before the maximum time if a solution is found.
• GraphGenerator library: The graph generator is an extra library that allows the user to randomly generate X-regular graphs using the C++ standard library's Mersenne Twister algorithm.To generate one graph, the algorithm runs in O(n * x), where n is the number of vertices in the graph, and x is the regularity of the graph.Within the library, there are two files: GraphNode.hand main.cpp.GraphNode.h is a simple struct that represents a single vertex in the graph being generated.The struct holds pointers to the other nodes in the graph the vertex is connected to and an integer that tracks the number of connections.The main.cpp file parses command line arguments provided by the user and runs the graph generation algorithm until all the graphs have been generated.At the start of the graph algorithm, n GraphNode objects are generated.Then, the algorithm attempts to place all of the (n * x/2) edges though random selection of two vertices.If the random generation results in two already connected vertices or one vertex that already has x edges, the algorithm will try again, quitting after one hundred failed attempts.The edge placement loop completes regardless of whether all the edges have been placed successfully.If there is a mistake, the graph is discarded, and the algorithm is run again until success.A makefile is included in the library that allows the user to compile the executable: main.cpp.The command: make all is sufficient to do this.To run the executable, the user should type: ./main<reg> <numGraphs> <numNodes> where the command line arguments: reg, numGraphs, and numNodes are respectively the regularity, the number of graphs to generate, and the number of vertices per graph.The user may find the generated graphs in the Output directory.Finally, if the user runs the executable twice without transferring the generated graphs to a separate directory, the graphs may be overwritten.
• maxcut.cpp:maxcut.cpp is an example of how tensor networks can be used to simulate QAOA to solve the Max-Cut problem.To compile and run the maxcut executable, the user must first install the nlopt (nonlinear optimization) library for C++, following the instructions in the folder included.Then, the user can run either the angle optimization (the first part of QAOA), the cut approximation calculator, or both.Type the command make cut to compile this executable.To run the angle optimization, the user must type the command: if the PATH variable was not set.Note that running this executable can take a long time for input graphs with many vertices or high regularity.We also recommend a QAOA p value of 1.The p value from the angle optimization and the cut calculator must match, and the number of angles provided for the cut calculator must be sufficient.

C.10 Examples
There are multiple example scripts provided for the simple QASM circuit simulation, as well as sample measurement files, sample circuits, and the maxcut example.We invite you to explore these within the Samples and Examples folders of our library.Please type chmod +x Examples/* before running any of the example scripts.

C.11 Troubleshooting
If any bugs with the library are encountered, please contact the authors via email: schuylerfried at gmail dot com, sawayanicolas at gmail dot com.

C.12 Further Improvements
We want to make this library as easy to use as possible, so if you have any improvements you want to suggest, please send us an email.In the future, we hope to provide better parallelization using MPI as well as sparse tensors.

Figure 1 :
Figure1: Histogram (10,000 trials) of how close the estimated most likely computational basis state is to the actual most likely computational basis state.In particular, the horizontal axis Ranking is the number of computational basis states in |Ψ with higher probability than the estimated state.We use the number of qubits n = 6, the parameter m = 10, and p = 2.

Figure 2 :
Figure2: Distribution (from 10,000 trials) of the 1-norm distance between the approximate distribution p arising from the product state approximation |Ψ in Equation 3 and the distribution p arising from the exact state |Ψ .We use the number of qubits n = 6, the parameter m = 10, and p = 2.

Figure 3 :
Figure 3: Time results for simulating quantum circuits of the Hubbard model.LG, Stoch, and LIQUi|> denote linegraph-based tensor contraction, stochastic tensor contraction, and LIQUi|>, respectively.LIQUi|>'s full Hilbert simulation method is substantially faster than either tensor contraction method.Missing data points resulted from running out of memory.

Figure 5 :
Figure 5: Simulation time plotted against number of qubits for 3-regular Max-Cut/QAOA circuits.LG and Stoch denote linegraph-based tensor contraction and stochastic tensor contraction respectively.For 3-regular Max-Cut/QAOA circuits, we were able to simulate a small subset of the 100-qubit circuits we created, not shown here.

Figure 6 :
Figure 6: Simulation time plotted against the regularity of the underlying Max-Cut graph, for Max-Cut/QAOA circuits.LG, Stoch, and LIQUi|> denote linegraph-based tensor contraction, stochastic tensor contraction, and LIQUi|>, respectively.As regularity increases, full Hilbert space simulation (using LIQUi|> ) becomes a more competitive simulation method.Circuit contraction via LG would be faster than Stoch in all instances, if QuickBB were to be run for infinite time.Missing data points resulted from running out of memory..

Figure 7 :
Figure7: Simulation time plotted against approximate tree width, for all simulated Max-Cut/QAOA quantum circuits of 18 qubits.The plot demonstrates the general trend of increased simulation time with the quantum circuit's line graphs's tree width, despite a constant number of qubits.
• • • , nm}.Here m is a parameter that can be interpreted as the number of clauses, if this were a QAOA problem.The elements of the p-dimensional vectors β and γ are drawn uniformly from [0, π] and [0, 2π] respectively.We use the construction of U p (β, γ) to emulate the form of parametrized unitary operations used in QAOA with the same p.Starting from the uniform superposition over all 2 n bit strings |s , we apply U p to compute the final state |Ψ = U p |s =2 n −1 i=0 ψ i |i .Let p i = |ψ i | 2denote the probability distribution associated to the QAOA-like output state |Ψ .Our likely string estimation algorithm can be thought of as treating the QAOA output state as a product state.Suppose we apply our algorithm onto the state |Ψ .The product state then reads B operators are applied to the initial state.If p = 1, there is only one γ and one β, and U C and U B are only applied once (the double hat notation is used to represent a Liouville superoperator): These are the only two operators used in QAOA, applied to the starting state |s = |+ ⊗q , where |+ = |0 +|1 √ 2 .In the tensor network framework, one starts with the state |0 0| ⊗q and prepares the state |s s| by applying a Hadamard gate to every qubit.The parameter p determines how many times the U C and U <GraphFile Path> <QAOA p value> <0> <file path to output angle file> if the shell PATH variable was set or ./bin/maxcutQAOA<GraphFile Path> <QAOA p value> <0> <file path to output angle file> if the PATH variable was not set.To run the cut approximation calculator, the user must type the command: maxcutQAOA <GraphFile Path> <QAOA p value> <1> <file path to input angle file> <file path to output answer file> <seconds to preprocess for (optional)> if the shell PATH variable was set or ./bin/maxcutQAOA<GraphFile Path> <QAOA p value> <1> <file path to input angle file> <file path to output answer file> <seconds to preprocess for (optional)> if the PATH variable was not set.Following the same pattern, to run both the angle optimization and cut approximator, the user must type maxcutQAOA <GraphFile Path> <QAOA p value> <2> <file path to output answer file> <preprocessing seconds (optional)> if the shell PATH variable was set or ./bin/maxcutQAOA<GraphFile Path> <QAOA p value> <2> <file path to output answer file> <preprocessing seconds (optional)>