Quantum++: A modern C++ quantum computing library

Quantum++ is a modern general-purpose multi-threaded quantum computing library written in C++11 and composed solely of header files. The library is not restricted to qubit systems or specific quantum information processing tasks, being capable of simulating arbitrary quantum processes. The main design factors taken in consideration were the ease of use, portability, and performance. The library’s simulation capabilities are only restricted by the amount of available physical memory. On a typical machine (Intel i5 8Gb RAM) Quantum++ can successfully simulate the evolution of 25 qubits in a pure state or of 12 qubits in a mixed state reasonably fast. The library also includes support for classical reversible logic, being able to simulate classical reversible operations on billions of bits. This latter feature may be useful in testing quantum circuits composed solely of Toffoli gates, such as certain arithmetic circuits.


Introduction
Quantum computing is a disruptive technology that promises great benefits for a plethora of applications, ranging from medicine and chemistry to machine learning and simulation of physical systems. However, today's most advanced quantum computers are not yet large enough for performing universal quantum computation, hence their applicability is still limited. Being able to simulate small sized quantum computers is therefore of paramount importance, as it allows the scientist or engineer to understand better the results she or he would expect from a quantum machine of similar size, as well as providing a better understanding of quantum computing itself. Below we describe a quantum computing library that can be used in research or exploratory work in quantum information and computation. Quantum++, available online at https://github.com/vsoftco/qpp, is a C++11 general purpose quantum computing library, composed solely of header files. It uses the Eigen 3 linear algebra library and, if available, the OpenMP multi-processing library. For additional Eigen 3 documentation see http://eigen.tuxfamily.org/dox/. For a simple Eigen 3 quick ASCII reference see http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt.
The simulator defines a large collection of (template) quantum computing related functions and a few useful classes. The main data types are complex vectors and complex matrices, which we will describe below. Most functions operate on such vectors/matrices passed by value and always return the result by value, without ever mutating their arguments. The design PLOS ONE | https://doi.org/10.1371/journal.pone.0208073 December 10, 2018 1 / 27 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 is inspired from functional programming, where functions do not mutate their arguments and do not have side effects. Those design choices make the library ideal to use or integrate in multi-processing frameworks. Collection of objects are implemented via the standard library container std::vector<>, instantiated accordingly. We decided to avoid using a complicated class hierarchy and focus on a functional stylelike approach, as we believe the latter is more suitable for a relatively small API and allows the user to focus on the quantum algorithm design rather than on object-oriented design. In addition, there is absolutely no need for explicit memory allocations or usage of (raw) pointers. All allocations, initializations and release of resources are performed by the library, hence the user is not at risk of forgetting to de-allocate memory, use un-initialized objects, or overflowing buffers, which are the most common, dangerous and hard to diagnose mistakes in the world of C and C++ programming.
Although there are many available quantum computing libraries/simulators written in various programming languages, see [1] for a comprehensive list, what makes Quantum++ different is the ease of use, portability and high performance. The library is not restricted to specific quantum information tasks, but it is intended to be multi-purpose and capable of simulating arbitrary quantum processes. We have chosen the C++ programming language (standard C+ +11) in implementing the library as it is by now a mature standard, fully (or almost fully) implemented by the most common compilers, and highly portable.
Other unique features of Quantum++ include the ability of simulating classical reversible networks up to billions of bits (this feature may be useful in testing quantum circuits composed solely of Toffoli gates, such as certain arithmetic circuits in e.g. [2]), strong multi-threading abilities, as well as built-in support for higher dimensional systems (qudits) that allows treating qubits and qudits on the same footing.
In the reminder of this manuscript we describe the main features of the library, "in a nutshell" fashion, via a series of simple examples. We assume that the reader is familiar with the basic concepts of quantum mechanics/quantum information. For a comprehensive introduction to the latter see e.g. [3]. This document is not intended to be a comprehensive documentation, but only a brief introduction to the library and its main features. For a detailed reference see the manual available as a .pdf file in ./doc/refman.pdf. For detailed installation instructions as well as for additional information regarding the library see the Wiki page at https://github.com/vsoftco/qpp/wiki. If you are interesting in contributing, or for any comments or suggestions, please contact me.
Quantum++ is free software: you can redistribute it and/or modify it under the terms of the MIT License https://opensource.org/licenses/MIT.

Installation
To get started with Quantum++, first install the Eigen 3 library from http://eigen.tuxfamily. org into your home directory, as $HOME/eigen. Here we implicitly assume that you use a UNIX-like system, although everything should translate into Windows as well, with slight modifications. You can change the name of the directory, but in the current document we will use $HOME/eigen as the location of the Eigen 3 library. Next, download the Quantum++ library from https://github.com/vsoftco/qpp/ and unzip it into the home directory as $HOME/ qpp. Finally, make sure that your compiler supports C++11 and preferably OpenMP. For a compiler we recommend g++ version 5.0 or later or clang version 3.7 or later (previous versions of clang do not support OpenMP).
We next build a simple minimal example to test that the installation was successful. Create a directory called $HOME/testing, and inside it create the file minimal.cpp, with the content listed in Listing 1. A verbatim copy of the above program is also available at $HOME/ qpp/examples/minimal.cpp.
Next, compile the file using a C++11 compliant compiler. In the following, we assume that you use g++, but the building instructions are similar for other compilers. From the directory $HOME/testing type g++ -std = c++11 -O3 -Wall -Wextra -pedantic -isystem $HOME/ eigen -I $HOME/qpp/include minimal.cpp -o minimal Your compile command may differ from the above, depending on the C++ compiler and operating system. If everything went fine, the above command should build an executable minimal in $HOME/testing, which can be run by typing ./minimal. The output should be similar to the following: In line 5 of Listing 1 we include the main header file of the library qpp.h This header file includes all other necessary internal Quantum++ header files. In line 10 we display the state |0i represented by the singleton st.z0 in a nice format using the display manipulator disp().

Data types, constants and global objects
All header files of Quantum++ are located inside the ./include directory. All functions, classes and global objects defined by the library belong to the namespace qpp. To avoid additional typing, we will omit the prefix qpp:: in the rest of this document. We recommend the using of using namespace qpp; in your main .cpp file.

Data types
The most important data types are defined in the header file types.h. We list them in Table 1.

Constants
The important constants are defined in the header file constants.h and are listed in Table 2.

Singleton classes and their global instances
Some useful classes are defined as singletons and their instances are globally available, being initialized at runtime in the header file qpp.h, before main(). They are listed in Table 3.

Gates and states
Let us introduce the main objects used by Quantum++: gates, states and basic operations. Consider the code in Listing 2.  In line 5 of Listing 2 we bring the namespace qpp into the global namespace.
In line 9 we use the States singleton st to declare psi as the zero eigenvector |0i of the Z Pauli operator. In line 10 we use the Gates singleton gt and assign to U the bit flip gate gt.X. In line 11 we compute the result of the operation X|0i, and display the result |1i in lines 13 and 14. In line 14 we use the display manipulator disp(), which is especially useful when displaying complex matrices, as it displays the entries of the latter in the form a + bi, in contrast to the form (a, b) used by the C++ standard library. The manipulator also accepts additional parameters that allows e.g. setting to zero numbers smaller than some given value (useful to chop small values), and it is in addition overloaded for standard containers, iterators and C-style arrays.
In line 16 we reassign to psi the state |10i via the user-defined literal ket operator"" _ket(). We could have also used the Eigen 3 insertion operator ket psi(4); // specify the dimension before insertion of elements via << psi << 0, 0, 1, 0; or the Quantum++ library function mket(). In line 17 we declare a gate U as the Controlled-NOT with control as the first subsystem, and target as the last, using the global singleton gt. In line 18 we declare the ket result as the result of applying the Controlled-NOT gate to the state |10i, i.e. |11i. We then display the result of the computation in lines 20 and 21.
Next, in line 23 we generate a random unitary gate via the function randU(), then in line 27 apply the Controlled-U, with control as the first qubit and target as the second qubit, to the state psi. Finally, we display the result in lines 28 and 29.

Measurements
Let us now complicate things a bit and introduce measurements. Consider the example in Listing 3. In line 11 of Listing 3 we use the function kron() to create the tensor product (Kronecker product) of the Hadamard gate on the first qubit and identity on the second qubit, then we left-multiply it by the Controlled-NOT gate. In line 12 we compute the result of the operation CNOT ab (H � I)|00i, which is the Bell state ðj00i þ j11iÞ= ffi ffi ffi 2 p

Listing 3. Measurements
. We display it in lines 14 and 15.
In line 18 we use the function apply() to apply the gate X on the second qubit of the previously produced Bell state. Note that Quantum++ uses the C/C++ numbering convention, with indexes starting from zero. The function apply() takes as its third parameter a list of subsystems, and in our case {1} denotes the second subsystem, not the first. The function apply(), as well as many other functions that we will encounter, have a variety of useful overloads, see doc/refman.pdf for a detailed library reference. In lines 19 and 20 we display the newly created Bell state.
In line 23 we use the function measure() to perform a measurement of the first qubit (subsystem {0}) in the X basis. You may be confused by the apparition of gt.H, however this overload of the function measure() takes as its second parameter the measurement basis, specified as the columns of a complex matrix. In our case, the eigenvectors of the X operator are just the columns of the Hadamard matrix. As mentioned before, as all other library functions, measure() returns by value, hence it does not modify its argument. The return of measure is a tuple consisting of the measurement result, the outcome probabilities, and the possible output states. Technically measure() returns a tuple of 3 elements std::tuple<qpp::idx, std::vector<double>, std::vector<qpp:: cmat>> The first element represents the measurement result, the second the possible output probabilities and the third the output states. Instead of using this cumbersome type definition, we use the new C++11 auto keyword to infer the type of the result measured returned by measure(). In lines 24-29 we use the standard std::get<>() function to retrieve each element of the tuple, then display the measurement result, the probabilities and the resulting output states.

Quantum operations
In Listing 4 we introduce quantum operations: quantum channels, as well as the partial trace and partial transpose operations. , then display it in lines 11 and 12.
In lines 15-18 we partially transpose the first qubit, then display the eigenvalues of the resulting matrix rhoTA.
In lines 20-22 we define a quantum channel Ks consisting of two Kraus operators: |0ih0| and |1ih1|, then display the latter. Note that Quantum++ uses the std::vector<cmat> container to store the Kraus operators and define a quantum channel.
In lines 24-28 we display the superoperator matrix as well as the Choi matrix of the channel Ks.
Next, in lines 31-34 we apply the channel Ks to the first qubit of the input state rho, then display the output state rhoOut.
In lines 37-39 we take the partial trace of the output state rhoOut, then display the resulting state rhoA.
Finally, in lines 42 and 43 we compute the von-Neumann entropy of the resulting state and display it.

Timing
To facilitate simple timing tasks, Quantum++ provides a Timer<> class that uses internally a std::steady_clock. The program in Listing 5 demonstrate its usage. 28 std::cout << "seconds to un-permute the subsystems. \n"; A possible output of this program is: In line 11 of Listing 5 we change the default output precision from 4 to 8 decimals after the delimiter.
In line 14 we use the Codes singleton codes to retrieve in c0 the first codeword of the Shor's [[9, 1, 3]] quantum error correcting code.
In line 16 we declare an instance timer of the class Timer<>. In line 17 we declare a random permutation perm via the function randperm(). In line 18 we permute the codeword according to the permutation perm using the function syspermute() and store the result. In line 19 we stop the timer. In line 20 we display the permutation, using an overloaded form of the disp() manipulator for C++ standard library containers. The latter takes a std:: string as its second parameter to specify the delimiter between the elements of the container. In line 21 we display the elapsed time using the ostream operator<<() operator overload for Timer<> instances.
Next, in line 23 we reset the timer, then display the inverse permutation of perm in lines 24 and 25. In line 26 we permute the already permuted state c0perm according to the inverse permutation of perm, and store the result in c0invperm. In lines 27 and 28 we display the elapsed time. Note that in line 27 we used directly t.toc() in the stream insertion operator, since, for convenience, the member function Timer<>::toc() returns a const Timer<>&.
Finally, in line 30, we verify that by permuting and permuting again using the inverse permutation we recover the initial codeword, i.e. the norm difference has to be zero.

Input/Output
We now introduce the input/output functions of Quantum++, as well as the input/output interfacing with MATLAB. The program in Listing 6 saves a matrix in both Quantum++ internal format as well as in MATLAB format, then loads it back and tests that the norm difference between the saved/loaded matrix is zero. >> Norm difference: 0 The output of this program is: Note that in order to use the MATLAB input/output interface support, you need to explicitly include the header file MATLAB/matlab.h, and you also need to have MATLAB or

Qudit teleportation
As mentioned before, Quantum++ treats qubits and qudits on the same footing. Below is a relatively more advanced self-documented example that implements the teleportation protocol for qudits.
Listing 7. Qudit teleporation The output of this program is: In line 10 of Listing 8 we define a random density matrix on four qubits (dimension 16). In line 13, we compute the mutual information between the first and the 5-th subsystem (which does not exist). Line 13 throws an exception of type qpp::exception:: SubsysMismatchDim exception, as there are only four systems. We next catch the exception in line 16 via the std::exception standard exception base class. We could have also used the Quantum++ exception base class qpp::exception::Exception, however using the std::exception allows the catching of other exceptions, not just of the type Exception. Finally, in line 17 we display the corresponding exception message.

Classical reversible logic
Quantum++ provides support for classical reversible logic and circuits via two classes, Dynamic_bitset and Bit_circuit. The first is similar to the standard library std:: bitset, with the exception that the length of the bitset can be specified at runtime, whereas

Aliasing
Aliasing occurs whenever the same Eigen 3 matrix/vector appears on both sides of the assignment operator, and happens because of Eigen 3's lazy evaluation system. Examples that exhibit aliasing: Whereas the first line produces aliasing, it is not dangerous, since the assignment is done in a one-to-one manner, i.e. each element (i, j) on the left hand side of the assignment operator is solely a function of the the same (i, j) element on the right hand side, i.e. mat(i, j) = f(mat(i, j)), 8i, j. The problem appears whenever coefficients are being combined and overlap, such as in the second example, where mat(i, j) = mat(j, i), 8i, j. To avoid aliasing, use the member function eval() to transform the right hand side object into a temporary, such as mat = 2 � mat.eval(); In general, aliasing can not be detected at compile time, but can be detected at runtime whenever the compile flag EIGEN_NO_DEBUG is not set. Quantum++ does not set this flag in debug mode. We highly recommend to first compile your program in debug mode to detect aliasing run-time assertions, as well as other possible issues that may have escaped you, such as assigning a matrix to another matrix of mismatching dimensions etc.

Type deduction via auto
Avoid the usage of auto when working with Eigen 3 expressions, e.g. avoid writing code like auto mat = A � B + C; but write instead cmat mat = A � B + C; or auto mat = (A � B + C).eval(); to force evaluation, as otherwise you may get unexpected results. The "problem" lies in the Eigen 3 lazy evaluation system and reference binding, see e.g. http://stackoverflow.com/q/ 26705446/3093378 for more details. In short, the reference to the internal data represented by the expression A � B + C is dangling at the end of the auto mat = A � B + C; statement.

Optimizations
Whenever testing your application, we recommend compiling in debug mode, as Eigen 3 run-time assertions can provide extremely helpful feedback on potential issues. Whenever the code is production-ready, you should always compile with optimization flags turned on, such as -O3 (for g++) and -DEIGEN_NO_DEBUG. You should also turn on the OpenMP (if available) multi-processing flag (-fopenmp for g++), as it enables multi-core/multi-processing with shared memory. Eigen 3 uses multi-processing when available, e.g. in matrix multiplication. Quantum++ also uses multi-processing in computationally-intensive functions.
Since most Quantum++ functions return by value, in assignments of the form mat = f(another_mat); there is an additional copy assignment operator when assigning the temporary returned by f() back to mat. As far as we are aware, this extra copy operation is not elided. Unfortunately, Eigen 3 does not yet support move semantics, which would have got rid of this additional assignment via the corresponding move assignment operator. If in the future Eigen 3 will support move semantics, the additional assignment operator will be "free", and you won't have to modify any existing code to enable the optimization; the Eigen 3 move assignment operator should take care of it for you. Note that in a line of the form cmat mat = f(another_mat); most compilers perform return value optimization (RVO), i.e. the temporary on the right hand side is constructed directly inside the object mat, the copy constructor being elided.

Extending Quantum++
Most Quantum++ operate on Eigen 3 matrices/vectors, and return either a matrix or a scalar. In principle, you may be tempted to write a new function such as cmat f(const cmat& A){. . .} The problem with the approach above is that Eigen 3 uses expression templates as the type of each expression, i.e. different expressions have in general different types, see the official Eigen 3 documentation at http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes. html for more details. The correct way to write a generic function that is guaranteed to work with any matrix expression is to make the function template and declare the input parameter as Eigen::MatrixBase<Derived>, where Derived is the template parameter. For example, the Quantum++ transpose() function is defined as It takes an Eigen 3 matrix expression, line 3, and returns a dynamic matrix over the scalar field of the expression, line 2. In line 4 we implicitly convert the input expression A to a dynamic matrix rA over the same scalar field as the expression, via binding to a const reference, therefore paying no copying cost. We then use rA instead of the original expression A in the rest of the function. Note that most of the time it is adequate to use the original expression, however there are some cases where you may get a compile time error if the expression is not explicitly cast to a matrix. For consistency, we use this reference binding trick in the code of all Quantum++ functions.

Benchmarks
In this section we compare the performance of Quantum++ with two other widely used quantum software platforms that allow quantum simulation, namely IBM's Qiskit and the open source QuTiP. More specifically, we benchmark the time required to perform two widely used quantum operations, namely the partial trace and the quantum Fourier transform, respectively, as a function of the number of input qubits and number of CPU cores. Note that both vanilla versions of Qiskit and QuTiP do not seem to be using parallelization, so for a fair comparison the reader should only compare with the single-threaded curve(s) generated for Quantum++. All benchmark plots use logarithmic (base 2) scales for the time scale (expressed in seconds). We used an 8 core x86-64 Linux machine running Debian 9.5, with an Intel(R) Core(TM) i7-7700K CPU running at 4.20GHz and 16Gb of RAM. Quantum++ was compiled with g++ 6.3, whereas the Qiskit and QuTiP simulators were run using Python 3.5. All benchmark code from this section is available online at https://github.com/vsoftco/qpp/tree/master/ stress_tests.

Partial trace
In this subsection we benchmark how long it takes to perform a partial trace over the first qubit of an n-qubit matrix. Note that Qiskit does not provide a native partial trace function, so we only benchmark against QuTiP. The results are displayed in Fig 1.

Quantum Fourier transform
In this subsection we benchmark how long it takes to perform a quantum Fourier transform over the first qubit of an n-qubit matrix. The results are displayed in Fig 2.

Discussion
The fundamental data types in Quantum++ are non-sparse vectors and matrices. Most computationally-intensive activity performed by the library involves operations on such vectors and matrices. Whenever possible, the task is delegated to the highly-optimized Eigen 3 linear algebra library, e.g. when multiplying 2 matrices together. In addition, most loops in the code are parallelized via the OpenMP multi-processing library, if the corresponding flag is present at compile time. Those optimizations make Quantum++ highly efficient on multiple cores, as the benchmarks in Figs 1 and 2 show. As one of our referees pointed out, QuTiP performs relatively poorly for small number of qubits, most likely because of overhead introduced by the Python interpreter. What is surprising is that both QuTiP and Qiskit seem to be single-cored, even though, at least for QuTiP, one would expect aggressive parallelization via the BLAS library, as mentioned by http://qutip. org/docs/3.0.1/installation.html#optimized-blas-librariesl. Most likely the vanilla version of QuTiP comes with a NumPy library which is not built against BLAS.

Long term maintenance
We plan to keep all future releases of Quantum++ open source. We will continue to host the project on GitHub or on an equivalent versioning control system. We will publish new stable releases of the software whenever enough improvements or features have been accumulated since the previous stable release. We plan to keep Quantum++ active and we welcome everyone interested to collaborate.

Conclusions and future directions
As you may have already seen, Quantum++ consists mainly of a collection of functions and few classes. There is no complicated class hierarchy, and you can regard the Quantum++ API as a low/medium-level API. You may extend it to incorporate graphical input, e.g. use a graphical library such as Qt, or build a more sophisticated library on top of it. We recommend to read the source code and make yourself familiar with the library before deciding to extend it. You should also check the complete reference manual ./doc/refman.pdf for an extensive documentation of all functions and classes.
An interesting future direction is to allow GPU parallelization, however at the time of the writing this was beyond the scope of this project.