Randomized methods to characterize large-scale vortical flow network

We demonstrate the effective use of randomized methods for linear algebra to perform network-based analysis of complex vortical flows. Network theoretic approaches can reveal the connectivity structures among a set of vortical elements and analyze their collective dynamics. These approaches have recently been generalized to analyze high-dimensional turbulent flows, for which network computations can become prohibitively expensive. In this work, we propose efficient methods to approximate network quantities, such as the leading eigendecomposition of the adjacency matrix, using randomized methods. Specifically, we use the Nystr\"om method to approximate the leading eigenvalues and eigenvectors, achieving significant computational savings and reduced memory requirements. The effectiveness of the proposed technique is demonstrated on two high-dimensional flow fields: two-dimensional flow past an airfoil and two-dimensional turbulence. We find that quasi-uniform column sampling outperforms uniform column sampling, while both feature the same computational complexity.


Introduction
Fluid dynamics is a rich and challenging field at the intersection of physics and engineering. There is still a tremendous amount that we do not understand about turbulence, yet working fluids are at the heart of nearly every major industry, including health, defense, transportation, and energy. One cannot overestimate the significance and impact that a better understanding of fluid flows would have in our ability to predict and manipulate their behavior in the real world. The challenge of modeling and controlling fluids stems from the fact that they are nonlinear with complex multi-scale interactions over a large range of spatial and temporal scales. Moreover, data from experiments and simulations are generally represented as exceedingly high-dimensional measurements that may obscure underlying patterns. With advances in simulation capabilities and measurement techniques, the volume and quality of such data are rapidly increasing.
Despite the complexity of the governing equations and the overwhelming volume of data, it is often observed that flows evolve on a low-dimensional attractor defined by a few dominant coherent structures [1]. There are a wealth of modal decomposition techniques to mine and characterize these structures from experimental data and numerical solvers [2,3]. The majority of techniques, such as proper orthogonal decomposition (POD) [1,4] and dynamic mode decomposition [5][6][7] are fundamentally linear, as are many of the standard techniques in control theory [8]. Linear control has been widely applied for flow control [9][10][11], for example to stabilize boundary layers. However,

Motivation
The application of network theory in fluid dynamics faces the challenge of an exceedingly large number of degrees of freedom (nodes) for a fully turbulent flow, leading to an even larger adjacency matrix of edges. Indeed, the adjacency matrix for a vortical flow network scales as the square of the number of fluid grid points, quickly becoming intractable to store and analyze using classical techniques from linear algebra. Recent approaches have been developed to manage this complexity, including building a graph on the modes of a flow [35], given by dominant fluid coherent structures from POD, rather than the nodes, given by fluid grid elements. However, there is still interest in developing scalable methods to directly analyze the vortical network in the original ambient measurement space, where nodes are grid elements. Randomized methods for linear algebra provide an emerging alternative to efficiently compute an approximate eigendecomposition of large-scale matrices. These methods work with a reduced representation, a so-called sketch, of the input matrix that captures the essential spectral information. This sketch is obtained either through random sampling or by random projections. The sketch can then be used to efficiently compute a desired low-rank approximation, such as the singular value decomposition. For a comprehensive survey and implementation details see [36][37][38][39][40].  Figure 1: Identification of dominant eigenvalues and eigenvectors of the adjacency matrix obtained from a fluid flow field using randomized methods. Path I randomly samples columns of the adjacency matrix to approximate the leading eigenvalues and eigenvectors; path II uses both column and row sampling to form a sketch. Note, it is not required to explicitly construct the full adjacency matrix.

Contributions
In this work, we use randomized methods to analyze large networks arising in fluid mechanics. Specifically we are interested in using randomized methods to obtain a qualitatively correct view of the dominant graph structures, which can then be used for the downstream tasks of sensor and actuator placement, optimization, and control. This application domain is the subject of the present work, where we explore the use of random sampling and randomized linear algebra to generate qualitatively accurate estimates of dominant flow structures for cases where the adjacency matrix is exceedingly large. We compare both uniform and quasi-uniform sampling, and demonstrate this approach on two high-dimensional vortical flow networks. A high-level principle sketch is provided in Figure 1.

Vortical interaction network
Recently, network analysis has been employed to represent and analyze vortical interactions in flows [15,17,19,35,41]. In Nair and Taira [15], spectral sparsification was used to identify a lowdimensional skeleton underlying the high-dimensional, nonlinear fluid dynamics. Taira et al. [17] later demonstrated scale-free behavior of the vortex interaction network for two-dimensional isotropic turbulence. Network analysis has also been used by Meena et al. [19] for community detection and force prediction in an unsteady wake. More recently, interaction networks based on the energy transfer between modes has been used for modeling and control [35,41]. These studies provide compelling evidence that the emerging network perspective may complement well-established flow modeling techniques. A graph (network) G consists of three components: a set of vertices (nodes) V, a set of edges E that connect these vertices, and the associated edge weights W. In the present work, the vortical elements in a flow field represent the nodes of the vortical flow network, following the formulation of Nair and Taira [15]. The interaction among vortical elements can be quantified by the induced velocity. Following the Biot-Savart law, the induced velocity from element i to j in a two-dimensional flow is given by where Γ i = ω i ∆x∆y is the strength of vortical element i with vorticity ω i and grid area ∆x∆y, and d ij is the Euclidean distance between the vortical elements. This induced velocity represents the edge weights, and may be used to construct an adjacency matrix which mathematically represents the interactions in the vortical network. Since a vortex cannot induce motion upon itself, there are no diagonal entries for the adjacency matrix. Here we have an undirected network with a symmetric adjacency matrix A to enable the use of the algorithm described later. However, it is possible to define an asymmetric adjacency matrix for a directed network. Given the vorticity field of a flow, the corresponding vortical interaction network can be formulated as above. This also suggests that the size of the adjacency matrix scales as A ∈ R n×n where n is the number of grid points or vortical elements in the flow field. For a high-fidelity simulation or experimental measurements of a flow field, n can exceed O(10 5 − 10 6 ), making the computation and analysis of the vortical network computationally expensive. Below, we discuss a few network measures that are particularly useful for understanding the interactionbased characteristics of the flow. The current work focuses on computing these measures for large, dense vortical networks in a computationally tractable manner.

Network measures and computations
The adjacency matrix is the basis for a number of useful network measures, such as eigenvector centrality, Katz centrality, and PageRank [22,42]. The centrality of a network describes the most important or central elements based on measures that quantify the connection strength of the nodes. One of the most basic centrality measures is the node strength, given by s i = j A ij , which measures the overall interaction strength of a node. The eigenvector centrality is another fundamental network property, which also considers the strength of the neighboring elements of a node and is used to identify the most influential nodes in the network [43]. It is computed as the eigenvector of A corresponding to the largest eigenvalue λ: Entries with large absolute value in the vector u correspond to influential nodes. The Katz centrality and PageRank are related and also based on computing the dominant eigenvector of the adjacency matrix. Network analysis is also useful for graph partitioning and clustering [22], where the nodes in the network are grouped into clusters based on their close connectivity. Spectral partitioning or clustering is a common approach based on the dominant eigenvector of the adjacency matrix. Nodes can be divided and sub-divided into groups based on the sign of the corresponding elements in the dominant eigenvectors of A. Spectral partitioning can also be used in community detection algorithms to identify groups of closely interacting nodes, called communities [44]. These algorithms perform a hierarchical grouping of nodes into communities based on maximizing the overall modular measure of the network. This hierarchical grouping can be performed using the dominant eigenvectors of A [45]. In vortical flows, these network-based measures are useful for identifying the most influential vortical elements and clustering them into coherent structures or communities.
Due to the importance of the spectral information of the adjacency matrix, considerable effort has gone into developing accurate and efficient algorithms to compute dominant eigenvalues for large adjacency matrices. Provided that the largest eigenvalue is real and distinct, the dominant eigenvector can be efficiently computed using the power method [46]. A symmetric adjacency matrix with nonnegative entries will have purely real eigenvalues. For instance, the power method is commonly used to compute the dominant eigenvector of an adjacency matrix [47], and this method is particularly efficient for large sparse adjacency matrices. The computational costs of the power method for computing the dominant eigenvector scales as O(n 2 ).

Random sampling-based matrix approximation
As stated above, we are interested in computing the eigendecomposition of a real symmetric matrix A ∈ R n×n that takes the form where the columns of the orthogonal matrix U ∈ R n×n are eigenvectors and the entries of the diagonal matrix D ∈ R n×n are the corresponding eigenvalues λ 1 ≥ · · · ≥ λ n . More concretely, we are interested in computing only the dominant k eigenvectors and eigenvalues (where k n) that yield the rank-k approximation where U k ∈ R n×k contains only the k columns of U that correspond to the k largest eigenvalues.
• Projection-based methods construct a 'sketch' by forming a set of k randomly weighted linear combinations of the columns of the input (adjacency) matrix [39,48,49,62]. This approach can substantially reduce the computational demands when computing a low-rank approximation. However, projection-based methods generally require at least a single pass over the entire data matrix.
• Sampling-based methods aim to approximate the low-rank structure of the input (adjacency) matrix from a small random subsample of actual columns or rows of the matrix [63][64][65]. The sampling process is described in more detail below. In practice, sampling-based methods may bypass the construction of the full adjacency matrix, while sacrificing some accuracy for improved scalability.
In what follows, we generate qualitatively accurate estimates of dominant flow structures for cases where the adjacency matrix is exceedingly large. In this problem context, we require a qualitatively correct view of the dominant graph structures, which can be used for the downstream tasks of sensor and actuator placement, optimization, and control. Therefore, we investigate the use of column sampling to compute a sketched singular value decomposition and the Nyström Method to approximate the leading k eigenvalues and eigenvectors. Both techniques have been successfully used to approximate large-scale matrices in other applications [66]. It is important to note that sampling-based methods provide tunable error bounds that are based on the number of sampled columns/rows, making them viable as an alternative to well-established deterministic algorithms.

Sketched singular value decomposition
Column sampling for low-rank matrix approximation dates back to the pioneering work of Frieze et al. [62]. Instead of computing the full eigendecomposition of A, it is possible to form a rank-k approximation by first sampling l ≥ k columns from the input matrix where the matrix C ∈ R n×l consists of a subset of l columns of A and S describes the corresponding random sampling process. In practice, it is only necessary to form a list J comprised of l column indices C := A(:, J).
There are several sampling strategies to choose good columns, and the approximation error depends on the both the sampling process and the number of columns sampled. In our problem setup, each column of A corresponds to the network interaction of a single grid element in the original fluid flow field with every other grid element. Here, we compare uniform random sampling against (quasi-random) Halton sampling [67,68]. Halton sampling provides better coverage of the domain, as illustrated in Figure 2, and our empirical results show that this translates into faster convergence of the approximation error with respect to the number of columns sampled.
If the columns of C are carefully chosen, then C provides a basis for the column space of the adjacency matrix A. Note that here we assume that A is a symmetric matrix. Thus, the dominant k left singular vectors of C = U C Σ C V * C , provide an approximation for the dominant k eigenvectors of A, i.e., we haveŨ k ≈ U C [:, 1 : k]. The corresponding eigenvalues are approximated by scaling The cost of constructing the sketch matrix using uniform or Halton sampling is O(nl). Then, the cost of computing the SVD on C is O(nl 2 ). For parallel computations, it is possible to compute U C and Σ C by taking the SVD of C T C, which requires O(nl 2 ) to be generated and O(l 3 ) for the corresponding SVD.

The Nyström method
The Nyström method provides an efficient approach for low-rank matrix approximations in largescale learning applications. It was initially introduced as a quadrature method for numerical integration to approximate eigenfunction solutions. Recent work has streamlined these computations and provided theoretical foundations for the use of various sampling schemes [69][70][71][72].
In addition to column sampling, the Nyström method further constructs a square matrix W ∈ R l×l by selecting the J rows and columns of A: When A is symmetric and positive-semidefinite (SPSD), W is also SPSD. The small matrix W can be used to efficiently compute the dominant eigenvectors and eigenvalues of A. Following [73], we first compute the eigendecomposition: Then, we reconstruct the dominant eigenvalues as and the corresponding eigenvectorsŨ The Nyström method has a computational complexity of O(l 3 ).

Results
We now demonstrate the use of sampling-based randomized decomposition techniques to characterize the vortical interaction networks for two example flows: the laminar wake flow past a NACA 0012 airfoil with a Gurney flap attached to the trailing edge [74], and a two-dimensional homogeneous decaying isotropic turbulence [17].

Numerical simulations
The first example is the flow past a NACA 0012 airfoil with a chord-based Reynolds number of Re = 1, 000, at an angle of attack of 20 • , and with a Gurney flap of 10% chord length (c). This setup generates an unsteady 2P wake [74,75], with periodic shedding of two pairs of positive and negative vortices. This flow is solved using direct numerical simulations (DNS) via the immersed boundary projection method [76,77], following the computational setup of Gopalakrishnan Meena et al.  The second example is two-dimensional decaying homogeneous isotropic turbulence, which is a canonical high-dimensional, mutiscale turbulent flow that exhibits complex nonlinear interaction of vortices over a wide range of length scales [78]. The vorticity field is obtained from a twodimensional incompressible DNS, solving the vorticity transport equation without forcing [17].
The simulation, based on the Fourier spectral method and the fourth-order Runge-Kutta time integration scheme, is performed on a square biperiodic computational domain with grid resolution of m x × m y = 1024 × 1024. The vorticity field is initialized with 100 vortices with random strengths, core sizes, and centers such that the kinetic energy spectra satisfies E(k) ∝ k exp(−k 2 /k 2 0 ) with k 0 = 26.5 [79]. For this study, the flow field is initialized by an integral length-scale based Reynolds number of Re(t 0 ) = 814, shown in Figure 3 (b).
The vorticity snapshots of both flows are used to construct the adjacency matrices of the corresponding vortical networks using Eq. 2. The size of the adjacency matrices are of the order O(10 8 ) and O(10 12 ) for the airfoil and turbulent flow, respectively, emphasizing the need for dimensionality reduction and computationally tractable analysis tools.

Computational performance
A naive application of network theory in fluid dynamics can be challenging, because the adjacency matrix is often too large to be stored or analyzed using classical techniques from linear algebra. There are two primary challenges associated with large vortical networks: • The size of the adjacency matrix, which describes the network edges, scales as the square of the number of fluid grid points; • Unlike social graphs, the vortical adjacency matrix is dense.
Here, we use randomized methods to accelerate computations based on the vortical adjacency matrix. We sample a small number of columns from the adjacency matrix to compute a sketched SVD or Nyström approximation. Thus, the approximation quality can be controlled via the number of samples used to compute the low-rank approximation. In the context of networks arising in fluid mechanics, it turns out that sampling a small number of columns is sufficient for computing an approximation that is qualitatively useful. This leads to tremendous memory and computational savings, since the costs are proportional to the number of columns that we sample. We run all of our experiments using Amazon AWS, working on a memory optimized instance 'x1.16xlarge' that is powered by two Intel Xeon E7 8880 v3 (Haswell) processors with 64 virtual cores and 976 GB fast memory. Our algorithms are implemented in Python powered by MKL accelerated linear algebra routines, and C extensions for constructing the elements of the adjacency matrix.
We show in Figure 4 the approximation error as a function of the percentage of columns sampled and the corresponding computational time required.
Here, we consider two cropped spatial grids of different dimensions from which we construct the adjacency matrix. The error is reported as the acute angle between the true leading eigenvector v and the approximate eigenvectorv, which provides a useful summary of how closely the two high-dimensional eigenvectors align. Since the algorithms are fundamentally based on random sampling, we average the results over 20 initializations and report the distribution of errors.
Halton sampling results in considerably lower error than uniform sampling at the same percentage of columns sampled, which is consistent with the observation that it is sampling the flow domain more efficiently. The results based on Halton sampling also have considerably lower variance, indicating improved numerical robustness. Figures 4a and 4b illustrate how substantially the computational time increases when the spatial resolution of the flow field is increased only slightly. In the 421 × 421 case, only about 5% of the columns need to be sampled for 5% error compared between the approximate and true eigenvectors. In other words, we use only about 5% of the information to compute the low-rank approximation, and this reduces the memory requirements for the example in Figure 4b from about 215GB to roughly 10.5GB. At the same time,  Table 1: Summary of computational results for constructing the full adjacency matrix and computing the dominant eigenvector using the deterministic power method. Here the computational bottleneck is the memory required to construct the adjacency matrix. Note that the power method does not generally require that the full adjacency matrix is constructed explicitly; however, the power method is not efficient because the adjacency matrices we consider are dense.
the computational time is reduced since it is not necessary to construct the full adjacency matrix. The computational time is reduced further by using the Nyström method, while only sacrificing a small amount of accuracy. For comparison, Table 1 summarizes the computational demands and performance for the deterministic power iteration method. Figure 5 shows similar results for the turbulent flow. Again, we see that Halton sampling is favorable and that only a small fraction of columns is required to yield a good approximation of the leading eigenvector.    Figure 6 shows the leading eigenvectors of the adjacency matrix for the two flow examples, computed using deterministic and randomized methods. By visual inspection, the Nyström method with quasi-random Halton sampling provides an excellent approximation of the leading eigenvector using only 10% of the columns of the adjacency matrix. The resulting eigenvector centrality of the matrix A highlights strong vortex cores in both flows, as shown in Figure 6 (b). Vortices with lower strength have less centrality, emphasizing their decreased role in influencing the flow field. The 2P wake structure is clearly visible in the flow past an airfoil, and some shear layer structures are present in the 2D turbulent flow. The irrotational regions in both flows have the lowest centrality measure. These observations complement the traditional fluid dynamics literature regarding highly influential structures in a vortical flow. The approximate eigenvector centrality measures capture both the highly influential vortex cores as well as the smaller, less influential vortices, as seen in both flows. Moreover, the network-based measures take into account the spatial arrangements of the vortices to assess their influence.

Approximate eigenvectors and spectral clustering
In the computational comparison above, we only considered relatively low resolution flow fields because it was computationally prohibitive to compute the ground truth eigenvectors using deterministic methods. However, with randomized techniques it is possible to compute the leading eigenvector of the adjacency matrix for a 512 × 512 resolution grid and a 1024 × 1024 resolution   grid, respectively, as shown in Figure 7. The adjacency matrix for the 1024 × 1024 grid would require about 8 TB of storage, if explicitly constructed. Instead, we sample only about 10% of the columns to compute the Nyström approximation. While we cannot quantify the approximation error, we can see by visual inspection that the eigenvector captures dominant flow structures.
Finally, we investigate spectral clustering, which is one of the common uses of the leading eigenvectors of the adjacency matrix, providing a principled approach to cluster nodes into common communities. Figure 8 shows the results of spectral clustering based on the three leading eigenvectors obtained from the adjacency matrix. In both flow fields, the approximate clusters obtained using randomized methods closely resemble the clusters computed via deterministic eigendecomposition. The right panels in Figure 8 further shows the data plotted in these first three eigenvector coordinates to visualize the clusters and subspaces. Typically, these leading eigenvectors are quite useful for determining relevant community structures within a network. These communities have an important physical interpretation in vortical flow networks, hierarchically identifying and organizing vortex cores within the flow. Because these communities are often determined from the few dominant eigenvectors, the randomized approach here can provide considerable computational advantages. Further, since high-vorticity nodes have a large influence on the network interactions, it may be possible to improve convergence by preferential sampling using these community structures.
Airfoil wake 2D Turbulence    In (a) we show the clusters that we found using the exact eigenvectors. It can be seen that the approximate eigenvectors allow us to reveal the same seven clusters, as shown in (b). In (c) and (d) we show the first three approximated eigenvector coordinates to visualize the clusters and subspaces.

Discussion
In this work, we have demonstrated the effective use of scalable algorithms from randomized linear algebra to accelerate network computations for large-scale fluid flows, as demonstrated on the wake behind an airfoil and two-dimensional isotropic turbulent flow. In particular, we have used sampling-based methods to approximate the leading eigenvectors of the adjacency matrix of the vortical interaction network for cases where the data is so large that even single-pass algorithms are prohibitively expensive. Combining importance sampling, based on the probability distribution of detected communities, and the Nyström method, the leading eigenvector can be accurately computed at a fraction of the computational cost and memory requirement, bypassing the need to construct or query the full A matrix. We also showed that quasi-random Halton sampling techniques outperform uniform sampling in both examples as they provide more comprehensive sampling coverage of the spatial domain.
There are a number of interesting future directions based on this work. Further study is required to apply these randomized network based analysis techniques to more complex threedimensional turbulent flows. In addition, there are similar scaling issues in the related fields of almost invariant sets and set-oriented methods [80][81][82][83][84], where the state-transition operator may be viewed as a graph on the high-dimensional state-space. In the transfer operator case, machine learning has been used to identify a data-driven discretization of the state-space into clusters, which serve as nodes that scale with the intrinsic rank of the attractor, rather than the highdimensional measurement dimension [85]; similar clustering has been used to identify subspaces for POD-Galerkin models [86]. These methods have been used to determine eigenvectors of the Perron-Frobenius operator, the dual of the Koopman operator, establishing a strong connection between sensitivity and coherence in fluid flows. It will be interesting to mathematically connect those approaches with the randomized methods considered in the present study.