Unraveling Flow Patterns through Nonlinear Manifold Learning

From climatology to biofluidics, the characterization of complex flows relies on computationally expensive kinematic and kinetic measurements. In addition, such big data are difficult to handle in real time, thereby hampering advancements in the area of flow control and distributed sensing. Here, we propose a novel framework for unsupervised characterization of flow patterns through nonlinear manifold learning. Specifically, we apply the isometric feature mapping (Isomap) to experimental video data of the wake past a circular cylinder from steady to turbulent flows. Without direct velocity measurements, we show that manifold topology is intrinsically related to flow regime and that Isomap global coordinates can unravel salient flow features.


Introduction
The characterization of complex flows is a major challenge in climatology, biology, and engineering [1,2,3,4]. The detection of salient flow features is traditionally addressed through the analysis of velocity fields, obtained from flow visualization, numerical, and analytical methodologies [5,6,7,8,9,10]. Specifically, flows are classified by estimating relevant physical parameters [11,12,13], through pattern tracking procedures [14,15] or flow topology analysis [16,17,18]. These approaches rely on the availability of computationally expensive measurements to accurately describe the flow field. Beyond flow characterization, an even more elusive problem in fluid mechanics is the real time control of flow structures in biology, biomedicine, aerodynamics, and environmental science [19,20]. Despite recent technological advances, such as the use of microelectromechanical systems and the introduction of feedback control [21,22], flow manipulation is still affected by limitations in measuring relevant flow parameters, data storage, and computational time [23]. These drawbacks hamper real time autonomous flow monitoring of complex systems.
Here, we propose the implementation of a machine learning framework for unsupervised characterization of fluid flows. Different from established flow visualization techniques that require a-posteriori intensive processing of high resolution images [24,25], our approach uses raw video data to rapidly disclose and examine relevant flow phenomena. Moving forward from pattern tracking, machine learning demonstrates remarkable potential in identifying features underlying complex phenomena [26,27]. Specifically, manifold learning aims at uncovering the low dimensional structures ''hidden'' in high dimensional data. For instance, the isometric feature mapping (Isomap) embeds large scale data sets on lower dimensional manifolds approximated by undirected graphs, whose topology is utilized to compute geodesics on the true nonlinear manifolds [28]. This machine learning algorithm focuses on the extraction of relevant features directly from images without requiring the intermediate phase of quantitative parameters estimation [29]. In particular, the Isomap algorithm is effectively applied to the problem of face and human motion recognition [30] and collective behavior in biological systems [31,32,33] supporting the feasibility of using Isomap in fluid dynamics.
To demonstrate our approach, we study the flow past a circular cylinder by processing flow visualization video data with Isomap for Reynolds numbers ranging from 50 to 1725. For such range, the fluid experiences steady separation, the formation of regular vortex patterns (that is, von Karman vortex streets), and the initiation of turbulence. We anticipate Isomap to detect flow regimes through varying dimensionality of the embedding manifolds, similarly to the problem of collective behavior of animal groups, where dimensionality is showed to relate with the degree of coordination between individuals [31,32,33]. The flow around a circular cylinder is widely studied in the literature [34,35,36,37,38] for its numerous instances in nature [39] and engineering [40]. In our study, this phenomenon is instrumental to experiment with an array of different flow regimes, spanning from steady to periodic and unsteady. We design an experimental setup including a hollow circular cylinder of outer diameter D positioned vertically at the cross-section of a water tunnel. A dye-injection system is developed for improved visualization of the flow streaklines around the cylinder through a digital camera (see the Methods for further details). We vary the flow regime by changing the free stream velocity, U.
In the framework of nonlinear machine learning, we regard experimental video frames as the Isomap ambient space and seek to characterize the flow by studying the embedding manifolds. We demonstrate that the topology of the embeddings can be associated with the flow regime, whereby lack of flow separation is manifested through one dimensional manifolds and the presence of coherent structures through higher dimensionality. Further, we show that manifold inspection can be used to estimate the frequency of vortex shedding and study flow pattern variations due to externally-induced perturbations.

Flow Separation Correlates with Embedding Dimensionality
We process experimental video data recorded with a commercial camcorder with the Isomap algorithm and study the relationship between the topological features of the embedding manifolds and the flow regime, controlled by the Reynolds number Re (see the Materials and Methods for the full set of Re adopted in the experiments). The Reynolds number is defined as Re~UD=n, where n is the kinematic viscosity of water (at the measured fluid temperature of 20 0 C). In line with our expectations, we find that data relative to steady flow separation, that is, Re~50, are embedded onto one dimensional manifolds, see Figure 1(a). Conversely, for 50 Re 550, that is, for flow regimes characterized by a transition from laminar to turbulent von Karman vortex streets [34], cylindrical manifolds are obtained, see Figure 1(b). From Re~642, when turbulent flow coexists with periodic fluctuations in the cylinder wake [41], larger amounts of data points are not embedded onto cylindrical surfaces and rather fall onto irregularly shaped manifolds that are well approximated by nearly one dimensional structures, see Figure 1(c).

Manifold Global Coordinates Unravel Flow Features of Von Karman Vortex Streets
Figures 2(a-c) display the cylindrical manifold, residual variance, and distance matrix obtained by setting Re~191. We find that data points are arranged onto a thick cylindrical structure; specifically, 90% of the data set is represented by a three dimensional manifold (see the residual variance for dimensionality equal to three). Further, the distance matrix highlights the periodicity of the flow through the presence of regular sets of points that are closer to their neighbors (see the diagonal stripes in Figure 2(c)).
We further find that the topology of the embedding is related to two major features underlying the experimental data set. Specifically, in the two dimensional projection in Figure 2(d), all data points are symmetrically distributed along an annulus, suggesting a periodic behavior. By counterclockwise inspection of the annulus, we observe that data are consecutively ordered along the flow direction. Moreover, data points located at similar angular positions tend to depict comparable shapes. Variations along the thickness of the cylinder, corresponding to its radial coordinate, are related to varying image contrast during the experiment. Diametrically opposed locations on the annulus show vortex shedding phases that differ by 180 0 . Thus, one of the Isomap global coordinates, corresponding to the angular coordinate along the cylinder mantle, identifies the periodicity of the observed flow. Projecting the three dimensional embedding on a plane parallel to its axis, we find that images are horizontally ordered in the direction of flow, Figure 2(e). Further, variations of the flow pattern in the data set are arranged along the vertical direction, corresponding to the axial coordinate of the cylinder, with images displaying differently shaped vortices arranged far apart on the manifold.

The Topology of the Embedding Manifolds can be Used to Estimate Salient Flow Parameters
We quantify the vortex shedding frequency by inspection of the annular projections recovered for Re from 148 to 388. Specifically, we manually compute from the video feed the number of vortices, n machlearn v , formed between images laying at comparable angular positions on the annulus, see Figure 3

Data Cluster Differently on Manifolds of Varying Dimensions as a Function of the Flow Parameters
Our analysis of the dimensionality of Isomap embeddings demonstrates a close correspondence between the algorithm outputs and the flow physics. We further elucidate such relations by studying the residual variances for the first three dimensionalities of the data sets, which capture the vast majority of the experiments (more than 75% of the data). In Figure 4(a), we present residual variances for all the experimental data sets fitted by functions of the form aRe exp ({bRe), with a and b being unknown parameters (RMSE f1~0 :12, RMSE f2~0 :13, and RMSE f3~0 :10), where shaded regions denote the 95% confidence intervals. As expected, we find that at low and high Re, the flow can be described through nearly one dimensional embeddings, which capture the translational motion in the video feed.
On the other hand, as coherent structures are shed by the cylinder, data points are fit on higher dimensionality manifolds, which also account for the shape of the vortices. We observe that increasing the degree of turbulence of the flow corresponds to ''hiding'' periodic fluctuations in the flow. Indeed, Isomap captures the prevalently translational nature of the data.

Discussion
In this study, we present an unsupervised approach for characterizing flow patterns based on isometric feature mapping. The methodology does not rely on computationally expensive pattern tracking procedures or on the analysis of flow velocity fields [14,15,18]. Rather, it requires minimal preprocessing of experimental video frames (see the Materials and Methods for details).
Our results show that the dimensionality of the embedding manifold and its topology are landmarks of the flow regime, whereby smooth one dimensional manifolds are constructed from steady flows, cylindrical embeddings from von Karman vortex streets, and irregular structures from turbulent flows. With respect to von Karman vortex streets, our results are in agreement with the analysis presented in [17], where proper orthogonal decomposition is conducted on particle image velocimetry (PIV) and analytical velocity fields for flow characterization. In fact, we obtain striped distance matrices and two dimensional annular embedding projections for vortex shedding similar to [17]. This is achieved by directly processing video images through Isomap rather than performing computationally expensive PIV. Notably, we recover such annular projection also when the Isomap input space is constituted of unordered sets of experimental video frames, suggesting that our procedure can be successfully used to independently sort the ambient space in time.
In line with our expectations, we also find that Isomap global coordinates of the embedding manifolds relate with relevant features of the flow. For example, the axial coordinate of the cylinder in Figure 2(e) captures variations in vortex shape and provides a measure of the wake regularity. These variations in the geometry of the shed vortices are well studied in fluid dynamics [42] and can be related to flow-induced vibrations of the cylinder, boundary-layer effects, and inhomogeneities in the free stream velocity field. Although speculative, our findings also suggest that the method can be used to estimate pertinent flow parameters by exploiting the nonlinear dependence of the residual variance on the flow parameters. Specifically, the analysis of the residual variances associated with the first few embedding dimensionalities can be leveraged to extract usable information for the identification of flow parameters.
Raw video feed is also considered in [43] to study flow kinematics. Therein, images are obtained from a PIV study and the optical flow technique is utilized to reconstruct the velocity field. Here, we rely on standard video feed for rapid unsupervised characterization of flow phenomena through global features. While the accuracy of optical flow techniques is highly dependent on image quality and tracer seeding uniformity in the field of view, Isomap emphasizes underlying flow characteristics through relative topological distance among video data points, thus reducing the effect of fixed pattern noise in the images.
In contrast to canonical vortex detection methodologies [6,8,12,14], no preprocessing in terms of scaling, compression, or filtering is performed on images before nonlinear embedding through Isomap. Nonetheless, the performance of the methodology relies on the visibility of the flow structures and, therefore, low contrast, poor resolution, and highly nonuniform background noise may require image enhancement before feature extraction. While not explored in this study, such image enhancement can be achieved through computationally inexpensive and automated procedures that are commonly executed in flow visualization applications [25]. Ultimately, we emphasize that increasing the size of the dataset is expected to improve on the estimations of Isomap geodesic distances (see the Methods for details), and, therefore, aid the identification of embedding manifolds.
Our results indicate that unsupervised nonlinear machine learning through the Isomap algorithm can be successfully used to rapidly unravel salient flow features. Real time flow monitoring is a major challenge when image-based methodologies are needed rather than invasive sensors and probes. For instance, we expect this methodology to find application in biofluidics, where flow characterization can aid in monitoring hemodynamics, oxygen transport, intravascular blood pressure, and blood vessel obstructions [44,45,46,47,48]. Further, unsupervised flow characterization is anticipated to provide insight in environmental sensing, where noninvasive methodologies are increasingly needed for monitoring the evolution of large scale natural systems [39,49,50]. In addition, the approach may find application in autonomous robotics for rapid environmental mapping of unknown areas [51].  Markers correspond to residual variances for the first three embedding dimensionalities (f 1 , f 2 , and f 3 for dimensionality 1, 2, and 3, respectively). Blue, black, and red solid lines are best-fit curves (aRe exp ({bRe)) for dimensionality equal to one, two, and three, respectively. Shaded areas correspond to 95% confidence intervals). doi:10.1371/journal.pone.0091131.g004

Experimental Setup
Experiments are conducted in an open-test section water tunnel (Engineering Laboratory Design 502S). The tunnel cross-section is 15 cm|15 cm. Along the water flume, a working cross-section is selected at approximately 50 cm in between two honeycomb grids for improved uniformity of the velocity profile. A hollow copper cylinder of outer diameter equal to 5 mm is positioned vertically in the center of the working cross-section. Two 0:4 mm injection ports located at the mid-span of the cylinder at an angle of 90 0 from the front stagnation point allow for homogeneous and continuous rhodamine WT injection in the flow through a syringe system. Dye streaklines are captured by a Canon Vixia HG20 digital video camera, located 22 cm underneath the water tunnel and 10:4 cm downstream the working cross-section, with its axis perpendicular to the plane of vortex shedding. The camcorder acquires a field of view equal to 31:5 cm|18 cm; its resolution is set to Full HD (1920|1080 pixels); and its acquisition frequency is kept at

Isomap Algorithm
The Isomap algorithm is a nonlinear manifold learning methodology for dimensionality reduction problems [27]. Differently from the classical multidimensional scaling method (MDS), Isomap uses geodesic rather than Euclidean manifold distances between data points. The algorithm objectives are: i) embedding a data set of n d-dimensional data points on a manifold, ii) defining the manifold dimensionality, and iii) finding such dimension to be much less than d. In particular, for the data set Z~fz i g n i~1 5R d , Isomap constructs a corresponding data set Y~fy i g n i~1 5R d d and assesses if d d%d. The d d-dimensional embedding is represented through the parametrization m : Y?Z, where each j-th coordinate of the i-th data point is parameterized as z ij~mj (y i1 ,:::,y i d d ), for j~1,:::,d, and for each data point i~1,:::,n. The second subscript is used to identify vector components. The algorithm follows these steps [28,31,32,33]: 1. Construction of the neighbor graph G~fV,Eg to approximate the manifold. The elements of the set of vertices V~fv i g n i~1 match the data points Z~fz i g n i~1 and the elements of the set of edges E are unordered pairs of vertices in G. Edges connect k-nearest neighbors vertices. Specifically, edges fv i ,v j g correspond to the k-closest data points z j to z i , for each i~1,:::,n, with respect to the Euclidean distance in the ambient space (the pixels space), denoted by d Z (z i ,z j ). The matrix M n [ R n|n , encoding the weighted graph of intrinsic manifold distances corresponding to G, is computed. For each fv i ,v j g [ E, the distance equals the ij-th entry of M n , that is, M n (i,j)d Z (z i ,z j ). For all fv i ,v j g 6 [ E, M n (i,j) is set equal to ? to prevent jumps between branches of the underlying embedding. 2. Computation of the graph geodesic matrix D M to approximate the geodesic of the manifold. Floyd's algorithm [52] is utilized to compute shortest paths. From M n , an approximate geodesic distance matrix D M [ R n|n is computed, whose ij-th entry is the shortest path length from v i to v j , being an approximation of manifold geodesic distances. 3. Approximation of the manifold distance by k-nearest neighbor distance. The matrix D M computed in the previous step is used to approximate the geodesic distances of the manifold between z i and z j by the graph distance between v i and v j . If the data density is too low, a poor representation of the manifold could be obtained with some neighbors lying on separate manifold branches. 4. Computation of the projective variables Y applying the classical MDS on the matrix D M . Classical MDS [53] is performed on a matrix of dissimilarities between pairs of input and candidate embeddings, which minimize the distance in the embedded manifold. For a survey on MDS, see [31].
The outputs of Isomap are the transformed data points on an embedding manifold for the input data set Z and the vector f of residual variances, which represents the fraction of data points not embedded on the manifold for different dimensions.

Residual Variances Fitting
The vectors of the residual variances for the first three embedding dimensionalities are plotted against the respective Re for each experimental video. Such data points are fitted through the nonlinear least squares method with functions of the type aRe exp ({bRe), where a and b are fitting parameters. The 95% confidence intervals are estimated based on the fitting model coefficient covariance matrix.

Vortex Shedding Frequency
Vortex shedding frequency is evaluated for experiments conducted at Re~148; 159; 191; 245; 330; and 388. For such data sets, the frequency obtained from images located at comparable angular positions on the annular embedding projection is compared to vortex shedding frequencies estimated through the analysis of randomly selected sets of 10 to 40 consecutive images of the same videos. Similar to [54], frequencies are found by counting the number of vortices convected past a selected reference point in consecutive pictures in known time intervals. The duration of the time intervals is computed from the camera acquisition frequency.