Predicting Flow Reversals in a Computational Fluid Dynamics Simulated Thermosyphon Using Data Assimilation

A thermal convection loop is a annular chamber filled with water, heated on the bottom half and cooled on the top half. With sufficiently large forcing of heat, the direction of fluid flow in the loop oscillates chaotically, dynamics analogous to the Earth’s weather. As is the case for state-of-the-art weather models, we only observe the statistics over a small region of state space, making prediction difficult. To overcome this challenge, data assimilation (DA) methods, and specifically ensemble methods, use the computational model itself to estimate the uncertainty of the model to optimally combine these observations into an initial condition for predicting the future state. Here, we build and verify four distinct DA methods, and then, we perform a twin model experiment with the computational fluid dynamics simulation of the loop using the Ensemble Transform Kalman Filter (ETKF) to assimilate observations and predict flow reversals. We show that using adaptively shaped localized covariance outperforms static localized covariance with the ETKF, and allows for the use of less observations in predicting flow reversals. We also show that a Dynamic Mode Decomposition (DMD) of the temperature and velocity fields recovers the low dimensional system underlying reversals, finding specific modes which together are predictive of reversal direction.


Introduction
Prediction of the future state of complex systems is a fundamental challenge of science and engineering, and ultimately integral to the functioning of society. Some of these systems include weather [1], health [2], the economy [3], marketing [4] and transportation [5]. For weather in particular, predictions are made using supercomputers integrating numerical weather models, projecting our current best guess of the atmospheric state into the future. The accuracy of these predictions depends on the accuracy of the models themselves, and the quality of our knowledge of the current state of the atmosphere.
Model accuracy has improved with better meteorological understanding of weather processes and advances in computing technology [6]. To solve the initial value problem, techniques developed over the past 50 years are now broadly known as data assimilation (DA). Formally, data assimilation is the process of using all available information, including short-range model forecasts and physical observations, to estimate the current state of a system as accurately as possible [7]. The best-guess of the current state is often referred to as the analysis state.
Here, we employ a fluid dynamics experiment as a test bed for improving numerical weather prediction algorithms, focusing specifically on data assimilation methods. Our approach is inspired by the historical development of current methodologies, and provides a tractable system for rigorous analysis. The experiment is a thermal convection loop, which by design simplifies our problem into the prediction of natural convection. The thermosyphon, a type of natural convection loop or nonmechanical heat pump, can be likened to a toy mod-el of climate [8]. The dynamics of thermal convection loops have been explored under both periodic [9] and chaotic [7,[10][11][12][13][14][15][16][17][18][19] regimes. A full characterization of the computational behavior of a loop under flux boundary conditions by Louisos et. al. describes four regimes: chaotic convection with reversals, high Rayleigh number (Ra) aperiodic stable convection, steady stable convection, and conduction/quasi-conduction [20]. For the remainder of this work, we focus on the chaotic flow regime.

Physical Experiment and Computational Model
The reduced order system describing a thermal convection loop was originally derived by Gorman [13] and Ehrhard and Müller [14]. Here we present this three dimensional system in non-dimensionalized form. In Appendix B, we present a more complete derivation of these equations, following the derivation of Harris [8]. For the mean fluid velocity dx1 dt , temperature difference between the 3 o'clock and 9 o'clock positions dx2 dt (also referred to presently as ∆T 3−9 ), and deviation from conductive temperature profile dx3 dt , these equations are: Typeset by REVT E X arXiv:1510.03389v2 [math.DS] 23 Jun 2016 The function h(x) is a defined piece-wise analytic polynomial, and is provided in the full derivation as Equation 24. The parameters α, β, and K, along with scaling factors for time and each model variable can be fit to data using standard parameter estimation techniques. Operated by Dave Hammond, UVM's Scientific Electronics Technician, the experimental thermosyphons access the chaotic regime of state space found in the principled governing equations. We quote the detailed setup from Darcy Glenn's undergraduate thesis [21] and provide Fig. 2 for details of the experiment: The [thermosyphon] is a bent semi-flexible plastic tube with a 10-foot heating rope wrapped around the bottom half of the upright circle. The tubing used is lighttransmitting clear THV from McMaster-Carr, with an inner diameter of 7/8 inch, a wall thickness of 1/16 inch, and a maximum operating temperature of 200F. The outer diameter of the circular thermosyphon is 32.25 inches. Together, the tubing inner diameter and outer diameter of the thermosyphon produce a ratio of approximately 1:36. There are 1 inch 'windows' when the heating cable is coiled in a helix pattern around the outside of the tube, so the heating is not exactly uniform. The bottom half is then insulated using aluminum foil, which allowed fluid in the bottom half to reach 176F. A forcing of 57 V, or 105 Watts, is required for the heating cable so that chaotic motion is observed. Temperature is measured at the 3 o'clock and 9 o'clock positions using unsheathed copper thermocouples from Omega.
We confirm that the experiment accesses the chaotic regime of state space using a time series of the temperature difference as measured at the 3 o'clock and 9 o'clock positions in Fig. 1. We first test our ability to predict this experimental thermosyphon using synthetic data.
We perform all computational simulations of the thermal convection loop with the open-source finite volume C++ library OpenFOAM [22]. The open-source nature of this software enables its integration with the data assimilation framework that our present work provides.
We consider the incompressible Navier-Stokes equations with the Boussinesq approximation to model the flow of water inside a thermal convection loop. For brevity, we omit the equations themselves, and include them in the Appendix. The solver in OpenFOAM that we use, with some modification, is buoyantBoussinesqPimpleFoam.
Solving is accomplished by the Pressure-Implicit Split Operator (PISO) algorithm [23]. We find that modification of the code is necessary for laminar operation.
We create both 2-dimensional and 3-dimensional meshes using OpenFOAM's native meshing utility blockMesh shown in Figures 3 and 4. After creating a mesh, we refine the mesh near the walls to capture boundary layer phenomena and renumber the mesh for solving speed. We use the refineWallMesh utility to refine the mesh near walls, and the renumberMesh utility to renumber the mesh. The resulting 2D mesh contains 80,000 points (80 across the diameter and 1000 around). Available boundary conditions (BCs) we find to be stable in OpenFOAM's solver are constant gradient, fixed value conditions, and turbulent heat flux. Constant gradient simulations are stable, but the behavior is empirically different from our physical system. While it is possible that a fixed value BC is acceptable due to the thermal diffusivity and thickness of the walls of the experimental setup, we find that this is also inadequate. Simulations with a turbulent heat flux BC implemented through the externalWallHeatFluxTemperature library are unstable with the laminar turbulence model we use and resulted in physically unrealistic results. We employ the thirdparty library groovyBC to use a gradient condition that computes the flux using a fixed external temperature T inf and fixed wall heat transfer coefficient h as where we choose h to be the reference value for aluminum (the material used in the experimental setup). With the mesh, BCs, and solver chosen, we now simulate the flow. From the data of T, φ, u and p that are saved at each timestep (temperature, cell face flux, velocity, and pressure, respectively), we extract the mass flow rate and average temperature at the 12, 3, 6 and 9 o'clock positions on the loop. Since φ is saved as a face-value

Thermosyphon simulation
The reference state of the thermosyphon is represented by a CFD-based numerical simulation in two spatial dimensions (2-D). The details of the computational model have been described in detail in a previous study by Ridouane et al. (2009); however, for completeness, we summarise here its essential elements. It is assumed that the temperature differential DT w is sufficiently small so that temperature-dependent variations of material properties can be regarded as negligible, save for the density. The standard Boussinesq approximation is invoked, and all fluid properties are assumed to be constant and evaluated at the reference temperature ðT h þ T c Þ=2. The flow is assumed to be laminar, 2-D, with negligible viscous dissipation due to low velocities. Under these circumstances, the governing dimensionless equations are the unsteady, 2-D laminar NavierÁStokes equations along with the energy equation and equation of state for the density. No slip velocity boundary conditions are imposed on the walls, and isothermal boundary conditions of T h and T c are imposed on the heated and cooled lower and upper walls, respectively. The thermosyphon has a simple circular geometry. The bottom wall is heated to a constant hot temperature T h while the top wall is maintained at the temperature T c , creating a temperature inversion of hot fluid below cold fluid. If conduction alone cannot stabilise this temperature inversion, then the fluid will begin to rotate and convection becomes the dominant process of heat transfer. The relevant model state variables are proportional to the bulk fluid velocity u and the temperature difference across the loop DT 3À9 . For CCW flow, as indicated by the arrow near 9 o'clock, fluid velocity u > 0 and DT 3À9 is typically !0. The radius ratio R=r ¼ 24 used in our experiments is shown to scale.
where f (i) corresponds the face perpendicular to the loop angle at cell i and ρ is reconstructed from the Boussinesq

Data Assimilation
We perform initial tests of the data assimilation algorithms described here with the Lorenz '63 system, which is analogous to the above equations with Lorenz's β = 1, and K = 0. The canonical choices of σ = 10, β = 8/3 and ρ = 28 produce the well known butterfly attractor, and we use these values for all examples here. From these tests, we will find the optimal data assimilation parameters (inflation factors) for predicting time series with this system. Having done so, we then focus our efforts on making prediction using computational fluid dynamics models.
We first implement the 3D-Var filter. Simply put, 3D-Var is the variational (cost-function) approach to finding the analysis. It has been shown that 3D-var solves the Here there are 900 cells in each slice (not shown), for a total mesh size of 81,000 cells. Simulations using this computational mesh are prohibitively expensive for use in a real time ensemble forecasting system, but are possible offline. same statistical problem as optimal interpolation (OI) [24]. The usefulness of the variational approach comes from the computational efficiency, when solved with an iterative method. Specifically, the multivariate 3D-Var amounts to finding the x a that minimizes the cost function (5) Next, we implement the "gold-standard" Extended Kalman Filter (EKF). The tangent linear model (TLM) is precisely the model (written as a matrix) that transforms a perturbation at time t to a perturbation at time t + ∆t, analytically equivalent to the Jacobian of the model. Using the notation of Kalnay [25], this amounts to making a forecast with the nonlinear model M , and updating the error covariance matrix P with the TLM L, and adjoint model L T : where Q is the noise covariance matrix (model error). In the experiments with Lorenz '63 presented in this section, Q = 0 since our model is perfect. In numerical weather prediction, Q must be approximated, e.g., using statistical moments on the analysis increments [26][27][28].
The analysis step is then written as (for H the observation operator): where is the innovation. We compute the Kalman gain matrix to minimize the analysis error covariance P a i as where R i is the observation error covariance. Since we are making observations of the truth with random normal errors of standard deviation , the observational error covariance matrix R is a diagonal matrix with along the diagonal. The most difficult (and most computationally expensive) part of the EKF is deriving and integrating the TLM. For this reason, the EKF is not used operationally, and later we will turn to statistical approximations of the EKF using ensembles of model forecasts. With our CFD model we have no such TLM, and we provide more detail on the TLM approaches applicable to the Lorenz '63 system in Appendix C.
The computational cost of the EKF is mitigated through the approximation of the error covariance matrix P f from the model itself, without the use of a TLM. One such approach is the use of a forecast ensemble, where a collection of models (ensemble members) are used to statistically sample model error propagation. With ensemble members spanning the model analysis error space, the forecasts of these ensemble members are then used to estimate the model forecast error covariance.
The only difference between this approach and the EKF, in general, is that the forecast error covariance P f is computed from the ensemble members, without the need for a tangent linear model: The ETKF introduced by Bishop is one type of square root filter, and we present it here to provide background for the formulation of the LETKF [29]. For a square root filter in general, we begin by writing the covariance matrices as the product of their matrix square roots. Because P a and P f are symmetric positive-definite (by definition), we can write where Z a and Z f are the matrix square roots of P a and P f . We are not concerned that this decomposition is not unique, and note that Z must have the same rank as P which will prove computationally advantageous. The power of the SRF is now seen as we represent the columns of the matrix Z f as the difference from the ensemble members from the ensemble mean, to avoid forming the full forecast covariance matrix P f . The ensemble members are updated by applying the model M to the states Z f such that an update is performed by To summarize, the steps for the ETKF are to (1) form Z T f H T R −1 HZ f , assuming that computing R −1 is easy, and (2) compute its eigenvalue decomposition, and apply it to Z f .
The LEKF implements a strategy that becomes important for large simulations: localization. Namely, the analysis is computed for each grid-point using only local observations, without the need to build matrices that represent the entire analysis space. Localization removes long-distance correlations from B and allows greater flexibility in the global analysis by allowing different linear combinations of ensemble members at different spatial locations [30]. The general formulation of the LEKF by Ott goes as follows, quoting directly from [31]: 1. Globally advance each ensemble member to the next analysis timestep. Steps 2-5 are performed for each grid point.
2. Create local vectors from each ensemble member.
3. Project that point's local vectors from each ensemble member into a low dimensional subspace as represented by perturbations from the mean.
4. Perform the data assimilation step to obtain a local analysis mean and covariance.
5. Generate local analysis ensemble of states.
6. Form a new global analysis ensemble from all of the local analyses.
Proposed by Hunt et al. (2007) with the stated objective of computational efficiency, the LETKF is named from its most similar algorithms from which it draws [32].
With the formulation of the LEKF and the ETKF given, the LETKF can be described as a synthesis of the advantages of both of these approaches. The LETKF is the method sufficiently efficient for implementation on the full OpenFOAM CFD model of 240,000 model variables, and so we present it in more detail and follow the notation of Hunt et al. (2007). As in the LEKF, we explicitly perform the analysis for each grid point of the model. The choice of observations to use for each grid point can be selected a priori, and tuned adaptively. Starting with a collection of background forecast vectors {x b(i) : i = 1, . . . , k}, we perform steps 1 and 2 in a global variable space, then steps 3-8 for each grid point: 2. Similarly form X b . Now for each grid point: 3. Form the local vectors.
8. Multiply X b by each w a(i) and addx b to get x a(i) : i = 1, . . . , k to complete each grid point. 9. Combine all of the local analysis into the global analysis.
We implement the LETKF on our mesh using the full 80 cells across with zone sizes of center 10, and sides 15, resulting in 3200 local variables for 100 zones. In parallel, these 100 local computations can all be carried out simultaneously over an arbitrary number of processors.

Adaptive covariance localization
Using the "square" sections of the loop to localize, we shift the zone to the left or right to follow the dominate flow direction at the center of that local window. In Fig.  5 a schematic of localization using square, circular, and adaptive location shows a situation in which adaptive localization will potentially capture more relevant information for finding the analysis state of any given cell. As we note in the caption of Fig. 5, while we are motivated by localization around flow structures like Panel C, we simply shift the covariance in Panel A so that our method is most general and computationally efficient.
Denote the velocity vector of cells on a perpendicular slice of the loop at U , the tangent vector to the slice U A: Square Covariance by T , the zone width as z max and then the localization shift α local for that slice of the loop is taken to be Dynamic mode decomposition We employ the "standard" algorithm of Tu to compute the Dynamic Mode Decomposition [33]. Tu's "standard" algorithm is as follows with X and Y taken as the first and last N − 1 columns of the snapshot matrix D: Aw = λw (Compute eigenvectors and values.) θw = U w (Compute corresponding modes.) Given a system state U * we project this state onto the DMD basis by taking the real part of Φ = re (U * · w) and use the psuedoinverse to compute the projection as This projection is a vector which contains the linear coefficients on the basis of DMD modes for the given state.

Data assimilation
We confirm the performance of the DA methods described above by testing each (on the Lorenz '63 system) for increasingly long times between observations, by increasing the DA window length in Fig. 6. As the time between observations increases, the nonlinearity of the Lorenz '63 system results in the failure of the EKF and difficultly for the EnKF with small ensemble size. The ETKF and EnSRF perform the best of the methods tested and we chose the ETKF for future use with the CFD model. The RMS error (not scaled by climatology) for our EKF and EnKF filters. Error is measured as the difference between forecast and truth at the end of an assimiliation window for the latter 2500 assimiliation windows in a 3000 assimilation window Lorenz '63 run. Error is measured in the only observed variable, x1. Increasing the assimilation window led to an decrease in predictive skill, as expected.
The results in Fig. 6 rely on tuned covariance inflation, both additive and multiplicative, pre-computed for each window and DA technique. We choose optimal additive inflation µ and multiplicative inflation ∆ by selecting for the lowest error in an exhaustive search through a maximum factors of 1.5 in each, an example is shown in Fig.  7. We use these optimal data assimilation parameters (inflation factors) for the remainder of this work. Each of the 100 model runs starts with a random IC, and the analysis forecast starts randomly. The window length here is 390 seconds. The filter performance RMS is computed as the RMS value of the difference between forecast and truth at the assimilation window for the latter 500 windows, allowing a spin-up of 500 windows.

Limited observations & adaptive covariance
An initial test of prediction skill with limited observations in a twin model experiment showed that we needed 1000 spatial measurements of the temperature to predict flow reversals within 1 assimilation window. In an attempt to decrease the required observations to a experimentally realizable number, we implement a simple, adaptively localized covariance for data assimilation. Since we first saw a modest improvement in the prediction skill with full temperature observations, we hope that this improvement increases and is sufficient to get down to needing as few as 32 observations to predict reversals 1 assimilation window (of length 10 seconds) into the future.
In Fig. 8 we see that over an assimilation of 200 seconds, the ensemble converges on the hidden, true state. To test the performance of flow reversal prediction, we take the average of the ensemble flow direction (the average of each value of φ) as the predicted flow direction, and count how often we predict reversals both when they do and do not occur. Varying both the number of model variables and the strength of covariance shifting in Fig.  9, we find that covariance shifting improves flow reversal prediction skill even when spatial observation density is decreased. With full observations (spacing of 1), we obtain a the best predictions with a covariance shift of 2. For 1/2 and 1/5 observations [a spacing of 2 (5) to observe every other (fifth) variable], we again have the best predictions with a shift of 2. And for a spacing of 10, observing every 10th variable, we achieve greater prediction skill with a covariance shift of 10. : Prediction skill as fraction of reversals that we correctly predicted across different numbers of observations and sliding windows of localized covariance. Decreasing observation density makes the prediction problem more difficult while at the same time make the data assimilation stable numerically, and we see a decrease in prediction skill with no covariance shifting. With covariance shifting, skill improves for each observational density and most dramatically with less observation density.
Computing the average flow direction inside a localized covariance zone is straightforward, and computationally easy since the velocity is immediately available, making incorporation of this scheme into any data assimilation method easy. Since observations are also sparse in large weather models, we expect that using an adaptive local covariance scheme could lead to improved prediction skill with sparse observations [34].

Dynamic mode decomposition
To incorporate limited observations into a highdimensional CFD simulation, we combine ideas from both CFD literature and data assimilation to make predictions. We proceed with Tu's algorithm using snapshots every 10 seconds for the first 900 seconds of model time. A full picture of this time series can be found in .
In this reduced space, we extract the modes that correspond to the instability leading to flow reversals. With a known low-dimensional model of the thermosyphon dynamics, we take this opportunity to test whether DMD can discover the underlying system. The time series of the model state projection onto a specific DMD mode will represent the time dynamics of a mode that is representative of a single low dimensional variable.
To look at all of the modes at once, we examine the average magnitude of the projection from all model states onto each mode in comparison to the projection of all states that 1,3,5, and 7 time steps before a reversal. The magnitude of the mode projection of a predictive mode before a reversal should stand out against the projection average across all states, and decay back towards the average further from the reversal in time. For modes 21 and 79, we directly observe in Fig. 10 that the average projection from states just 1 second before reversal is the most different from the average state projection, and the further away from the reversal the more similar the states become to the average.
We are particularly interested in whether the mode projection time series is predictive of flow reversals, as is true with the hidden system. The insets of Panel A and Panel B in Fig. 11 show two such timeseries, with stars indicating the time of flow reversals. Individually, these modes increase in amplitude when flow reversals happen. As a dominant mode, the time series of Mode 2 tracks closely to the timeseries from which the modes were generated, while the dynamics of the projection of Mode 79 are less obvious.
By combining the state projection onto specific mode time series into a phase plane, the combined signal from two modes is used for discovering states that separate reversals in direction and from other states in the phase plane. In Fig. 11 we see that the dominant dynamics from mode 2 plotted with those of mode 79 are able to strongly separate reversals into quadrants of the lowdimensional space. This result indicates that DMD could be used to improve predictability of reversals.

Concluding Remarks
The first output of our work is a general data assimilation framework for MATLAB and Julia. By utilizing an object-oriented (OO) design, the model and data assimilation algorithm code are separate and can be changed independently. The principal advantage of this approach is the ease of incorporation of new models and DA techniques (code available at https://github.com/ andyreagan/julia-openfoam).
We next present the results pertaining to the accuracy of forecasts for synthetic data (twin model experiments). There are many possible experiments given the choice of assimilation window, data assimilation algorithm, localization scheme, model resolution, observational density, observed variables, and observation quality. We focused on considering the effect of observations and observational locations on the resulting forecast skill, and we find that there is a threshold for the required number of observations to make useful predictions. In general, and unsurprisingly, we see that increasing observational density leads to improved forecast accuracy. With too few observations, the data assimilation is unable to recover the underlying dynamics. Using adaptively localized covariance holds promise for data assimilation with datascarce models, to overcome the lack of data.
The ability of DMD to recover the lower dimensional dynamics is expected but with 240,000 variables is nonetheless an accomplishment. When modeling systems for which there are unknown but useful dimension reduc-tions, as demonstrated here, DMD can be a useful tool to find such dimension reductions. When computation- al model runs are exceedingly costly or time consuming, the best-guess state projection onto DMD modes provides insights into the system dynamics that could not otherwise be obtained.
The numerical coupling of CFD to experiment by DA should be generally useful to improve the skill of CFD predictions of experiments. In addition, the CFD model can provide better knowledge of unobservable quantities of interest in fluid flow that use the experimental data to find the analysis state provided by DA. Adaptive covariance localization further enhances the benefit provided by DA in this context. In this section, we first present the governing equations for the flow in our thermal convection loop experiment. A spatial and temporal discretization of the governing equations is then necessary so that they may be solved numerically. After discretization, we must specify the boundary conditions. With the mesh and boundary conditions in place, we can then simulate the flow with a computational fluid dynamics solver.
We now discuss the equations, mesh, boundary conditions, and solver in more detail. With these considerations, we present our simulations of the thermosyphon. For a complete derivation of the equations used, see [35].
We consider the incompressible Navier-Stokes equations with the Boussinesq approximation to model the flow of water inside a thermal convection loop. Here we present the main equations that are solved numerically, noting the assumptions that are necessary in their derivation. In standard notation, for u, v, w the velocity in the x, y, z direction, respectively, the continuity equation for an incompressible fluid is The momentum equations, in tensor notation with bars representing averaged quantities (long timesteps are used, requiring integration), are for ρ ref the reference density with the Boussinesq approximation included, p the pressure, µ the viscosity, and g i gravity in the i-direction. Note that g i = 0 for i ∈ {x, y} since gravity is assumed to be the z direction. Since the model is incompressible, of course our energy equation includes only temperature, and is given by for T the temperature and q the flux (where q = q + q * is the averaging notation). The PISO (Pressure-Implicit with Splitting of Operators) algorithm derives from the work of [23], and is complementary to the SIMPLE (Semi-Implicit Method for Pressure-Linked Equations) [36] iterative method. The main difference of the PISO and SIMPLE algorithms is that in the PISO, no under-relaxation is applied and the momentum corrector step is performed more than once [37]. They sum up the algorithm in nine steps: • Set the boundary conditions.
• Solve the discretized momentum equation to compute an intermediate velocity field.
• Compute the mass fluxes at the cell faces.
• Solve the pressure equation.
• Correct the mass fluxes at the cell faces.
• Correct the velocity with respect to the new pressure field.
• Update the boundary conditions.
• Repeat from step #3 for the prescribed number of times.
• Repeat (with increased time step).
The solver itself has 647 dependencies, of which I present only a fraction. The main code is straight forward, relying on include statements to load the libraries and equations to be solved. We then enter the main loop. This is computed for each time step, prescribed before the solver is applied. Note that the capacity is available for adaptive time steps, choosing to keep the Courant number below some threshold, but I do not use this. For the distributed ensemble of model runs, it is important that each model complete in nearly the same time, so that the analysis is not waiting on one model and therefore under-utilizing the available resources. The final operation being the conversion of pressure to hydrostatic pressure, This "pEqn.H" is then re-run until convergence is achieved, and the PISO loop begins again.
We verify convergence of the solution in a steady flow state regime in Fig S1. Following the derivation by Harris [8], itself a representation of the derivation of Gorman [13] and namesakes Ehrhard and Müller [14], we derive the equations governing a closed loop thermosyphon.
Similar to the derivation of the governing equations of computational fluid dynamics, we start with a small but finite volume inside the loop. Here, however, the volume is described by πr 2 Rdφ for r the interior loop size (such that πr 2 is the area of a slice) and Rdφ the arc length (width) of the slice. Newton's second law states that momentum is conserved, such that the sum of the forces acting upon our finite volume is equal to the change in momentum of this volume. Therefore we have the basic starting point for forces F and velocity u as The sum of the forces is F = F {p,f,g} for net pressure, fluid shear, and gravity, respectively. We write these as where ∂p/∂φ is the pressure gradient, f w is the wall friction force, and g sin(φ) is the vertical component of gravity acting on the volume.
We now introduce the Boussinesq approximation which states that both variations in fluid density are linear in temperature T and density variation is insignificant except when multiplied by gravity. The consideration manifests as where ρ 0 is the reference density and T ref is the reference temperature, and β is the thermal expansion coefficient. The second consideration of the Boussinesq approximation allows us to replace ρ with this ρ ref in all terms except for F g . We now write momentum equation as Canceling the common πr 2 , dividing by R, and pulling out dφ on the LHS we have −dφ ∂p ∂φ We integrate this equation over φ to eliminate many of the terms, specifically we have Since u (and hence du dφ ) and f w do not depend on φ, we can pull these outside an integral over φ and therefore the momentum equation is now

S6
Diving out 2π and pull constants out of the integral we have our final form of the momentum equation Now considering the conservation of energy within the thermosyphon, the energy change within a finite volume must be balanced by transfer within the thermosyphon and to the walls. The internal energy change is given by which must equal the energy transfer through the wall, which is, for T w the wall temperature: Combining Equations 21 and 22 (and canceling terms) we have the energy equation: The f w which we have yet to define and h w are fluid-wall coefficients and can be described by [14]: We have introduced an additional function h to describe the behavior of the dimensionless velocity x 1 αu. This function is defined piece-wise as where p(x) can be defined as p(x) = 44x 2 − 55x 3 + 20x 4 /9 such that p is analytic at 0 [8].
Taking the lowest modes of a Fourier expansion for T for an approximate solution, we consider: By substituting this form into Equations 20 and 23 and integrating, we obtain a system of three equations for our solution. We then follow the particular nondimensionalization choice of Harris et al.such that we obtain the following ODE system, which we refer to as the Ehrhard-Müller equations: The nondimensionalization is given by the change of variables and Through careful consideration of these non-dimensional variable transformations we verify that x 1 is representative of the mean fluid velocity, x 2 of the temperature difference between the 3 and 9 o'clock positions on the thermosyphon, and x 3 the deviation from the vertical temperature profile in a conduction state [8].

S3 Data Assimiliation
The TLM is the model which advances an initial perturbation δx i at timestep i to a final perturbation δx i+1 at timestep i + 1. The dynamical system we are interested in, Lorenz '63, is given as a system of ODE's: We integrate this system using a numerical scheme of our choice (in the given examples we use a second-order Runge-Kutta method), to obtain a model M discretized in time.
Introducing a small perturbation y, we can approximate our model M applied to x(t 0 ) + y(t 0 ) with a Taylor series around x(t 0 ): We can then solve for the linear evolution of the small perturbation y(t 0 ) as where J = ∂F/∂x is the Jacobian of F . We can solve the above system of linear ordinary differential equations using the same numerical scheme as we did for the nonlinear model. One problem with solving the system of equations given by Equation 35 is that the Jacobian matrix of discretized code is not necessarily identical to the discretization of the Jacobian operator for the analytic system. This is a problem because we need to have the TLM of our model M , which is the time-space discretization of the solution to dx/dt = F (x). We can apply our numerical method to the dx/dt = F (x) to obtain M explicitly, and then take the Jacobian of the result. This method is, however, prohibitively costly, since Runge-Kutta methods are implicit. It is therefore desirable to take the derivative of the numerical scheme directly, and apply this differentiated numerical scheme to the system of equations F (x) to obtain the TLM. A schematic of this scenario is illustrated in Figure S2. To that the derivative of numerical code for implementing the EKF on models larger than 3 dimensions (i.e. global weather models written in Fortan), automatic code differentiation is used [38].
To verify our implementation of the TLM, we propagate a small error in the Lorenz '63 system and plot the difference between that error and the TLM predicted error, for each variable ( Figure S3).
With a finite ensemble size, the ensemble method is only an approximation and therefore in practice it often fails to capture the full spread of error. To better capture the model variance, additive and multiplicative inflation factors are used to obtain a good estimate of the error covariance matrix (The RMS error averaged over 100 model runs of length 1000 windows is reported for the ETKF for varying additive and multiplicative inflation factors Section). The spread of ensemble members in the x 1 variable of the Lorenz model, as distance from the analysis, can be seen in Figure S4.
In computing the error covariance P f from the ensemble, we wish to add up the error covariance of each forecast with respect to the mean forecast. But this would underestimate the error covariance, since the forecast we're comparing against is used in the ensemble average (to obtain the mean forecast). Therefore, to compute the error covariance matrix for each forecast, that forecast itself is excluded from the ensemble average forecast.
We can see the classic spaghetti of the ensemble with this filter implemented on Lorenz 63 in Figure S5. We denote the forecast within an ensemble filter as the average of the individual ensemble forecasts, and an explanation for this choice is substantiated by Burgers [39]. The general EnKF which we use is most similar to that of Burgers. Many algorithms based on the EnKF have been proposed and include the Ensemble Transform Kalman Filter (ETKF) [ [31], and the Local Ensemble Transform Kalman Filter (LETKF) [32]. A comprehensive overview through 2003 is provided by Evensen [42]. For further details on the most advanced methods, beyond what is provided in the body of the paper, we direct the reader the above references and the derivations provided in [35]. : An explanation of how and why the best way to obtain a TLM is with a differentiated numerical scheme. Both the Lorenz ODE and TLM ODE System can be solved by RK2/4, but the analytic Jacobian of TLM that would is not the same as the derivative of the numerical method. In particular, the derivative of the RK2/4 integrator is used to obtain a TLM that most accurately propogates error growth in the Lorenz '63 system. The future error predicted by the TLM is compared to the error growth in Lorenz '63 system for an initial perturbation with standard deviation of 0.1, averaged over 1000 TLM integrations. The is not the error predicted by the TLM, but rather the error of the TLM in predicting the error growth. We see an intially linear error growth for small time, which is overcome by the nonlinearity of the Lorenz system for longer time. We can see that the ensemble member state after assimilation better represents the uncertainty of the analysis state and enables some ensemble members to stay close to the true state.

S4 Additional DMD Details
The general algorithm for DMD is provided in the Section, and here we supply more results of the DMD procedure. The timeseries from which we computed the decomposition is shown in Figure S6. The real and imaginary components of the DMD eigenvalues are shown in both un-mapped and mapped form in Figure S7.