Functional validation and comparison framework for EIT lung imaging.

Introduction Electrical impedance tomography (EIT) is an emerging clinical tool for monitoring ventilation distribution in mechanically ventilated patients, for which many image reconstruction algorithms have been suggested. We propose an experimental framework to assess such algorithms with respect to their ability to correctly represent well-defined physiological changes. We defined a set of clinically relevant ventilation conditions and induced them experimentally in 8 pigs by controlling three ventilator settings (tidal volume, positive end-expiratory pressure and the fraction of inspired oxygen). In this way, large and discrete shifts in global and regional lung air content were elicited. Methods We use the framework to compare twelve 2D EIT reconstruction algorithms, including backprojection (the original and still most frequently used algorithm), GREIT (a more recent consensus algorithm for lung imaging), truncated singular value decomposition (TSVD), several variants of the one-step Gauss-Newton approach and two iterative algorithms. We consider the effects of using a 3D finite element model, assuming non-uniform background conductivity, noise modeling, reconstructing for electrode movement, total variation (TV) reconstruction, robust error norms, smoothing priors, and using difference vs. normalized difference data. Results and Conclusions Our results indicate that, while variation in appearance of images reconstructed from the same data is not negligible, clinically relevant parameters do not vary considerably among the advanced algorithms. Among the analysed algorithms, several advanced algorithms perform well, while some others are significantly worse. Given its vintage and ad-hoc formulation backprojection works surprisingly well, supporting the validity of previous studies in lung EIT.


Introduction
Electrical impedance tomography (EIT) has been proposed as a tool to monitor and guide ventilator therapy in mechanically ventilated patients [1][2][3][4]. Medical interest in EIT is driven by the discovery of the injurious effects of artificial ventilation [5] and, thus, the importance of lung protective ventilation strategies [6,7]. No consensus exists as to how optimal ventilator settings (i.e. least harmful and still securing proper gas exchange) should be chosen in an individual patient [8]. However, there is a clear need for continuous assessment of regional lung ventilation. No medical examination technique is available at the bedside allowing this type of monitoring. With its inherent advantages of the radiationfree measuring principle, good time resolution, portability and ability to assess rapid changes in regional lung air content, EIT promises to fulfil many of the requested criteria for such patient monitoring.
A growing body of research seeks to develop and evaluate EITbased parameters to guide ventilator therapy [1]. Most of these studies used versions of the original back-projection algorithm of [9] for image reconstruction; for recent examples see e.g. [10][11][12][13][14][15][16][17]. Newer types of image reconstruction procedures allow theoretical advantages [18][19][20][21]; however, these algorithms have not been systematically compared against each other, and often have not been tested in vivo. This paucity of systematic validation and comparison of EIT algorithms hampers its entrance into clinical practice.
The ability of EIT to measure regional distribution of ventilation has been validated against several established highresolution imaging modalities like X-ray computed tomography [22,23], electron beam computed tomography [24], single photon emission computed tomography [25,26] and positron emission tomography [27].
Such validation of EIT to anatomical references is important, but exhibits certain drawbacks. Most such studies have not used anatomical models for EIT reconstruction (typically using circles on a 32|32 pixel grid) making comparison with morphologically accurate images problematic. Also, the inherent low resolution of EIT creates large partial volume effects [28], while the scan rate of EIT is much higher ( §50 frames/s for some devices) than those of the reference radiological techniques. The most important drawback of such studies, however, is that anatomical validation is primarily sensitive to morphological accuracy, and thus less sensitive to the functional performance of EIT imaging. We believe that a functional imaging technique should also have a functional validation methodology. Thus, we propose a more functional approach for validation and comparison of EIT images obtained with different reconstruction algorithms where the ability to provide clinically relevant information is tested directly. Therefore, the aim of our study was to develop a framework to validate and compare EIT images by testing their ability to correctly reflect clinically significant changes in regional lung ventilation (or lack thereof). The backbone of our framework is an experimental EIT data set acquired in mechanically ventilated pigs. By manipulating three clinically relevant ventilator settingstidal volume (V T ), peek end-expiratory pressure (PEEP) and the O 2 fraction in inspired gas (F I O 2 ) -we were able to achieve different regional ventilation distributions following well defined large and discrete shifts in global and regional lung air content.
We used the proposed framework to compare the performance of a number of popular reconstruction algorithms, including GREIT [29] and backprojection. We sampled the analysed algorithms mainly from the large class of sensitivity-based methods available in the EIDORS suite [30]. In doing so, we studied the effect of individual design choices in assembling such algorithms. Other approaches, such as those based on a Bayesian framework (e.g. [31]), monotonicity (e.g. [32]), level sets (e.g. [33,34]), factorisation (e.g. [35], or the D-Bar method (e.g. [36]) were not evaluated, as they have seen little evaluation for thoracic EIT data (but see [37,38]. However, we make the data and software publicly available to allow extending the comparison to a wider range of present and future algorithms.

Ethics statement
After approval by the Committee for Animal Care of the Christian-Albrechts University Kiel (Permit Number: V742-72241.121-39), the study was conducted in compliance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health.

Evaluation framework
Since the potential role of EIT in ventilation therapy is to detect changes in the regional distribution of ventilation, our proposed validation framework is explicitly designed to test its ability to do so. An overview of the framework is presented in Fig. 1.
First, we conducted an experiment in healthy pigs ventilated mechanically to produce shifts in ventilation distribution while keeping V T constant. Ventilation with pure O 2 is known to induce absorption atelectasis in the dependent, i.e. posterior in supine animals, lung regions [39,40], while PEEP leads to a redistribution of inspired V T with more gas entering the dependent lung regions [41]. These well-known effects of changing PEEP and F I O 2 settings allowed the definition of distinct changes in global and regional ventilation, which are summarized in Table 1.
Second, EIT-derived measures of V T and distribution of ventilation along the gravitational axis were calculated for each analysed algorithm for each animal and condition. These values were used to test each algorithm by comparing against the expected findings, or physiological references (Table 1). For example, V T is a known volume change in the lung. While the distribution of V T changes with PEEP, the volume is fixed. Thus, using this physiological constant, we can test that an EIT algorithm which is equally sensitive to volume across the image will, correctly, identify V T as constant, while an EIT algorithm which has spatially varying volume response, may see varying V T . Similarly, the known movement of V T toward dependent lung with increases in PEEP can be used to test the sensitivity of EIT algorithms to movement within the image. The complete list of physiological references is given in Table 4.
Experimental protocol. The study was performed at the University Medical Center of Schleswig-Holstein, Campus Kiel in Germany. The experiments were carried out on eight anesthetized pigs (body weight (bw): 39+4 kg, mean+standard deviation (SD)) in supine position. The animals were at first sedated with azaperon (8 mg/kg bw). Anaesthesia was achieved by a continuous intravenous infusion of propofol (6 to 12 mg/kg bw per hour) and sufentanile (10 mg/kg bw per hour). Subsequently, the animals were intubated and connected to a ventilator (Siemens Servo 900 C ventilator, Siemens-Elema, Solna, Sweden). Haemodynamic as well as ventilatory parameters including heart rate, partial pressure of CO 2 (PCO 2 ) in respired gas, arterial O 2 saturation (S a O 2 ), airway pressures and respiratory system compliance were continuously monitored using the S/5 anaesthesia monitoring system with a gas-density compensated module (M-CAIOV, Datex Ohmeda, Helsinki, Finland). All animals were ventilated in a volumecontrolled mode with a constant V T , respiratory rate (20 breaths/ min) and inspiration-to-expiration ratio of 1:2 in order to maintain normocapnia (end-tidal PCO 2 [35][36][37][38][39][40][41][42][43][44][45]. If required, vecuronium bromide (0.1 mg/kg bw) was administered for muscle paralysis to suppress spontaneous breathing activity and thus avoid disturbance of the volume-controlled ventilation measurements.
During the examination period of 60 min, the animals were ventilated with different combinations of end-expiratory pressures and F I O 2 ( Table 2). F I O 2 was consecutively changed from 21% to 100% and back to 21% and at each F I O 2 , the animals were at first ventilated with zero end-expiratory pressure (ZEEP) and then with PEEP of 5 cmH 2 O. In each condition, one EIT measurement was performed meaning that a total of six measurements was obtained in each animal.
EIT examinations were performed using the Goe-MF II EIT device (CareFusion, Höchberg, Germany). Sixteen self-adhesive electrodes (Blue Sensor BR-50-K, Ambu, Bad Nauheim, Germany) were placed on the chest circumference in one transverse plane lying approximately at the level of the 6th intercostal space. Electrical currents (50 kHz, 5 mArms) were applied through adjacent pairs of electrodes in a rotating mode. After each current injection, the resulting potential differences were measured by the remaining electrode pairs. EIT data were acquired at a frame rate of 13 images/s during 60 s time intervals.

EIT algorithms
We review EIT image reconstruction algorithms and develop a taxonomy of approaches. A representative sample of algorithm choices are selected for evaluation.
EIT image reconstruction. The recovery of information about internal conductivity distribution from surface voltage measurements is a severely ill-posed non-linear inverse problem. Time difference EIT (TD-EIT), which compares two sets of measurements, is much less sensitive to uncertainties in electrode placement, thorax geometry and some hardware errors (such as gain variability between channels), than algorithms that attempt to reconstruct the absolute conductivity distribution [42]. This paper considers only TD-EIT algorithms. Specifically, TD-EIT seeks to reconstruct an image of the change in conductivity (Ds) between EIT measurements v and v r , where v represents the current frame of EIT measurements, while v r is the reference measurement frame, typically calculated by averaging over periods when conductivity distribution in the region of interest is stable. We use the notation (d) for time difference data and (m) for the conductivity change image (or model) m~Ds. Depending on the image reconstruction approach, the image may be represented as a pixel grid, or on a finite element model (FEM) discretization. Of the many TD-EIT algorithms proposed (see references of [20,29]) we consider only those that are applicable to the most common case for which clinical and experimental thoracic EIT measurements have been made. Thus, we assume: 1) TD-EIT measurements; 2) placement of 16 EIT electrodes in a single plane around the thorax; 3) measurement performed using adjacent stimulation and measurement (the Sheffield protocol); and 4) reconstruction of images onto a circular domain, since this is a common capability of all algorithms. We organize the space of EIT algorithms as in Fig.  2; first, algorithms are classified on the features used in the forward problem (and sensitivity matrix) and then on the inverse solution. Two main classes of image reconstruction procedures were chosen for analysis, based on backprojection, and sensitivity matrix (FEM) based solutions; the latter category is subdivided into regularization and optimization based methods. We review the formulation of each approach in the following subsections. Clearly, the possible combinations of algorithm variants are very large, and thus difficult to test. Instead, we define a "baseline" algorithm, R 0 which represents the simplest approach, and consider modifications from it. For each variation, we define a default case (which is part of R 0 ) and a variant case, which is tested in an algorithm (Table 3).  Backprojection sensitivity model. The backprojection algorithm was developed [43] for the original Sheffield EIT system, and has seen numerous variants and improvements. Using an analogy to X-ray CT backprojection, normalized measurements are "backprojected" onto the equipotential region in the image, and the image is subsequently filtered. Improvements to account for the diffuse nature of electrical current propagation in the body were subsequently made; a good mathematical characterization is given by Santosa and Vogelius [44]. While many variants of Sheffield backprojection (R SBP ) exist, clinical and experimental EIT has largely used only the one which was distributed in the Sheffield Mark I EIT system [45]. In discussions with the algorithm authors, it appears that the exact formulation of this algorithm has been lost. In order to accurately represent this algorithm, we obtained permission to reverse engineer R SBP from a Sheffield Mk I device, as described in [29].  Numerical sensitivity model. The forward model, F ( : ), in EIT simulates the measured data vector d for a given vector of model parameters m, as d~F sr (m). The data vector, d, represents each measured voltage for each applied current pattern in a data frame; for the Sheffield stimulation protocol on 16 electrodes, this yields 16|13~208 measurements per frame, of which 104 are independent, due to reciprocity [46]. F( : ) is typically based on a finite element model (FEM) and each parameter of the conductivity distribution is usually modelled in m as the piecewise constant conductivity on a tetrahedral element. The forward model is used to calculate a Jacobian, J, or sensitivity matrix, which describes the sensitivity of each data element to each model parameter.
where the FEM estimate depends on the background conductivity, s r , around which the conductivity changes m occur. The following section discusses various approaches to numerical (forward) models (F ( : )), considering the baseline approach and its variants.
Model dimension: 2D FEM / 3D FEM. The choice of model dimension affects the relative magnitude of measured voltage in regions further from the current stimulations. Real voltage measurements occur in three dimensions, and are thus mismatched to 2D models. Early EIT algorithms used 2D FEM to reduce computational time; however, many recent algorithms continue to use 2D. One consequence is errors in the reconstructed position of objects because sensitivity falls off faster with distance in 3D than 2D.
The baseline model uses a circular 2D FEM (Fig. 3A), and the variant uses a cylindrical 3D FEM of height equal to the medium diameter, and the electrode plane in the centre (Fig. 3B). FEM models are constructed using Netgen [47] by specifying the region geometry and circular electrode contact regions.
Difference data: normalized diff. data / diff. data. TD-EIT defines difference data d in terms of the current, v, and reference, v r measurements. The majority of experimental and clinical algorithms (including R SBP ) have used normalized difference imaging, where data d are defined as ½d i~½ v{v r i =½v r i~½ v i =½v r i {1. If difference imaging (without normalization) is used, then data d are defined as d~v{v r . By normalizing, small measurement values are scaled up, and have a larger impact on the reconstructed images.
The baseline model uses normalized difference data, while the variant model uses difference data (without normalization).
Conductivity background: homogeneous s r / lung À s r . TD-EIT assumes conductivity changes occur with respect to a reference conductivity, s r , which defines the sensitivity matrix, J. The most common choice for EIT algorithms is a homogeneous value for s r , which is clearly a poor model for the thorax, in which the lungs are far less conductive than other tissues. The use of a homogeneous thorax conductivity distorts the position and amplitude of reconstructed contrasts [48]. The baseline model uses a homogeneous s r (Fig. 3B), while the variant model defines lung tissue with a relative conductivity of 0:25 of that of other tissues [42]. The lung region is defined from CT images of pigs of similar size to those used in our experiments (Fig. 3C). We evaluate the effect of lung s r in a 3D FEM, since a full dimensional model is required to get more accurate current flow in the thorax.
Model electrode movement: static model / elec. move. One key difficulty with EIT measurements is due to Table 3. Algorithm variations considered. For each tested algorithm (horizontal row), the indicated variations from baseline (R 0 ) are made.  Table 4. Performance of different image reconstruction algorithms during mechanical ventilation with constant tidal volume and variable end-expiratory pressure and O 2 fraction. the position uncertainty of the electrodes, especially for thoracic measurements, in which the body surface moves during breathing and posture change. Several approaches have been proposed to reduce the effect of electrode movements on the reconstructed EIT conductivity changes. We select the algorithm of [21,49] based on using an augmented J, sensitive to impedance changes, electrode movement, and shape deformation in the model. This approach allows estimation of the shape changes, which we do not use here; instead, by allowing calculation of movement, certain artefacts in the conductivity image may be reduced. This approach can be shown to be equivalent to formulating electrode movement as a structured noise term which is added to the measurement noise variance, S n . Following [49], where S n and S Ã n represent the original and updated estimates of data noise variance, respectively. J m is the movement sensitivity matrix (change in data for each electrode coordinate change) and S m is a prior estimate of the covariance of electrode movement.
The baseline model assumes electrode position to be fixed, while the variant model updates noise estimates using (2).
Regularized EIT reconstruction. The class of linear regularized EIT reconstruction algorithms seeks a solutionm m which minimizes the expression The first term, EW(d{Jm)E nd n d is the data error term, the mismatch between the model and the measured data weighted by a matrix W, which models the measurement error on each data channel. The data error term has a norm n d [ ½1, 2. If a Gaussian model of data errors is used (n d~2 ), the data weighting matrix, W, is related to the channel noise covariance, S n , by S {1 n~W t W. This formulation is also used to incorporate the structured noise due to electrode movement, S Ã n in (2). The second term, EL(m{m 0 )E nm nm is a regularization term designed to address the ill-conditioning intrinsic to EIT; it serves as a penalty function which "discourages" unlikely (but otherwise feasible) solutions. It calculates the mismatch between the m and prior constraints on "likely" models. m 0 is the prior mean and has always been set to zero for TD-EIT, since conductivity is as likely to increase as decrease. L implements a penalty for large changes or non-smooth conductivities. The model error term has a norm n m [ ½1,2. l controls the relative strength of data and model error terms, and is typically called the regularization hyperparameter. As l increases, the solution matches more closely to the model, and becomes increasingly smooth. In order to adequately compare different algorithms, functionally equivalent values of l are chosen. Many strategies to choose an appropriate value have been presented (see references of [50]). We choose the "Noise Figure" approach [51], in which l is chosen such that the ratio of values of signal to noise ratio (SNR) between d andm m is required to be constant across algorithms. Since we have no control of the parameters of the SBP algorithm, all other algorithms are normalized to its value.
The most commonly studied case is for quadratic norms n m~nd~2 , which has the advantage of modelling Gaussian error, and yielding a closed form, linear solution: EIT algorithms of this form are known as one-step Gauss Newton solvers (GN) [52]. One important advantage is that image reconstruction can be expressed as matrix multiplication by R; since R may be pre-calculated, reconstruction is rapid and may be implemented in real-time. If either n m or n d are not quadratic, image reconstruction must be formulated iteratively.
In general, regularized EIT image reconstruction techniques are popular due to their flexibility to represent arbitrary body geometry, measurement configurations, and mathematical models of the conductivity distribution. Many papers have used the basic regularized formulation, and have explored the choices of parameter values. In this paper, the baseline reconstruction algorithm, R 0 , follows an early widely used regularized approach, NOSER [52], which makes the following parameter choices: it is quadratic, n m~nd~2 , data noise is modelled to be identical on each channel W~I, and the regularization matrix, L, is diagonal such that, diag L T L~diag J T J, which implies an assumption that inter-element correlations are zero.
The following section discusses various approaches to regularized image reconstruction (I), considering the baseline approach and its variants.
Data noise model (W): uniform / weighted. The most common approach is to "trust" all EIT data equally, and thus to model each channel with equal, uniform noise (W~I). It is not necessary to set the actual noise intensity, since the data to model noise trade-off is set by the hyperparameter, l. In practice, however, EIT noise can vary considerably between data channels, most likely due to variability in electrode contact quality. Such variations in data quality can be addressed by decreasing W on channels with larger noise. One practical approach to setting an appropriate W is to calculate the reciprocity error [53] (the difference between data values that should be equal by reciprocity Validation and Comparison Framework for Lung EIT PLOS ONE | www.plosone.org [46]) and to use this value to scale W. Clearly, modelling data noise would be expected to be more useful for lower quality data; we consider the EIT data in this study to be representative of relatively good measurements.
The baseline model sets W~I, and the variant model sets W using the reciprocity error approach [53], setting the parameter t~10 {5 . This parameter choice implied data were on average weighted at 0:66 of their weighting in the baseline model.
Data norm (n d ): Gaussian, n d~2 /robust, n d~1 . As mentioned, a Gaussian (n d~nm~2 ) model and data norms allow calculation of a linear matrix inverse. One disadvantage of this formulation is that it emphasizes larger errors (since the effect of an error is squared). One approach to address such errors is the use of a n d~1 data norm. While such as approach has been relatively recently proposed in medical EIT [54], it has been widely used in the geophysical EIT literature, and is referred to as "robust error norms".
The baseline model uses a Gaussian data norm n d~2 , while the variant uses n d~1 using the formulation of [54].
Prior model (L): diagonal / smoothing. The regularization matrix, L, is chosen to penalize unlikely solutions. The NOSER [52] regularization matrix is diagonal, and thus penalizes large image amplitudes. Several studies have suggested that a better prior model is a smooth, rather than simply low amplitude distribution [51,55,56]. To achieve such smoothing, L is designed to penalize non-smooth m, imposing a filter across nearby model elements. One key difference between an amplitude and a smoothing filter is its behaviour as a function of model element density. As density increases, amplitude penalties tend to show a more "speckled" pattern of noise, since each image element is independent in this case.
The baseline model sets L to penalize image amplitude [52], and the variant model uses the spatial high pass filter [51], in which L is calculated to apply Gaussian filter where spatial wavelengths below 10% of the medium diameter are penalized.
Prior model (L): diagonal / truncated SVD. The function of regularization may be understood via the singular value decomposition (SVD) of the Jacobian, J~UDV t , as the pseudo-inverse of J, with the elimination of small singular values. Here, U and V are orthonormal matrices and D is diagonal. Thus where The threshold T functions like a regularization parameter. We choose its value to enforce the noise figure constraint similar to the other algorithms. R TSVD forms a linear inverse and may be represented in the form of (4) with the values W~I (uniform independent noise), a very large value of l, and L equal to V t with its first T rows set to zero [57]. The baseline model sets L to penalize image amplitude [52], and the variant model uses the truncated SVD to set L as indicated above.
Model norm (n m ): Gaussian / Total Variation. One disadvantage of a Gaussian (n m~2 ) model norm is that it blurs edges in images, which is physiologically unrealistic, since organ boundaries are anatomically well-defined. This blurring can be addressed using the Total Variation (TV) formulation [58], with n m~1 . While TV produces apparently sharper images, there is some debate as to whether such images can give the appearance of features where none exist, especially in the context of EIT data errors.
The baseline model uses a Gaussian model n m~2 , while the variant implements TV n m~1 using the formulation of [54,58].
Optimization based EIT reconstruction. One recent EIT reconstruction algorithm formulation is GREIT [29]. This algorithm proposes an approach to image reconstruction designed to implement a set of requirements (or figures of merit) on which a group of experts reached consensus. The parameters of a linear reconstruction matrix, R, are chosen to best satisfy the listed requirements. This approach is similar in many ways to regularized image reconstruction; however it is the reconstruction matrix, R, rather than the image,m m, which is the target in the objective function formulation. Thus, we classify GREIT as an "optimization-based EIT reconstruction". The GREIT approach provides a number of different parameters which can be selected for an image; however, in this paper we choose the circular model reconstruction matrix evaluated in [29].
EIT algorithms evaluated. In order to test the large variety of possible combinations of EIT reconstruction algorithms, we evaluate a baseline algorithm, R 0 , and modifications of it in a single variation. The list of considered algorithms is summarized in  Table 3. We thus consider R 0 and nine variations, as well as SBP and GREIT, for a total of twelve algorithms.

Data analysis
Image reconstruction. Each EIT algorithm identified in Table 3 was implemented using EIDORS version 3.6 [30] and used to reconstruct real conductivity change in each EIT frame in each series of EIT scans, representing 60 seconds of recording for each animal and condition. The reference data set, v r was chosen to be an average of data scans at end-expiration. Images reconstructed onto an FEM were interpolated onto a 64|64 pixel grid (R SBP and R GR create a 32|32 pixel image directly).
Modern algorithms can be tuned to control the trade-off between resolution and noise performance, generally via a "hyperparameter". To fairly compare algorithms, it is essential that the individual hyperparameters are set to equivalent values using an objective measure. Several such measures have been proposed [50]. In this study we chose to tune all algorithms such that the noise figure (NF) measure, first proposed for EIT by [51], for changes half the model radius away from the centre was 1:0. This particular value was chosen as it matched the performance of the backprojection algorithm, which does not have a tunable hyperparameter.
Ventilation pattern evaluation. From each sequence of EIT data, an average end-inspiratory and end-expiratory data scan was calculated and used for subsequent analysis. The average data scan was calculated by, first, identifying candidate endinspiratory (end-expiratory) time points from the maximum (minimum) values in the average time course of EIT data. These candidate points were reviewed by a human operator in order to identify and, if necessary, reject physiologically infeasible time points. An example of identified (and rejected) events is shown in Fig. 4. The remaining points were averaged to create a single data scan, one for end-inspiration and one for end-expiration. Using these data, functional EIT scans were generated showing the distribution of local V T by plotting the average tidal differences between the end-inspiratory and end-expiratory values in corresponding pixel locations.
Lung areas (A L ) in images were identified as regions with an EIT ventilation signal above 20% of the maximum found in the image. This approach was recommended e.g. in [59]. These were aggregated over all 6 recordings to provide a single lung region of interest (ROI) for each animal and algorithm. The size and shape of the lung ROI varies for different algorithms, since some tend to "squeeze" image contrasts toward the image centre. For each measurement, we calculate V T by summing all pixel values within the lung ROI; we characterized the distribution of ventilation along the dorsoventral axis within the lung ROI as centre of ventilation (CoV), calculated analogously to centre of gravity, such that CoV~0 in the image centre and positive in non-dependent (ventral) lung regions; {1ƒCoV ƒ1. The CoV values depend on the position of the lung ROI within the EIT image. A decrease in CoV reflects a shift of ventilation distribution towards the dependent lung.
Statistical analysis. Statistical tests are performed as follows: to test for inequality (avb), we use the one-tailed t-test to test rejection of the null hypothesis, H 0 :a §b; to test for equality (a~b), we use the two-tailed t-test to test rejection of the null hypothesis, H 0 :Da{bDw0:1| 1 2 DazbD for V T , and H 0 :Da{bDw0:03 for CoV. Thus we consider EIT-derived V T values equal when their means differ by less than 10%. We treat mean CoV values equal when they differ by less than 0.03, which corresponds to half a pixel height in a 32|32 image. For each test a p value is calculated, and pv:05 is considered significant.
The t-test requires an assumption of a normal distribution of data. We investigated this assumption using two approaches. First, the distribution of each set of calculated parameters was tested against the normal distribution using the Kolmogorov-Smirnov normality test, using a~0:05. Next, data were analysed using the same protocol, but with the Wilcoxon signed rank test instead of the t-test.

Results
Images were reconstructed for all animals for each case and algorithm. Fig. 5 shows representative images of tidal change V T at PEEP and ZEEP in three animals under 21% F I O 2 .

Qualitative assessment
All algorithms successfully reconstructed the lungs as a central object with conductivity changing with ventilation (shown in blue as decrease in conductivity between expiration and inspiration). Images for different algorithms vary qualitatively in the shape of the lung region and the type and amount of artefacts in the images (Fig. 5). We qualitatively identify five types of artefacts: 1) boundary artefacts, especially near the electrodes, represented as image contrasts at the image edge; 2) speckle, showing high spatial frequency contrasts especially visible at the lung boundary (especially R dif , R TSVD ); 3) ringing, showing inverse contrast near larger image contrasts (ie. red regions adjacent to blue lungs) (R n , R mv , R TSVD ); 4) spatial distortions, which disturb the shape, visible especially in the shape of the lung regions (R dif ); and 5) streak artefacts, showing contrasting lines radially projecting toward the electrodes (R mv , R n , R SBP ). The presence of such artefacts disturbs the analysis of the images in several ways, primarily by introducing noise in the images, and by disturbing the analysis of the regional distribution of ventilation.
Images reconstructed with the baseline algorithm R 0 show the lungs as a single object. Although the images are generally smooth, boundaries of individual triangular finite elements on which the images are reconstructed prior to rasterization are clearly visible giving the appearance of implausible structures. This is true of all algorithms beside R SBP and R GR which reconstruct directly onto a pixel grid. Artefacts are clearly visible on the periphery near the electrodes (mostly in red).
Modifications of the numerical sensitivity model (algorithms R 3D to R mv ) generally have an obvious and pronounced impact on the reconstructed EIT images. Only images obtained with R 3D , which differs from R 0 only in that a 3D forward model is used, do not show appreciable difference with respect to R 0 . Reconstructions of difference data with R dif , as opposed to normalized difference data as in R 0 , show much more artefacts than the other algorithms. However, although the lung shape appears distorted, it shows clear separation between the two lungs. The replacement of homogeneous s r with one that includes lung tissue contrast in R bkg produces images with clearly separated lungs but slightly more artefacts than R 0 . Reconstructions with electrode movement R mv show virtually no boundary artefacts. However, the lungs appear slightly smaller and more round in comparison to R 0 .
Modifications of the parameters of regularized reconstruction also dramatically affect the appearance of EIT images. Noise weighting based on reciprocity in R n produced images with less artefacts than R 0 , but more than R mv . However, tidal changes seem to be pushed together and toward the centre resulting in comparatively smaller lungs. R HPF also offers limited reduction in boundary artefacts. In contrast, images reconstructed with R TSVD show increased artefacts compared with R 0 ; the lungs have a speckled appearance. As expected, R TV produced images with sharp boundaries. However, the resulting lung shape does not look plausible. Perhaps surprisingly, the use of the robust data norm with n d~1 in R L1 seems to offer no advantage in terms of boundary artefacts over R 0 in our data set, but it reconstructs the lungs as bigger and, in some cases, better separated. Images produced with R GR and R SBP stand out for their smoothness and lack of boundary artefacts. R GR is most comparable with R mv in terms of the shape of the lung region, although the lungs appear slightly smaller and less separable. Images obtained with R SBP are similar to those obtained with R GR , but the lungs appear even smaller while their shape is less smooth. It also seems to be distorted -tidal changes are pushed toward the centre and streak artefacts pointing outwards seem to show implausible structures.

Functional assessment
The changes in ventilation pattern apparent in the reconstructed images were tested against physiological references, listed in Table 4, where p values of the corresponding statistical tests are also reported. The p-values are based on the t-test, which assumes parameter values are normally distributed. We justify this assumption as follows. First, all parameter values pass the Kolmogorov-Smirnov normality test (at a~0:05). Second, we recalculated all p values using the Wilcoxon signed rank test, which showed substantially similar results; compared to the values in Table 4, the ordering of algorithms by p-value was close to identical for all parameter values.
The independence of V T from airway pressure was confirmed for most algorithms at F I O 2 of 100% (pv0:05 in Table 4). Notable exceptions are: R TV , R dif and, surprisingly, R bkg . However, at F I O 2 of 21% V T was less stable and only five algorithms passed the independence test (R mv , R n , R L1 , R GR and R SBP ). Nonetheless, V T was independent of F I O 2 for all algorithms. Repeated measurements at the same settings showed very little variation in V T , which was reproducible for all algorithms.
At 21% F I O 2 , no significant decrease in CoV associated with the introduction of PEEP was detected, save by the R n and R GR algorithms. In contrast, at 100% F I O 2 , the distribution of ventilation was significantly more skewed towards ventral lung regions at ZEEP as compared to PEEP. This is reflected in the decrease in CoV associated with the introduction of PEEP, which was detected by all algorithms but R bkg and R L1 where it narrowly escaped significance. Accordingly, we found that the increase in F I O 2 from 21% to 100% produced for all algorithms a significant increase in CoV, i.e. a shift of ventilation distribution towards ventral regions, at ZEEP but not at PEEP. Repeated EIT recordings with the same ventilator settings produced equivalent values of CoV for all algorithms at PEEP. At ZEEP, CoV was reproducible at pƒ0:05 only for six algorithms (R bkg , R n , R L1 , R TV , R GR and R SBP ), although only for R dif and R TSVD did p exceed 0:1.

Discussion
EIT shows significant promise as a technique to monitor regional changes in lung ventilation, and to improve patient outcomes by informing ventilator therapy [4]. Since EIT is a challenging mathematical problem, many advanced reconstruction procedures have been developed, each offering potentially useful advances in terms of image quality or robustness against data noise. However, such advanced reconstruction schemes have seen little application to clinical or experimental EIT research, with the exception of GREIT [29] algorithm, which has seen some recent use. In this paper, we propose a framework for functional validation and comparison of lung EIT images. This framework is then used to evaluate whether some advanced algorithms for EIT image reconstruction improve its ability to resolve the distribution of lung ventilation. As proposed, our validation focuses on EIT's ability to represent two key parameters, V T and CoV, in accordance with known physiology. These are currently the most important for the proposed clinical applications of EIT; they have been applied in recent work by e.g. [60][61][62]. Additionally, our framework permits straightforward addition of new parameters by an extension of the evaluation software. Rather than using simulations or a saline phantom (e.g. [63]), our approach is more clinically orientated in that we use measures based on an in vivo experimental model.
The key innovation is that we directly test reconstructed images against physiological "knowns". In particular, we test for the presence of significant changes in the reconstructed images as a result of experimental interventions with known physiological consequences. In the current study we do not assess the magnitude of these changes, although the presented framework could be extended in the future to include such assessment, once a number of technical difficulties are overcome. These include the use of normalized difference data, which hinders quantitative comparison of image pixel values to expected conductivity changes in experimental data; and the use of incorrect model geometry (discussed further below) resulting in spatially distorted images.
We consider algorithms based on a sensitivity model of EIT, which can be expressed in terms of a direct minimization of an error expression (eqn. 3). This includes many possible "ingredients": the dimension and background conductivity of the FEM; electrode movement modelling and reconstruction; time-difference vs. normalized time-difference measurement; data noise model; data and model norms; assumptions about the conductivity distribution imposed through the prior model; and method of calculating the pseudo-inverse of the sensitivity matrix. We do not consider all available algorithms; direct inversion methods (such as D-bar), those based on a Bayesian framework or non-linear sensitivity models (e.g. level set, monotonicity, or factorization) were not evaluated. We have designed our evaluation software to make it straightforward to add such comparisons as implementations of these algorithms become available. Even with the nine factors considered, it would not be possible to evaluate the full set of possible algorithms; instead we define a baseline algorithm and variants of it, yielding twelve algorithms, which were evaluated based on two approaches. First, representative images of ventilation were studied visually to assess the relative amount and character of artefacts as well as the shape of the region showing tidal changes. Second, measures of tidal changes and their distribution in the reconstructed images were tested against known physiological behaviour in experimental data.
To fairly compare algorithms we chose their hyperparameters, which control the trade-off between resolution and noise performance, such that the value of the noise figure (NF) metric [51] for a specific change in of R SBP . Still, because the NF of algorithms varies across the image, sometimes greatly, this approach to comparison is not completely fair for conductivity changes elsewhere in the domain. Moreover, this high value of NF means that for some of the tested algorithms the amount of regularization was insufficient to fully appreciate the nature of the prior.
Based on the results in Table 4, algorithms which exhibit a better p-value are interpreted as offering an improved correspondence to known physiological "ground-truths", and are thus better. We do not select an overall "best", as that would amount to ranking the relative importance of the different tests, which is problematic. For example, the ability to accurately represent V T may be more important relative to CoV in some clinical applications, but not others. Overall, we note that there is a general trend in which a ranking of algorithms by p-value performance in various tests are substantially consistent. Based on the images in Fig. 5, we make the following observations: 1) Artefacts on the image boundary are reduced in R mv , R GR and R SBP , since these algorithms explicitly compensate for noise close to the electrodes; 2) The lung regions are better separated in R bkg which exhibits increased sensitivity in the lung regions "pulling" them apart; 3) Reconstructions using difference -rather than normalized difference -data exhibit strong artefacts, likely due to the variation in gain on individual channels in the Goe-MF II EIT system; 4) Numerous "streak" artefacts are generated by R SBP , which are a result of wrong assumptions in the backprojection sensitivity model; 5) R TSVD , the only algorithm whose regularization is represented in terms of sensitivity matrix singular valuesand thus not on the image space -produces strong "speckle" artefacts; 6) Algorithms using a spatial filter (R HPF and R GR ) produce spatially smoother images; 7) On the other hand, R TVdesigned to preserve boundaries -creates an image with implausible shapes; 8) The use of a 3D forward model, R 3D vs. R 0 , shows little appreciable difference despite the fact that EIT data from a 3D domain is incompatible with a 2D model [64], since the use of a circular shape is incorrect in both cases.
Overall, this means the variation in appearance of images reconstructed from the same data with different algorithms is not negligible. This poses a challenge to efforts to analyse the images at the level of individual pixels to derive diagnostic information. At the same time, the strength of our functional validation is evident in that we were able to detect clinically relevant physiological changes in spite of the varied appearance of the images. This is in contrast to methods based on comparison with images more faithful to anatomy that are primarily sensitive to morphological distortions.
Our results address two important questions: 1) Do advanced image reconstruction algorithms provide advantages over the original Sheffield backprojection when applied to in vivo data? Yes, some advanced algorithms offer a small, but significant advantage. Of the linear, one-step (i.e. fast) algorithms considered, R GR is the current best choice. Several other algorithms also offer advantages, such as those that consider electrode movement and the background lung conductivity; however, these "ingredients" can -and should -be included in an improved GREIT-like algorithm. Considering both the images and the significance values in our tests, the best approach (but at the cost of significantly higher computational complexity) is R L1 , which uses robust error norms. We also note that several algorithms do not perform well. 2) Do clinically relevant EIT measures characterizing lung ventilation depend on the algorithm used? For ventilation therapy, EITs clinical relevance is driven by its ability to determine information on the changes in volume and its distribution, which we seek to capture in our evaluation framework. A first reading of Table 4 would indicate that the choice of algorithm affects the significance. Many algorithms have significantly worse p-values than the original R SBP , with which most clinical and experimental results have been analysed (ie. R 0 , R 3D , R dif , R bkg , R TSVD , R HPF , R TV ). The remaining algorithms (R mv , R n , R L1 , R GR ) show similar, or in some cases improved, p-values to R SBP . This group of algorithms may be characterized as "advanced in that they model additional image and sensor characteristics above that available in a classic GN regularized formulation. We note that the RSBP algorithm works surprisingly well given its vintage and ad-hoc formulation. Overall, our analysis does not suggest reconsideration of the validity of previous analyses of EIT data.
Our analysis is limited in that we only considered 2D electrode placement and adjacent injection and measurement patterns. These have prevented us from evaluating 3D reconstructions and limited the sensitivity in the centre of the image [49]. This, however, remains the de-facto standard for in vivo studies. We also purposefully used a circular (cylindrical in 3D) forward model for all algorithms, as this is what R SBP assumes, although using accurate body shape (e.g. derived from CT data) is known to decrease artefacts and improve positional accuracy [28,65]. The effects of using a circular model do not affect our analysis of tidal volume changes, since V T is a scalar not affected by the distribution of pixel values within the ROI used for its calculation. Furthermore, because CoV is a gross measure also defined on a ROI, it is unlikely that using an anatomically accurate model would substantially improve the performance of the evaluated algorithms in detecting shifts in CoV. However, the use of circular models is likely to have adversely affected our analysis of background modelling in R bkg , since realistic organ shape could not be incorporated into the forward model. Last but not least, our analysis considered the effect of each "ingredient" individually. Clearly, a combination of the more promising ones is likely to yield a better reconstruction algorithm than any of those tested here. However, naïve combinations are unlikely to produce good results. Instead, careful calibration and testing of many combinations will be needed, for which, we hope, the proposed evaluation framework will prove a useful tool.

Conclusions
We present a methodology to experimentally evaluate lung EIT algorithms, and use it to analyse and compare recent work. The methodology is based on testing the algorithms ability to correctly detect physiological changes (or lack thereof) in a bespoke experimental data set. The basic idea behind this methodology allows it to be easily expanded to incorporate additional tests in the future. Several promising algorithm ingredients are identified; and we recommend research to evaluate approaches combining these factors, as well as other, non-linear algorithms. In order to support such investigation, we release the data and software for this study under a free license, so that it can be used and modified by others (http://sf.net/p/eidors3d/code/HEAD/tree/trunk/pubs/papers/ compare-algs-2014).