^{1}

^{¤a}

^{2}

^{3}

^{¤b}

^{1}

^{1}

^{2}

^{2}

^{2}

^{3}

^{1}

^{*}

^{¤c}

VJE is employee of Novartis Pharma AG, Basel, Switzerland, and owns stocks of Novartis. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials. Please also note that co-author HF is a PLOS ONE Editorial Board member. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.

Conceived and designed the experiments: VAB AF NK JMH HF. Performed the experiments: VAB VJE AF TR SR JMH. Analyzed the data: VAB AF HF. Contributed reagents/materials/analysis tools: SR NK JMH. Wrote the paper: VAB VJE JMH HF.

Current address: Institute of Radiology, County Hospital Aarau, Aarau, Switzerland

Current address: Novartis Pharma AG, Basel, Switzerland

Current address: Research Center Borstel, Leibniz Center for Medicine and Biosciences, Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Borstel, Germany

Inflammatory cell numbers are important endpoints in clinical studies relying on endobronchial biopsies. Assumption-based bidimensional (2D) counting methods are widely used, although theoretically design-based stereologic three-dimensional (3D) methods alone offer an unbiased quantitative tool. We assessed the method agreement between 2D and 3D counting designs in practice when applied to identical samples in parallel.

Biopsies from segmental bronchi were collected from healthy non-smokers (n = 7) and smokers (n = 7), embedded and sectioned exhaustively. Systematic uniform random samples were immunohistochemically stained for macrophages (CD68) and T-lymphocytes (CD3), respectively. In identical fields of view, cell numbers per volume unit (N_{V}) were assessed using the physical disector (3D), and profiles per area unit (N_{A}) were counted (2D). For CD68^{+} cells, profiles with and without nucleus were separately recorded. In order to enable a direct comparison of the two methods, the zero-dimensional CD68^{+}/CD3^{+}-ratio was calculated for each approach. Method agreement was tested by Bland-Altmann analysis.

In both groups, mean CD68^{+}/CD3^{+} ratios for N_{V} and N_{A} were significantly different (non-smokers: 0.39 and 0.68, p<0.05; smokers: 0.49 and 1.68, p<0.05). When counting only nucleated CD68^{+} profiles, mean ratios obtained by 2D and 3D counting were similar, but the regression-based Bland-Altmann analysis indicated a bias of the 2D ratios proportional to their magnitude. This magnitude dependent deviation differed between the two groups.

2D counts of cell and nuclear profiles introduce a variable size-dependent bias throughout the measurement range. Because the deviation between the 3D and 2D data was different in the two groups, it precludes establishing a ‘universal conversion formula’.

Airway inflammation is a characteristic feature of chronic airway diseases like asthma and chronic obstructive pulmonary disease (COPD). Studies aiming at unravelling the pathophysiological mechanisms of these entities or at the clinical evaluation of drugs with anti-inflammatory or disease-modifying activity require the implementation of techniques for the reliable quantification of the inflammatory and/or ‘inappropriate remodelling’ processes of the airways

Many attempts have been made to standardise all steps of the procedure, including sampling of the airway tree, excision, processing and sampling of the specimen and analysing the histology

Whereas the general advantages and disadvantages of 3D versus 2D approaches were discussed elsewhere

We further describe an experimental design for the analysis of endobronchial biopsies, which allows obtaining multiple section series from one biopsy, in accordance with the principles of systematic uniform random sampling. Thus, in a given study several section series, each of them representative of the whole biopsy, can be obtained and assigned to different histochemical or immunohistochemical stainings.

In this study we investigated endobronchial biopsies from 7 healthy non-smokers and 7 smokers. None of the included subjects suffered from acute bronchitis within 4 weeks before the investigations. All subjects were volunteers who gave their written consent after being fully informed about the purpose and nature of the investigations. This study was approved by the ethics committee of Hannover Medical School (Hannover, Germany).

The subjects received premedication according to the routine protocols: 0.2 mg aerosolized salbutamol, fractionated intravenous midazolam (0.05 mg/kg) and 3 ml nasal topical lidocaine 4%. The healthy non-smokers underwent inhalative bronchial anaesthesia with 2.5 ml lidocaine 4% by electronically controlled and regulated inhalation using the AKITA® inhalation system, while the smokers received local anaesthesia of the bronchial mucosa during the bronchoscopy using lidocaine 2% up to a maximal dose of 6 mg/kg as previously described

The collected biopsies underwent fixation in 4% phosphate-buffered formaldehyde overnight. After transfer into 2% aqueous agarose, the biopsies were embedded in paraffin wax. The paraffin blocks were exhaustively sectioned using a motorized rotary microtome (HM355S, Microm International GmbH, Walldorf, Germany) with a 2-μm average block advance (BA), calibrated by means of a digital calliper measuring the block height before and after cutting 500 sections at a given microtome setting. Every three consecutive sections were mounted on numbered glass slides. The contribution of the variation between biopsies of the same airway generation to the total variability is very low ^{th} or 20^{th} slide, depending on the size of the biopsy, was sampled in a slide series with a random outset between the 1^{st} and the 9^{th} or the 20^{th} slide of a biopsy, respectively (

After exhaustive sectioning, every three sections were mounted on numbered glass slides (1 to 28 in this example). With a random outset between the 1^{st} and the 9^{th} slide, nine slide samples, each consisting of every 9^{th} glass slide, were collected and stained.

The collected samples were used to identify T-lymphocytes and macrophages, respectively: one sample was stained for CD3^{+} (polyclonal rabbit anti-human 1∶100, DAKOCytomation, Glostrup, Denmark) and the other for CD68^{+} (monoclonal mouse anti-human PG-M1 1∶100, DAKOCytomation) cells as previously described

All cell counts were conducted on a computer-linked Olympus BX 51 light microscope equipped with a motorized stage and the CAST-Grid 2.01 system (Olympus, Ballerup, Denmark) using oil immersion lenses. The final magnifications were 1,400×(CD68^{+}) and 2,100×(CD3^{+}) with a numerical aperture setting of 1.00 and 1.40 respectively, in order to minimize the depth of field. The reference compartment was confined to the lamina propria of the airway mucosa for both cell types. The stained T-lymphocytes and macrophages were quantified over the entire sample by performing the 2D and 3D counting simultaneously.

For 3D counting, the physical disector was used by analysing two consecutive sections: a reference and a look-up section _{V}), was estimated for each biopsy and cell type according to:

Red triangles mark cell profiles seen in the reference section which are not present in the look-up section (bidirectional counting); green circles mark all cell profiles seen in the right section; yellow squares mark each assessed counting frame/field of view. The cell profile cutting the lower exclusion (red) line is not counted either in 3D or in 2D.

The 2D profile counting was performed on one of the two sections, on the same fields of view sampled for 3D counting (_{A}) was estimated for each biopsy and cell type according to:

For each subject and selected biopsy, N_{V} [mm^{−3}] and N_{A} [mm^{−2}] were calculated as discrete values accompanied by the coefficients of error (CE) calculated with the quadratic approximation formula (data not shown), which takes into account the nugget effect, i.e. the discontinuous distribution of cells, which tend to form clusters rather than being randomly distributed

The observed variance (OV) of the estimates has two contributions: (i) the inherent variation between the individuals (biological variability) and (ii) the variation introduced by the employed sampling scheme, which is depicted by

The two cell counting methods deliver results with different physical dimensions (mm^{−3} and mm^{−2} respectively) and very different magnitudes. To allow for a direct comparison of the 3D and 2D approach only, zero-dimensional ratios between the densities of two cell populations were calculated using each method. To avoid the pitfall of a potential size-bias similarly affecting both terms of the ratio, two cell populations with clearly different mean sizes were investigated: macrophages and T-lymphocytes. The mean ratio values are reported for each group of subjects. The CEs of the ratios (CE_{r}) were calculated as the square root of the sum of squared CEs of the ratio terms. Mean ratios are accompanied by mean CE_{r} (_{r}.

All statistical analyses were performed using SigmaStat 3.1 (Jandel Scientific, Erkrath, Germany). The Kolmogorov-Smirnov test was used to verify the data for a normal distribution. The equality of variances was tested by the variance ratio test (F-test). Parametric testing was then applied to data drawn from normally distributed populations with equal variances. Otherwise, non-parametric tests were employed. Pearson's correlation coefficient (r) was used to test the relationship between 3D and 2D density estimates. For each group of subjects, each of the 2D approaches and the physical disector design were tested for differences of the mean CD68^{+}/CD3^{+} ratios using Wilcoxon's signed rank test. The mean CD68^{+}/CD3^{+} ratios obtained by 2D cell profile counting were tested for differences between the two groups by Mann-Whitney's non-parametric rank sum test, after standardisation by dividing them by the corresponding 3D mean ratios.

The method agreement was tested for interchangeability of the results using the Bland-Altman analysis _{y|x} (standard error of the estimate), in a manner similar to the definition of the 95% limits of agreement

The subjects' demographic and clinical data are shown in _{1}/FVC ratios; the other 4 subjects (2 males, 2 females) had FEV_{1}/FVC ratios<70% (58.1%–66.8%) and were diagnosed with COPD stage 1 according to the GOLD criteria

Group | Non-smokers | Smokers |

7 | 7 | |

4/3 | 4/3 | |

Mean ± SD | 30.9±6.96 | 46.7±7.91 |

Range | 25–42 | 40–61 |

_{1} (L) |
||

Mean ± SD | 4.6±0.59 | 3.4±0.96 |

Range | 3.80–5.43 | 2.35–4.69 |

_{1}/FVC (%) |
||

Mean ± SD | 81.7±2.61 | 68.5±9.2 |

Range | 78.8–86.3 | 58.1–80.2 |

0/7 | 4/7 | |

Median | 0 | 33 |

Range | 0.0–0.9 | 23.4–54.4 |

Group | Cell Type | N_{V} (mm^{−3}) |
N_{A nucleus} (mm^{−2}) |
N_{A cell} (mm^{−2}) |
|||

Mean | mean | mean | |||||

non-smokers | CD68^{+} |
85987 | 9.7% | 350 | 10.1% | 569 | 7.6% |

CD3^{+} |
228612 | 9.3% | N. A. | N. A. | 931 | 9.3% | |

smokers | CD68^{+} |
46025 | 11.5% | 163 | 12.4% | 534 | 6.6% |

CD3^{+} |
91870 | 10.4% | N. A. | N. A. | 322 | 11.2% |

In both study groups, N_{A} and N_{V} were very strongly and significantly correlated for both T-lymphocytes (_{A}), whereas in the 3D approach cell numbers per volume unit (N_{V}) were obtained, different scale units precluded direct statistical testing of the differences or the agreement between these methods. To overcome this problem the dimensionless ratio between CD68^{+} and CD3^{+} counts was calculated by each approach. The ^{+}/CD3^{+} ratios obtained from 3D and 2D cell profile counts showed statistically significant differences (^{+} cell profiles containing a nucleus, the mean results of the 3D and the 2D nuclear profile approaches were very similar and the level of significance was not reached: non-smokers

(a) T-lymphocytes, non-smokers, r = 0.84, _{nucleus} = 0.95, _{cell} = 0.76, _{nucleus} = 0.98, _{cell} = 0.89,

Group | CD68^{+}/CD3^{+} |
CD68^{+}/CD3^{+} |
CD68^{+}/CD3^{+} |
|||

3D | 2D nucleus | 2D cell | ||||

Mean | Mean | Mean | ||||

non-smokers | 0.39 | 13.4% | 0.43 | 13.7% | 0.68 | 12.0% |

smokers | 0.49 | 15.5% | 0.50 | 16.7% | 1.68 | 12.9% |

The agreement was assessed by plotting the differences between the ratios from the two approaches against their mean (i.e. magnitude) for each subject (_{s} = 0.89 for the non-smoker group and r_{s} = 0.79 for the smoker group, both statistically significant (non-smoker _{r}<0.001, smoker _{r} = 0.025). In the non-smoker group, the ratio means reflect 91% of the variability in the ratio differences, as measured by the coefficient of determination r^{2}. The differences between the two methods tended to be negative for low magnitudes and positive for high values. The linear regression of the differences (

Dashed line y = 0 represents the line of equality, which stands for perfect agreement. (a) Regression based mean difference (bias) and 95% limits of agreement for the differences of the CD68^{+}/CD3^{+} cell density ratios as determined by the 2D nucleus and 3D approaches in the non-smoker group. All values lie within the interval between the calculated 95% limits of agreement; (b) Regression based mean difference (bias) with 95% C.I. of the regression line (dotted) for the differences of the CD68^{+}/CD3^{+} cell density ratios in the smoker group. The 95% C.I. includes several horizontal lines (slope = 0) so that the fitted linear model does not achieve the desired statistical significance. Two large outliers encircled; (c) Regression based mean difference (bias) and 95% limits of agreement for the differences of the CD68^{+}/CD3^{+} cell density ratios as determined by the 2D nucleus and 3D approaches in the smoker group after removing the two large outliers. All values lie within the interval between the calculated 95% limits of agreement. Notice the similar slope to the fitted model in

_{y|x} = 0.053 the regression based 95% limits of agreement were

This falls under the criteria of acceptance for the 95% limits of agreement set to

In the smoker group fitting a linear regression model showed that the ratio means account for only 41.4% of the variability of the ratio differences, as measured by the coefficient of determination r^{2}. Regarding the regression equation, the chosen level of significance was reached neither for the slope (regression coefficient), nor for the analysis of variance (F-test): ^{+} and CD3^{+} N_{V} (the lowest in our sample) and therefore high CE and CE_{r}. Because this very high measurement error is likely to be a strong confounder in a sample of n = 7, we excluded these two subjects and then repeated the regression analysis of the differences on the means. This led to a remarkable improvement of the fitted model, with the mean ratios reflecting 98.3% of the variability in the ratio differences. The regression equation of _{y|x} = 0.011 to calculate the regression based 95% limits of agreement as

The equations were tested to see if the regression follows the same model in both study groups. The difference between the regression coefficients of Eq. 1 and 2 was not statistically significant: 95% C.I. [−0.396; 0.504]. The common (or weighted) regression coefficient was computed: b_{c} = 0.736. The two intercepts of Eq. 1 and 2 showed a statistically significant difference (

Endobronchial biopsies have been widely used for quantitative assessments of inflammation and the related structural changes in chronic inflammatory airway diseases

The present study addresses the issue of agreement between the data supplied by the widely used 2D cell or nuclear profile counting and those relying on 3D cell counts. Because size and its variation are thought to be a major source of bias

Prior to assessing the accuracy, quantified by the systematic error or bias, one should demonstrate adequate precision, quantified by the random measurement error. The estimated CE (inherent counting noise) for the 2D and 3D densities were acceptable with regard to the biological variability of the samples ^{2}(method)≤0.5 CV^{2}(biological), and efficiency considerations means that it is wasteful of resources to make CE(method) << CV(biological)” (i.e., the “do more less well” paradigm)”

The counted entities were bidimensional cell transects in one case and three-dimensional cells in the other case. The two designs delivered results with very different orders of magnitude, mostly 10^{2} for 2D and 10^{4}–10^{5} for 3D counts, and expressed in different scale units: mm^{−2} and mm^{−3} respectively. This is an inherent problem in biopsy research, which has to rely mostly on cell densities, as the reference volume is not known and therefore no absolute cell numbers can be derived. Caution is necessary in the interpretation of density data in order to avoid the ‘reference trap’, when the unknown reference volume is prone to different changes during pathophysiological processes or tissue processing and thus alters the density values without any change in the absolute quantities.

It is obvious that the two data sets cannot substitute each other, although they display very strong positive correlations (_{A} and N_{V} is described by the mean cell height perpendicular to the section plane

Although regression was proposed as a tool for the evaluation of agreement when the two methods of measurement have different units _{V}) from the value obtained by the alternative method (N_{A}). While regression analysis allows calculating a 95% prediction interval, something akin to the limits of agreement of the Bland-Altman analysis, it is still ‘blind’ to a systematic error, i.e. bias

Thus, there is no way that would allow directly comparing the outcomes of the two designs for a single cell population.

Because the two approaches delivered data with different scale units we attempted to eliminate them by calculating a relative variable, which would be zero-dimensional and allow a direct comparison of both methods. This is represented by the ratio of CD68^{+} to CD3^{+} counts for each approach. At this point we would like to emphasize we do

As the 2D and 3D counting were performed simultaneously, i.e., on the same fields of view, one would expect the zero-dimensional ratios of macrophages to T-lymphocytes to be fairly close (accounting for the inherent random measurement error) if no bias were present. This is frequently regarded as the null hypothesis of a statistical analysis based on hypothesis testing. Besides correlation analysis this is another inappropriate approach for method comparison studies ^{+} macrophages) by the factor of 1.7 to 3.4 in the two study groups (

Assuming that the nucleus size varies less than the cell size, opting to count only cell transects whose nucleus appears in the section plane theoretically should reduce the size-bias

A simple and robust solution for the comparison of different methods was suggested by D.G. Altman and J.M. Bland more than two decades ago

Plotting the differences of the ratios by the two methods against their means as shown in

In the smoker group the fitted linear regression model did not reach the chosen level of significance of α = 0.05. Hence, we cannot conclude that the ratio differences in the smoker sample follow the linear distribution described by the regression equation. This can also be visualized by drawing the 95% confidence interval of the regression line – between the two curves one could also fit several horizontal lines, which would contradict a relation between the dependent variable ^{+} and CD3^{+} N_{V} (the lowest in our sample), the counting results were very low and therefore the CE quite high in both designs. This also led to high CE_{r} (up to 25%) of the calculated ratios. As this high measurement error is likely to be a strong confounder in a sample of n = 7, we decided to exclude these two subjects and then repeat the regression analysis of the differences on the means. This led to a remarkable improvement, confirming the contribution of the independent variable (

In an eye-gauge attempt to assess the behaviour of the 2D bias in different populations, we noticed that the coefficients of the Eq. 1 and 2 appear to be similar. Subsequent formal testing revealed a significant difference between their intercepts even in our small groups. Thus, the magnitude dependent deviation of the 2D estimator from the 3D gold standard is described by a different equation in each group.

Summarizing, even though the differences between the mean ratios of N_{V} and those of N_{A nucleus} were not statistically significant and they showed a consistent correlation (