Reference Charts for Fetal Cerebellar Vermis Height: A Prospective Cross-Sectional Study of 10605 Fetuses

Objective To establish reference charts for fetal cerebellar vermis height in an unselected population. Methods A prospective cross-sectional study between September 2009 and December 2014 was carried out at ALTAMEDICA Fetal–Maternal Medical Centre, Rome, Italy. Of 25203 fetal biometric measurements, 12167 (48%) measurements of the cerebellar vermis were available. After excluding 1562 (12.8%) measurements, a total of 10605 (87.2%) fetuses were considered and analyzed once only. Parametric and nonparametric quantile regression models were used for the statistical analysis. In order to evaluate the robustness of the proposed reference charts regarding various distributional assumptions on the ultrasound measurements at hand, we compared the gestational age-specific reference curves we produced through the statistical methods used. Normal mean height based on parametric and nonparametric methods were defined for each week of gestation and the regression equation expressing the height of the cerebellar vermis as a function of gestational age was calculated. Finally the correlation between dimension/gestation was measured. Results The mean height of the cerebellar vermis was 12.7mm (SD, 1.6mm; 95% confidence interval, 12.7–12.8mm). The regression equation expressing the height of the CV as a function of the gestational age was: height (mm) = -4.85+0.78 x gestational age. The correlation between dimension/gestation was expressed by the coefficient r = 0.87. Conclusion This is the first prospective cross-sectional study on fetal cerebellar vermis biometry with such a large sample size reported in literature. It is a detailed statistical survey and contains new centile-based reference charts for fetal height of cerebellar vermis measurements.


Introduction
New concepts regarding the anatomic development of the cerebellar vermis (CV) were recently introduced [1], changing the diagnostic approach of both CV and posterior fossa anomalies. These have improved the differentiation between normal and pathological conditions during fetal life [2] [3]. Imaging of the fetal posterior fossa is an integral part of a routine anomaly scan [4]and its anomalies are associated in 75% of cases with other structural malformations, chromosomal and genetic diseases responsible for high mortality, cognitive, language and behavioral dysfunction among the children affected [5][6][7][8]. The posterior fossa is studied with axial planes which highlight the CV as an echoic structure between the two cerebellars hemispheres [9] [10]. In these multiplanar images, the biometry of the cisterna magna, the closure of the 4 th ventricle and the transcerebellar diameter must be also evaluated, although the midsaggittal plane is the most important plane since it allows a better visualization of the CV [11].The prevalence of posterior fossa malformations diagnosed during the neonatal period is estimated to be 1:5000 live births [12] and posterior fossa malformations on imaging are now the most commonly diagnosed brain malformations in utero, though the actual incidence is unknown. The most frequent of these malformations, Blake's pouch cyst (BPC), vermian hypoplasia and Dandy-Walker malformation(DWM), have a similar ultrasound appearance, but with a very different prognosis DWM is the most common disease with an incidence ranging from 1:25000 to 1:35000 [13]. It is characterized by partial/complete agenesis of the CV, cystic dilation of the fourth ventricle, and an enlarged posterior fossa combined with a superior displacement of the cerebellar hemisphere [14].
The BPC is characterized by 1) normal anatomy and size of the vermis in a mid-sagittal section of fetal brain; 2) mild to moderate up-ward rotation of the vermis in amid-sagittal section of fetal brain; 3) normal size of the cisterna magna in both mid-sagittal and axial sections of fetal brain. Generally BPC has a normal neurodevelopmental outcome. Finally CV anomalies could be also present as an isolated finding with complete or partial vermian agenesis [15] [16] causing extremely variable clinical manifestations during the neonatal period which range from mild to severe mental retardation and psychomotor delay [17][18][19][20].
Despite the well-defined ultrasonographic findings, there are still many misdiagnoses of both DWM,BPC and vermis isolated anomalies [11] and the most challenging differential diagnosis for BPC is the hypoplastic form of the CV, because it is based mainly on the size of the vermis. Hence, when a CV anomaly is suspected, the definition of CV size is crucial for the diagnosis of the partial/hypoplastic forms or BPC. Finally, the exclusion of associated structural malformations and Central Nervous System (CNS) defects remains the most important aspect of prenatal counseling and poses a challenge for all obstetricians.
With this in mind, we aim to provide reference charts of fetal CV height from a prospective cross-sectional study and to evaluate the robustness of the proposed reference charts when making various distributional assumptions.

Materials and Methods
The institutional review board (Comitato Etico per la Ricerca Scientifica) approved the study and written informed consent was obtained from all patients (PRSV-09-0013-13/07/2009).
We carried out a prospective cross-sectional study between September 2009 and December 2014, at the ALTAMEDICA Fetal-Maternal Medical Centre, in Rome. We enrolled all the women who came to our institution who had requested a fetal biometric evaluation without making any selection of this population. The women involved in the study were between 20 +0 and35 +6 weeks of gestation.
We excluded patients with hypertensive disorders, diabetes mellitus, multiple pregnancy, who had a fetus with an abnormal karyotype and/or congenital malformations, or no availability of first-trimester dating based on crown-rump length (CRL) [21]. Then, we excluded women with a history of congenital CNS abnormalities or those with fetuses suspected to be at risk for any CNS anomaly (potential maternal alcohol abuse, chromosomal abnormalities, single mutant genes, maternal diabetes mellitus, radiation exposure, congenital infections, and a parent or previous sibling with a neural defect). Finally, according to the cross-sectional rules, each fetus was included only once. If the CV of a fetus was measured more than once during pregnancy, only one measure was randomly chosen, while the remaining were not used to fit the statistical models. This was done to avoid any selection effect due to non-random choices or to considering all measurements for a single fetus, where fetuses with more than one measurement would be over-weighted.
All measurements were performed by eleven different gynecologists with at least 9 year of experience in fetal ultrasound and about 1000-scans/per-year. (MG, PC, FP, MM, LM, LD, VM, CB, RV, CG, CC). In all cases, General Electrics medical System Voluson 730 Pro or General Electrics medical System Voluson E8 with a 2D (4.5-16.5 MHz) trans-abdominal probe was used. When the visualization of the CV was difficult (i.e. high maternal body mass index or vertex presentation of the fetus), the exam was performed with a 2D (5-9 MHz) transvaginal probe. Each operator used both ultrasound systems during the study period.
The transducer was oriented basing on fetal head position. The head was usually viewed from a slightly posterior angle. This plane is obtained at a slightly lower level than that of transventricular plane and with a slight posterior tilting. An axial transcerebellar view was obtained at the level of the fourth cerebral ventricle. The beam was directed through the posterolateral (mastoid) fontanel to minimize shadowing. The depiction of the cerebellar vermis was based on the demonstration of serial axial planes with slight angulations between them, to demonstrate the portions of the cerebellar vermis. The cerebellar vermis was visualized in a mid-sagittal plane as an hyperechoic structure delimited anteriorly by the 4 th ventricle and posteriorly from the cisterna magna. In order to obtain a precise mid-sagittal plane, the corpus callosum should be clearly visualized anteriorly. The size of the cerebellar vermis was measured directly from a magnified view using 0.1 mm resolution. The vermis height was defined as the maximum distance between the most cranial portion of the culmen and the most caudal portion of the uvula. [22][23][24] (Fig 1).
The primary aim was to produce reference charts using the following statistical methods: a) Normal-based reference curves; b) Generalized Additive Models for Location, Scale and Shape (GAMLSS); c) Box-Cox t distribution and d) Nonparametric quantile regression. A further aim was to evaluate the robustness of the proposed reference charts when making various distributional assumptions on the ultrasound measurements at hand comparing the gestational age specific reference curve produced by the previously described statistical methods. Statistical analysis was performed using R 2.15.0 [25].In particular, the following packages were used: quantreg [26], scam [27], gamlss [28] and ggplot2 [29].

Normal-based reference curves
In spite of the recent trend towards nonparametric reference distributions, the Normal distribution or some simple transformation towards normality is still the most commonly used model. A key feature of such a model is the availability of standard deviation estimates and Zscores, which represent the centile position corresponding to individual measurements transformed to the normal deviate scale, which cannot be calculated with quantile regression. If the model is correctly specified, the Z-scores follow, at least approximately, a standard Normal distribution independent of age. More specifically, Normal-based centile curves are given by μ GA ± κσ GA where μ GA and σ GA are, respectively, the gestational age (GA)-specific mean and standard deviation (SD) of (Y/ GA), while kis an appropriate percentage point of a standard Normal. The Z-score (or standardized residual) for (Y/ GA) is defined as (Y-μ GA )/σGA. Suitable regression functions may be used to model μ GA and σ GA with parameters estimated by, for example, maximum likelihood. As suggested by Royston and Wright [30], in this paper we have considered t cubic polynomials to model μ GA and σ GA , adopting the Akaike Information Criterion (AIC) to select the more appropriate model. More in detail, to estimate σ GA , we have regressed the so called "scaled absolute residuals"-that is, the absolute residuals multiplied by ffiffiffiffiffiffiffiffi p=2 p on GA. Standard diagnostic plots and tools were also considered.

GAMLSS and Box-Cox t distribution
Usually, data may require a suitable transformation to achieve approximate normality. Since biometric measurements tend to follow a (positively) skewed normal distribution at a given GA, the data are usually log-transformed. More in general, the LMS method [31] uses the power transformation family of Box and Cox to allow the skewness of the measurement distribution, as well as the median and variability, to vary with age. Given the marked leptokurtosis shown by the observed data, we adopt an extension of the original method devised by Rigby and Stasinopoulos [32] which is based on the so-called Box-Cox t (BCT) distribution. This distribution is defined by a power transformation of the original variable leading to a shifted and scaled t distribution with degrees of freedom parameter τ. This produces a model with four parameters usually denoted by BCT(μ,σ,ν,τ). The parameters μ, σ, ν and τ may be related to location (median), scale (centile-based coefficient of variation), skewness (power transformation to symmetry) and kurtosis (degrees of freedom), respectively. The generalized additive model for location, scale and shape (GAMLSS) allows each of the parameters to be modeled as linear or non-linear, parametric or smooth nonparametric functions of explanatory variables such as gestational age. As mentioned above, we consider cubic polynomials to model μ, σ, ν and τ as a function of GA, adopting the (generalized) AIC criterion for model selection. Once a BCT-GAMLSS is fitted, we use the estimates (μ,σ,ν,τ) to obtain the centile curve y q at any level q Є (0,1) for the variable measured on the original scale, by adopting an explicit back-transformation; that is, by simply substituting the estimated values into Here tt ;q is the100qcentile of a standard t distribution witht degrees of freedom, and we have suppressed the dependence of model parameters on GA. From a diagnostic point of view, we first considered the worm plot [33], a tool to visualize how well a statistical model fits the data. It consists of a number of detrended normal Q-Q plots of the residuals, split according to pre-specified classes of gestational age. A model that fits the data well is characterized by "flat worms". The worm plot makes it possible to detect inadequacies in model fit within each specific class of GA. The fit within GA classes can also be further investigated by calculating Qstatistics to test for normality of the residuals within each group. More specifically, we assume K to be the number of GA classes and let {e k,i , i = 1,. . .,n k } be the residuals in group k, for k = 1,. . .,K. Statistics ðZ m k ; Z s k ; Z n k ; Z t k Þ are calculated from the residuals in group κto test whether they have mean 0, variance 1, skewness 0 and kurtosis 3. Finally, the Q statistics (Q μ , Q σ , Q ν , Q Г ) of Royston and Wright [34] are calculated as the within group sum of squares of the corresponding Z's. Significant Q's statistics indicate potential lack of fit for models describing parameters μ, σ, ν and τ respectively.

Nonparametric quantile regression
Quantile regression [35] is a general non-parametric approach which can be used to estimate age-specific reference intervals. The QR model is defined to minimize the following objective function LðbÞ ¼ S j r q ðy j À bÞ; over the scalar β, where ρ q(u) = (q-I(u<0)) is the so called "check function" and I(A) denotes the indicator function of the set A. The optimal solution β corresponds to the q th sample quantile of the observed data {y 1 ,. . .,y n };that is b ¼ F À1 y ðqÞ with F y (Á) being the cumulative distribution function of y. This basic framework has been extended in many directions. For what concerns the present study, the most relevant, consists in replacing β with a smooth, possibly shape-constrained, function g(Á) of some explanatory variables, adding a suitable smoothing penalty term to the objective function L(Á). In the present work, in order to reduce the variability of the estimated reference curves and to avoid any artifact due to the scarcity of data at earlier or later gestational ages, we first fitted a non-decreasing smooth quantile function of the gestational age as implemented in the quantreg package. Due to the total variation smoothing penalty adopted, this procedure results in piecewise linear curves with knots at the observed data. To get a smoother-locally quadratic, for example-fit, we also adapted the iterative procedure proposed by Hee-Seok et al [36]to our shape-constrained framework. Diagnostics were again based on a recent adaptation of the worm plot to the QR setup [37], whereas uniform and point wise confidence bands have also been provided to better understand how variable (and consequently reliable) the obtained estimates are.
Five thousand one hundred and ninety-six fetuses (49%) were in a vertex presentation, and 5409 (51%) were in a breech presentation. Among fetuses in the vertex presentations, 1641 (31.6%) were examined vaginally, and 3555 (68.4%) abdominally. In five cases (3.5%) a combined approach was used. In both the vertex and breech group, there were 100% cases with successful views of the vermis. Baseline characteristics are shown in Table 1.
The mean CV height was 12.7mm (SD, 1.6mm; 95% CI, 12.7-12.8mm). The distribution of mean heights and standard deviations by gestational age (GA) is shown in Table 2.
The intra-observer variability showed a low average coefficient of variation, ranging from 5.6% to 6%; whereas, the inter-observer variability among the sonographers was very low. In particular, between 20 and 24 weeks of gestation, when most of the measurements were taken, the largest per week observed mean difference went from 0.4 to 0.8mm, corresponding to a modest relative variation among sonographers of 3.6% up to 5.9% respectively.
We now provide the results obtained by the three approaches previously described.

Normal-Based Reference Curves
To start with, we looked at the usual normal-based reference curves. The linear model selected by the Akaike Information Criterion consists of a cubic polynomial with the following regression equations: s GA ¼ À 1:219 þ 0:088 Â GA: All of the covariate effects were strongly significant, although this result has to be handled with care in light of the Anderson-Darling tests and the associated diagnostic plots shown in Fig 2. In fact, it is clear that, mainly due to the strong leptokurtosis shown by the data, and even after the usual log transformation, normality cannot be assumed over the whole gestational period. In order to get a better fit, in the next section, we discuss the results of the (polynomial) GAMLSS model based on the BCT distribution.

GAMLSS and BCT Distribution
To effectively capture the evident kurtosis shown by the data, we considered a GAMLSS-BCT model with, cubic polynomials to model μ, σ, ν and τ as functions of GA. The model selected by the (generalized) Akaike Information Criterion was then specified by the following set of regression equations:   outliers. In conclusion, we can say the BCT model provides a very good fit for the data at hand, and we can confidently use the fitted values (μ,σ,ν,τ) to obtain the centile curves at different levels q Є (0,1) of interest.

Nonparametric Quantile Regression
We now consider a shape-constrained (nondecreasing) spline-based quantile regression model. The introduction of specific shape information (e.g. convexity and monotonicity) to drive the fit and reduce its overall variability, was essentially needed to avoid unrealistic behaviors of the more extreme centile curves induced by the scarcity of data at later GAs. As mentioned above, we examined a piecewise linear fit provided by the quantreg package and a smoother (quadratic) alternative. Figs 7 and 8 show the resulting centile curves and compare them with the standard normal-based solution. The (generalized) worm plots were associated with the fitted linear and the quadratic quantile regression models. Regarding the GAMLSS solution, even in this case, there was no strong evidence of a lack of fit.

Comparative Tables
In Tables 3-6, we first summarize the estimated reference curves for the normal-based method, the piecewise linear quantile regression, the "smooth" quantile regression, and the BCT GAMLSS centile. Then, in Table 7, we provide the median absolute deviation for the predictions obtained at different GAs and quantiles by these four techniques in order to quantify their overall agreement. In this study, we established new reference charts for fetal CV height deriving from the largest prenatal prospective cross-sectional study ever performed. In this paper, we provide a comparison of different statistical methods to evaluate the robustness of the proposed reference charts to be used when making various distributional assumptions on the ultrasound measurements. Although over the years a variety of strategies have been published concerning the construction of reference charts [38], unreliable methods have still been used for fetal measurements of all biometric parameters [30] and to the best of our knowledge, few studies have reported reference ranges for the fetal CV [22][23][24] [39]. Infact, the choice of reference charts and equations from a sample which is as similar as possible to the screened population may have a significant impact on the quality of obstetrical practice and patient counseling to assess observed fetal biometric parameters [40].Therefore, the analysis was based on different techniques, which produced substantially similar results (Table 7). From Figs 7 and 8, Tables 3 and 4 and Table 6, in fact, we can see the good agreement between the (strongly parametric) normal-based curves and the more data-driven quantile regression solution. This is particularly evident when we look between20 and 24 weeks of gestation when most of the measurements were performed. Considering the balance between easier techniques and more flexible statistical methods, this suggests that, in spite of its evident lack-of-fit, reference tables produced by the simple normalbased technique are definitively reliable when adopted during the standard sonographic screening weeks. On the other hand, when the focus is on extreme centiles associated to later gestational weeks when data are scarce, it is definitively more advisable to rely on reference intervals produced by intrinsically robust methods such as the shape-constrained nonparametric quantile regression model adopted in this work.
Naturally, one weakness to consider is the relatively low number of observed fetuses, especially at some GAs (after 33weeks). This could be due to a number of factors. The first and most important regards the fact that in our country the Termination Of Pregnancy can be performed up to 23 weeks of gestation. Therefore, it seems unwise to perform a measure late in gestation. The second consideration is that the diagnosis of CV anomalies must be performed after 19 weeks of gestation. However, considering the absolute number of observation after 24 weeks of gestation and stratified according to gestational age, our study continues to have the largest sample size when compared with other studies. From a statistical point of view, this observation imbalance between gestational weeks is clearly reflected by the behavior of the (approximated) confidence bands around the estimated centiles shown in Figs 7 and 8. Although rarely reported in data analyses for fetal biometry, from the perspective of good clinical practice, confidence bands are crucial in order to understand how reliable/variable the adopted reference intervals are. To take an extreme example, by looking at the 1% centile The importance of adequate reference ranges arises from the concept that evolutive lesions may affect an apparently normal CV, since vermian pathologies may develop late in pregnancy or even two years postnatally [1]. This condition leads to a difficult differential diagnosis of the various vermian abnormalities [41] and more recently a new theory has been put forward suggesting that the fact that the CV develops in a ventro dorsal direction rather than in a craniocaudal direction means that the definition of "inferior vermian hypoplasia"should be considered incorrect. Infact, both MRI an ultrasonografic studies demonstrate a linear growth of superior and inferior lobes throughout fetal life [42] [43]. However, by 18 weeks the communication between the fourth ventricle and the cistern magna is covered. Therefore, the diagnosis of vermian agenesis (especially the partial form) cannot be made prior to 18 weeks' gestation [44] [45].Hence establishing the proper growth pattern of the fetal CV is important, since it has been demonstrated that the spectrum of dysgenic abnormalities of the cerebellum is expansive, ranging from subtle to important malformations. A disturbance of the developing process of the CV may lead to complete agenesis or hypoplasia. The most common remarkable CV abnormality is the DWM which is characterized by complete agenesis or hypoplasia of the CV, dilatation of the 4 th ventricle and a superior displacement of the tentorium [46]. DWM is a rare brain defect which can be diagnosed both in the prenatal and neonatal period and its exact prevalence is not easy to detect(approximately 1:30.000 births) [12], since its prenatal diagnosis is difficult and its outcome could be favorable also in cases with vermian hypoplasia [47]. Moreover DWM is associated in 86% of cases with other fetal abnormalities [48] and with CNS defects in 13-67% of which the corpus callosum agenesis is the most recognized malformation [49] [50].The neonatal neurological outcome of DWM is well described and the prenatal diagnosis of the complete CV agenesis form is feasible. However, the diagnosis of the DWM with a hypoplastic form or the isolated hypoplastic formis very difficult since evident signs could not be determined in the posterior fossa like in the complete form [51] [52]. Viñals et al. [23] and more recently Zhao et al [53]demonstrated a reliable method to measure the height of the CV with a 3D imaging reconstruction, although comparison of 2D with 3D imaging suggests that the quality of 2D is superior and that 3D, while easy to perform, does not overcome all the limitations of 2D imaging [54].
Over the past 20 years, magnetic resonance imaging (MRI) has gained considerable importance in the evaluation of the fetal brain, usually as a complementary tool used after detection of abnormalities with ultrasound. [55]. Some authors suggest that fetal MRI studies introduce some bias in the evaluation of its sensitivity and specificity in the detection of CNS defects, concluding that ultrasound could detect CNS defects with the same accuracy especially before 25 weeks of gestation [56]. Today we can affirm that no studies with high clinical evidence have yet demonstrated the higher accuracy of MRI compared to US in the evaluation of CNS anomalies in early pregnancy.
In 2001, Malinger et al [39]provided the measurement of height of the CV in normal prenatal development, obtained transvaginally in 101 fetuses, but they provided no information regarding the statistical method used in the study. In 2004, Achiron et al [42]established  construction of the curves. In general, all of these studies have some common methodological weaknesses: a selected population, a relative low number of observed fetuses especially in some gestational ages and a no description of the statistical methods used.  In contrast to the previously published studies, we focused our attention on the statistical methodology used for the analysis of data, such as an adequate sample size (a reduced sample size will produce imprecise estimates for the reference intervals, especially on the extreme centiles), an unselected population (because reference data should relate to "normal" fetuses. Altman and Chitty [38] suggested that it is reasonable to exclude fetuses subsequently found to have a congenital abnormality, though they recommend the inclusion of neonatal deaths and fetuses large or small for dates at birth), the kind of the study (cross-sectional) and a detailed description of the statistical method (the method needs to produce reference centiles which change smoothly with gestational age and provide a good fit to the data, while maintaining, for the sake of general usability and accessibility, as simple a statistical model as possible).

Conclusions
This is the first prospective cross-sectional study on CV height with such a large sample size and following standard statistical methods to calculate the fetal biometric reference charts. The reference interval charts we propose for fetal height of CV and corresponding reference equations have a major clinical relevance since they provide sonographers new reference equations in obstetrical practice.