No Reliable Association between Runs of Homozygosity and Schizophrenia in a Well-Powered Replication Study

It is well known that inbreeding increases the risk of recessive monogenic diseases, but it is less certain whether it contributes to the etiology of complex diseases such as schizophrenia. One way to estimate the effects of inbreeding is to examine the association between disease diagnosis and genome-wide autozygosity estimated using runs of homozygosity (ROH) in genome-wide single nucleotide polymorphism arrays. Using data for schizophrenia from the Psychiatric Genomics Consortium (n = 21,868), Keller et al. (2012) estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is autozygous (β = 16.1, CI(β) = [6.93, 25.7], Z = 3.44, p = 0.0006). Here we describe replication results from 22 independent schizophrenia case-control datasets from the Psychiatric Genomics Consortium (n = 39,830). Using the same ROH calling thresholds and procedures as Keller et al. (2012), we were unable to replicate the significant association between ROH burden and schizophrenia in the independent PGC phase II data, although the effect was in the predicted direction, and the combined (original + replication) dataset yielded an attenuated but significant relationship between Froh and schizophrenia (β = 4.86,CI(β) = [0.90,8.83],Z = 2.40,p = 0.02). Since Keller et al. (2012), several studies reported inconsistent association of ROH burden with complex traits, particularly in case-control data. These conflicting results might suggest that the effects of autozygosity are confounded by various factors, such as socioeconomic status, education, urbanicity, and religiosity, which may be associated with both real inbreeding and the outcome measures of interest.


Introduction
Close inbreeding (e.g., cousin-cousin mating) is known to decrease fitness in animals [1] and to increase risk for recessive Mendelian diseases in humans [2], a phenomenon known as inbreeding depression. Inbreeding depression is thought to occur due to evolutionary selection against genetic variants that decrease fitness-e.g., variants that increase risk of disorders [3]. Such fitness-reducing variants should not only be more rare, but also more recessive than expected under a neutral evolution model (i.e., show directional dominance). If so, individuals with a greater proportion of their genome in autozygous stretches (two homologous segments of a chromosome inherited from a common ancestor identical by descent [IBD]) should have higher rates of disorders. This is because autozygous regions reveal the full, harmful effects of any deleterious, recessive alleles that existed on the haplotype of the common ancestor.
Whether inbreeding increases risk for complex disorders like schizophrenia is less clear. Previous studies have found that inbreeding is associated with higher rates of complex disorders [4][5][6][7][8][9]. However, sample sizes have typically been small and the possibility that confounding factors might explain the results has left the links inconclusive. Moreover, close inbreeding accounts for fewer than 1% of marriages in industrialized countries [10], and information on pedigrees going back many generations is difficult to collect reliably. For these reasons, investigators have recently begun looking at signatures of very distant inbreeding (e.g., common ancestry up to~100 generations ago) using genome-wide single nucleotide polymorphism (SNP) data in an attempt to understand whether autozygosity increases the risk to schizophrenia and other complex diseases [11]. Autozygosity in SNP data is typically inferred from runs of homozygosity (ROHs): long, contiguous stretches (e.g., > 40) of homozygous SNPs. The proportion of the genome contained in such ROHs, Froh, can then be used to predict complex traits [12][13][14][15][16][17][18][19]. Keller et al. [11] showed that Froh is the optimal method for detecting inbreeding signals that are due to rare, recessive to partially recessive mutations, such as those thought to occur when traits are under directional selection [3]. The low variation in Froh means that large sample sizes (e.g., >12,000) are required to uncover realistic effects of distant inbreeding on complex diseases in samples unselected for inbreeding [11].
In 2012, Keller et al. [20] used the original Psychiatric Genomics Consortium schizophrenia data (17 case-control datasets, total n = 21,831) to investigate whether Froh is associated with increased risk of schizophrenia. The authors estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is contained in autozygous regions (β = 16.1, CI(β) = [6.93, 25.7], p = 6x10 -4 .) This was by far the largest study to that date examining the association between Froh and any psychiatric disorder, and the significant relationship between Froh and case-control status remained robust through secondary analyses of various covariate combinations, common vs. rare IBD haplotypes, and SNP thresholds used to define ROHs. These results are consistent with the hypothesis that autozygosity causally increases the risk of schizophrenia. Nevertheless, because various confounding factors may increase likelihood of distant inbreeding as well as the probability of having offspring with schizophrenia, these results do not imply a causal relationship. For example, parents higher on schizophrenia liability may pass their higher liability to offspring and mate with more genetically similar partners (e.g., due to decreased mobility, educational opportunities, etc.).
The current study seeks to provide a well-powered, independent replication of Keller et al. (2012) [20]. In light of the growing concern about publication bias [21,22] and dearth of wellpowered replications [23,24], this follow-up analysis is a necessary step in validating the Frohschizophrenia relationship. The present study used genome-wide SNP data from 22 independent schizophrenia case-control datasets (n = 39,830) from the PGC [25] to further examine the relationship between Froh and schizophrenia. Our replication attempt is an important contribution to the growing body of literature examining autozygosity and psychiatric disorders, and should help verify whether autozygosity estimated from ROHs is robustly related to schizophrenia risk and, by extension, can help elucidate whether schizophrenia risk alleles are biased, on average, toward recessive effects.

Results
SNP data from 28,985 schizophrenia cases and 35,017 controls were collected as detailed in Ripke et al. [25]. Quality control (QC) and analyses were conducted separately for the original and replication datasets. The "original" dataset included subjects from the PGC's SCZ1 [26] samples used by Keller et al [20] (n = 21,868 after QC), and the "replication" dataset contained all subjects (n = 39,830 after QC) in the PGC SCZ2 [25] samples not included in the original Keller et al. study, making the replication dataset independent of the original dataset analyzed in Keller et al. Despite the number of imputed SNPs ranging from~1.8 million to~4.2 million in the datasets, there were not enough well imputed SNPs in common across all 22 datasets to conduct a viable ROH analysis in the same way as in the original study (see Methods). Nevertheless, Keller et al. also reported results from ROHs estimated from unimputed SNP data, and these results were highly consistent with imputed SNPs. Therefore, our primary analyses were conducted using post-QC, unimputed genotype data. We also report results on imputed SNPs (see S5-S12 Figs and S1 Table) using slightly different QC procedures than used in the original report (see Methods), which do not change the conclusions below. While ROHs from the imputed data were called from a common SNP set, ROHs from the unimputed data were called on unique sets of SNPs for each dataset.
Keller et al. [20] found that all ROH length thresholds were significantly associated with schizophrenia, but because ROH thresholds are ultimately arbitrary, they focused their discussion on the thresholds (e.g., 110 consecutive homozygous SNPs in the unimputed data) that maximized the schizophrenia-ROH relationship. In an attempt to follow as closely as possible the method used by Keller et al., we report two sets of ROH results. The first approach-a direct replication attempt of Keller et al.-defined ROHs as being ! 110 consecutive homozygous SNPs in a row (with median Mb ranging from~1 to~3.4 Mb, depending on sample) in the unimputed data. Because using unimputed SNP data introduces large differences in mean ROH length across datasets (when defined by number of consecutive homozygous SNPs) due to varying SNP densities, we also employed a secondary replication approach using a 2.3 Mb minimum length threshold that corresponds to 110 SNPs-in-a-row average length in the original report. As in the original report, we also show results across all thresholds to ensure that no results were missed. Table 1 gives the descriptive statistics for average ROH lengths and Froh across datasets, where ROHs were defined as ! 110 consecutive homozygous SNPs. There was wide variation in average Froh and ROH lengths between datasets, a consequence of using unimputed SNP data, which introduces more between-dataset variability in Froh and mean ROH length [20]. Across datasets, mean Froh was also higher (0.30% vs. 0.14%) and average ROH lengths shorter (1.1-3.4 Mb vs. 2.0-4.7 Mb) in the replication versus original datasets. Part of the reason for the mean Froh discrepancy seemed to be due to replication datasets being genotyped on denser SNP chips, because this discrepancy reduced when we defined ROHs as ! 2.3 Mb homozygous SNPs (0.22% vs. 0.13%; Table 1). The remaining higher average Froh in the replication datasets appears to be due to more samples being from countries with higher overall Froh (e.g., Sweden, Estonia, Israel) in the replication datasets; the average Froh levels were very similar across replication vs. original datasets within the same countries.

ROH burden results
For each dataset, we regressed case-control status on Froh using mixed effects logistic regression treating dataset as a random factor, and controlled for 20 principal components (PCs) from the genomic relationship matrix [27] and two SNP quality measures (excess heterozygosity and SNP missingness; see Methods). In Keller et al. (2012), the authors used mixed effects models to test the ROH burden association with schizophrenia. However, in the current analysis we used fixed effect logistic regression models, treating dataset as a fixed, because a minority of the mixed effects models failed to converge. When the mixed effects models did converge, the results were highly similar to the respective fixed effect models. Figs 1 and S1 show the predicted change in odds of schizophrenia risk (and 95% confidence intervals) for every 1% increase in average Froh for each logistic regression in the replication data using ROHs defined by either !110 consecutive homozygous SNPs (Fig 1) or ROH length ! 2.3 Mb (S1 Fig). The overall association between schizophrenia and Froh in the replication data was in the predicted direction but not significant for ROHs defined as at least 110 consecutive homozygous SNPs (β = 0. 19 The results from analyses on ROHs called from imputed rather than raw SNP data were also non-significant (S5 Fig). As in Keller et al., we also explored increasingly long SNP and Mb ROH thresholds to assess the stability of the Froh-schizophrenia relationship (Figs 2 and 3). Across all thresholds, the only thresholds that approached significant associations between Froh and schizophrenia in the replication data were at the upper limits of the Mb-length ROH thresholds; the strongest association was for ROHs defined as ! 19 Mb (β = 8.64, CI(β) = [−0.85,18.13], Z = 1.78, p = 0.07).
We conducted a series of follow-up analyses to ensure that the failure to replicate our original report was not due to analytical error, inclusion of outlier individuals or datasets, or suppressing covariates in the replication data. We reran the same analyses described above on SNP data from the "original" report using the exact same quality control and analytic procedures performed on the replication data. Results were virtually identical to those obtained in  Table). We noticed that there was greater variability in Froh in the replication datasets and that this greater variability was mostly driven by replication datasets that had n < 300. Under the premise that smaller samples might differ in genotypic or phenotypic quality, we excluded seven samples that contained fewer than 300 cases ("egcu", "ersw", "lie2", "pews", "top8", "umes"), reran our baseline analysis (including all covariates mentioned above and using an ROH threshold of ! 110 consecutive homozygous SNPs), but still observed a non-significant Frohschizophrenia relationship (β = 1.04, CI(β) = [−3.88,5.96], Z = 0.42, p = 0.68) in the predicted direction. Therefore, this post-hoc analysis does not lend support to the possibility that small samples in the replication set added noise to our analysis, obscuring an Froh-schizophrenia relationship.
Although results from the replication analysis were not significant, they were in the same direction as the original analysis. It could therefore be argued that the best estimate of the association between ROHs and schizophrenia is obtained by combining the two datasets. When we reran our analyses on the combined original + replication data (n = 61,661), all Froh associations based on ROH thresholds greater than 60 consecutive homozygous SNPs or longer than In this combined dataset, we also used a replication status-by-Froh interaction to conclude that the Froh-schizophrenia association was only

The effects of close versus distant inbreeding
To assess the relative importance of distant versus close inbreeding, we compared the effects of short versus long ROHs. As in the original study, we chose our ROH length threshold based on the Mb length cutoff that resulted in equal Froh variances, calculating Froh_short as the proportion of the genome contained in ROHs caused by autozygosity arising from more recent common ancestors, which predicted increased risk for schizophrenia (Fig 6).

Discussion
Despite exploring various homozygous SNP length thresholds, Mb thresholds, and combinations of covariates, the findings from this study do not lend much support to the original observation of a highly significant Froh-schizophrenia association [20], and provide only equivocal support, based on combining the original and replication data, for the hypothesis that autozygosity is a risk factor for schizophrenia.
Perhaps the simplest explanation for this pattern of results is that the conclusions about distant inbreeding from the original data represent a type-I error or that the lack of replication in the current report was a type-II error. Despite the fact that the effect in the original study was highly significant (p = 6x10 -4 ) and the statistical power in the replication study to detect the observed effect size in the original study was nearly 100%, it is possible that the estimated effects of the original analysis could have been over-estimated and/or those of the replication analysis under-estimated, due to sampling variability. There is some support for this interpretation, as there was not a significant difference in results between replication versus original datasets (interaction p = 0.07).
An alternative explanation for the overall pattern of results has to do with the potential influence of unmeasured confounding factors in both the original and replication analyses. Unlike genotype frequencies, which change very slowly and are unaffected by inbreeding, ROH levels can change substantially after even a single generation of inbreeding, making ROH analyses highly susceptible to confounding factors associated with both disease risk and the degree of inbreeding/outbreeding. For example, contrary to initial predictions, Abdellaoui et al. [28] identified a significant and negative ("protective") relationship between Froh and risk for major depressive disorder (MDD) in the Dutch population. However, the authors found that religiosity was significantly associated with both higher autozygosity and lower MDD in this population. When religiosity was accounted for in their regression model, the original association between MDD and Froh disappeared. A similar effect was detected for educational attainment: highly educated individuals were more likely to migrate and mate with highly educated and more diverse partners, making highly educated spouse pairs share less ancestry and leading to their offspring having lower Froh [29]. Thus, assortative mating on variables such as education or religion could subtly influence observed Froh associations, potentially affecting results in ways that can be difficult to account for. For example, an observed Froh-schizophrenia relationship could be due to parents with a higher schizophrenia liability mating with less genetically diverse mates due to, e.g., fewer educational opportunities or lower migration rates. Thus, the causation may be reversed: schizophrenia liability in parents could cause not only higher schizophrenia risk, but also higher Froh, in offspring rather than Froh in offspring increasing their schizophrenia liability. Such reverse and third variable causation possibilities can only be tested if relevant socio-demographic variables in subjects and (optimally) their parents are collected.
The possibility of unmeasured variables confounding Froh-disorder relationships seems particularly likely in analyses conducted on ascertained samples. Ascertainment of cases and controls not perfectly matched on socio-demographic factors that might affect degree of outbreeding (e.g., socioeconomic status, education level, age, religion, urbanicity) can mask any true Froh association and bias the observed association in either direction. Such a scenario might explain otherwise contradictory findings in previous ROH case-control analyses [18,28,[30][31][32][33][34][35][36]. For example, following two studies showing that genome-wide autozygosity was significantly associated with schizophrenia risk, including the original Keller et al. study [13,20], two newer studies failed to replicate this association [34,35], although both replication sample sizes (n = 3,400 and 11,244 respectively) were substantially smaller than the current one (n = 39,830). (It should be noted that the sample used in the latter study [36] overlapped with the samples in both the original Keller et al. [20] study and the current replication study). Even within the same study, Froh results in ascertained samples have been inconsistent. Using PGC MDD data, Power et al. [36] found a significant positive Froh-MDD relationship in data from three German sites but a significant negative Froh-MDD relationship in six non-German sites. A possible explanation for this and other such examples of heterogeneity across sites they observed is that cases and controls differed on socio-demographic factors that were associated with Froh, and the direction of this ascertainment bias was inconsistent across data collection sites.
We believe that similar ascertainment biases could have affected results in the present study as well as in the original Keller et al. [20] report. Many of the PGC schizophrenia datasets used cases ascertained from hospitals, clinics, health surveys, and advertisements but controls from previous biomedical research volunteers, university students, blood donors, and population registries. While such differences in ascertainment between cases and controls are highly unlikely to lead to allele frequency differences, and thus are of little concern to genome-wide association studies, they could very easily lead to Froh differences due to differences in degree of inbreeding/outbreeding in the populations from which cases and controls were drawn. Controlling for ancestry principal components in this case would only help to the degree that degree of inbreeding/outbreeding is associated with ancestry. Unfortunately, none of the other variables that might statistically control for such biases due to differences in case/control ascertainment are currently available in the PGC data collection. The PGC collection of studies was designed for association analyses; it was not optimally designed for ancillary purposes, such as ROH analyses.
It is important to recognize that even ascertainment biases that differ at random across sites would substantially inflate type-I error rates because the proper degrees of freedom for the test should be closer to the number of independent sites rather than the number of independent cases and controls. To demonstrate this, we permuted data under the null hypothesis of no relationship between Froh and schizophrenia in the 17 datasets from the original 2012 study by randomly flipping case or control status within each dataset for each permutation (e.g., cases and control statuses in a dataset either remained the same or were flipped to the opposite status). We then calculated the overall Froh~schizophrenia relationship with the same logistic regression model and using the same covariates as in the original analysis. Across 1,000 permutations, 183 p-values were significant (p < 0.05), implying a type-I error rate of 0.18 and demonstrating how false conclusions about Froh relationships can be reached even when ascertainment biases are random across multiple sites.

Conclusion
Given concerns about the false discovery rate in science [22], there has been increasing emphasis on the need for well-powered, direct replications of novel findings in genetics [23,37,38] and other fields [39][40][41]. The current study was a well-powered, direct replication attempt that failed to replicate an earlier finding that autozygosity arising from distant common ancestors was significantly associated with schizophrenia. As is typical with null findings, it is difficult to identify the reason for this failure to replicate. However, we have argued that a likely cause is that ROH associations are highly susceptible to confounding, especially in case-control (ascertained) samples. Thus, we believe that the conclusions of the original study were premature and the true causal relationship between schizophrenia and autozygosity could be either stronger/more positive (if the populations from which controls were ascertained were, on average, slightly less outbred than populations from which cases were ascertained) or weaker/more negative (the reverse) than reported here. Unfortunately, we do not have the ability to test these hypotheses directly in the current datasets, and doing so awaits either new samples in which cases and controls are carefully matched or the collection of information that allows potential confounders to be statistically controlled. This creates a dilemma for ROH analyses using existing case-control genome-wide data: GWAS datasets usually do not match cases and controls to the degree necessary to rule out confounding effects on ROH analyses and typically do not collect the relevant socio-demographic information necessary to control for potential confounders. The current study therefore serves as a cautionary tale for analyzing ROHs in existing ascertained GWAS datasets. Such datasets may be perfectly adequate for their designed purpose-GWAS-but may be problematic and even misleading for ROH analyses.

Psychiatric Genomics Consortium GWAS Data
Our study used 37 datasets from the Psychiatric Genomics Consortium's SCZ2 data-these data included 28,985 schizophrenia cases and 35,017 controls, collected from 37 sites in 13 countries. Data collection and ascertainment details are described elsewhere. [25] Keller et al. [20] used 17 datasets from the PGC SCZ1 [26] data. Several of these original 17 studies recruited additional subjects by the time of our study, necessitating two well-defined, independent datasets: one including all of the individuals analyzed in the original 2012 study ("original" dataset), and one containing only subjects not included in Keller et al. 's 2012 report (the "replication" dataset, comprised of 22 studies and a total sample size of 18,562 cases and 21,268 controls after QC; see Table 1). Three of the original case-control datasets from the PGC's SCZ1 added more subjects and/or controls in SCZ2, but only two of these datasets had enough subjects to pass QC and merit inclusion in the current study-thus there is a "top8" dataset (N = 180) in this replication study, comprised of the samples that were added to the "top3" dataset (N = 598) from the original 2012 study, and a "boco" dataset (N = 1,870), which includes the new cases and controls that were added to the original "bon" dataset (N = 1,778). For consistency with the original Keller et al. (2012) study [20], we excluded the three familybased datasets of parent-proband trios and three East Asian datasets.

Quality Control (QC) Procedures-Raw SNP Data
We followed the same QC procedures as Keller et al. [20]. We removed a) one individual from any pair of individuals who were related withp >0.2, b) individuals with non-European ancestry as determined by principal components analysis; c) samples with SNP missingness >0.02; or d) samples with genome-wide heterozygosities >6 standard deviations above the mean. SNPs were excluded if they a) deviated from Hardy-Weinberg equilibrium at p<1×10 −6 ; b) had missingness >0.02; or c) had a missingness difference between cases and controls >0.02.

QC Procedures-Imputed SNP Data
Early in the analysis process, we found that only including SNPs with imputation dosage r 2 > .90 across all datasets, as was done in the original study [20], left us with too few SNPs with which to conduct viable ROH analyses in the replication data. Because having ROHs of similar length and SNP density is important for comparing present results to those from the 2012 study, we decided that having a similar number of SNPs to Keller et al. [20] was more important than following the exact same QC procedures. Thus, to arrive at a similar number of genomewide SNPs in the new and old datasets, some of the QC measures described below were different than in the 2012 investigation.
SNPs were imputed using the 1000 Genomes reference panel [42]; imputation procedures are described elsewhere [25]. Imputation dosages were converted to best-guess (highest posterior probability) SNP calls because ROH detection algorithms require discrete SNP calls, and extremely stringent QC thresholds were employed to achieve accuracy rates similar to those in genotyped SNPs [43]. We excluded any imputed SNPs that were not included in the HapMap3 [44] reference panel, as done in the 2012 study. Unlike the original QC procedures, we did not require that the dosage r 2 had to be > .90 in each individual datasets. We excluded any imputed SNPs that had a dosage r 2 <0.98 or >1.02 in the overall sample (calculated using average dosage r 2 weighted by sample size) or that had MAF<0.15 within each sample (vs. .05 in original), leaving 340,084 high-quality imputed SNPs (vs. 398,325 in original).

ROH Calling Procedures
Again, we followed the same ROH calling procedures as in Keller et al [20]. As recommended in a separate investigation [45] by three of the authors of the present study, we chose PLINK software [46] for its computational efficiency and superior detection of autozygous stretches. As in the 2012 study, we pruned for LD using PLINK's-indep flag, which ensures more uniform SNP coverage across the genome and reduces false autozygosity calls by removing redundant markers. We pruned SNPs for LD using a VIF threshold of 10, which is equivalent to multiple R 2 > 0.90 between the focal SNP and the 50 surrounding SNPs.
We called ROHs using PLINK's-homozyg flags, defining initial ROHs as being !40 homozygous SNPs in a row with no heterozygote calls allowed. We required that ROHs have a density greater than 1 SNP per 200 kb, and split an ROH into two if a gap >500 kb existed between consecutive homozygous SNPs. We then post-processed the initial ROH calls by altering the SNPs-in-a-row threshold and the Mb length threshold; specifically, we looked at ROH calls with a minimum of 40 to 200 consecutive homozygous SNPs in increments of 10, and ROH calls with minimum lengths ranging from 1 to 20 Mb by increments of 1 Mb. We varied ROH thresholds this widely to ensure that no potential effects of autozygosity were missed, but the primary results presented here are based on two replication attempts in the unimputed data: (a) using the same SNP thresholds that gave the most straightforward comparison with the original report (this was 110 SNPs-in-a-row for the unimputed data, spanning~1 to~2.1 Mb in the replication datasets, and 65 SNPs-in-a-row for the imputed data), and (b) using the physical length threshold (2.3 Mb) that corresponded to the average Mb length for 110 SNPsin-a row in the original report.

ROH Burden Analysis
After calling ROHs, we summed the total length of all autosomal ROHs for each individual and divided that by the total SNP-mappable distance (2.77x10 9 bases) to calculate Froh. Froh, the proportion of the genome contained in long homozygous regions, was used as the predictor of schizophrenia case-control status in analyses described below. As confounding factors such as population stratification, SNP missingness, call quality, and plate effects can influence Froh, we included the first 20 principle components (based on a genome relationship matrix calculated from~30K LD-pruned SNPs), percentage of missing SNP calls in the raw data, and excess heterozygosity in all regression models [20]. We then regressed case-control status on Froh using a mixed linear effects logistic regression model (available in the lme4 package in R version 3.1.0), treating dataset as a random factor, to assess the overall effect of Froh on schizophrenia across all sites. Some of the models with random effects did not converge; thus, for consistency, we modeled dataset as a fixed factor for all analyses. The results from mixed linear effects models that converged were very similar to fixed effects models, giving us confidence that the fixed effects results of this analysis and the random effect results from the original Keller et al. (2012) study are commensurate. We also ran logistic regressions in each of the 22 datasets separately.

Ethics Statement
This research was approved by CU Boulder's Institutional Review Board with regard to protocol number 13-0266 on 3/29/2016 in accordance with Federal Regulations at 45 CFR 46. Written patient consent was obtained for each individual study by the study PI, with the exception of the "clm3" and "clo3" datasets, which obtained anonymous samples via a drug monitoring service under ethical approval and in accordance with the UK Human Tissue Act.
Supporting Information S1  Table. Results from follow-up analyses to ensure that failure to replicate was not due to inclusion of outlier individuals or datasets, or suppressing covariates in the replication data. data (colored) and for the total sample (black) from the unimputed SNP data. Boxes are proportional to the square root of sample sizes (also shown at the bottom). Dataset names are on the x-axis. (While the y-axis is cut off at 3 for clarity, it should be noted that the upper limit of the 95% confidence interval is 4.1 for the "muc" dataset and 5.4 for the "top3" dataset.) Only one of the individual estimated odds ratios significantly differ from one (the "muc" dataset), but the overall effect (black) is significant (Beta = 16.83, p = 0.000357.) (TIFF)