Multivariate phenotype analysis enables genome-wide inference of mammalian gene function

The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines. There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype–gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene–phenotype map with good sensitivity. A factor analysis of the MV model’s fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.

to focus this paper on the sex-genotype interaction effects themselves (for reasons we described in response to Reviewer Point P1.12 in your first set of comments) but adjusting for them is the approach we have now taken, and of course we should make that clear in the main Results section and in Supplementary Note 1.
With very best wishes, 1145 George Nicholson (on behalf of all co-authors)

Comments on revised version
Reviewer Point P 1.1 -The Methods contains some important results on robustness of the results, in particular the cross-validations to show that missing data are imputed satisfactorily. I suggest you move at least a summary of these analyses into the main Results, since you raise the 1150 problem of missing data in the introduction but don't directly address it in the main Results in the current revision. Line 230 near Fig 2 might be a good place to introduce this.
Reply: Thanks for this nice idea, which we have implemented exactly as you suggest. We have now moved the bulk of the missing-data discussion from Methods to join the missing data results in a new main Results section called Inference when data are missing, leaving only the technical details of the 1155 missing-data MV model in a Methods section called MV model when data are missing.
Reviewer Point P 1.2 -Explain clearly in the Results what the categorisation of calls as -1, 0, 1 represents (eg legend to Table 1) -this seems to be buried in the Methods Reply: Great spot, thanks, we have now added the following clarification to the legends of Tables 1 and 8: 1160 We represent calls by a number in {−1, 0, 1}, with 1 and −1 denoting significant positive and negative phenotypic perturbations respectively, and 0 denoting a lack of statistical significance.
Reviewer Point P 1.3 -Explain more clearly what the factor analysis means from a biological perspective. I took it to mean there are clusters of phenotypes that tend to get called in the same 1165 way (is that correct?). In the abstract they are described as "sparse factors" which has an obscure technical meaning, losing clarity.
Reply: Yes, your interpretation's correct. We have improved the clarity in the abstract, and removed the word "sparse", writing: A factor analysis of the MV model's fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data, and facilitate biological interpretation of perturbations. We have also improved the presentation in Section 3.10: An eigendecomposition of the MV model's fitted covariance structure (Σ pooled of (26) in Methods-Cross-validation and model averaging) shows that 75% of the correlation structure is explained by the first 20 eigenvectors; Supplementary Figure 8 plots the cumulative variance explained. We rotate these eigenvectors to a sparse, interpretable set of loadings, or factors, 1180 which are visualised in Figure 8(a). The important notion of sparsity in this context, illustrated in Figure 8(a), is that the vast majority of phenotypes at any particular factor have loadings close to zero (i.e. they are coloured green). Each factor therefore defines a small cluster of phenotypes that have large positive or small negative loadings. From a biological perspective, each cluster of phenotypes tends to be perturbed collectively by gene KOs.
By examining each cluster of phenotypes, and taking into account the signs of its loadings, we manually curate labels describing the biological interpretation of each factor. For example, the first factor defines a cluster according to negative loadings on Bone Mineral Content, Bone Area, Lean mass, Body length, and Heart weight; this factor is therefore labelled 'Body size Reply: Thanks for the helpful suggestion; we have now added UV and MV estimators and their confidence bands and updated the legend to describe the new features and also to answer your question on relative precision: yes, the MV are almost always more precise, while also being usually shrunken towards zero: 1200 Experimental design of the IMPC. Each point corresponds to one animal, with data from two KO lines-labelled g andg-displayed alongside contemporaneous data from a large number of control (wild-type, or WT) animals in grey (see legend). Panels (a) and (b) show data from phenotypes p (Triglycerides) andp (Body fat percentage) respectively. Our goal is to 1205 quantify the underlying expected perturbations of the red/blue coloured points from the rolling WT baseline (illustrated with a smooth black curve), in the presence of structured experimental noise. Annotated on the plot, to the right of the red/blue measurement data of each genephenotype pair, are the posterior mean estimates from the UV and MV models,θ UV pg with empty squares andθ MV pg with filled squares (see legend), along with error bars denoting ±2 posterior 1210 SDs. In the current paper we combine the information inθ UV pg across multiple related phenotypes, such as p andp, thereby generating improved estimatorsθ MV pg . The relative means and SDs of the UV and MV estimators shown in the plot are illustrative of their general properties-MV posterior means are shrunken towards zero (towards the black curve here) relative to the UV posterior means in 90.2% of cases (phenotype-gene pairs); while MV posterior SDs are smaller 1215 than UV posterior SDs in > 99.9% of cases.
Reviewer Point P 1.5 -In the abstract there is a mention of "7.5-fold increase in statistical power". It's not clear that this really means "There are 4,256 (1.4% of 302,997 observed-data measurements) hits called by the UV model, compared to 31,843 (10.5%) hits in the observed-data results of the MV model". Suggest you expand the text in the abstract.