Region-Based Association Test for Familial Data under Functional Linear Models

Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function ‘famFLM’ using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The ‘famFLM’ function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.


S2 Note. FLM-based association analysis under different ratios between m, K G and K β Necessary conditions for performance of FLM-based method
In the framework of a functional linear model (FLM), we fix the number of basis functions (K G to estimate genetic variant functions (GVF) and K β to estimate beta-smooth function (BSF)) depending on the function basis type which can be different for GVF and BSF.
To perform association analysis using Model (5), we must estimate unknown betas, β F , solving the equation where matrices Ω, G, and W have dimensions n × n, n × m and m × K β , respectively. Inverting Therefore to unambiguously evaluate β F , we must restrict the number of basis functions, K β , as To introduce additional, stricter restrictions preventing an over-parameterization in Model (5), we present a matrix W of dimension m × K β as a product of two matrices W 1 and W 2 : Then the matrix W T G T Ω -1 GW in (S1) is transformed into W 2 T W 1 T G T Ω -1 GW 1 W 2 . The additional restrictions must be imposed on dimensions of the matrices W 1 and W 2 . Inverting the As a result, we must firstly restrict the number of basis functions for smoothing the genotypes, K G , and then for smoothing the betas, K β : respectively. Therefore, if the declared number of the basis functions, K G , used for GVF is more than m, K G has to be reduced to m. Next, if the declared number of basis functions, K β , used for BSF is more than K G , K β has to be reduced to K G .

Equivalence of several functional linear models
In general, the functional linear models under consideration are described as and an F-test statistic for them is calculated as The difference between the models with and without the smoothing on the GVF is in the way the matrix W is constructed. For all models using both the GVF and the BSF (i.e., F-F, B-B, F-B and B-F models), the matrix W is formed as product of two matrices W 1 and W 2 , defined by expressions (S2)  For the models with both the GVF and BSF, the matrix P looks as: If m > K G and K G = K β , the matrix W 2 is invertible, and hence it can be canceled in the expression for P (S5). Moreover, decomposing W 1 into Φ and (Φ T Φ) -1 (see formula (S2)) allows us also to cancel the matrix ( T ) -1 in the expression for P (S5). As a result, the matrix W is reduced to the matrix . Therefore, the models within groups (0-F, F-F, and F-B) and (0-B, B-B and B-F) do not differ from each other because the matrix  for the models with the GVF and the matrix W for the models without the GVF are identical.
If m  K G and K G > K β , the matrix W 2 is not invertible (because it is not square) and it cannot be canceled in the expression for P (S5). As a result, all the models are different because matrices W in the models with both the GVF and the BSF and the models with only the GVF are constructed differently.
In our study, we fixed the number of basis functions as 25 for Fourier series and as 15 for B-splines. Therefore:  The F-B model is equivalent to none of the used models.
 In the B-F model, K G (originally declared as 25) is automatically reduced to 15 to meet condition K G  K β . This model is equivalent to the models 0-B and B-B.
In summary, the set of all six models used in our study is reduced to the set of three models being different from each other: 0-B (identical to B-B and B-F), 0-F (identical to F-F) and F-B.

Analysis of regions with small number of genetic variants
For the situations, when the number of genetic variants is small (smaller than the number of basis functions), to avoid model over-parameterization, we imposed restrictions on the number of basis functions, reducing K G and K β to the number of variants in a region of interest.
As a result, if K G and K β become equal to m, the functional linear model (5), described by the equation (S4), is reduced to a more simple linear mixed model (1), described by the equation because the matrix W becomes invertible and it can be canceled in the expression for P (S5).
Therefore, the benefit of the proposed method compared to simple approach based on linear mixed model exists only for regions with the large number of variants.