Skip to main content

Advertisement

PLOS Genetics

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Figure 1.

Distribution of rare variants per gene in Baylor and Broad data sets after filtering.
Minor allele counts (MAC) are restricted to variants with minor allele frequency . Panel (A), distribution of mean MAC per sample, averaged over all genes. Panel (B), in the Baylor samples, genes were binned based on the counts of rare variants (which range from 1 to 30); for each bin the vertical axis shows the distribution of counts (boxplot) from the same genes in the Broad samples. The red line indicates an equal count in Broad and Baylor.

More »

Figure 1.

Distribution of rare variants per gene in Baylor and Broad data sets after filtering.
Minor allele counts (MAC) are restricted to variants with minor allele frequency . Panel (A), distribution of mean MAC per sample, averaged over all genes. Panel (B), in the Baylor samples, genes were binned based on the counts of rare variants (which range from 1 to 30); for each bin the vertical axis shows the distribution of counts (boxplot) from the same genes in the Broad samples. The red line indicates an equal count in Broad and Baylor.

More »

Figure 2.

Theoretical power comparison: Meta versus Mega.
Theoretical power functions of meta- (red) and mega-analysis (blue) at significance level of . is the strength of signal per variant and is the number of rare variants. (A) ; (B) ; (C) ; and (D) .

More »

Figure 2.

Theoretical power comparison: Meta versus Mega.
Theoretical power functions of meta- (red) and mega-analysis (blue) at significance level of . is the strength of signal per variant and is the number of rare variants. (A) ; (B) ; (C) ; and (D) .

More »

Figure 3.

Simulation of power.
The empirical power comparisons of SKAT applied to Broad (blue), Baylor (green), and combined via mega- (red) and meta-analysis (orange). We use causal variants to generate the phenotype based on the model in Eqn. 1 with . Causal rate is the fraction of variants with , which varied from 20% to 50%. We choose weights and use SKAT to calculate the p-values for Baylor, Broad and merged data sets. We combine all singleton variants as a super-variant. For meta analysis, the weighted Z-score method combines the two p-values from Baylor and Broad for each gene. Panel (A) and the significance level is set at .001; in panel (B) and the significance level is set at .01.

More »

Figure 3.

Simulation of power.
The empirical power comparisons of SKAT applied to Broad (blue), Baylor (green), and combined via mega- (red) and meta-analysis (orange). We use causal variants to generate the phenotype based on the model in Eqn. 1 with . Causal rate is the fraction of variants with , which varied from 20% to 50%. We choose weights and use SKAT to calculate the p-values for Baylor, Broad and merged data sets. We combine all singleton variants as a super-variant. For meta analysis, the weighted Z-score method combines the two p-values from Baylor and Broad for each gene. Panel (A) and the significance level is set at .001; in panel (B) and the significance level is set at .01.

More »

Figure 4.

Q–Q plot of simulation tests under the assumption that linkage disequilibrium among rare variants has little impact on the distribution of the test statistic.
144 genes are selected from the Broad data set. Each gene has exactly rare variants, . For each gene, we first randomly assign the phenotypes for 913 samples based on a coin toss, then calculate the test statistics , and corresponding p-value computed under the assumption that . We repeat this 100 times per gene, to obtain more than 10,000 p-values.

More »

Figure 4.

Q–Q plot of simulation tests under the assumption that linkage disequilibrium among rare variants has little impact on the distribution of the test statistic.
144 genes are selected from the Broad data set. Each gene has exactly rare variants, . For each gene, we first randomly assign the phenotypes for 913 samples based on a coin toss, then calculate the test statistics , and corresponding p-value computed under the assumption that . We repeat this 100 times per gene, to obtain more than 10,000 p-values.

More »

Figure 5.

PCA from common variants, low frequency variants, and both types of variants.
Plotted are the first eigen-vector versus second eigen-vector for Broad samples. Eigen-vectors are obtained by applying PCA to all common variants that have no missingness (56,607 variants) (A), all low frequency variants that have no missingness (29,509 variants) (B), and both type of variants (C). The colors are obtained by clustering individuals based on their coordinates in panel (A) using model based clustering [51].

More »

Figure 6.

PCA for case (orange) and control (blue) samples.
Panels (A) and (B) plot the top two eigen-vectors for Baylor and Broad, respectively. Eigen-vectors are obtained by applying PCA to all common variants (CVs) that have no missingness (14,702 CVs used in Baylor and 56,607 CVs used in Broad).

More »

Figure 7.

Distribution of doubletons as a function of the eigen-map.
The first eigen-vector versus second eigen-vector for (A) Baylor and (B) Broad samples. Eigen-vectors are obtained by applying PCA to all common variants. For each individual, we count the number of doubletons. To indicate the relative number of doubletons per individual, points are color-coded as follows: black (bottom : fewest doubletons), blue (next 25), green (next 25), and orange (top 25: most doubletons) within the Baylor and Broad samples, respectively.

More »

Figure 7.

Distribution of doubletons as a function of the eigen-map.
The first eigen-vector versus second eigen-vector for (A) Baylor and (B) Broad samples. Eigen-vectors are obtained by applying PCA to all common variants. For each individual, we count the number of doubletons. To indicate the relative number of doubletons per individual, points are color-coded as follows: black (bottom : fewest doubletons), blue (next 25), green (next 25), and orange (top 25: most doubletons) within the Baylor and Broad samples, respectively.

More »

Figure 8.

Doubletons counts versus minor allele counts (MAC_c) in common variants (CVs).
MAC_c are computed for all variants with minor allele frequency . Panel (A) is the doubleton counts of Baylor cases versus MACs of CVs in the exome. Panel (B) is a zoomed in version of panel (A). Panel (C) is the doubleton counts of Broad cases versus MAC_c of CVs in the exome.

More »

Figure 8.

Doubletons counts versus minor allele counts (MAC_c) in common variants (CVs).
MAC_c are computed for all variants with minor allele frequency . Panel (A) is the doubleton counts of Baylor cases versus MACs of CVs in the exome. Panel (B) is a zoomed in version of panel (A). Panel (C) is the doubleton counts of Broad cases versus MAC_c of CVs in the exome.

More »

Figure 9.

Distribution of the genomic control factor .
By permuting case/control status 100 times the distribution of is obtained based on the 1000 largest genes. The red line shows the mean of the permutation distribution and the green line shows obtained from the data using (A) Broad SKAT p-values obtained without eigen-vectors; (B) Broad SKAT p-values, with common variants (CVs) eigen-vectors, (C) Broad SKAT p-values, with low frequency variants (LFVs) eigen-vectors; and (D) Broad SKAT p-values, with CVs plus LFVs eigen-vectors.

More »

Figure 9.

Distribution of the genomic control factor .
By permuting case/control status 100 times the distribution of is obtained based on the 1000 largest genes. The red line shows the mean of the permutation distribution and the green line shows obtained from the data using (A) Broad SKAT p-values obtained without eigen-vectors; (B) Broad SKAT p-values, with common variants (CVs) eigen-vectors, (C) Broad SKAT p-values, with low frequency variants (LFVs) eigen-vectors; and (D) Broad SKAT p-values, with CVs plus LFVs eigen-vectors.

More »

Figure 10.

-log10(observed p-values) versus -log10(expected p-values) of SKAT and Burden test for Mega-analysis.
Panel (A) shows SKAT p-values, Panel (B) shows burden test p-values. and 1.047, for mega SKAT and burden test, respectively.

More »

Figure 10.

-log10(observed p-values) versus -log10(expected p-values) of SKAT and Burden test for Mega-analysis.
Panel (A) shows SKAT p-values, Panel (B) shows burden test p-values. and 1.047, for mega SKAT and burden test, respectively.

More »

Table 1.

Genomic control and for all tests before and after PC adjustment.

More »

Table 1.

Genomic control and for all tests before and after PC adjustment.

More »

Table 2 — Table 2.

Number of significant genes (and expected number) under different filters.

More »

Table 3 — Table 3.

Counts of missingness per sample after filtering.

More »

Table 4 — Table 4.

Number of nominally significant genes before and after filtering.

More »

Table 5 — Table 5.

Counts of non-synonymous variants in Baylor and Broad before filtering.

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US