Retroviral Integration Process in the Human Genome: Is It Really Non-Random? A New Statistical Approach

doi:10.1371/journal.pcbi.1000144

Figure 1.

Distribution of Moloney Leukemia Virus (MLV) and Simian Immunodeficiency Virus (SIV) integration sites centered on transcription start sites of the nearest gene.

The empirical comparison between simulated (dotted line) and observed distribution leads the authors to conclude in favour of non-randomness of retroviral integration.

More »

Expand

Figure 2.

Example of integration distance calculation for one integration site mapped on Chromosome 4 (CB-RV51 insertion site in [20] dataset).

Notice that in this particular case the transcription start site (TSS) of the nearest gene coincides with the nearest downstream (3′) TSS.

More »

Expand

Figure 3.

Distribution of 1,250,000 integration distances (kb) from the transcription start site (TSS) of the nearest gene (Y) randomly generated from a Uniform distribution.

The solid line is the kernel density estimate plotted within a ±30 kb window for a better graphical visualization of the ”bell-shape” curve.

More »

Expand

Figure 4.

Integration distance (ID) from the nearest gene transcription start site (TSS).

In this picture, six hypothetical genes with different length and orientation (blue arrows) are scattered along a chromosome (x-axis). The purple piecewise linear function represents the distance from the TSS of the nearest gene. This function has discontinuities exactly in the middle of the intervals between two consecutive genes. Even assuming a series of random integrations in this setting, we obtain a distribution of distances from TSSs (projected on the y-axis, gray plot) which is a mixture of Uniform distributions. As a consequence, the bell-shape curve is observed. Notice that the ID distribution is asymmetric around zero, since gene orientations and gene lengths determine which is the TSS to be considered in computing the distances (a symmetric distribution would be observed plotting the distance from the nearest TSS instead of the nearest gene TSS, data not shown).

More »

Expand

Figure 5.

Beta probability distribution functions for different parameter combinations.

Solid black line represents the case of Uniform distribution (p = q = 1). Other curves are all consistent with the alternative hypothesis in H₁: p≠1 or q≠1.

More »

Expand

Figure 6.

Loglikelihood function related to the distribution of Y^* observed in human hematopoietic/stem progenitor cells showing the Maximum Likelihood Estimator (MLE) for the parameters p and q.

More »

Expand

Figure 7.

Comparison between the observed Y^* distribution and the fitted distributions of Method of Moments Estimators (MMEs, red dashed line) and Maximum Likelihood Estimators (MLEs, blue dashed line).

Goodness of fit was assessed by Kolmogorov Smirnov test (MME p-value = 0.909, MLE p-value = 0.8012).

More »

Expand

Table 1.

Method of Moments and Maximum Likelihood p and q estimates (MME and MLE, respectively).

More »

Expand