Evidence of Influence of Genomic DNA Sequence on Human X Chromosome Inactivation
Figure 3
Recursive Feature Reduction and Distributions of Consistent Features across the XAR Nonborder Genes
(A) The mean prediction accuracy and standard deviations (y-axis) for 100 recursive feature reduction iterations are shown for the indicated number of genes (x-axis). Green represents the CV rate using randomly selected two-thirds of the XAR nonborder genes for each set of features. The prediction rates for escaping genes (blue) and subject genes (red) in the remaining one-third are also shown. Both escape and subject prediction rates begin declining when the feature set is reduced to fewer than 53 features.
(B) The content of each feature (y-axis) in specific windows around the transcription start sites for all 82 XAR nonborder genes (x-axis) is represented as a histogram. The first 36 genes on the x-axis escape X inactivation (shaded area), and the remaining 46 are subject to X inactivation. Features found to be consistently chosen during recursive feature reduction for the creation of accurate classifiers are L1 100 kb downstream, MLT1K 100 kb upstream, and MER33 100 kb upstream. For comparison, THE1B 50 kb upstream, a randomly distributed feature, is also shown.