Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

Figure 1

Principles of repeat identification using P-clouds.

A) True data distribution representing divergence within a TE family from a master element sequence (center). B) Consensus sequence based search throws away information by collapsing observed data to a single sequence. C) P-clouds clusters related high-abundance oligos, thus providing better coverage of sequence space.

Figure 1