Outline of a Genome Navigation System Based on the Properties of GA-Sequences and Their Flanks

Introducing a new method to visualize large stretches of genomic DNA (see Appendix S1) the article reports that most GA-sequences [1] shared chains of tetra-GA-motifs and contained upstream poly(A)-segments. Although not integral parts of them, Alu-elements were found immediately upstream of all human and chimpanzee GA-sequences with an upstream poly(A)-segment. The article hypothesizes that genome navigation uses these properties of GA-sequences in the following way. (1) Poly(A) binding proteins interact with the upstream poly(A)-segments and arrange adjacent GA-sequences side-by-side (‘GA-ribbon’), while folding the intervening DNA sequences between them into loops (‘associated DNA-loops’). (2) Genome navigation uses the GA-ribbon as a search path for specific target genes that is up to 730-fold shorter than the full-length chromosome. (3) As to the specificity of the search, each molecule of a target protein is assumed to catalyze the formation of specific oligomers from a set of transcription factors that recognize tetra-GA-motifs. Their specific combinations of tetra-GA motifs are assumed to be present in the particular GA-sequence whose associated loop contains the gene for the target protein. As long as the target protein is abundant in the cell it produces sufficient numbers of such oligomers which bind to their specific GA-sequences and, thereby, inhibit locally the transcription of the target protein in the associated loop. However, if the amount of target protein drops below a certain threshold, the resultant reduction of specific oligomers leaves the corresponding GA-sequence ‘denuded’. In response, the associated DNA-loop releases its nucleosomes and allows transcription of the target protein to proceed. (4) The Alu-transcripts may help control the general background of protein synthesis proportional to the number of transcriptionally active associated loops, especially in stressed cells. (5) The model offers a new mechanism of co-regulation of protein synthesis based on the shared segments of different GA-sequences.


Introducton
The importance of genome navigation in the case of the huge genomes of mammals and others can hardly be exaggerated. As pointed out in a previous article [1], even the most basic household function of mammalian cells require finding specific genes reproducibly and rapidly in the multi-billion base pair vastness of their genomes, especially during immune or stress responses. The often cited random diffusion of transcription factors and polymerases throughout the dense chromatin matrix hardly represents a navigation system with the required high level of accuracy and speed.
Equally important seems to be the necessity to understand possible failures of genome navigation. Even a 'mild' slow-down of the search mechanisms may cause numerous diseases by delaying the synthesis and/or turnover of vital gene products and moving them out of a required synchrony. Worse, even a small mutation in the direction-giving elements may cause the misdirection of the search mechanism. By sending large numbers of polymerases to the wrong targets such a mutation may produce diseases that have no single cause, but are the result of hundreds and thousands of improper gene expressions that may seem functionally unrelated and, thus, render it almost intractable. One wonders whether cancer or various dementias are diseases of this kind.
In a previous article I have suggested that pure GA-sequences may serve as sign posts of the genome navigation system [1]. These are sequences of 50-1300 bases consisting exclusively of G's and A's. Statistically speaking, their existence is extremely improbable. Yet, tens of thousands of such sequences are distributed throughout mammalian genomes. With the exception of 4 specific types, no two of them were identical. Although there is no doubt that pure GA sequences have all these properties, there is as yet no experimental evidence that they serve as sign posts of a genome navigation system, even though a number of observations in the field of heat shock seem to support the interpretation [1].
In view of the pivotal importance of our understanding of the way in which genomes navigate their own vastness, this article tries to expand further on the concept of a genome navigation system which is based on pure GA-sequences in order to advance it to a more testable state.

Results
A list of definitions used in this article are attached at the end)

The genome pixel image (GPxI)
The aim of the present article to study the sequence architecture of GA-sequences and their genomic neighborhood requires detailed comparisons of thousands of very large DNA sequences in order to detect common patterns among them. Traditionally, this kind of task is solved by aligning them and by computing their homology, using one of the established algorithms such as the Needleman-Wunsch algorithms [2].
While such methods are both mathematically elegant and quantitative, they require considerable computing time and, more importantly, they often require some prior knowledge as to which DNA sequences should be examined in order to obtain meaningful results. Therefore, I introduce and apply here a novel method to represent DNA sequences. It turns relationships between sequences into visible patterns by representing the DNA sequences as graytone images called 'genome pixel images' (GPxI). The method is both sensitive and intuitive as it takes advantage of the exceptional ability of the human visual sense to detect patterns in images.
Briefly, the method assigns to the bases the following gray-tone values: A: black, G: white, C: dark gray and T: light gray (Fig. S1). This assignment is, of course, arbitrary, but must remain the same throughout. It transforms the consecutive bases of a DNA sequence into a continuous line of pixels with these gray values. Whenever the line of pixels reaches the edge of the image area, it wraps around like any other text would, and continues at the beginning of the next line immediately underneath (Fig. S2). The method detects relationships between sequences as patterns very sensitively (Fig. S3). For more details see Appendix S1.

The GPxI of the GA-complexes
In addition to the pure GA-sequences themselves I recorded also their 400 [b] large flanks in various chromosomes of humans, chimpanzees, rhesus monkey, mouse, and zebrafish. It should be noted that some of the GA-sequences and their flanks had to be omitted as they were duplications for the following reason. If 2 consecutive GA-sequences were closer together than the flank size of 400 [b], their flanks would overlap and, thus be recorded twice, at least in part. Therefore, the flanks of all GA-sequences closer than 1 [Kb] were eliminated throughout this article.
The GPxI of the first 1,100 GA-complexes of human chr. 1 displayed in their natural order of occurrence are shown in Fig. 1a. The upstream ( = left hand) ends of all GA-complexes were aligned in the vertical direction, which automatically also aligned the upstream ends of the GA-sequences. In contrast, the downstream flanks were not aligned in this GPxI, because the lengths of the pure GA-sequence were variable [see 1], thus pushing the ends of the downstream flanks to variable positions.
There were 4 striking results of the depicted GPxI of the aligned GA-complexes.
1. The pure GA-sequences appeared to contain many nonrandom patterns. 2. Neighboring GA-sequences seemed to share many patterns as evidenced by the enhanced visibility of the patterns after alignment as in Fig. 1b   a. The tetra-GA motifs of pure GA-sequences.
Concatenating end-to-end all 1667 pure GA-sequences of human chr.1 yielded the GPxI shown in Figure 2b. The comparison with a computer-constructed random GA-sequence file (Figure 2a), confirmed that the pure GA-sequences contain many repetitive patterns. The period length of the common motifs can easily be determined by yet another application of the GPxI-method. Adopting the rationale of the so-called Markham rotation [3], one can superimpose pixel-by-pixel a particular GPxI with other GPxIs that were created by frame-shifts of 1,2,3, …[b] of the original sequence. Assume a motif has the size of N bases and forms strings of various lengths. Every time the original GPxI is superimposed with one that was frame shifted by N or an integral multiple of N, the images of the motif strings coincide and thus appear reinforced.
As illustrated in the GPxI of the pure GA-sequences of human chr.1 (Fig. 2b) frame shifts of 4, but not of 1, 2, and 3 reinforced the patterns, indicating that the prevalent repeated motifs of pure GA-sequences are tetra-GA motifs. These motifs were not only present, but constituted a significant part of the pure GAsequences. Furthermore, the 4-fold patterns seem to repeat over several lines in the vertical direction of the GPxI, as if consecutive GA-sequences shared similar chains of tetra-GA motifs.
Similar considerations suggest that in addition to AAAG among the remaining tetra-GA-motifs only AAGG, AGAG, and GGGA were able to generate essentially different chains (AAAA and GGGG are excluded by definition of the pure GA-sequences). These tetra-GA-motifs occurred with different frequencies in the pure GA-sequences. Evaluating the 206,450 occurrences of tetra-GA motifs in the 19,139 pure GA-sequences of the entire human genome yielded the following probabilities of their occurrence: AAAG (10.4%), AAGG (7.1%), AGAG (5.1%), and GGGA (3%). Together all of the tetra-GA motifs made up 46-47% of the entire length of the pure GA-sequences of the human genome. The rest were individual sequences that guarantee the individuality of the GA-sequences [1].
b. The appearance of upstream stripes in the GPxIs of the GA-complexes of human and chimpanzee. A closer inspection of Fig. 1a suggests that the stripe patterns appeared upstream of a pure GA-sequence whenever its upstream end began with a certain stretch of poly(A) (i.e. with many black pixels). In order to test this conjecture, I extended the definition of GAsequences to include more cases with poly(A) stretches.
At this point the reader is reminded that pure GA-sequences were defined as GA-sequences longer than 50 bases in order to exclude poly(A) and poly(G) sequences which, of course, fulfill trivially the definition of a GA-sequence, namely to contain no C's or T's [1]. Therefore, the inclusion of more poly(A) containing GA-sequences was achieved by simply easing the size restriction down to sizes of only 20 bases and longer. The resulting GAsequences will be called 'common' GA-sequences in the following. By definition, the common GA-sequences included the pure ones.
Reducing the length restriction yielded a much increased number of GA-sequences. For example, human chromosome 1 contained 1667 pure GA-sequences and 19,513 common GAsequences. As a result, the ribbon of GA-sequences became much darker in the GPxI and the upstream stripes became much more pronounced (Fig. 3).
Upstream stripes appeared in identical form in the GPxIs of the (common) GA-complexes of human chromosomes 1 (Fig. 3), 7 and X and even in the GPxIs of chimpanzee chromosomes (Fig. 3). In contrast, chromosomes of rhesus monkey, dog mouse and zebrafish showed no obvious patterns in the upstream flanks (Fig. 3).
The GPxIs generated from the common GA-sequences of human and chimpanzee chromosomes after re-ordering them by /division). a. The GPxI of a computerconstructed DNA file consisting of random sequences of G (white pixels) and A (black pixels). Therefore, no pixels with other gray-values are visible. The randomness is of the sequences is expressed by the lack of any detectable patterns. b. GPxI of the end-to-end concatenated pure GA-sequences of human chr. 1 shows clearly a number of patterns. Although different, they seem to share a periodicity of 4. c., d. Use of a modified Markham rotation [3] to demonstrate the prevalence of the 4periodicity. In panel c the GPxI of panel b is superimposed on itself although frame shifted by 2 bases. The result is a rather featureless gray image. In panel d the applied frame shift is 4. The result is the almost identical re-appearance of the original GPxI, indicating that a frame-shift of 4 reinforces the prevalent patterns. doi:10.1371/journal.pone.0004701.g002 the size of their upstream poly(A)-segment confirmed that the upstream poly(A) stretches were required for the appearance of upstream stripes: Whenever the GA-sequences did not end in an upstream poly(A) motif, upstream stripes were not visible in the GA-complex, either (Fig. 4a). In contrast, when the GPxI of a GAsequence displayed a predominantly black stretch, the upstream stripes were strongly expressed in its upstream flank (Fig. 4b).
They also demonstrated that the poly(A)-segments (depicted black in the GPxIs) were located almost exclusively at the upstream ends of the GA-sequences (see e.g. Fig. 3). In this way, the poly(A)-segments created a certain asymmetry and directionality of the GA-sequences, which may point to their role as markers for a reading direction of the GA-sequences.
Apparently, in exceptional cases GA-complexes can suffer inversions. After sorting the GA-complexes according to the poly(A) content of their downstream flanks, I found in human chr.7 a handful of GA-complexes whose upstream stripes were absent, but their exact mirror images appeared in the GAsequences.
c. The identity between upstream stripes and Alusequences. In an unrelated study I searched the human chromosome 1 for the locations of Alu sequences. The search used the Alu-sequence of Def {4} as template and tolerated up to 10 point mutations at arbitrary locations for successful matches.
Once found, the matching sequences and their 400 [b] large upand downstream flanks were recorded and used to generate the GPxI of the corresponding Alu-complexes (Fig 4c). Surprisingly, the upstream stripes of human and chimpanzee appeared identical to the stripe pattern of the Alu-sequences (Fig. 4b,c). A further surprise was the absence of any Alu-patterns in the upstream flanks of the GA-sequences of rhesus monkeys (Fig. 3), (or anywhere else in the rhesus genome), as Alu-sequences are generally believed to be shared by all primates.

A simple model of genome navigation
The following describes in broad strokes an outline of genome navigation that is consistent with the above findings. It offers details only when there were obvious objections to be met.  GA-sequences, it is rather obvious, what a genome navigation system should not do. Imagine that it would need to scan the entire genome in order to find a particular sign post, which subsequently would guide it to the desired target genes. This mechanism would offer very little advantage over no navigation mechanism at all. After all, instead of crawling along billions of bases to find a specific gene, the search mechanism would have to crawl along the same billions of bases in order to find first the appropriate sign post. Obviously, it would be much more efficient, if all sign posts were concentrated in a small space, so that the search mechanism could rapidly leap from one sign post to the next.
b. The topology of the side-by-side alignment of GAsequences. Since all pure GA-sequences of a chromosome are lined up in tandem on the same DNA strand, there is essentially only one non-disruptive way of forcing all of them into a small space, namely by placing the pure GA-sequences side-by-side while folding the intervening stretches of DNA between them into loops (See Fig. 5, Fig. 6).
c. The role of the upstream poly(A) stretches as binding sites for linker molecules. A side-by-side arrangement of consecutive GA-sequences requires one or more species of linker molecules which are capable of binding to consecutive GAsequences and to each other. Therefore, all GA-sequences should, their overall individuality [1] notwithstanding, contain or be flanked by a common binding segment for these universal linker molecules. Based on the above results, the poly(A) stretches at the upstream end of the common GA-sequences are the most obvious candidates for such common binding sites for linker molecules.
In this case, it would not be difficult to find the corresponding poly(A) binding proteins that could serve as the linker molecules. Although primarily known for their interaction with the 39 poly(A) tails of mRNAs, in many cases their binding specificity does not distinguish unambiguously between poly(A) and poly(dA). Especially the protein known as PABPN1 was found to enter the nucleus, to be required for transcription [4,5], and to shuttle between nucleus and cytoplasm [6]. More importantly, it not only binds to poly(A) sequences, it appears to bind to itself, as it forms nuclear aggregates even in the absence of mRNA [4]. In this way, it would be able to bind two poly(A) sequences together in a poly(A)-PABPN1-PABPN1-poly(A) complex. Hypothesizing, therefore, that PABPN1, or similar nuclear poly(A) binding proteins align GA-complexes with their upstream poly(A)segments, one may arrive at a basic topology of chromatin that  would support a fast genome search and navigation mechanism depicted as depicted in Fig. 5.
The poly(A) binding protein(s) may have an additional role. GAsequences as sign posts are likely to contain information for the searching mechanism, which may require a particular reading direction. Considering that poly(A) tails generate a reading asymmetry in mRNAs, it is tempting to think of the poly(A)segments as markers for the reading starts, as they occur almost exclusively at the upstream end of the GA-sequences.
d. The reduction of the search path. Since the distance between consecutive GA-complexes is not constant, the parallel arrangement of GA-sequences would create a kind of 'ribbon' with different size loops between them ( Figure 5). In this way, each chromosome is divided into 2 domains, the ribbon of the GAsequences (non-coding) and all the rest (including all genes). In other words, all genes are located on one or the other loop. The variable sizes of the loops accommodate variable numbers and sizes of genes.
This ribbon may be considered a 'macro-insulator' [see 7] for an entire chromosome, although the present article presented no evidence that the associated DNA-loops between GA-sequences are transcriptional independent. Yet, in support of this notion, it appears that the known insulators in Drosophila melanogaster contain binding sites for the GAGA-factor [cited from 8], suggesting that they may be related to the GA-sequences.
Searching along this ribbon instead of the entire chromosome could shorten considerably the search path for genome navigation. Consider the following rough estimate! Human chr. 1 has a length of 238 Mb and contains 19513 common GA-sequences with an average distance between consecutive GA-sequences of 12.2 Kb. At a distance of 0.3 nm per base pair, the average loop between consecutive GA-sequences would therefore measure 3655 nm. If there was a base-by-base search mechanism it would have to crawl along this distance in order to move from one GA-sequence to the next. On the other hand, searching along the ribbon of parallel arranged GA-sequences would shorten the distance to the next GA-sequence to 1 diameter of the double helix (2 nm) and, maybe, the diameter of a linker protein (e.g. 3 nm). Thus, instead crawling for 3655 nm along a loop, the search mechanism could leap to the next sign post by moving only 5 nm, corresponding to a 730-fold shortening of the search path.
e. The GA-ribbon as an architectural feature of chromatin. Another kind of ribbon, namely the GA-ribbon had been introduced earlier (e.g. see Fig. 1a), in the GPxIs of GAcomplexes. At that time, it was merely a visual consequence of sequence alignment in the GPxIs of chromosome segments. Now, based on the postulate of minimizing search paths, I suggest that the GA-ribbon may actually reflect a certain reality of chromatin architecture. In other words, if one could flatten out chromatin and stain the different bases with 4 different gray-tone probes, the resulting microscopic image may look similar to the GPxI of Fig. 1a.
It should be noted that the concept of the GA-ribbon is ultimately a topological one. If it were stained with appropriate probes, its actual appearance inside the nucleus does not have to resemble a straight ribbon. On the contrary, in actuality it may well be rolled up into a ball or a tube with the various associated DNA loops pointing to the outside. A number of other topologically equivalent shapes are equally well conceivable, and are not excluded here.
f. The chromatinization of GA-ribbon and associated loops. Of course, in reality the various loops of DNA will have to be associated with nucleosomes ( Figure 6). The average size loop of 12.2 Kb is large enough to accommodate roughly 50-60 nucleosomes, or about one 30-nm fiber [9]. The variable lengths of the 30 nm fibers would be consistent with the variable length of the loops between adjacent GA-complexes.
The GA-sequences whose parallel arrangement gives rise to the GA-ribbon would hardly exist as naked DNA for long, either. More likely they are associated with GA-specific transcription factors and other GA-specific DNA-binding proteins. In view of the reported prevalence of tetra-GA motifs in GA-sequences one would expect that these DNA-binding proteins have preferences for tetra-GA motifs such as the GAGA-factor [10,11], HSF1 [12] and others (see below and Fig. 7d).

g.
The postulate of a search mechanism ('clavisomes'). If one adopts the view that genomes contain sign posts arranged into much shortened search paths, it is consistent to postulate also that 'something' exists that searches this path. This hypothetical searching complex must (a) find the specific GA-sequence, GA-Sequ 0 (see Def {7}), that belongs to the associated loop containing the gene of the target protein P 0 (see Def {6}) and (b) interact with it in order to initiate transcription in the associated loop(s). Much may already be known about this entity, albeit possibly under different names. While there is no evidence that it exists in the form of a nuclear particle, for the sake of simplicity it will be treated as such and called a 'clavisome' in the following (from lat. clavis = key) as it 'unlocks' a segment of chromatin (see Def. {5}).
h. The hypothetical initiation of transcription by clavisomes. How can clavisomes initiate the transcription of a specific protein P 0 in response to its demand by the cell? Several reports in the literature suggested that the activation of genes is Figure 6. Outline of a chromatin model that supports a fast genome navigation system. By leaping from one GA-sequence to the next along the GA-ribbon in the scanning direction while 'reading' the information encoded in the proteins bound to the GA sequence in the reading direction, the postulated 'clavisomes' (searching complexes) can efficiently find the appropriate GA-sequence on a more than 700-fold shorter search path than by crawling along the various size loops of genomic DNA. After a clavisome found its target GA-sequence and interacted with it, the nucleosomes in the associated loop are released, and the specific coding sequences in the loop are exposed to the transcription mechanisms. doi:10.1371/journal.pone.0004701.g006 accompanied by loosening or even breaking the association between DNA and nucleosomes [for references see 9; specifically 13, and 14]. Thus it seems conceivable that the interaction between a clavisome and GA-Sequ 0 leads to the release of the nucleosomes in the associated loop, thus exposing the coding sequences for P 0 within the loop to the mechanisms of transcription. This step is depicted in the model of Figure 6.
i. The hypothetical recognition of target GA-sequences by clavisomes. Of course, the above hypothesis begs the question how clavisomes, upon a cellular demand for P 0 , can distinguish its particular GA-Sequ 0 from all other GA-sequences. Assume, that each cellular protein P 0 is able to interact with a certain number of hypothetical transcription factors, tetra-GA-factors m (see Def {8}), which are specific for binding one of the 16 different tetra-GA motifs (Fig. 7a,b). The interaction may catalyze the formation of a P 0 -specific oligomer (see Def {9}) of these transcription factors, which is subsequently released from the P 0 molecule and enters the nucleus (Fig. 7c). There it binds to the characteristic chains of the tetra-GA-motifs of GA-Sequ 0 (Fig. 7d) and prevents clavisomes from interacting with GA-Sequ 0 . As a result, no new transcripts of P 0 will be made as long as the cytoplasmic levels of P 0 remain sufficiently high to produce a steady stream of the P 0 -specific oligomers. However, if the cytoplasmic levels of P 0 drop below a certain threshold, GA-Sequ 0 would become 'denuded' and allow clavisomes to initiate the transcription of the genes for P 0 in the associated loop of GA-Sequ 0 . The binding of the tetra-GA-factors m by each cellular protein does not have to be direct. The proposed regulation of the P 0 synthesis may happen indirectly via another protein that does bind these transcription factors. As to the case of secreted proteins, the model predicts that their transcription is only turned on if their steady state levels in the cytoplasm is depleted.
There are precedents for major aspects of the above scheme. For example, in Drosophila the GAGA-factor, which can be viewed as one of the 16 tetra-GA-factor m molecules, is required to bind to GAGA motifs in certain GA-rich sequences before the transcription of heat shock proteins is initiated [11]. Another example may be the case of human HSF1, a GA-specific transcription factor with a pentameric consensus sequence of nGAAn which has to form a trimer in the cytoplasm in response to stress situations before it can initiate the transcription of heat shock proteins [12].
More generally speaking, it is not hard to imagine how natural selection, starting with some crude linkages between GA-motifs and GA-rich sequences, over time could have selected for gene products P 0 with binding domains for the same combination of tetra-GAfactors m as were contained in their corresponding GA-Sequ 0 .
j. Expected properties of the P 0 -specific oligomers. The formation of the P 0 -specific oligomers must not be a mere concatenation, nor must their inhibition of clavisomes be merely the result of their binding to the tetra-GA-motifs of the GAsequences. After all, the individual tetra-GA-factors m were assumed to bind these motifs, too. Unless prevented from entering the nucleus, the tetra-GA-factors m should coat all the GA-sequences and, thus, stop transcription permanently. Therefore, the formation of P 0 -specific oligomers and their binding to the GA-sequences must involve additions and/or modifications of their component tetra-GA-factors m .
The postulated P 0 -specific oligomers need not be larger than pentomers in order to distinguish between more than 1 million different protein species P 0 because 16 5 = 1,048,576. Still, the regulation of the protein synthesis of P 0 would probably require the binding of several P 0 -specific oligomers to GA-Sequ 0 . Otherwise, the expression of P 0 would occur in a rather abrupt all-or-none fashion. Figure 7. Assumed linkage between the cellular demand for protein P 0 and the accessibility of the particular GA-sequence GA-Sequ 0 which connects to the loop containing the P 0 gene (see text). a. As long as the cellular protein P 0 is available in sufficient quantities (i.e. there is no demand for P 0 ), one or more of the 16 conceivable tetra-GA specific transcription factors tetra-GA-factor m can bind to it at its specific binding sites. b. The bound tetra-GA-factor m molecules form a P 0 -specific oligomer, [tetra-GA-factor m1 ] [tetra-GAfactor m2 ]… [tetra-GA-factor mN ]. c. The P 0 -specific oligomers are released from the P 0 molecule and enter the nucleus. d. They bind to the characteristic chains of the tetra-GA-motifs of GA-Sequ 0 and prevent clavisomes from interacting with it. Conversely, if the cytoplasmic levels of P 0 drop below a certain threshold (i.e. there is high demand for P 0 ), no more P 0 -specific oligomers are formed to block GA-Sequ 0 . As a result, clavisomes are able to initiate the transcription of the genes for P 0 in the associated loop of GA-Sequ 0 . doi:10.1371/journal.pone.0004701.g007 Multiple interactions between P 0 -specific oligomers and GA-Sequ 0 would also be required if the P 0 -specific oligomers were composed of fewer than 5 monomers. In this case single tetra-, tri-, or dimers could not contain enough information to specify the target protein unambiguously among hundreds of thousand other candidates.
k. The potential for co-regulation of gene expression due to the similarity of neighboring GA-sequences. Frequently, the genes of co-regulated proteins are located in close proximity to each other on the genome [15]. The above outline of genome navigation offers several reasons to explain this finding.
In the first place, it assumes tacitly that all genes that are contained in an associated loop are expressed together. In this sense it provides a simplistic mechanism of co-regulation of gene expression.
Furthermore, many neighboring GA-sequences shared partly identical chains of tetra-GA motifs, as shown in Fig. 1b and 2b which depicted the GA-sequences of human chr.1 in their natural order of occurrence. Therefore, many P 0 -specific oligomers that bind to GA-Sequ 0 may also bind to neighboring GA-sequences and, thus, influence the regulation of other gene products encoded in the adjacent associated loops.
This mechanism may even apply to genes of co-regulated proteins on different chromosomes. If their GA-sequences share similar chains of tetra-GA motifs their specific oligomers can crossreact with each other's Ga-sequences regardless of the distance between their loci and, thus, rise and fall together in their expression.

The close association between Alu-elements, poly(A)sequences and GA-sequence in some primates
As reported in this article, the upstream ends of many GAsequences were poly(A)-sequences. In human and chimpanzee genomes this has a peculiar consequence, because these genomes contain millions of Alu-sequences, which seem to be located upstream of poly(A) sequences (see e.g. [16] and Fig. 4c). Indeed, in humans and chimpanzees I also found Alu-sequence upstream of all the GA-sequences that terminated upstream in a poly(A)sequence (Fig. 4b).
Of course, this association between GA-sequences and Alusequences in human and chimpanzee genomes may have been merely a consequence of a poly(A)-dependent insertion mechanism of Alu-sequences. However, as argued below, genome navigation may also benefit from these insertions.
Being retro-transposons, even a few Alu-elements could spell disaster for any genome, as they could exponentially replicate and re-insert and, thus, fragment the genome in the process. Apparently, humans and chimpanzee genomes have learned in time to inhibit the transcription of Alu-elements to a manageable level [16]. Assuming that this mechanism of suppression spills over to their flanking segments, the insertion of Alu-elements directly upstream of GA-sequences may have offered these primate genomes an added level of precise suppression of the transcription of GA-sequences, even while they are de-repressed, consistent with their function as sign posts.
Furthermore, one can imagine that a clavisome is able to initiate the transcription of the adjacent (previous or subsequent) Alusequence, once it has opened up a specific loop for transcription (Fig. 6). This action [16] may offer yet another advantage.
It is well known that Alu-transcripts have a substantial influence on the translational regulation of protein synthesis, especially in stressed cells [17,18,19]. Thus, if their transcription is contingent upon the opening of the target loops, they may afford the genome navigation mechanisms a swift and guaranteed handle on the control of ongoing gene expressions. This may be needed especially under stress situations. Indeed, it has been shown that genotoxic stress initiates a massive transcription of Alu-elements [20].
In view of the latter possibility the specific placement of Alusequences in the upstream flanks of GA-sequences may even offer a certain quantitative control of background protein synthesis: If each GA-sequence has its own upstream Alu-sequence, the genome navigation mechanism(s) may produce proportionally as many copies of Alu-transcripts as they opened loops for transcription.
Other organisms may not use the same mechanism(s) because their SINE or LINE sequences may not have the same effects on protein synthesis. Furthermore, humans and chimpanzees are among the most recent species. Therefore, one may consider the presence of Alu-sequences in the upstream flanks of their genomes as a very recent acquisition in the evolution of the genome navigation systems.

A note about the GA-sequences of Drosophila melanogaster
Finally, I should explain my frequent use of examples from Drosophila melanogaster, even though it is obviously not a vertebrate. In the first place, a number of reports in this field mention GA-related observations that seem to point to fundamental principles of genome navigation. Furthermore, I hope that a number of crucial predictions of the model presented here can be tested in Drosophila genomes. It may also be possible to find the corresponding results in vertebrate genomes. Nevertheless, there are major differences between Drosophila and vertebrate genomes that may warrant different tests and approaches.
For example, the GAGA-factor was discovered in the field of Drosophila and plays a major role there. Indeed, as shown in Appendix S2, the common GA-sequences of Drosophila contain predominantly chains of the GAGA motif (see Appendix S2: list of GA-sequences and Fig. S4b, area '3'). In contrast, the many other tetra-GA-motifs which occur frequently in vertebrate genomes exist in only very small numbers in Drosophila. Therefore, vertebrates may express many other tetra-GA-motif-factors that are still to be discovered and whose function needs to be explored. Furthermore, although the GA-ribbon of Drosophila chromosomes may appear similar to that of vertebrates (Appendix S2: Fig.  S4a), the GA-sequences are much shorter (Appendix S2: list of GA-sequences) and may require very different kinds of clavisomes to interact with them.

Definitions
The following definitions were used in this article.

Materials and Methods
The genome sequences of human, chimpanzee, mouse, dog, zebrafish, and Drosophila melanogaster were obtained from the UCSC site. The Alu-sequence was derived from the NCBI site.
The analysis program, ''GA_dnaorg.exe'', was written by G.A.-B. using Visual C++ (Microsoft, Redmond, WA). Appendix S1 The genome pixel image (GPxI). I introduce and apply here a novel method to represent DNA sequences. It turns relationships between sequences into visible patterns by representing the DNA sequences as gray-tone images called 'genome pixel images' (GPxI). The method is both sensitive and intuitive as it takes advantage of the exceptional ability of the human visual sense to detect patterns in images.