Conserved Alternative Splicing and Expression Patterns of Arthropod N-Cadherin

Metazoan development requires complex mechanisms to generate cells with diverse function. Alternative splicing of pre-mRNA not only expands proteomic diversity but also provides a means to regulate tissue-specific molecular expression. The N-Cadherin gene in Drosophila contains three pairs of mutually-exclusive alternatively-spliced exons (MEs). However, no significant differences among the resulting protein isoforms have been successfully demonstrated in vivo. Furthermore, while the N-Cadherin gene products exhibit a complex spatiotemporal expression pattern within embryos, its underlying mechanisms and significance remain unknown. Here, we present results that suggest a critical role for alternative splicing in producing a crucial and reproducible complexity in the expression pattern of arthropod N-Cadherin. We demonstrate that the arthropod N-Cadherin gene has maintained the three sets of MEs for over 400 million years using in silico and in vivo approaches. Expression of isoforms derived from these MEs receives precise spatiotemporal control critical during development. Both Drosophila and Tribolium use ME-13a and ME-13b in “neural” and “mesodermal” splice variants, respectively. As proteins, either ME-13a- or ME-13b-containing isoform can cell-autonomously rescue the embryonic lethality caused by genetic loss of N-Cadherin. Ectopic muscle expression of either isoform beyond the time it normally ceases leads to paralysis and lethality. Together, our results offer an example of well-conserved alternative splicing increasing cellular diversity in metazoans.


Introduction
During early metazoan development, cells undergo a complex process in which they are organized into germ layers that further differentiate into various cell types each serving distinct functions. This process requires cellular complexity resulting from molecular diversity in each cell. The post-genomic era has brought the view that the number of protein-coding genes is insufficient to account for the cellular complexity of multicellular organisms. For example, the Caenorhabditis elegans genome contains some 19,000 protein-coding genes [1] whereas the human genome contains no more than 25,000 [2], undermining the simplistic notion that the cellular complexity of an organism rises in proportion to the number of protein-coding genes.
Alternative splicing of pre-messenger RNA drastically increases the molecular complexity of the mRNAs expressed in cells [3]. While only 0.05% of protein-coding genes (3 out of 6000) in Saccharomyces cerevisiae are alternatively spliced, the majority of protein-coding genes in the human genome are known to undergo alternative splicing [4][5][6][7][8][9], supporting the significance of alternative splicing in generating molecular diversity in metazoan evolution. Alternative splicing is particularly abundant in the brain [9]. Deficiency in producing precise splicing variants has been implicated in several neurological diseases [10,11]. Estimated numbers of splice-variants from a single gene range from just two in C. elegans Cadherin [12] to approximately 70 with mammalian Protocadherin [13,14] and over 38,000 with Drosophila Dscam [15]. Qualitative differences between individual splice-variant isoforms could allow them to distinguish different protein binding partners [4,16,17] or nucleotide binding sequences [18]. Splice variants could also be targeted to separate subcellular domains [19,20] and receive differential degradation controls [21].
Since alternative splicing might allow faster evolution of protein sequences, one might expect that nucleotide sequences of constitutive exons would be more conserved than those of alternative exons. On the contrary, when analyzing human and mouse orthologous transcriptomes, a higher degree of nucleotide sequence conservation is frequently observed in the alternatively spliced exons and/or flanking introns than in the constitutive exons [22][23][24][25][26][27]. Such incidences are due to the conserved presence of cis-acting regulatory elements that interact with the core spliceosomal components and tissue-specific splicing factors [11,28,29]. This supports the idea that tissue-specific regulation of alternative splicing is through combinatorial expression of splicing factors. Therefore, in addition to generating proteins of diverse functions, regulated expression patterns of alternative splicing could provide an additional layer of control over the quantity of expression [30][31][32][33].
Previous work in Drosophila shows that Neural Cadherin (N-Cadherin) is expressed in both the nervous system and early mesoderm in embryos. N-Cadherin mediates homophilic cell adhesion, associates with beta-Catenin, and causes neurogenesis defects when genetically deleted [34][35][36][37][38][39]. In this study, we examined alternative splicing of the arthropod N-Cadherin gene. By combining (i) in silico pan-genomic data-mining, (ii) in vivo expression assessment in evolutionarily distant organisms, and (iii) isoform-specific genetic manipulations in model organisms, we found that N-Cadherin alternative splicing is conserved and the expression patterns of splice-variants are tightly regulated during arthropod embryonic development.

Conserved alternative splicing of arthropod N-Cadherin
Following the release of the Drosophila melanogaster genome sequences [40], it became possible to analyze the genomic organization of the Drosophila N-Cadherin gene. Based on this analysis, we predicted Drosophila N-Cadherin gene to contain three sets of mutually exclusive exons (MEs) within its open reading frame and that it would produce hitherto unknown splice-variants. Combinatorial use of these three sets of N-Cadherin MEs, each containing two alternative exons, is predicted to yield up to 8 splice-variant isoforms that form similar structures ( Figure 1A and 1C) [41]. (Note: A 12-nucleotide long micro exon-7a' is also predicted in all insect genomes examined in this study. It was always paired with exon 7a, but never 7b, in the mature transcripts).
We examined various genomes outside the Drosophila genus to assess the evolution of the genomic organization of the N-Cadherin gene. A single N-Cadherin ortholog is found in mosquito (Anopheles gambiae), silkworm (Bombyx mori), red flour beetle (Tribolium castaneum), honeybee (Apis mellifera), and water flea (Daphnia pulex) ( Figure 1C) (see Materials and Methods ''N-Cadherin orthologs''). These insect and crustacean species that contain the N-Cadherin gene are, respectively, 230, 270, 290, 300, 430 million years apart from Drosophila melanogaster ( Figure 1B) [42]. Furthermore, the exact genomic organization of the three sets of MEs is conserved in these arthropod genomes, with a notable exception in Apis mellifera. This suggests a recent loss of both ME-13 (exon13b) and ME-18 (exon18b) ( Figure 1C) after diverging from other insect species. The amino acid sequences encoded by arthropod N-Cadherin genes predict the same overall protein structure ( Figure 1A). Considering that the number of common exons and the lengths of interspersing introns vary widely from genome to genome ( Figure 1C), the particular genomic stability noted for MEs in N-Cadherin implicates their importance for the survival of arthropods.

Functional redundancy of Drosophila N-Cadherin splicevariants
Predicted amino acid sequences encoded by MEs in N-Cadherin of arthropod genomes are highly conserved ( Figure 3A). The putative Ca 2+ -binding motifs are present in all ME-7s (DRE and DxNDNxPxF) and ME-13s (DxNDNxPxF), while every ME-18 contains part of a single putative transmembrane domain. Conservation between orthologous alternative exons (a or b) in different species is greater than between paralogous alternative exons (a and b) of the same species, as shown in the cluster, indicating exon duplications before the divergence of insects from Daphnia ( Figure 3B). Within each pair, paralogous alternative exons exhibit great sequence diversity from each other. For example, the pair of ME-7s of Drosophila N-Cadherin exhibits 50% identity as amino acids, while those of ME-13s and ME-18s display only 47% and 36% identity, respectively. These results indeed suggest that splice-variant isoforms may convey distinct functions. We have previously tested this hypothesis through cell aggregation assays using the Drosophila S2 cell line. We revealed that all tested N-Cadherin isoforms are able to mediate heterophilic interactions with each other [41]. The two transmembrane domain isoforms, 7b-13a-18a and 7b-13a-18b, mediate graded homophilic interactions [43], suggesting their potential roles in regulating differential affinity during development, which still awaits in vivo testing. Genetic mosaic analyses have demonstrated that by supplying a single isoform of

Author Summary
Animal development requires complex mechanisms to generate many different types of cells. Alternative splicing is a process by which a single gene could produce several protein variants under particular circumstances. It is a useful means to generate a diversified set of proteins in different cell types. In this report, we showed that the alternative splicing of the arthropod N-Cadherin gene has been maintained for over 400 million years. The switch of expression patterns of two distinct variants is also well conserved in arthropods. As proteins, these two N-Cadherin splice variants have similar ability to rescue the embryonic lethality caused by genetic loss of N-Cadherin. However, when the expression of either isoforms was prolonged in muscles where the endogenous expression ceased beyond certain stages, it leads to larval lethality, suggesting the importance of precise spatiotemporal regulation of N-Cadherin splice-variant expression. This finding is particularly important because it offers an example of well-conserved alternative splicing increasing cellular diversity in animals.
We conducted two additional approaches using Drosophila neurons as the in vivo model system to evaluate whether splicevariant isoforms exhibit diverse functions. First, we considered the possibility that transmembrane domain splice-variants might be targeted to distinct subcellular compartments of neurons, as is the case with Dscam [19,20], where they would serve specific functions. Neurons are polarized cells. They contain distinct compartments including a single axon and several dendrites. Each distinct compartment is molecularly, structurally and functionally different from each other. We generated GFP-tagged N-Cadherin transmembrane domain isoforms (7b-13a-18a::eGFP and 7b-13a-18b::eGFP) and then drove their expression in either adult mushroom body interneurons or embryonic motoneurons. In conserved ME (mutually-exclusive alternatively-spliced exons)-7s (ME-7a and ME-7b), ME-13s (ME-13a and ME-13b) and ME-18s (ME-18a and ME-18b). Exons are shown as rectangles while introns as horizontal lines, drawn in proportion to their actual length. Common exons are also numbered for Drosophila. doi:10.1371/journal.pgen.1000441.g001 either type of neuron, both isoforms are localized to dendrites and axons, showing no obvious protein targeting bias ( Figure 4A). Moreover, the localization of ectopically expressed GFP-tagged isoforms is similar to that of the endogenous protein visualized by immunostaining, although 7b-13a-18a::eGFP localized more specifically to the synapses in the motoneuron axons. Second, we wanted to test the differential ability for N-Cadherin splice variants to rescue the viability of genetically N-Cadherin null animals. The loss of endogenous N-Cadherin leads to embryonic lethality [36]. We reasoned that if individual isoforms possess unique features that cannot be substituted by others, they would exhibit different abilities in rescuing the embryonic lethality in null mutant embryos. This simple test has the potential to reveal critical functional differences among the isoforms even when morphological criteria might fail. We used splice-variants derived from alternative use of ME-13s, i.e., 7b-13a-18a and 7b-13b-18a. When expressed transgenically in neurons of the N-Cadherin null mutants, either isoform is capable of partially rescuing the lethality ( Figure 4B), similar to expression in both neurons and mesoderm (data not shown). However, when expressed transgenically in mesoderm of the N-Cadherin null mutants, neither isoform rescues the lethality. Expression levels of transgenic proteins are at least as high as those of endogenous N-Cadherin. This does not induce any visible gain-of-function phenotype (data not shown). In summary, our in vivo results showed no differential subcellular localization between transmembrane domain isoforms containing exons 18a or 18b in Drosophila. Moreover, there are no functional differences in rescuing embryonic viability between extracellular Cadherin domain isoforms containing exons 13a or 13b. Although we could not rule out the possibility that these tests are unable to distinguish functional differences between splice-variants, in vivo data from our lab or other labs [34,38,39,41,44] do not support the hypothesis that N-Cadherin splice-variants possess diverse functions as proteins. Conserved amino acid sequences for ME-13 pairs are shown in bold. Some occur as mutually exclusive for ME-13a (in the orange background) or for ME-13b (in the red background). Amino acid identities (%) between alternative pairs of MEs in Drosophila are indicated. Lengths of ME-7's, ME-13's and ME-18's are, respectively, 63, 54-55 and 68-70 amino acids. DRE and DxNDNxPxF constitute putative Ca 2+ -binding motifs. EGF-like Ca 2+ -binding cystine-rich (EGF-CA) and transmembrane domains are indicated. (B) Phylogenetic trees of the MEs based on predicted amino acid sequences. Apis lacks MEs for ME-13s and ME-18s, but has single exons that resemble, respectively, ME-13a and ME-18a. While each tree is drawn to scale, the branch lengths from different trees are not comparable. doi:10.1371/journal.pgen.1000441.g003 (B) Genetic rescue of N-Cadherin null mutant through expression of cDNAs of isoforms derived from either ME-13a or ME-13b in neural (elav'-Gal4) and mesodermal (Gal4 24B ) tissues. Asterisk indicates a significant reduction in viability as compared to wildtype control at p,0.01 with two-tailed t-test. Star indicates a significant reversion of the lethality over the null. Note that null mutants with the neural expression of ME-13a and ME-13b derived isoforms survive through adult stage but fail to produce progeny. doi:10.1371/journal.pgen.1000441.g004

Nucleotide sequence conservation of arthropod N-Cadherin
In addition to serving diverse protein functions, regulated tissuespecific and developmental stage-specific expressions of splicevariants have been shown to be vital to their functions. If the ME-containing transcripts of N-Cadherin were to receive distinct tissue-specific splicing regulation at the nucleotide level, then the MEs themselves as well as adjacent common exons would exhibit a higher degree of evolutionary conservation due to the presence of cis-acting regulatory elements [22,[25][26][27]. In order to distinguish nucleotide conservation because of functional constraints on nucleotide sequences from that on the protein-coding sequences, we first conducted an analysis based on the relative frequencies of nonsynonymous and synonymous (silent) mRNA mutations. We limited our pool of genomes to closely related species within the Drosophila genus that are less than 25 million years apart to avoid potential skewing of data by multiple independent silent mutations at a given locus ( Figure 5B) [45]. This analysis shows that nonsynonymous mutations are close to or equal to zero among orthologs of Drosophila N-Cadherin. In addition, it reveals an extraordinarily low synonymous mutation rate in and near the MEs of N-Cadherin, suggesting the presence of conserved cisregulating elements. The plummet of synonymous mutation rate is more apparent in ME-7s and ME-13s than in ME-18s. This finding further implies that their splicing might be regulated in a similar fashion in Drosophila and Tribolium ( Figure 5A and Figure  S1). Second, we also noted that in some cases the nucleotide sequences of orthologous exons cluster tighter than their amino acid sequences, in particular, ME-13b, and to a lesser extent also ME-18b (compare Figure 5C to Figure 3B). This implies relatively high selective pressures on the nucleotide sequences within these MEs in arthropod genomes. Thus, our in silico analyses suggested evolutionary conservation of diversified splicing regulatory regions for the N-Cadherin gene, especially for the ME-13s.

Conserved expression patterns of arthropod N-Cadherin splice variants
Highly conserved sequences in and around the MEs further suggest conserved spatiotemporal regulation of alternative splicing of arthropod N-Cadherin splice-variants during evolution, which we examined further by conducting the following in vivo tests. First, we used real-time PCR to quantify the dynamic temporal regulation of the MEs during embryogenesis of Drosophila and Tribolium ( Figure 6A). We found that, for each of the three sets of MEs, a switch of predominant ME occurs similarly in the two insects that are separated by 290 million years. The timing of switches between different ME pairs is not precisely parallel (compare, for example, ME-7s and ME-18s), suggesting separate regulation. Second, we examined in situ spatiotemporal expression patterns of the three pairs of MEs in N-Cadherin of Drosophila and Tribolium. Our results revealed expression of ME-7a, ME-7b, ME-18a and ME-18b in both the embryonic CNS and the early mesoderm ( Figure 6B). However, ME-13a is detected only in the CNS, while ME-13b is only expressed in the early mesoderm. Furthermore, the non-neuronal expression of ME-13b drops sharply before synapses begin to form in the embryos ( Figure 6A and 6B, triangles). Third, we raised an isoform-specific antibody against Drosophila ME-13b to determine whether the same spatiotemporal regulations of ME-13s are reflected at the protein level ( Figure 6C). The antibody detects isoforms containing ME-13b only in the early mesoderm, further confirming distinct expression patterns for the isoforms containing ME-13a and ME-13b. The labeling disappears soon after the mRNA becomes undetectable, indicating rapid degradation of the protein in the mesoderm. As a result, while the nervous system maintains ME-13a-containing ''neural'' N-Cadherin isoforms, there is little N-Cadherin protein in the muscles of late-stage embryos or earlystage larvae. Taken together, our in vivo data showed conserved spatiotemporal expression patterns between orthologous MEs, supporting the model of N-Cadherin isoforms receiving distinct and evolutionarily conserved expression regulation.
We reasoned that if maintaining the spatiotemporal regulation of N-Cadherin splicing were essential, then deliberately deviating from the endogenous pattern of N-Cadherin expression would cause detrimental effects to the survival of an organism. Using Drosophila, we ectopically expressed the cDNA of a N-Cadherin isoform containing ME-13b (7b-13b-18a) in muscles of wildtype Drosophila ( Figure 7A) at standard temperature (25uC). Devoid of introns, the transgene is free from endogenous splicing regulation and can be expressed in the mesoderm beyond the point at which endogenous N-Cadherin expression ceases, thus allowing the prolonged expression of these splice variants. This results in no change in the survival rate during the embryogenesis but a robust (95.5%) lethality during larval stages. The larvae exhibit reduced locomotion and remain small in size prior to their death ( Figure 7B) but display no apparent abnormality in the muscle morphology (data not shown). Interestingly, despite its considerable amino acid divergence, the cDNA of another N-Cadherin isoform containing ME-13a (7b-13a-18a) proves to be an equally potent agent of lethality (100% lethality). The lethal stage caused by temporal misexpression of splice variants is distinct from that caused by genetic deletion of the N-Cadherin gene, which occurs during late embryonic stages and with neuronal pathfinding defects [36]. Since deletion of the N-Cadherin also causes neural pathfinding defects, we ectopically express N-Cadherin cDNA containing either ME-13a (7b-13a-18a) or -13b (7b-13b-18a) continuously in the CNS, where endogenous ME-13a-containing isoforms are expressed. We found this induces no abnormality throughout both the embryonic and larval stages ( Figure 7A). To distinguish whether the elevated levels or the extended temporal expression of DN-Cadherin in muscles is the main cause of the lethality phenotype in larvae, we reduced the level of exogenous N-Cadherin expression by raising the animals at a lower temperature (18uC). Total amount of N-Cadherin (both endogenous and exogenous) 7-9 hours after egg-laying was quantified by western blot (data no shown). At 25uC, the exogenous expression levels of 13a-and 13b-containing isoforms are 14 and 9 folds of the endogenous N-Cadherin level, respectively. At 18uC, it dropped to approximately 3 folds of the endogenous N-Cadherin level. Despite the 3-5 folds drop of exogenous expression of 13a-and 13b-containing isoforms at 18uC, the lethality of larvae remains the same as that at 25uC ( Figure 7A). Thus, down-regulation of N-Cadherin expression in muscle cells beyond 12 hours after egglaying is essential to the normal development of Drosophila embryos.
In this report, we discover that the expression of N-Cadherin splice variants is spatiotemporally regulated during development. At embryonic stage in vivo, despite that N-Cadherin contains the ''neural'' and ''mesodermal'' splice-variant isoforms, the protein products of these splice variants are functionally interchangeable. However, prolonged expression of N-Cadherin splice isoforms after embryonic stage in the muscle causes lethality whereas the prolonged expression in neurons causes no adverse effect. These genetic manipulations in Drosophila offer in vivo evidence to support the significance of the spatiotemporal regulation of expression of N-Cadherin splice-variant isoforms. In addition, the three sets of MEs of arthropod N-Cadherin and the spatiotemporal regulation of the spliced variants are conserved over 400 million years. Therefore, we hypothesize that alternative splicing of N-Cadherin is critical for arthropod embryonic development and it provides the complexity required for developmental regulation.

Alternative splicing of N-Cadherin is conserved in arthropod genomes
Metazoan development requires the collaboration of many morphologically and functionally distinct cell types, resulting from differential gene expression patterns. A separate set of mRNA molecules in each individual cell is generated through regulation at several different levels, including transcription and splicing. Combinatorial expression of transcription factors has been shown to dictate cell fates, while various RNA-binding proteins could generate multiple splice-variant isoforms from a single species of pre-mRNA molecules [10].
Drosophila N-Cadherin belongs to the classic Cadherin family. Through inspection of several recently released arthropod genomes, we showed that the genomic structure of three pairs of mutually-exclusive alternatively-spliced exons, or MEs, of N-Cadherin has been conserved in the arthropod lineage for more than 400 million years. The only exception is the recent loss of ME-13b and ME-18b after the divergence of Apis mellifera from other insects. MEs of the arthropod N-Cadherin gene encode part of Exon numbers are shown on the top of the graph. The heat bar graphs are presented in a log scale with low silent mutation rates in blue and high ones in red. All mRNA sequences containing the same alternatively spliced exons were aligned and sliding window analysis of synonymous mutations was performed with Swaap 1.0.2 program basing on Li's method [46], with a window size of 90 nucleotides and a step size of 18 nucleotides. In ME-7s, ME-7a and, to a lesser extent, ME-7b exhibit unusually low rates. In ME-13s, ME-13b, as well as exon-12 and exon-14 that surround it, show extremely low rates. In ME-18s, ME-18a, ME-18b and/or exon-19 are conserved at the nucleotide level. See Figure S1   Alternatively spliced N-Cadherin proteins are functionally redundant Alternative splicing has been considered to be an important means to generate diverse protein products from a single gene, thus expanding the proteome from limited amount of genetic material. Using RT-PCR, we confirmed the endogenous expression of all MEs in Drosophila melanogaster and Tribolium castaneum embryos. The small number of splice-variants in arthropod N-Cadherin gene has offered an opportunity to experimentally evaluate whether resulting isoforms adopt distinct protein functions to expand protein diversity and/or receive separate spatiotemporal controls leading to increased complexity in expression patterns.
Drosophila N-Cadherin isoforms have been shown to exhibit differential homophilic binding affinity when expressed in Drosophila S2 cells [43]. However, a number of studies have demonstrated that re-supplying a single N-Cadherin isoform could cell-autonomously rescue morphological defects of N-Cadherin-deficient neurons in adult Drosophila brains [34,38,39,41,44]. Consistent with this, our own results showed that transmembrane-domain splice-variants (ME-18s) have no targeting preference to different subcellular compartments of neurons. We also showed that one pair of extracellular Cadherin domain isoforms (ME-13s), despite their distinct tissue-specific expression patterns (ME-13a being neuronal and ME-13b being mesodermal), show no differences in their ability to cellautonomously partially rescue embryonic lethality caused by genetic loss of N-Cadherin gene ( Figure 4B). Thus, results from in vivo tests in Drosophila do not support the idea that N-Cadherin splice-variants provide significant functional diversity as proteins during development.

Alternatively spliced N-Cadherin variants are expressed in a complex and essential pattern
Our in silico analysis of mRNA sequences of N-Cadherin splicevariants revealed extremely low synonymous mutation rates at MEs and/or flanking constitutive exons ( Figure 5A) and a tight clustering of nucleotide sequences ( Figure 5C). These results imply a heightened conservative selective pressure on the local Figure 7. Detrimental effects of overriding the endogenous spatiotemporal control. (A) Over-expression of ME-13a-and ME-13bcontaining isoforms in neural (elav'-Gal4) or mesodermal (Gal4 24B ) tissues at 25uC and 18uC. Ectopic expression of either isoform in the nervous system or mesoderm (box) has no significant effect during embryogenesis (one way ANOVA, p = 0.39 at 25uC and p = 0.03 at 18uC). However, expression of either isoform in mesoderm beyond the onset of synaptogenesis (triangle) caused lethality in larvae (two-tailed t-test, p,0.001 for either isoform at both temperature). In contrast, expression of either isoform in the nervous system had no significant effects on the survival either in embryonic or larval stage (two-tailed t-test, p.0.001 for either isoform at both temperature). Asterisk indicates a significant reduction in viability as compared to wildtype control at p,0.001 with two-tailed t-test. (B) Over-expression of either isoform in mesoderm causes muscle paralysis in larvae. Scale bar 500 mm. doi:10.1371/journal.pgen.1000441.g007 views) embryos. ME-13a and ME-13b are used in '''neural'' and ''mesodermal'' splice-variants, respectively. Tribolium embryos at 60:00 are filletdissected (dotted line). (C) Immunocytochemistry of ME-13b derived isoforms (green) in Drosophila embryos, counterstained with antibody that labels all N-Cadherin (magenta). The isoform-specific antibody shows non-specific binding of trachea. Scale bars 100 mm. doi:10.1371/journal.pgen.1000441.g006 nucleotide sequences. Another independent line of evidence comes from our in vivo observation of highly conserved endogenous ME expression patterns. Drosophila and Tribolium diverged from each other 290 million years ago. They undergo different modes of embryogenesis (i.e., short germ-band vs. long germ-band, respectively), and for different durations (i.e., 24 hours vs. 96 hours, respectively). Nevertheless, the overall spatiotemporal patterns in usage of individual MEs in these two distant insect species are extremely similar ( Figure 6A and 6B).
Using Drosophila, we designed in vivo experiments to examine the significance of the evolutionarily conserved spatiotemporal regulations of the alternative isoforms. We found that when N-Cadherin protein expression in muscles is abnormally prolonged only by several hours, the fitness of the organism is in serious jeopardy, regardless of the isoform expressed (Figure 7). Deliberate deviation from the normal spatiotemporal restriction on N-Cadherin expression leads to a robust lethality. This occurs when the molecule is delivered to the right tissue (i.e., the muscles), but at the wrong time (i.e., after synaptogenesis). On the other hand, as long as it is expressed within the precise spatiotemporal constraint, over 50% amino acid divergence within a functional domain of N-Cadherin proteins can be tolerated even at levels comparable to the endogenous expression. These isoforms may be functionally distinct, however, their significant aspects are not yet obvious and are difficult to test. Our results support the idea that the regulated expression of arthropod N-Cadherin MEs generates a complex but essential pattern of spatiotemporal expression of alternative isoforms. In conclusion, our in silico and in vivo analyses of the arthropod N-Cadherin presents an example in which well conserved alternative splicing increases the spatiotemporal expression complexity essential for metazoan development.

Analysis of genomic sequences
The intron sequences of the Drosophila N-Cadherin gene (CadN, Gene Bank AB002397) were subjected to BLASTX (NCBI) search of the Drosophila melanogaster protein database to identify potential alternatively spliced exons. The possible open reading frames/ exons within these introns were further analyzed for the presence of proper splicing donor and acceptor sites, and for the maintenance of correct phase and orientation. Alternatively spliced exons occur as three modules in the structure-coding sequence. Previously, Iwai et al. had isolated two cDNA isoforms that contain, respectively, the 7a-13a-18a (CG7100-PE) and 7b-13a-18a (CG7100-PD) combinations, and have further characterized the latter. A 12-nucleotide long micro exon-7a' is also predicted in all insect genomes examined in this study (data not shown).

Silent mutation rate
N-Cadherin mRNA sequences from Drosophila sechellia, Drosophila yakuba, Drosophila erecta and Drosophila ananassae were selected to conduct silent (synonymous) mutation analysis because of their relatively short evolutionary distance from Drosophila melanogaster. Drosophila simulans was rejected from the analysis because of a sequencing gap within the coding region of N-Cadherin while Drosophila ananassae was rejected due to saturation of the synonymous sites. All transcripts containing the same alternatively spliced exons were aligned and sliding window analysis of the synonymous mutation rate [40] was done with Swaap 1.0.2 program [46], with a window size of 90 nucleotides and a step size of 18 nucleotides. Nonsynonymous mutations were extremely rare among the Drosophila species. Therefore, the nonsynonymous mutation rate (Ka) is equal to zero in most of the windows. Because of that, instead of Ka/Ks, we plotted the Ks on a logarithmic scale.

Phylogenetic trees
The phylogenetic trees of MEs in the arthropod genomes were constructed from their amino acid sequences ( Figure 3B) and nucleotide sequences ( Figure 5C) using Clustal W and DRAW-TREE (PHYLIP unrooted phylogenetic tree) with Biology workbench (http://seqtool.sdsc.edu).

Fly stock
Drosophila melanogaster cultures were kept in standard media at 25uC. Embryogenesis takes 24 hours at room temperature. For RT-PCR reactions, the w 1118 strain flies were placed in a cage and allowed to lay eggs on grape juice plates with supplemental yeast paste for certain egg-laying periods. The plates were placed at room temperature (25uC) before reaching specified hours after egg-laying. Embryos were rinsed with water and then frozen with dry ice. For larvae collection, wandering third instar larvae were collected, washed with PBS, and frozen with dry ice. The mutant analysis on N-Cadherin loss-of-function was based on Ncad 405 (source: L. Zipursky) and Ncad M19 (source: T. Uemura). The elav'-GAL4 and GAL4 24B drivers (source: C. Goodman) were used to drive UAS-Ncad 7b-13a-18a and UAS-Ncad 7b-13b-18b (source: C-H. Lee

Beetle stock
Tribolium castaneum cultures were kept in standard media at 29uC. Embryogenesis takes 96 hours. For quantitative real-time PCR, after an egg-laying period of 6 hours, the embryos were collected and then kept at 29uC before they reached specified hours after egg-laying. The embryos were dechorionated with 50% bleach that also removes flour particles covering them, frozen with dry ice and kept at 220uC until later processing.

RNA isolation and reverse transcription (PCR and nested PCR)
Total RNA was isolated using the RNeasy column (Qiagen). Reverse transcription reactions were primed using random hexamers. All PCRs were performed using specific primers and standard protocol or the Advantage cDNA PCR kit (BD Biosciences). The sequences (from 59 to 39)

RNA in situ hybridization
The Drosophila protocol was based on the 96-well hybridization protocol of BDGP (http://www.fruitfly.org/about/methods/ RNAinsitu.html) and the Tribolium protocol was modified by Yoshinori Tomoyasu from what has been described before [47]. In brief, individual exons were amplified by PCR and then cloned into pCRII-TOPO vectors or pCRIV-TOPO vectors (Invitrogen). RNA probes of specific exons were made following the digoxygenin RNA labelling Kit protocol (Roche Molecular Biochemicals). Drosophila RNA probes for exon-7a and exon-7b are about 300 nts long, which include the specific exons and the flanking exonic sequences. Drosophila RNA probes for exon-8, exon-13a, exon-13b, exon-18a and exon-18b contains only the designated exons, respectively. Tribolium RNA probes are all about 500 nts long, including the designated exons and the flanking exonic sequences. Embryos were collected for 24 hours at 25uC for Drosophila and 96 hours at 29uC for Tribolium, dechorionated, and then fixed with freshly made 4% paraformaldehyde in phosphate-buffered saline. After incubating with hybridization buffer, RNA probes were added to the embryos and allowed to hybridize overnight at 55uC. The signal was detected using antidigoxygenin alkaline-phosphatase (Roche Molecular Biochemicals) and then NBT/BCIP substrate.

Immunocytochemistry
GST protein fused with seven repeats of oligopeptide (LDEGMTNTPFT) was designed as the antigen to generate the antibody specific to Drosophila N-Cadherin derived from ME-13b. The strategy of DNA construction was performed as previously described [48] with minor modification. Briefly, we designed two oligonucleotides. Oligo A (59-cta gac gaa ggt atg act aac acg ccg ttc acg) encodes the target antigen, and oligo B (59-gtc ata cct tcg tct agc gtg aac ggc gtg tta) is partly complementary to oligo A. The template-repeated polymerase chain reaction (TR-PCR) method was applied to construct the DNA fragment encoding multiple copies of 13b antigen. To incorporate restriction sites for subcloning at both ends of the TR-PCR products (BamHI at the 59end and EcoRI at the 39-end) as well as a stop codon at the 39-end of the coding region, a second round of PCR (adapter PCR) with two adapter primers, primer A (59-gga tcc cta gac gaa ggt atg ac) and primer B (59-g aat tca aag ctt cgt gaa cgg cgt gtt) was performed. The DNA fragment encoding the seven repeats of antigen was subcloned into plasmid pGST-KG. The resulting plasmid was introduced into XL-10 Gold. The fusion protein was purified by glutathione-Sepharose 4B affinity chromatography. Rabbits were then immunized with the purified fusion protein and sera were collected.

Imaging
Fluorescent images were taken with Zeiss LSM 510 confocal microscope. The three-dimensional projections of Z-stack images were then constructed with Volocity (Improvision). Figure S1 Low synonymous mutation rates at MEs. Plots of synonymous mutation rates of N-Cadherin isoforms between Drosophila melanogaster and other Drosophila species. The Y-axis is the silent mutation rate [40] plotted on the logarithmic scale, while the X-axis is the full length mRNA of 7b-13a-18a (upper panel) or 7a-13b-18b (lower panel