Comparative Genomic Evidence for a Complete Nuclear Pore Complex in the Last Eukaryotic Common Ancestor

Background The Nuclear Pore Complex (NPC) facilitates molecular trafficking between nucleus and cytoplasm and is an integral feature of the eukaryote cell. It exhibits eight-fold rotational symmetry and is comprised of approximately 30 nucleoporins (Nups) in different stoichiometries. Nups are broadly conserved between yeast, vertebrates and plants, but few have been identified among other major eukaryotic groups. Methodology/Principal Findings We screened for Nups across 60 eukaryote genomes and report that 19 Nups (spanning all major protein subcomplexes) are found in all eukaryote supergroups represented in our study (Opisthokonts, Amoebozoa, Viridiplantae, Chromalveolates and Excavates). Based on parsimony, between 23 and 26 of 31 Nups can be placed in LECA. Notably, they include central components of the anchoring system (Ndc1 and Gp210) indicating that the anchoring system did not evolve by convergence, as has previously been suggested. These results significantly extend earlier results and, importantly, unambiguously place a fully-fledged NPC in LECA. We also test the proposal that transmembrane Pom proteins in vertebrates and yeasts may account for their variant forms of mitosis (open mitoses in vertebrates, closed among yeasts). The distribution of homologues of vertebrate Pom121 and yeast Pom152 is not consistent with this suggestion, but the distribution of fungal Pom34 fits a scenario wherein it was integral to the evolution of closed mitosis in ascomycetes. We also report an updated screen for vesicle coating complexes, which share a common evolutionary origin with Nups, and can be traced back to LECA. Surprisingly, we find only three supergroup-level differences (one gain and two losses) between the constituents of COPI, COPII and Clathrin complexes. Conclusions/Significance Our results indicate that all major protein subcomplexes in the Nuclear Pore Complex are traceable to the Last Eukaryotic Common Ancestor (LECA). In contrast to previous screens, we demonstrate that our conclusions hold regardless of the position of the root of the eukaryote tree.


Introduction
Nuclear pore complexes (NPCs) mediate molecular trafficking between nucleus and cytoplasm [1,2]. They are composed of ,30 different proteins, called nucleoporins (Nups), that are present in multiple copies in each pore [3,4,5]. Most Nups are constituents of specific sub-complexes, which form the major structural units of the pore: cytoplasmic fibrils, central core and the nuclear basket ( Figure 1a) [6,7].
The majority of Nups are conserved between mammals and yeasts [3,4,7,8] and previous genomic studies demonstrate extensive conservation of the NPC also in plants and eukaryotic algae [9,10,11].
The extent of conservation of NPC components outside these groups appears patchy however [10,11]. As Mans et al. [10] acknowledged, this makes it difficult to unambiguously establish the complexity of the NPC in the Last Eukaryotic Common Ancestor (LECA), since inferences are dependent upon the position of the root of the eukaryote tree. Bapteste et al. [11], reporting a comparable distribution of Nups to Mans et al., noted furthermore that proteins involved in anchoring the NPC to the nuclear envelope were limited in their distribution. On the basis of this observation, Bapteste et al. concluded that the NPC anchoring system appears to have evolved multiple times independently.
This conclusion is moreover interesting in light of the recent suggestion that the yeast-specific transmembrane Nups Pom152 and Pom34 may be intimately linked to the evolution of closed mitosis in yeast [12]. Closed mitosis is not restricted to yeasts, as it is also observed in a range of protists [13,14]. This raises the question as to whether the evolutionary lability of the anchoring system broadly correlates with the evolution of closed mitosis.
In the wider context of eukaryote origins, there is great value in the identification of Nup homologues in either archaea or bacteria, since this may shed light on the evolutionary origins of the nucleus. If Nups display similarity to proteins from either or both of these domains, the role of these proteins may provide new insights into the evolutionary emergence of key protein families or folds [10]. In this respect there has also been considerable interest in the nuclear envelope-like internal membranes observed in planctomycete bacteria [15,16], and whether the putative pores identified from morphological data are constructed from protein components with similarity to eukaryote Nups. To date, no homologs to Nups have been identified in the genome of any planctomycete.
An alternative hypothesis, in principle compatible with several theories on eukaryote origins, is that the nucleus evolved autogenously in the eukaryote stem lineage [17,18]. The protocoatomer hypothesis [18] in particular addresses the evolution of the NPC in detail. In brief, this model posits that the NPC and vesicle-coating complexes evolved from a rudimentary membrane-bending apparatus that generated internal structure through invagination. Devos et al. [18] reported that an NPC subcomplex (yeast Nup84/vertebrate Nup107-160) bears a striking resemblance to vesicle-coating complexes, both containing proteins with a unique b-propeller/a-solenoid architecture. Moreover, Sec13 is a component of both the NPC and the COPII vesicle-coating complex [19,20]. Mans et al. [10] also noted similarities between NPC and vesicle-coating complex components, coming to a similar conclusion on the basis of sequence analyses.
Rapid progress in eukaryote genome sequence projects provides an ideal opportunity to revisit these questions with the benefit of a more comprehensive dataset. We report here the results of a screen covering 60 eukaryote genomes (representing five supergroups) with the aim of examining the extent to which protein subcomplexes that comprise the NPC are conserved across eukaryotes. We have also examined whether coatomer proteins from the COPI, COPII and clathrin complexes are as broadly conserved as NPC complex proteins, since an early common origin for both the NPC and vesicle coating complexes predicts this. Our results provide further support for a complete NPC in LECA and, in contrast to earlier studies, we show that this conclusion holds regardless of the position of the eukaryote root. We conclude that at least 23 and possibly as many as 26 nucleoporins, including key components of the anchoring system, were already present in LECA. We also report that the distribution of Pom34, but not Pom152, correlates with the occurrence of closed mitosis among fungi. Despite extensive searches, our screen did not recover clear Nup homologs in either bacterial or archaeal genomes, consistent with the view that the nuclear pore complex evolved within the eukaryote stem, after the divergence of archaea and eukaryotes.

Results and Discussion
Establishing the accuracy of HMMer-based identification of Nucleoporins In silico gene annotation by sequence similarity is expected to be subject to a significant degree of error (and perhaps subjectivity), and in the current case is also complicated by the great evolutionary distances spanning the eukaryote tree. The recent publication of nucleoporins identified in Trypanosoma brucei using experimental proteomics and structure prediction approaches   [32], provided us with a fortuitous internal control by which to test the accuracy of our in silico screens for eukaryote Nup homologs. As we had already completed our screen of T. brucei when the DeGrasse et al. study [32] was published, we were able to use the identified Nups reported therein as a blind control, in the spirit of CASP and CAPRI community experiments to test ab initio 3D protein structure and protein-protein docking prediction methods (reviewed in [33,34]). Comparison of the candidates identified using our HMMer-based approach with the results reported by DeGrasse et al. is particularly useful in that T. brucei is an outgroup to all sequences included in our training dataset. Table 1  While absence of Seh1 sequence identities outside of the WDrepeat regions warrants caution, we were unable to identify any other candidate sequences with this same repeat architecture, suggesting this sequence may well be a Seh1 candidate, albeit a weak one. It is also worth noting that Seh1 is known not to be strongly associated with the Nup107-160 complex, which may explain its absence from proteomics data. DeGrasse et al. also identified an additional 13 proteins, seven of which carry FG repeats. It is to be expected that comparative approaches will tend to underestimate the components of any given complex, since the approach is dependent upon the starting dataset. Moreover, as FG-repeat proteins often carry no other distinguishing features, we deemed the presence of FG-repeats alone insufficient for assigning membership to the NPC, and such candidates were excluded from our study ( Table 1). From the perspective of the current study, the results indicate that the HMMer-based approach used here is conservative but accurate, as no incorrect assignments were made in our control screen of T. brucei.

Components from all NPC subcomplexes are present in LECA
The results of our full screen for Nups are summarised in Figure 1, with species-level detail given in   x report that homologs for 19 of 31 Nups are found in all five supergroups ( Fig. 1), significantly extending the findings of previous studies, which were based on the analysis of fewer genomes [10,11]. The broadest conservation is found in plants where we detect 26 candidates, suggesting that the core composition of nuclear pore complexes in green plants is highly similar to that seen among opisthokonts. The only genome within the Plantae for which no Nups were recovered is the nucleomorph genome of Hemiselmis andersenii, which derives from a red algal endosymbiont [35,36]. This result mirrors previous results indicating that the nucleomorph genomes of Guillardia theta and Bigelowiella natans are devoid of nucleoporin genes, suggesting that all nucleoporin genes are coded in the main nucleus instead [9]. That all available nucleomorph genomes lack obvious nucleoporin homologs suggests little hindrance to relocation or replacement of nucleoporin genes in these lineages.
In previous studies, the conclusion that the LECA possessed a NPC was complicated by the patchy distribution of some nucleoporins, with only 9 nucleoporins found in any of the supergroups other than Plantae and Opisthokonts [11] [10]. Consequently, the ability to assign a complex NPC to LECA differed depending upon the topology of the eukaryote tree; where Excavates were basal (see [37]), only 7 nucleoporins could be placed in LECA [10]. If the root was placed between unikonts and bikonts [38,39], 23 Nups could be traced back to LECA, largely on account of candidates identified in plants [10,11]. As shown in Table 3, our broad screen significantly expands the extent to which Nup homologues can be identified across the eukaryote tree. Our results increase the number of Nup candidates across all eukaryote supergroups where genome data are available (except Opisthokonts, where a full complement had already been characterised in advance of all three studies). Of particular note, we significantly expand the number of candidates in three eukaryote supergroups where genome sequence data is still limited (Amoebozoa, Chromalveolates and Excavates). For these supergroups our screen expands the total number of candidate Nups from fewer than ten in each supergroup to 22 in Amoebozoa 25 in Chromalveolates and 23 in Excavates ( Table 3).
The identification of so many new Nup candidates across Amoebozoa, Chromalveolates and Excavates is significant because it enables us to trace a complex NPC back to LECA regardless of ongoing uncertainty about the position of the root of the eukaryote tree ( Figure 2), thereby providing robust evidence for the early evolutionary origin of the NPC in the eukaryote lineage independently of tree topology. By contrast, previous studies could only unambiguously place a complex NPC in the common ancestor of Opisthokonts and the Plantae. Under the unikont/ bikont rooting (Figure 2, right tree), we can trace 26 nucleoporins back to LECA (Fig 2), with four gains in the Opisthokonts. Of these, three are clearly lineage-specific gains: Pom121 is restricted to vertebrates, while Pom34 and Pom152 are found only in fungi. Nup37 is found in metazoa and some ascomycetes, suggesting either that we have failed to find all orthologs, or that this Nup has been subject to a series of losses in the Opisthokonts -the recent identification of a Nup37 homolog in Aspergillus nidulans [40] confirms these ascomycete candidates are not spurious predictions.
It is likewise interesting that Amoebozoa appear from Figure 2 to have lost a number of Nups. However losses (as indicated on both trees in Figure 2) should be treated with caution in that it is difficult to distinguish between genuine loss and missing data. In this context, it will be interesting to analyse genome data from the anaerobic amoebozoan, Breviata anathema, which is proposed to represent a deep-branching member of this supergroup [41,42].
A cursory examination of Table 2 indicates that we have had only limited success in finding Nup candidates among some parasitic lineages, and observations supporting morphologically complex nuclear pores among excavates [32,43], underscore the necessarily conservative nature of comparative genomic analyses. That aside, the data nevertheless provide a clear indication that LECA possessed between 23 and 26 Nups. Given ongoing uncertainty concerning the structure of the eukaryote tree [42,44], we note that, assuming the genomes screened in the present study are correctly placed in the proposed five supergroups, a star tree would still suggest between 19 and 22 Nups in LECA (where 19 are found in at least one representative genome from each supergroup and 22 is the minimum number of Nups in any one supergroup - Figure 1). That all major subcomplexes are represented even in the most conservative estimate (19 Nups) suggests LECA possessed a NPC comparable in complexity to NPCs in modern eukaryotes.

Evidence for a rudimentary NPC anchoring system in LECA
While the NPC does not traverse the lipid bilayer of either the inner our outer nuclear membrane, several nucleoporins are involved in anchoring the NPC to the nuclear envelope (reviewed in [2,6]). Among characterised Nups involved in anchoring, Pom34 and Pom152 are thought to be restricted to fungi, whereas Pom121 and Gp210 are vertebrate-specific (reviewed in [40]). The apparent lack of overlap led to the suggestion that the anchoring system may either be restricted to opisthokonts, or that it evolved by convergence [11]. Ndc1, a known transmembrane Nup from yeast, has recently been demonstrated to be a constituent of a range of fungal and vertebrate NPCs [45,46,47], indicating that parts of the anchoring system evolved before the split of vertebrates and fungi.
Our results significantly extend this view ( Figure 1 & Table 2). We identify homologs for Gp210 across all five supergroups, with multiple candidates across Amoebozoa, Plants, Chromalveolates and Excavates. It therefore seems probable that the absence of Gp210 from Fungi, where constituent Nups have been extensively characterised [48], is the result of secondary loss. Identification of Ndc1 homologs is somewhat more restricted; it is readily detected in green algae and plants ( Table 2), but only a single candidate is detected among the Chromalveolates (Phytophthora infestans), likewise among Excavates (Trichomonas vaginalis), and we found no candidates among the Amoebozoa. As shown in Figure 2, the distribution of Ndc1 nevertheless suggests this Nup can be placed in LECA, under either rooting. Splitstree analyses showed the Ndc1 dataset was noisy; a simple distance-based tree (BioNJ, JTT, c, 100 bootstrap replicates) does not indicate recent horizontal gene transfer from either Plantae or Opisthokonts to either of these lineages (Supplementary Figure S1).
Bolstering the suggestion that LECA possessed an anchoring system is the broad distribution of Nup35 (known as Nup53 in yeast and some vertebrates [6]), which is also conserved across all five supergroups. Nup35 is integral to NPC assembly [49,50], it interacts directly with Ndc1 [47,49] and may also contribute to anchoring of the NPC to the nuclear envelope via an amphipathic a-helix [51]. We therefore suggest that Gp210 and Ndc1, possibly with the inclusion of Nup35, constitute the ancestral anchoring system in LECA.

Does the distribution of integral membrane Nups shed any light on the evolution of variant mitoses?
In stark contrast to the results for Gp210, Ndc1 and Nup35, the other integral membrane Nups (Pom34, Pom121 & Pom152) display a more limited distribution (Table 2). It has been noted that the nonoverlapping distribution of these three transmembrane Nups correlates with open mitoses in vertebrates (Pom121) and closed mitoses in yeasts (Pom34 and Pom152) [12]. In closed mitosis, the nuclear envelope remains intact during cell division, whereas in open mitosis, the nuclear envelope disintegrates, and envelope and NPC must be reassembled following division [52], though there appear to be many variations therein [13,53]. Stunningly, experimental studies have demonstrated partial disassembly of the NPC during so-called 'closed' mitosis in Aspergillus nidulans [54]. However, Pom152 remains associated with the nuclear envelope. In a Saccharomyces cerevisiae pom34DN nup188D double mutant, Miao et al. [12] observed disassembly of some of the same FG repeat-containing Nups as were disassociated during closed mitosis in A. nidulans, raising the possibility that both may be central to (partial) pore maintenance during closed mitoses.
While a degree of caution is warranted concerning the open/ closed mitosis dichotomy [53], particularly among the Fungi (but also in early development in Drosophila and Caenorhabditis species,   [11]. N.B. This table aims simply to show that our updated screen now enables a more confident assignment of a complete NPC to the LECA than was possible based on two earlier studies. We have not performed a systematic comparison of the different methods applied across the three studies. This table therefore does not directly compare an HMM-based approach with PSI-BLAST or blast with ancestral sequence reconstruction (such comparisons exist, e.g. [100]). Moreover, the present study screened additional genome sequences unavailable at the time the other studies were performed. As is evident from where early embryonic nuclei divide in a syncytium), our data do shed some light on the correlations noted by Miao et al. [12].
In the case of animals, it seems that Pom121 is restricted to vertebrates ( Table 2): we find no homologs of Pom121 in diptera, tunicate or nematode genomes analysed, nor do we find a candidate in Monosiga brevicollis, a Choanoflagellate (sister group to metazoa - [55]; all of these groups undergo open mitoses [13,56]). On these data, it seems difficult to assign a general role for Pom121 in open mitosis, though a specific role in this process in vertebrates is of course plausible [57].
A more informative picture emerges across the fungal genomes however. We note that Ascomycetes as a group are characterised by closed mitoses [13], whereas among Basidiomycetes no cases of closed mitosis have been reported, and open mitoses are wellcharacterised in a number of species (reviewed in [58]).
Our initial analyses (  [58,59]. To further examine this pattern, we screened four additional Basidiomycete genomes (Phanerochaete chrysosporium, Laccaria bicolour, Coprinossis cinea & Malassesezia globosa) as well as that of the zygomycete Rhizopus oryza, which is thought to likewise undergo open mitosis [58]. As is clear from Table 4, all fungal genomes screened carry both Ndc1 and Pom152 homologs, but Pom34 is restricted to Ascomycetes. Given the broad phylogenetic distribution of ascomycete species included in our analysis [60], it seems reasonable to conclude that Pom34 was present in the ancestor of this group, but not in that of Basidiomycetes as suggested by the complete absence of Pom34 homologs among those fungi.
This result suggests that Pom34, but not Pom152, is central to this distinction, at least within dikaryote fungi. We failed to find evidence of either Pom34 or Pom152 in the microsporidian Encephalitozoon cuniculi, which undergoes closed mitosis [61], indicating that if Pom34 is integral to the evolution of closed mitosis in Fungi, this may only be limited to Ascomycetes. Having said that, only seven Nups were detected in E. cuniculi, and the combination of reductive adaptation to a parasitic lifestyle and rapid sequence-level evolution for some genes [62] may complicate homolog detection in this lineage. In this respect, it does seem that at least part of the anchoring system may well have evolved multiple times [11]. In that there appears to be a spectrum between open and closed forms of mitosis [53], and given that open and closed mitoses likely have a complex evolutionary history [13] [63], experimental screens may well yield a broader diversity of pore membrane (POM) proteins than hitherto recognised.

Complete coatomer complex components are traceable to LECA
The observation that Nups and coatomer proteins share a common architecture [18,64] has led to the proposal that these also share a common evolutionary origin. This protocoatomer hypothesis [18] is supported by the observation that vesicle coat proteins are well conserved across eukaryotes [65,66,67,68] and have expanded via duplication and divergence [67,69,70]. Vesicle coat complexes are involved in movement of cargo between the various organelles that constitute the endomembrane system, and are one part of this evolutionarily conserved system that also includes the evolutionarily ancient but distinct ESCRT system [67,71].
While previous analyses leave little doubt that the COPI, COPII, clathrin/adaptin complexes, are a feature of LECA, less focus has been placed on patterns of conservation at the level of individual components. We therefore screened for individual protein subunits from each complex across a representative dataset spanning five supergroups. In contrast to the overall pattern of conservation of the NPC, the COPI, II and clathrin/AP complexes were extremely well conserved and orthology predictions were assessable using phylogenies (see Supplementary file SI4). At the level of supergroups there are only four discernible differences (Table 5; accession numbers are in supplementary  Table S2). Apm2, a clathrin adaptor protein medium (m)-chain protein homolog, appears restricted to Saccharomycetes, and can be readily attributed to gene duplication (supplementary Figure  S2). However, it remains unclear whether Apm2 is a bona fide component of Clathrin complexes. Data to date indicate no discernible phenotype in yeast knockouts [72], it has not been ascribed to any AP complexes in yeast [73,74], and interaction with Apl2p (a constituent of the AP-1 clathrin adaptor complex) is only clearly observed when Apm2p is overexpressed [75].
Vertebrate Apl1 has likewise clearly evolved via duplication from the more broadly distributed Apl2 (supplementary Figure  S3). Fungi also contain both Apl1 and Apl2, but these form distinct phylogenetic clans ( Figure S3), suggesting fungal Apl1 and Apl2 are paralogues that did not evolve via duplication in an early fungal lineage. Non-Ophisthokont Apl2 sequences appear to form two separate clans in the unrooted tree inconsistent with eukaryote supergroups, suggesting that Apl2 and fungal Apl1 have evolved via a complex pattern of ancient duplications and losses. The trees are not sufficiently robust to establish all events with confidence, but a robust minimal conclusion is that vertebrate and fungal Apl1 have separate evolutionary origins.
We find only two other instances where an entire supergroup lacks a component; both impact COPII: the two amoebozoa represented here (Entamoeba histolytica and Dictyostelium discoideum) lack Sec16, a COPII constitutent, but in contrast to previous analyses [65] we do find candidates for all other COPII components in this group. The other supergroup-level absence is Sfb3, for which no homologs were recovered from either Excavates or Chromalveolates. In S. cerevisiae, Sfb3 is involved in vesicle budding and transport of cargo from the ER but not vesicle fusion with the Golgi body. Its function can be compensated for at lower temperatures by Sec24, with which it is homologous [76]. We identified Sec24 homologs in all Excavate and Chromalveolate genomes we screened, so in a scenario where Excavates and Chromalveolates represent the deepest branches of the eukaryote tree (as per Figure 2, left hand tree), the only innovation since LECA would be a single gain of a duplicate gene in the lineage leading to Plantae, Amoebozoa and Opisthokonts. Under the Unikont/Bikont rooting (cf Figure 2, right hand tree), this 'innovation' vanishes and is instead two losses. That such extreme conservation of components exists at the supergroup level is stunning.

WD-repeats are present in Bacteria and Archaea
Previous analyses report the presence of weak homologs to NPC components in bacteria and archaea, though no published data point to nuclear pore complex constituents in the genomes of either domain [10,11]. Supplementary Table S3 summarizes the results of our HMMer-based screen as applied to 49 bacterial and archaeal genomes. We found numerous hits in both archaea and bacteria to WD-repeat containing proteins. WD-repeat proteins possess a characteristic b-propellor fold [77] and are important for protein binding as they can form reversible complexes with several proteins, allowing coordination of sequential and/or simultaneous interactions that involve several sets of proteins at the same time. They comprise a large family involved in a variety of essential biological functions such as signal transduction, transcription regulation and apoptosis [78]. While to our knowledge no WDrepeat proteins have been characterized in Archaea, a small number have been characterized in bacteria, including AglU, which is required for gliding motility and development of spores in Myxococcus xanthus [79], and the Hat protein from Synechocystis sp. PCC6803, required for control of high affinity transport of inorganic carbon [80].
While we detect proteins with similarity to WD-repeat containing Nups (including in planctomycete genomes), sequence similarity is restricted to the WD-repeat regions alone; characteristic motifs that enable Nup identification (such as the SIEGRmotif in Rae1) are absent. That WD-domains are consistently identified in genomic screens of all three domains supports the view that these are extremely ancient [10,11,77], but WD-repeat containing nucleoporins, like other Nups, appear to be a eukaryote-specific innovation.
However, the immense gap between eukaryote Nucleoporins and the limited detection of related components in either bacterial or archaeal genomes leaves us no closer to establishing how these structures evolved. Mans et al. aptly referred to this as an 'event horizon' [10,97], and we note that while the availability of additional eukaryote genomes is leading to a successively clearer Nucleoporin Ndc1  Table 5. Distribution of candidate coatomer complex components across eukaryotes*.
Clathrins & clathrin-associated adapter complex components chc1 x x x x x *Abbreviations as per  Table 5. Cont.
picture of the nature of the LECA, screens of bacteria and archaea are not narrowing this gap. Structural screens and experimental characterisation are generating important new functional data, such as with the recent characterisation of structural proteins resembling eukaryote membrane-coat proteins in Gemmata obscuriglobus [98,99]. However it is difficult to place such data within the context of eukaryote stem evolution as multiple interpretations are possible.
In the current case, the emerging picture is of an extremely well conserved set of vesicle-coating complexes across eukaryotes, with a similar conclusion possible for the NPC. As all these complexes are traceable to the eukaryote root, it is not formally possible to fully evaluate the protocoatomer hypothesis [18] using comparative genomic data. While some have advocated gene phylogenies [83], Nups show low levels of sequence conservation, complicating attempts to examine the deep phylogeny of the related components of vesicle coats and the NPC. Having said that, the predictive power of the protocoatomer hypothesis is clear: a prediction of this hypothesis is that, if the NPC dates back to LECA, then so should at least one set of vesicle-coating complex components. We can uncontroversially assign the entire set of coatomer complex components from COPI, COPII and clathrincontaining complexes to LECA.
Comparative genomic studies have the power to generate a broad overview of evolutionary conservation, and are in this respect helpful tools in understanding the evolution of cellular structures. Such studies can therefore provide a valuable starting point for focused investigation of the cell biology of a specific species. At the same time, they are dependent upon experimental observation, but can also suggest fruitful avenues for subsequent experimental study. Further investigation of the evolution of variant mitoses (broadly classified as open and closed) may well be worthwhile within the context of the evolution of the nuclear pore complex.

Materials and Methods
Nup sequences were collected, aligned and alignments vetted as previously described [9]. As conservation between fungi and metazoan sequences was in some cases poor, separate fungal and metazoan alignments were created where necessary. Alignments were used to build local and global hmm profiles using HMMER 2.3.2 (http://hmmer.wustl.edu/) [21]. Species from which training data were derived are given in Table 2. Using hmmsearch from the HMMER package, annotated protein sequences derived from eukaryote genomes (given in Table 2) were screened for nucleoporin homologs.
Candidate Nup homologs were assessed using domain information in UniProt (http://www.uniprot.org/) and PFAM (http:// pfam.sanger.ac.uk) [22], as well as our examination of all alignments. Sequences lacking typical motifs/domains associated with a given Nup were removed from the analysis. All remaining candidate Nup sequences were back-blasted (blastp) against the non-redundant database (NCBI). Candidates that returned best hits against other proteins were removed.
For any given eukaryote genome, where no homologs were detected for a particular Nup, the genome was screened using Nups from closely related species using blastp and tblastn.
Sequences for the individual components of the COPI, COPII and Clathrin coatomer complexes in S. cerevisiae were retrieved from the SGD database (http://www.yeastgenome.org) using the respective vesicle coat names as query. Sequences were used to seed initial PSI blast searches [24] against the nr protein database at NCBI. Sequences were evaluated by means of reciprocal blastp searches, as above. Alignments from the obtained sequences were generated using probcons [25] and profile hmms were created from alignments for local and global hmm profile searches. All profiles were calibrated to increase search sensitivity. Sequences obtained were evaluated as described above for Nups.
As an aid in assigning orthology, phylogenetic networks (NeighborNet [26]) were built for NPC and coatomer components using SplitsTree [27,28]. Phylogenetic trees were constructed using raxML 7.2.2 [29] and BioNJ [30,31]. Phylogenies were reliable for coatomer components but not across Nups. Full Nup alignments (in clustal format) and coatomer trees (in splitstree format) are provided as supplementary material (supplementary File S1).     Figure S3 Unrooted PhyML tree of Apl1 and Apl2. Vertebrate Apl1 (blue) and Apl2 evolved via gene duplication. Apl1 from fungi (dark blue) appear paralogous to vertebrate Apl1, and the results do not support evolution by duplication and divergence from fungal Apl2. The tree was generated from protein sequence alignments using the phylogeny.fr server (Dereeper A, et al. 2008 Nucleic Acids Res. 36:W465-9). Branch support (approximate likelihood ratio test: SH-like). Similar topologies were obtained with both ML and neighbor-joining methods, and with a range of parameters and models.