Evolution of the karyopherin-β family of nucleocytoplasmic transport factors; ancient origins and continued specialization.

BACKGROUND
Macromolecular transport across the nuclear envelope (NE) is achieved through nuclear pore complexes (NPCs) and requires karyopherin-βs (KAP-βs), a family of soluble receptors, for recognition of embedded transport signals within cargo. We recently demonstrated, through proteomic analysis of trypanosomes, that NPC architecture is likely highly conserved across the Eukaryota, which in turn suggests conservation of the transport mechanisms. To determine if KAP-β diversity was similarly established early in eukaryotic evolution or if it was subsequently layered onto a conserved NPC, we chose to identify KAP-β sequences in a diverse range of eukaryotes and to investigate their evolutionary history.


RESULTS
Thirty six predicted proteomes were scanned for candidate KAP-β family members. These resulting sequences were resolved into fifteen KAP-β subfamilies which, due to broad supergroup representation, were most likely represented in the last eukaryotic common ancestor (LECA). Candidate members of each KAP-β subfamily were found in all eukaryotic supergroups, except XPO6, which is absent from Archaeplastida. Phylogenetic reconstruction revealed the likely evolutionary relationships between these different subfamilies. Many species contain more than one representative of each KAP-β subfamily; many duplications are apparently taxon-specific but others result from duplications occurring earlier in eukaryotic history.


CONCLUSIONS
At least fifteen KAP-β subfamilies were established early in eukaryote evolution and likely before the LECA. In addition we identified expansions at multiple stages within eukaryote evolution, including a multicellular plant-specific KAP-β, together with frequent secondary losses. Taken with evidence for early establishment of NPC architecture, these data demonstrate that multiple pathways for nucleocytoplasmic transport were established prior to the radiation of modern eukaryotes but that selective pressure continues to sculpt the KAP-β family.


Introduction
The major defining feature of eukaryotic cells is the presence of a nucleus, the organelle that sequesters the genetic material away from the cytoplasm. This fundamental cellular architectural modification serves to compartmentalise transcription and translation and likely permitted the evolution of more complex mechanisms for regulating gene expression [1]. Most eukaryotic cells possess additional membrane-bound organelles responsible for secretory and endocytic pathways that almost certainly have endogenous origins; collectively these are referred to as the endomembrane system. Compelling evidence suggests that these structures populated the last eukaryotic common ancestor (LECA) prior to the radiation of modern lineages [2]. It has recently become recognised that there are deep evolutionary relationships between the proteins that deform endomembrane compartments and those serving the nucleus [3] [4].
Trafficking of macromolecules across the nuclear envelope (NE) occurs exclusively through the nuclear pore complex (NPC), a ,100 MDa cylindrical structure with octagonal symmetry, comprising coaxial rings and a central aqueous channel. Small, soluble molecules freely diffuse through the NPC but molecules over ,40 kDa are selectively transported via active mechanisms. Active transport of protein and RNA is mediated by the karyopherin (KAP) family of nuclear transport receptors and the Ras-like GTPase Ran. There is a small family of KAP-as, six in Homo sapiens and one in Saccharomyces cerevisiae, which recognise nuclear localisation signals (NLS) on cargo and bind to a member of the larger KAP-b family [5]. However, most transport is independent of KAP-a and mediated by direct recognition of the NLS or nuclear export signal (NES) by a KAP-b.
All functionally defined KAP-bs share a similar architecture and are extremely flexible [6], superhelical proteins composed of ,20 consecutive HEAT (for Huntingtin, elongation factor 3, protein phosphatase 2A, and yeast PI3-kinase TOR1) repeats, each of which is composed of a pair (A and B) of antiparallel a-helices [7]. The HEAT repeats stack with a minor clockwise twist, forming an inner cargo-binding concave surface of B helices and an outer convex surface formed from the A helices (reviewed in [8]). Overall sequence similarity across the KAP-b family is low, at about 15-20%, with the N-terminal portion of the KAP-b protein, which binds the small GTPase Ran, being the most conserved region [9].
Most yeast and mammalian KAP-bs are functionally classified as importins [10] or exportins [11], depending on the direction of transport they have been shown to mediate (Figure 1). Importin KAP-bs bind the cargo NLS directly or via an adaptor, e.g. KAPa [12]. At the NPC, the KAP-bNcargo complex interacts with phenylalanine-glycine repeat-containing nucleoporins (FG-NUPs) located at the NPC central channel [13]. Upon arrival in the nucleoplasm and association with RanGTP, the KAP-bNcargo complex dissociates and KAP-bNRanGTP returns to the cytoplasm, where GTP hydrolysis dissociates the KAP-bNRan complex. By contrast, exportin KAP-bs bind RanGTP and NES-containing cargo and the complex translocates through the NPC to the cytosol. Ran levels in the nucleus are replenished by re-import of RanGDP in complex with the nuclear import factor Ntf2 [14]. Directionality is facilitated by the Ran GTP/GDP gradient across the NE (reviewed in [15] [16]). RanGEF is restricted to the nucleus and maintains a high nuclear RanGTP concentration, while RanGAP, localised to the cytoplasmic face of the NPC or in the cytosol, depending on the organism [17], maintains a low cytoplasmic RanGTP concentration.
Several models have been proposed to explain selective translocation through the NPC, including a high density of low affinity binding sites, partitioning based on hydrophobicity or gel-like states within the channel, reduction of dimensionality by KAP binding to the FG-NUPs and more formal gating systems [18] [19] [20][21] [22]. Recent work suggests that selectivity can arise from a balance between efficiency and speed of transport for each KAP-bNcargo complex [23]. While no consensus mechanism has emerged, FG-NUPs clearly have a major role as these disordered proteins selectively bind KAP-b complexes [24], concentrate them at the NPC and restrict passive diffusion [18]. KAP-bs themselves may also directly maintain selectivity by impeding passage of proteins that do not specifically bind FG-NUPs [25].
The KAP-b family transports an extremely broad range of molecules; tRNAs and rRNAs are carried from the nucleus to the cytoplasm while transcription factors, DNA-interacting and RNAprocessing proteins are imported to the nucleus. Several pathways, such as biosynthesis of ribosomes, require components to engage in multiple crossings of the nuclear envelope [26]. While many KAP-b cargoes are known (see [27] for recent review), the full range of molecules transported by individual KAP-bs is undefined; hence KAP-bs currently classed as importins may, with additional analysis, be found to function in export. The absence of a rigorous discrimination between export or import pathways and substrate specificity may arise from a rather complex hierarchy of binding affinities. For example in S. cerevisiae only four KAP-bs are essential [16] and many can be deleted in combination, indicating redundancy [28]. Also, some proteins including histones [29] are imported by several different KAP-b family members, again arguing for redundancy. By contrast, Kap123p in S. cerevisiae is the sole KAP-b involved in import of ribosomal proteins. Confusingly, Kap123p knockouts are viable [30], but interestingly ribosomal proteins are transported by Pse1p in Kap123p knockout cells, indicating that cargo can switch from one KAP-b to another. Further, KAP-a is highly specific, associating exclusively with KAP-b1. Thus a complex relationship between specificity and flexibility of cargo recognition governs KAP-b/cargo interactions, confounding attempts to uncover evolutionary relationships based on simple genetic, functional or specificity criteria. Interestingly a similar situation of apparent redundancy, using viability in S. cerevisiae in rich media as the assay, is found for FG-NUPs. A considerable level of knockout is possible before loss of viability [31]. However, retention of a similar number of FG-NUPs and conserved features across eukaryotes argues that selective pressure has maintained the overall heterogeneity of FG-NUPs [32].
Structural analysis of the KAP-b member importin-b in complex with various cargo reveals that distinct molecules interact with different C-terminal sites [33][34] [35]. Thus KAP-bs likely possess multiple binding sites for recognition and transport of the wide range of cargo. Cargo-bound states also exhibit distinct conformations, illustrating the flexibility of the KAP-b structure, which may contribute to selection and binding of the repertoire of cargo molecules. This absence of a simple relationship between sequence, structure and binding specificity, coupled to the low level of sequence conservation between KAP-bs, makes determining the evolutionary origins and history of KAP-bs challenging.
An accurate KAP-b phylogeny will reveal evolutionary relationships between functionally similar members and uncover the events leading to functional diversification. Recent data suggests deep evolutionary connections between NPC and endomembrane transport components, while broad conservation Figure 1. Schematic illustrating the basic functions of karyopherin-betas (KAP-bs) in context. The nuclear envelope is punctuated by nuclear pores, within which sit the proteinaceous nuclear pore complexes. Transport is bidirectional via a central channel and is gated by an incompletely defined mechanism. KAP-bs participate in both import (blue panel) and export (pink panel), and are also known as importins and exportins respectively. However, many KAP-bs function in both modes and hence a clear designation between import and export is not apparent. Distinct cargo are imported and exported by formation of a complex in the origin compartment; this complex dissociates on reaching the destination compartment. The RanGTP/GDP gradient, which governs directionality of transport, is maintained by the localization of RanGEF to the nucleus and RanGAP to the cytosol. RanGDP is transported to the nucleus by its own import factor, Ntf2. doi:10.1371/journal.pone.0019308.g001 of many protein families required by the endomembrane system and the NPC suggests that much eukaryotic compartmentalisation predates the LECA [3][36] [13][32] [37]. In terms of nucleocytoplasmic transport, a simple KAP-b repertoire in the LECA would imply that much complexity in extant eukaryotes is lineage-specific while a conserved KAP-b repertoire across eukaryotes would suggest that nucleocytoplasmic system complexity was established in LECA.
All eukaryotes are thought to descend from one ancestor which gave rise to the six supergroups [38], known as Opisthokonta, Amoeboza, Archaeplastida, Excavata, Chromalveolata and Rhizaria. In a more recent classification, Chromalveolata and Rhizaria were proposed to be members of one supergroup 'SAR' (Stramenopiles+Alveolates+Rhizaria) [39]. Previous investigations of KAP evolution [40][41] [42][32] [43][5] were restricted to a limited range of taxa that was biased towards animals and yeasts, members of the Opisthokonta. Specifically Mason and coworkers reconstructed evolution of the KAP-a family, determining the presence of an ancient KAP-a1/KAP-a1-like subfamily with evidence for lineage-specific expansion into KAP-a2 and KAP-a3 forms in the Opisthokonta and further expansions and secondary losses in Metazoa [5]. These authors suggested that a system utilizing KAPa was likely the ancestral configuration, with KAP-a-independent pathways arising later. However, the analysis could not predict events prior to establishment of the Opisthokonta. In a broader study, Mans et al [41] suggested that while there were ,13 KAP-b subfamilies, only six or seven of these were identified within the alveolates and trypanosomatids, suggesting that much KAP-b evolution was lineage specific. We considered that re-evaluation of the KAP-b repertoire using a broader range of genomes together with iterative searches would result in more extensive KAP-b sampling with improved understanding of their origins and subsequent evolutionary history. Our findings are consistent with much KAP-b complexity being established by the time of the LECA. Significant expansion, lineage-specific innovation and secondary losses are also in evidence.

Karyopherin-b is represented by at least fifteen subfamilies
To examine sequence relationships within the KAP-b family across the eukaryotes, we performed a semi-automated search of a panel of predicted proteomes. Species selection was designed to cover the full range of eukaryotic diversity possible with current genomic sampling, thus revealing lineage-specific patterns of gene conservation and identifying lineage-specific expansions and losses.
Three rounds of reciprocal BLAST [44] scans were performed using known KAP-b query sequences [9][15] [16]. All hits from the first BLAST scan with an e-value less than 10 210 were also collected. PSI-BLAST [45] scans were performed using pfam [46] domains IBN_N and Xpo1, which are specific for several KAP-b family members. All returned sequences were then pooled and sequences showing no evidence of KAP-b family membership were removed from the dataset. Following these searches, 630 sequences meeting criteria for KAP-b membership (see methods) were retrieved and 622 of these sequences were subjected to the analysis presented in this section. For reasons of computational tractability, bootstrapped neighbour-joining (NJ) analysis was used to produce an initial subfamily classification system in which any sequences with similar BLAST results and located on adjacent branches of the NJ tree were counted as a cluster. All the subfamily assignments made in this analysis were, where possible, confirmed by formal phylogenetic methodology (see following sections). This preliminary clustering is shown in Figure 2 and together with statistical support in File S1. Fifteen subfamilies, each containing representatives from three or more eukaryotic supergroups, were identified. Each cluster was named using the UniProt ID of either an S. cerevisiae or H. sapiens KAP-b as follows: IMB1, IMB2, IMB3,  IMB4, IMB5, XPO1, XPO2, XPO4, XPO5, XPO6, XPO7, XPOT, IPO8, KA120 and TNPO3. All subfamilies were represented by a single cluster except XPO5, represented by two clusters, which may arise from high sequence diversity. Additional NJ clustering with a sequence subset (composed of the four reference sequence sets, see methods) of each KAP-b subfamily plus all XPO5 candidates produced a single XPO5 cluster, indicating that XPO5 candidate sequences likely comprise a single subfamily (data not shown). Significantly, a well supported cluster of Embryophyte-specific (land plant) sequences was identified (File S1) and designated PLANTKAP.
Following the identification of 15 subfamilies, each sequence in the dataset was assigned candidate membership to either a KAP-b subfamily or PLANTKAP, or as an orphan, i.e. unassignable to a subfamily. S. cerevisiae Pdr6 (Kap122) was the sole functionally validated KAP-b failing to map to a KAP-b subfamily. Further BLAST analysis revealed H. sapiens IPO13 and additional divergent TNPO3 subfamily members as Pdr6 closest relatives. NJ analysis with selected KAP-b subfamily representatives (see methods), all TNPO3 candidates plus Pdr6 and its orthologues, resulted in Pdr6 clustering with the majority of TNPO3 candidates (data not shown), and therefore Pdr6 was classified as a candidate for belonging to the TNPO3 subfamily.
For 30 of the 36 genomes searched, all KAP-b candidates were assignable. In the remaining genomes, the orphan KAP-b sequences were all detected using PSI-BLAST-based domainspecific scans. These sequences may correspond to recent taxonspecific innovations or represent highly diverged representatives of established KAP-b subfamilies. They were not studied further. It is possible that not all KAP-bs were captured by our search; Pdr6 would not have been included if not an initial query. Therefore, whilst exhaustive, we cannot exclude the possibility that additional KAP-b sequences were not identified, and KAP-b complexity may exceed that sampled here. However, the search did correctly identify all KAP-bs detected as NE-associated by proteomics in Trypanosoma brucei [32], suggesting that the dataset is very comprehensive, and only likely to have missed extremely divergent candidates.
In summary, we identified at least 15 KAP-b subfamilies containingone or more sequences from at least three eukaryotic supergroups. In the absence of a convincing root of the eukaryotic tree [47], this distribution is best interpreted as representing an ancient presence in eukaryotes. The most likely interpretation therefore is that these KAP-b subfamilies were established before the eukaryote radiation.

Karyopherin-b family evolution prior to eukaryotic expansion
For phylogenetic analysis, a reduced set of KAP-b subfamily representatives were selected ( Figure 3). We retained sequences from four supergroups where possible to ensure broad representation and hence validate the subfamily presence within the LECA, and included the following taxa: Opisthokonta: Homo sapiens, Nematostella vectensis; Archaeplastida: Arabidopsis thaliana, Physcomitrella patens; Chromalveolata: Phytophthora ramorum, Phytophthora sojae; Excavata: Leishmania major, Trypanosoma brucei. Representatives of the Amoebozoa were excluded as several sequences from this supergroup were found to be more diverged in an initial analysis (data not shown). Where statistical support was poor, the more divergent Excavata sequences were removed.
The presence of the N-terminal IBN_N pfam domain (e-value threshold ,0.1) in members of every subfamily argues for KAP-b being monophyletic (see File S1). We sought further support by testing if each subfamily can detect all other subfamilies based on sequence homology and scanned the human proteome with PSI-BLAST aligments for each subfamily, constructed from the taxa selected above. While each subfamily was not found to detect all other subfamilies, scans with both the XPO1 and XPO5 alignments detected members of all 15 subfamilies as top hits (data not shown), supporting the hypothesis of a monophyletic origin for KAP-b.
An initial analysis containing representatives from all 15 KAP-b subfamilies ( Figure 3, Figure S1(a)) identified 2 robust clades, supported by maximum likelihood (bs .70%) and Bayesian (pp .0.95) algorithms. XPO4 and XPO7 share a common ancestor (Figure 4 blue), as do the two clades of IMB1, IMB2, IMB3 and IMB4 and IMB5, KA120, IPO8 and XPO2. Further analysis ( Figure S1 Figure S1(e)) established the phylogenetic relationships of four ancestral subfamilies comprising the three groups identified above ( Figure 4) and XPO6. This phylogeny demonstrates that (i) XPO6 and the IMB1, IMB2, IMB3, IMB4, IMB5, KA120, IPO8, XPO2 group are descended from a common ancestor and (ii) the XPO4, XPO7 group and the TNPO3, XPO5, XPOT, XPO1 group are also descended from a common ancestor. These two groups in turn are predicted to be descended from an ancestral KAP-b. While this analysis has established a phylogeny of KAP-b subfamilies, it is not possible to determine the order in which the subfamilies diverged from their common ancestor due to the absence of a prokaryotic homologue with which to root the tree. Significantly, there is some correspondence between the phylogenetic groupings described above and published functional characteristics of KAP-b subfamily members. However, given that a complete characterisation of KAP-b function in any organism has yet to be reported, this remains tentative.
With the exception of exportin XPO2, the IMB1 clade contains KAP-b subfamilies, characterised as exclusively involved in protein, and not RNA, nuclear import. Cargoes for this subfamily include mRNA binding proteins, ribosomal proteins, histones and Six hundred and twenty two KAP-b candidate sequences, retrieved from 36 completed predicted proteomes, and representing five of six established eukaryotic supergroups, were clustered into a NJ tree with ClustalW. Taxa are coloured by species, listed on right, and by eukaryotic supergroup. All sequences highlighted by a black arc at the rim of the tree exhibit evidence for specific KAP-b subfamily membership and are located on a branch immediately adjacent to at least one other similar taxon on the tree. Unhighlighted sequences either have some evidence for sub-family membership but are not clustered, or are orphans. The subfamily name of each cluster is followed by additional names, based upon S. cerevisiae and H. sapiens gene names [16]. Tree drawn using PhyloWidget [89]. doi:10.1371/journal.pone.0019308.g002 signal recognition particle proteins. IMB1 imports cargo associated with importin-a and XPO2 exports importin-a after cargo has been released in the nucleus.
The TNPO3, XPO5, XPOT, XPO1 clade members are functionally more diverse, based on present data. Cargoes include tRNAs, small noncoding RNAs, ribosomal subunits and proteins. XPOT and XPO1 function in export of tRNAs and proteins containing leucine-rich nuclear export signals respectively. TNPO3-like and XPO5 proteins participate in both import and export [48] [49]. As these KAP-b subfamilies are located on adjacent leaves in our phylogeny, we speculate that the ancestor of TNPO3 and XPO5 possessed a dual import/export role. The S. cerevisiae XPO5, Msn5p, mediates import and export of distinct cargoes [48], importing replication protein A and exporting a variety of phosphoproteins. Metazoan XPO5 representatives are responsible for export of eukaryotic elongation factor 1A (eEF1A), tRNAs [50], 60S ribosomal subunits [51] and short miRNA precursors [52] [53]. While the S. cerevisiae and H. sapiens orthologues bind dsRNA templates, functional divergence has been demonstrated by measurement of cargo binding affinities [54]. The human TNPO3-related protein IPO13 similarly imports and exports different cargoes; RBM8, Ubc9 and Pax6 are imported, and translation initiation factor eIF1A is exported [49] [55]. However, TNPO3 orthologues from S. cerevisiae and H. sapiens are only documented so far as being involved with import, carrying mRNA-binding splicing factor SR (serine/arginine-rich) proteins into the nucleus [56] [57].
The two members of the remaining clade, XPO4 and XPO7, are functionally distinct. XPO4 exports eIF-5A (eukaryotic translation initiation factor 5A) [58] and transcriptional modulator Smad3 [59] and also imports a different cargo, Sox transcription factors [60]. XPO7 exports proteins with broad substrate specificity using nuclear export signals that, unlike leucine-rich XPO1 signals, include folded motifs [61]. The remaining KAP-b subfamily, XPO6, exports profilin-actin complexes [62]. Given that our understanding of KAP-b function is incomplete, any conclusions based on correspondences between functional and phylogenetic groupings remain speculative.
To attempt to gauge levels of sequence divergence within each subfamily, percent identity values were calculated for subfamilyspecific alignments of the sequences used in the phylogenetic analysis above (Figure 4). XPO5 appears to be the least constrained, which correlates with the observed functional divergence. XPO1, with a percent identity value of 50, appears to be the most evolutionarily constrained.
As we consider convergent evolution unlikely, we propose that the entire KAP-b family descended from an ancestral form. As the phylogeny is unrooted, the position of this ancestral KAP-b remains unknown, and therefore the order of events involved in elaboration of this gene family is unclear. The difference in PID values for each subfamily indicates that selective pressures are variable across the family and that any assumptions about the position of a common ancestor cannot be inferred from branch length. The common ancestor most likely functioned in both import and export, as well as transporting a broad range of cargo. As the XPO1-containing clade (Figure 4 green) both imports and exports a broad range of cargo, we suggest that the root of the tree may lie within this clade. XPO1 and XPO5 robustly detect all other subfamilies by PSI-BLAST, which suggests that these two subfamilies are the most canonical, and other subfamilies may have diverged more from the ancestral KAP-b. In an alternative model [41] [5], it was argued that KAP-a-mediated transport was the ancestral mode, and that later KAP-a-independent pathways are a later simplification. This model implies that the root lies between IMB1 and remaining members of the KAP-b family. While we cannot exclude it, we do not favor this model as it suggests that IMB1 has undergone no expansion whatsoever, while the remaining KAP-b family exhibits huge diversification. However, resolution between the two models is not possible from the data presently available. Regardless of which model is correct, clearly diverse KAP-b pathways were present in the LECA.
With a larger selection of genomes, this analysis confirms, clarifies and expands upon the evolutionary relationships for KAPbs described previously [41] [32]. The new phylogeny ( Figure 4) suggests a likely evolutionary path for the development of the KAP-b transport receptor in the transitional period between the first and last eukaryotic common ancestors.

Karyopherin-b representation across the eukaryotic supergroups
Our initial search produced over 600 KAP-b family members of which, for computational reasons, only a subset were included in the pan-eukaryotic phylogenetic reconstruction (Figure 4). We confirmed subfamily membership for the remaining KAP-bs by additional analysis using Bayesian methods (see methods, Table S1, Figure 5). Sequences shorter than 50% the length of validated KAP-b proteins were excluded. The analysis confirmed representation of all KAP-b subfamilies in all supergroups, with the established exception of XPO6 in Archaeplastida. While some species possess one or more phylogenetically verified members of each KAP-b subfamily, others have divergent representatives, and for some species entire subfamilies are absent ( Figure 6). Schematic illustrating inferred ancestral relationships between the KAP-b subfamilies, percent identity (%id) values, known roles as import or export factor (I/E) within each subfamily and description of cargo types. This unrooted topology was inferred from a series of phylogenetic reconstructions available in Figure S1. Colored panels highlight three clades of related subfamilies whose phylogenies were initially determined; a subfamily representative of each of these clades, and of XPO6, were then used to infer a family-wide phylogeny. doi:10.1371/journal.pone.0019308.g004 A striking feature is the frequency of secondary loss in individual species, suggesting that many organisms sculpt nuclear transport by elimination of KAP-b subfamilies. While we cannot exclude failure to detect highly divergent KAP-b sequences as an explanation, we consider that our searches sufficiently exhaustive to preclude this as a general explanation and that most losses are genuine. With the continuing and increasing availability of completed genomes, this analysis may be improved by including more species, particularly we note the completion of the Chromalveolata Stramenopile Ectocarpus siliculosus [63] and Aureococcus anophagefferens [64] genomes which were not available at the time of beginning this study.
Amongst supergroup-restricted losses, the most prominent is absence of XPO6 from Archaeplastida; as seven Archaeplastida species were included this is unlikely a sampling or data issue. XPO6 exports actin [62], but XPO1 can also perform this function [65], and assumes this role in plants. Significantly, XPO6 is also lost from many other lineages, suggesting that its function is dispensable under certain contexts.
Within supergroups, some taxon groupings exhibit notable KAP-b divergence. In Opisthokonta, several fungi have lost XPO4 and XPO7. Multicellular organisms have, in general, maintained the full complement of KAP-b subfamilies, while unicellular organisms are more likely to have undergone loss or great sequence divergence. A clear exception is the minimized KAP-b system found in nematodes, as Caenorhabditis elegans appears to have ,50% of the KAP-bs from the IMB1 clade. This result was confirmed for C. briggsiae (data not shown), and indicates that a full KAP-b complement is not necessary for multicellularity.
In Archaeplastida, higher plants have undergone several subfamily expansions, with no evidence of secondary loss apart from that of XPO6. By contrast, amongst unicellular Archaeplastida, secondary losses are common, with nine of fifteen subfamilies lost from the hot-spring red alga Cyanidioschyzon merolae. This organism has a very small gene complement [66] and is an extremophile, therefore the result is not unexpected.
In Chromalveolata, the Apicomplexa (Cryptosporidium parvum, Toxoplasma gondii, Theileria parva, Plasmodium falciparum) have undergone similar patterns of secondary loss (XPO6, IMB5, XPO5, XPO4), suggesting that these were lost in their common ancestor. Multiple losses in endomembrane transport are reported for Apicomplexans, suggesting a significant degree of divergence in transport pathways in general in these taxa [67][68] [69].
Within Excavata, only XPO6 and TNPO3 are lost from the kinetoplastids, consistent with retention of other trafficking systems by this supergroup [67] [68]. Significantly, Trichomonas vaginalis has Figure 5. Subfamily distribution of karyopherin-bs across the Eukaryota. Black circles indicate presence of a phylogenetically supported (see methods) KAP-b subfamily member. Grey circles indicate candidate subfamily members that could not be verified phylogenetically. Empty circles indicate no candidate found. Numbered circles indicate cases where more than one candidate is found. A small circle indicates candidate(s) in addition to phylogenetically supported candidate(s) indicated by big circles. The left panel illustrates the phylogenetic relationships between subfamilies. See Table S1 for additional information including protein identifiers. doi:10.1371/journal.pone.0019308.g005 expanded multiple KAP-b families, a feature of interest as specific expansions of multiple gene families involved in intracellular trafficking, including Rabs [70] [71] and adaptins [72][71], have also been described. This suggests that KAP-b may be a component of the expanded gene cohort in this organism. It is unclear why such expansions occurred [70].

Lineage-specific expansions
Many examples of species-specific duplications or expansions were found, the most dramatic being fifteen XPO1 subfamily members in T. vaginalis. In addition, several duplications ( Figure S2 and Figure 6) are predicted within individual supergroups as follows: (i) A common ancestor to H. sapiens and D. rerio duplicated IPO8 ( Figure S2(a)).

(ii)
A common ancestor to land plants duplicated IMB1 before full diversification ( Figure S2(b)). While both IMB1 subclades contain higher plants, one contains a duplicated P. patens (moss) sequence while the second branch lacks a moss representative, presumably from secondary loss.
There are two versions of IMB3 in P. sojae and P. ramorum ( Figure S2(c)), suggesting a duplication in a Chromalveolata common ancestor. However it was not possible to produce a robust topology phylogenetically and so any conclusions are tentative. (iv) One or more common ancestors to the Kinetoplastida duplicated each of XPO7, IPO8 and also may have duplicated IMB2 ( Figure S2(d, e, f)). For XPO7, both paralogues are divergent while for IMB2, just one paralogue is diverged. The T. brucei paralogue of the more diverged IMB2 clade (Tb10.6k15.3020) was identified as a component of the NPC proteome [32], providing direct evidence that this KAP-b is functional. While this paralogue may have arisen by duplication in a common ancestor, this is not confirmed by phylogenetic analysis and so any conclusions are tentative.
The novel clade, PLANTKAP, is restricted to land plants.
Both bootstrapped NJ and Bayesian algorithms placed PLANTKAP close to IPO8 (data not shown and Figure  S2(g), suggesting these have diverged from the IPO8 subfamily. Populus trichocarpa contains only a truncated PLANTKAP (accession Poptr1_1_724002), likely a sequencing artifact. Several land plants have two IPO8 KAPb paralogues, but these are derived from recent speciesspecific duplications (data not shown).

(vi)
Except for Excavata, all supergroups possess a duplicated TNPO3. The simplest explanation is an ancestral duplication and Excavata secondary loss. When all TNPO3 candidate sequences were clustered by NJ, a cluster of robust candidates identified by phylogenetic analysis (data not shown), was formed with more diverged candidates being excluded ( Figure S2(h)). This divergent TNPO3 group includes S. cerevisiae Pdr6 and H. sapiens IPO13. While IPO13 is involved in both import and export [49], Pdr6 is only noted as involved with import [73]. The more diverged Archaeplastida TNPO3 group only contains land plant representatives and remains sufficiently closely related to TNPO3 that these sequences validate as TNPO3 subfamily members by phylogeny. Therefore, if TNPO3 is comprised of two subfamilies, this lesser degree of divergence in plants and higher degree of divergence in Opisthokonta potentially reflects differing selective forces between the eukaryotic lineages. (vii) A single example of KAP-b innovation by gene fusion was identified. P. sojae and P. ramorum contain an XPOT::ABCtype transporter chimera, which likely arose in their common ancestor. It remains unknown if these proteins actively function as KAP-bs or as proteins with novel function.
Overall, there is clear evidence for specific, but limited, secondary losses and lineage-specific expansions within the evolutionary history of the KAP-b family. There is however little evidence for major evolutionary innovation within the KAP-b family post-dating the LECA.

Conclusions
Employing a combination of domain searching and iterative BLAST analysis, we identified over six hundred KAP-b genes from a broad range of eukaryotes. Due to a shared IBN_N pfam domain and the fact that all subfamilies are returned as top hits in XPO1 and XPO5 PSI-BLAST searches, we conclude that the KAP-b family most likely arose by divergent evolution, i.e. from a single ancestral KAP-b. Cluster analysis identified fifteen KAP-b subfamilies that, except XPO6 in Archaeplastida, are represented in all eukaryotic supergroups, and hence were likely present in the LECA. This also suggests that KAP-b transport mechanisms have been conserved since the LECA, consistent with conserved NPC composition and additional aspects of the nuclear envelope [32] [4]. Further, a derived evolutionary history successfully places the vast majority of KAP-b subfamilies into three major clades, for which there is some functional support.
The IMB1 clade is responsible for protein, and not RNA, transport. Seven of the eight subfamilies in the clade are importins, together with KAP-bs responsible for KAP-a import and export (IMB1 and XPO2). The XPO1 clade is involved in both import and export of both RNA and proteins. While these phylogenetic groupings may reflect deep functional relationships, this remains to be confirmed by further experimental work. Several lineagespecific events were identified, most notably PLANTKAP, a cluster of plant-specific KAP-bs likely derived from IPO8, and a TNPO3 subfamily expansion which may indicate an additional KAP-b subfamily. We also found several examples of secondary loss, many of which clearly occurred early in evolution of specific supergroups while some are more recent. However, most significantly, there is little evidence for large paralogous expansions within either individual taxa or supergroups, suggesting that the overall configuration of the KAP-b family has been retained during post-LECA evolution.
For the BLAST scans, functionally validated KAP-b sequences [9][15] [16] and their S. cerevisiae or H. sapiens orthologues (identified as reciprocal best BLAST hits) were used as queries in BLASTp scans of the predicted proteome panel. All reciprocal best BLAST hits (i.e. the hit, when used as query, returned the original query as top hit or with identical e-value as the top hit), were collected and used as query sequences in a second BLASTp scan of the panel. All reciprocal best BLAST hits were collected and used as query sequences in a third BLASTp scan. All returned reciprocal best BLAST hits, together with all hits in the initial BLAST scan with e-value ,10 210 , were collected. An additional eight hits from the initial BLAST scan with an e-value of less than 0.01, were selected on the basis of (i) being greater than 500 amino acids in length and (ii) fold recognition by FUGUE where sequences were counted as candidates if a KAP-b family member was returned as top hit with ZSCORE greater than 6.
For the PSI-BLAST scans, Pfam [46] domains IBN_N (PF03810, Importin-b N-terminal) and Xpo1 (PF08389, Exportin 1-like protein), which are specific to several KAP-b family members, were used as query sequences in blastpgp (PSI-BLAST) [45]) searches. This was carried out as follows: Multiple sequence alignments (MSAs) for each domain were retrieved from the pfam website [74] and realigned with Muscle [75]. Sequences with greater than 90% identity were removed using Jalview 2.06 [76]. For each member sequence of a pfam-domain query (reducedredundancy alignment), PSI-BLAST was to scan the predicted proteome panel. The input of each PSI-BLAST scan included the pfam domain alignment (ie, 'jump-start' from MSA mode) and the maximum number of rounds was set to three. For sequences retrieved with e-value of 0.0001 or less in any of the PSI-BLAST scans, the sequence segment giving the lowest e-value was identified. These segments were scored as valid matches if they, or the whole predicted coding sequence from which they were derived, satisfied any of the following criteria: (i) whole sequence annotated in UniProt [77] as containing the query domain according to pfam or InterPro [78], (ii) sequence segment detected a protein containing the domain, according to pfam or InterPro, as the highest scoring retrieved sequence when used as BLAST query against S. cerevisiae, or (iii) whole sequence gives an e-value ,1.0 for the domain when used as hmmpfam [79] query against the pfam version 18.0 HMM database. The majority (.90%) of retrieved sequences for both IBN_N and Xpo1 domains passed the validation test. Having verified that the PSI-BLAST-based domain detection strategy did not detect a significant number of off-target sequences, the full sequences from all hits, regardless of validation status, were collected. Redundant sequences were identified by all-vs-all BLAST and discarded. Criteria for removal of off-target sequences from the pooled dataset were set after a preliminary analysis by visual inspection of a ClustalW [80] neighbour-joining (NJ) tree of all sequences in which the treefile labels had been annotated. Annotations included BLAST and PSI-BLAST results, predicted protein size, predicted charge (using pepstats of the EMBOSS package [81]), predicted domain content and, in cases with no evidence of IBN_N or Xpo1 domains by hmmpfam prediction with no threshold, detection of KAP-b homology by fold recognition. The eight low-scoring BLAST hits were removed from the NJ analysis as they introduced distortions in the NJ tree.
Sequences meeting either of the following criteria were removed: (i) unexpected hmmpfam-predicted domain content and no evidence of hmmpfam-predicted IBN_N or Xpo1 domains (e-value ,0.1), or (ii) no evidence of IBN_N or Xpo1 domains by hmmpfam prediction with no threshold and sequence matched no KAPb family members by fold recognition with FUGUE's fugueseq [82] (ZSCORE cutoff of 0).
For reciprocal BLAST round three matches, sequences matching only the first condition of (ii) were also removed. In one exception to (ii), a sequence without FUGUE or hmmpfam evidence for family membership (XPO5 candidate Poptr1_ 1_241008) was retained as it clustered with other XPO5 candidates in the ClustalW NJ tree that was part of the preliminary analysis. Three HEAT domain-containing sequences that did not match either of the criteria were also discarded: Nemve1_128552, as it was detected only by a query sequence that was itself rejected, and sp_Q7Z460-1_CLAP1_HUMAN and tr_A8WHM7_A8WHM7_DANRE, which are annotated as belonging to a different gene family. An additional two sequences (tr_Q54TU2_Q54TU2_DICDI and ent_h_54.m00221) were also removed; while both matched KAP-b family members by fold recognition with FUGUE's fugueseq, the KAP-bs were not the top match and the sequences are probably members of a different gene family. Also, both were much longer than expected for KAP-bs.

NJ cluster analysis of Karyopherin-b candidate sequences
The remaining 622 KAP-b candidate sequence were clustered in a ClustalW [80] NJ tree. The eight low-scoring BLAST hits, which were included in further analyses, were excluded from this NJ analysis as they were only weakly detected by BLAST and introduced distortions in the NJ clustering tree. A small number of sequences were trimmed before this step as they were substantially larger than the other candidates, containing additional sequence with no KAP-b homology -this is assumed to be the result of sequence assembly/annotation issues in the original databases. The treefile labels were then annotated with the same data used for the preliminary analysis. Where possible, each taxon was assigned, by hand, to a subfamily on the basis of subfamily membership of the BLAST query which first detected the sequence during the initial BLAST scan and three rounds of reciprocal BLAST. Any taxon located on a branch immediately adjacent to at least one other taxon with the same subfamily assignment was classed as being a member of that cluster. Taxa with no indication of KAP-b subfamily membership were assigned as 'ORPHAN'.

Phylogenetic analysis of the karyopherin-b family
Multiple sequence alignments were generated using Muscle and edited in Jalview 2.4 as follows: Alignments were coloured by conservation with no threshold (this is the default 'colour by annotation' setting) and then uncoloured columns and less-conserved columns (conservation value of 1) that were at the junctions of wellconserved blocks of sequence were removed. Phylogenetic analysis was performed using MrBayes [83] and PhyML [84]. All calculations were performed on CamGrid [85]. MrBayes run parameters were prset aamodelpr = mixed; lset rates = gamma Ngammacat = 4; mcmc ngen = 1000000; samplefreq = 1000; nchains = 4; startingtree = random; sumt burnin = 100. PhyML parameters were nb bootstrapped data sets = 100; substitution model = WAG; proportion invariable sites = 0.0; nb categories = 4. For each PhyML analysis, ProtTest [86] was used to determine the appropriate substitution model and gamma parameter.

Phylogenetic verification of subfamily membership
Datasets for verification of subfamily membership using phylogenetic analysis were generated by adding the sequences of interest to the appropriate one of four reference sequence sets which were each composed of sequences from selected species and from one of four KAP-b subfamily groupings only. The reference sequences were previously used for establishment of KAP-b subfamily phylogeny and were from the following supergroups: Opisthokonta (H. sapiens, N. vectensis), Excavata (L. major, T. brucei), Chromalveolata (P. ramorum, P. sojae) and Archaeplastida (A. thaliana, P. patens). The four reference sequence files contained the following KAP-b subfamilies: IMB1/IMB2/IMB3/IMB4, IMB5/ KA120/IPO8/XPO2, TNPO3/XPO5/XPOT/XPO1 and XPO4/XPO7/XPO6. Trees were generated using MrBayes with the same parameters as described above. Sequences were counted as phylogenetically verified subfamily members if they were located in the expected branch of the tree with statistical support (posterior probability .95%), regardless of branch length or position on the branch.

Calculation of percent identity for each KAP-b subfamily
A multiple sequence alignment for each KAP-b subfamily was generated in Muscle using only selected sequences from the species used for the reference dataset. Each MSA was used to generate two subalignments; the first contained sequence from H. sapiens, P. sojae and A. thaliana and the second contained sequence from N. vectensis, P ramorum. and P. patens. In some cases the alignment contained only two sequences as the third sequence was absent or was considerably diverged from canonical KAP-b sequence. Percent identity for each subalignment was calculated by alistat from the HMMer package [79]. The results for the two subalignments for each KAP-b subfamily were averaged after confirming that the results for each subalignment were similar (within 2% of each other).

Automation
Sequence retrieval, BLAST scans, identification, validation and annotation of hits and all associated parsing of text files were carried out in batch mode with scripts written in Perl5 [87], using modules from the BioPerl library [88] and are available from the authors on request. Table S1 Details of data sources and data from analysis of karyopherin-b families. (a) Sources for predicted proteome data. (b) Table listing IDs for candidates for each species and KAP-b subfamily. This dataset was used to generate Figure 4. Includes annotation indicating if the candidate clustered in the NJ tree of Figure 2, if the candidate was a reciprocal BLAST best hit for a known KAP-b and if the candidate was assignable to a subfamily phylogenetically.

(XLS)
File S1 Treefile used to generate Figure 2 -ClustalW neighbourjoining tree of 622 known & candidate karyopherin-betas. Suggested tree viewing software: http://www.phylosoft.org/archaeopteryx/. Each taxon is annotated as follows: gene_name or UniProt ID; *length; *pfam domain predictions (e-value ,0.1); *BLAST results (_q indicates that the sequence was used as a query in the first BLAST round, _b1, _b2, _b3 indicate that the sequence was picked up in best reciprocal BLAST hit round 1, 2 or 3, _b0 indicates that the sequence was picked up in the first round of BLASTs with an evalue ,10e-10); * hmmpf (if IBN_N or Xpo1 was detected in the sequence by hmmpfam with no e-value threshold); * PB (if the sequence was detected in either of the PSI-BLAST IBN_N or Xpo1 domain scans; ch 00.00 charge calculated by pepstats; FUGUE 00.00 (If the sequence matched a karyopherin-beta family member according to fugueseq and ZSCORE of top match). Note -FUGUE results for selected sequences only; *subfamily assignment. Subfamily assignments:__ NAME_1, for candidate NAME subfamily candidates clustered with other candidate NAME subfamily candidates; __NAME_0 for candidate NAME subfamily candidates not clustered with other candidate NAME subfamily candidates; __ORPHAN for karyopherin-beta candidates with no subfamily assignment. (TXT)