Structure and Molecular Evolution of CDGSH Iron-Sulfur Domains

The recently discovered CDGSH iron-sulfur domains (CISDs) are classified into seven major types with a wide distribution throughout the three domains of life. The type 1 protein mitoNEET has been shown to fold into a dimer with the signature CDGSH motif binding to a [2Fe-2S] cluster. However, the structures of all other types of CISDs were unknown. Here we report the crystal structures of type 3, 4, and 6 CISDs determined at 1.5 Å, 1.8 Å and 1.15 Å resolution, respectively. The type 3 and 4 CISD each contain one CDGSH motif and adopt a dimeric structure. Although similar to each other, the two structures have permutated topologies, and both are distinct from the type 1 structure. The type 6 CISD contains tandem CDGSH motifs and adopts a monomeric structure with an internal pseudo dyad symmetry. All currently known CISD structures share dual iron-sulfur binding modules and a β-sandwich for either intermolecular or intramolecular dimerization. The iron-sulfur binding module, the β-strand N-terminal to the module and a proline motif are conserved among different type structures, but the dimerization module and the interface and orientation between the two iron-sulfur binding modules are divergent. Sequence analysis further shows resemblance between CISD types 4 and 7 and between 1 and 2. Our findings suggest that all CISDs share common ancestry and diverged into three primary folds with a characteristic phylogenetic distribution: a eukaryote-specific fold adopted by types 1 and 2 proteins, a prokaryote-specific fold adopted by types 3, 4 and 7 proteins, and a tandem-motif fold adopted by types 5 and 6 proteins. Our comprehensive structural, sequential and phylogenetic analysis provides significant insight into the assembly principles and evolutionary relationship of CISDs.


Introduction
Proteins with bound iron-sulfur clusters form an ancient and essential part of the proteome in every organism. Iron-sulfur proteins primarily transfer electrons in various biochemical processes, and they also take roles in substrate binding, iron-sulfur storage, gene regulation and enzyme activity [1][2][3]. The recently discovered CDGSH iron-sulfur domains (CISDs) are characterized by one or two highly conserved 17-residue CDGSH motifs with a consensus sequence of [WCXCXX(S/T)XXXPWCDG(S/T/A)H], where W and X are a hydrophobic or any residue, respectively [4][5][6][7][8].
Humans have three CISD proteins, mitoNEET (aka CISD1), Miner1 (CISD2) and Miner2 (CISD3), which show an ill-defined mitochondria-related function. The first characterized CISD protein, mitoNEET, was initially identified in the mitochondria as a cross-linking target for the anti-diabetic drug pioglitazone [9]. The mitochondria isolated from mitoNEET knockout mice showed a reduced oxidative capacity [4]. MitoNEET consists of an Nterminal transmembrane helix and a C-terminal CISD and is located in the outer membrane of mitochondria with its CISD oriented toward the cytoplasm [4]. The crystal structure of mitoNEET CISD shows an intertwined homodimer with each subunit binding a [2Fe-2S] cluster [6][7][8]. The [2Fe-2S] cluster is coordinated with three cysteine residues and one histidine residue in the CDGSH motif. Such a 3-Cys and 1-His ligated [2Fe-2S] cluster has not been observed for other [2Fe-2S] proteins and has raised much interest in its unique physicochemical properties [10][11][12][13][14].
Miner1 is closely related to mitoNEET, sharing 44% overall sequence identity and a highly similar structure [15]. A mutation in the Miner1 gene is the causative agent for Wolfram Syndrome 2, a disease characterized by juvenile onset diabetes mellitus, optic atrophy, deafness and increased bleeding tendency [16]. Miner1 knockout mice showed growth retardation, a reduced life span and premature aging, and these phenotypes were associated with mitochondrial degeneration and respiration dysfunction [17]. Miner1 was initially reported to be localized in the endoplasmic reticulum [4,16], but a recent study demonstrated that it primarily localizes in the outer mitochondrial membrane [17]. The third human CISD protein, Miner2, contains tandem CDGSH motifs, unlike mitoNEET and Miner1, which contain only a single CDGSH motif. The function of Miner2, which is also a mitochondrial protein [4], has not yet been described.
The iron-sulfur clusters in mitoNEET and Miner1 are redox active and may be involved in electron transfer [5,6,15]. Because of the histidine necessary for ligand binding, lowering the pH destabilizes the iron-sulfur cluster and reduces the redox potential [5,15]. The stability of the iron-sulfur cluster of mitoNEET was shown to be increased by pioglitazone [7] and reduced by NADPH [18].
CISD proteins are present in the majority of eukaryotes except fungi and have a scattered distribution in prokaryotes. We have classified CISDs into seven major types based on overall sequence homology, domain composition and phylogenetic profile (Fig. 1, Table 1 and Files S1, S2, S3, S4, S5, and S6) [6]. Types 1, 2, 3 and 4 CISDs contain a single CDGSH motif and are present in eukaryotes, apicomplexa, archaea and bacteria, respectively. Type 6 CISDs possess tandem motifs separated by about 20 amino acid residues, and type 5 CISDs are a fusion of type 6 CISDs with a C-terminal glutamate synthase FMN-binding domain. One or two copies of type 7 CISDs are conjugated to a DUF1271 domain in a variable arrangement.
Although the highly conserved CDGSH motif will most likely bind a [2Fe-2S] cluster in all CISDs, how the iron-sulfur binding module is incorporated into structures folded by seemingly diverse sequences has previously not been understood. To date, structures are available only for the type 1 CISD proteins mitoNEET and Miner1. Preliminary crystallization results, but no structure, have been reported for a type 4 CISD [19]. To investigate the structure of different types of CISDs, we have determined high-resolution crystal structures for types 3, 4, and 6 CISDs and predicted the structures for type 2 and 7 CISDs. Our comprehensive analysis of CISD structures and sequences provides important insights into the assembly principles and evolutionary pathway of CISDcontaining proteins.

Structure determination
We synthesized genes for a type 3 CISD (PcCISD) from Pyrobaculum calidifontis, which is a hyperthermophilic archaeon, a type 4 CISD (RsCISD) from Ralstonia solanacearum, which is a soilborn plant pathogenic bacterium, and a type 6 CISD (MmCISD) from Magnetospirillum magneticum, which is an aquatic alphabacterium that utilizes iron reduction to derive energy. As there is only a single CISD protein present in each of the three species' genomes, each is named CISD prefixed with the initials of the binomial name of the derived species. The three recombinant proteins display a reddish color during purification and crystallization, suggesting that they bind iron-sulfur clusters like mitoNEET and Miner1. The MmCISD structure was determined by singlewavelength anomalous dispersion (SAD) using the anomalous signal from the bound iron. The structures of RsCISD and PcCISD were solved by a combination of SAD and molecular replacement (MR) using the iron-sulfur binding module of mitoNEET as a search model. The structures of MmCISD, RsCISD and PcCISD have been refined to resolutions of 1.15 Å , 1.8 Å and 1.5 Å with an R work /R free of 0.111/0.125, 0.208/0.254 and 0.196/0.222, respectively ( Fig. 2 and Table 2). We will describe each of the three structures and derive structural themes and variation.

Type 6 MmCISD is a monomer with an internal pseudo dyad symmetry
The MmCISD structure is monomeric, with each of its two CDGSH motifs binding a [2Fe-2S] cluster ( Fig. 2A). The pearshaped structure consists of two iron-sulfur binding modules at the base and a compact b-sandwich on the top. The structure is composed of mainly loop regions (L1 though L5) interspersed by four short b-strands (b1 though b4) with a topology of L1-b1-L2-b2-L3-b3-L4-b4-L5. Strand b1 pairs with strand b4 forming a two-stranded sheet, whereas strand b2 pairs with strand b3, forming another two-stranded b-sheet, both are antiparallel. The two b-sheets pack against each other into a b-sandwich. One [2Fe-2S] cluster is coordinated by Cys25, Cys27, Cys36 and His40 in the N-terminal CDGSH motif in loop L3, whereas the other [2Fe-2S] cluster is coordinated by Cys61, Cys63, Cys72 and His76 in the C-terminal CDGSH motif in loop L5, as predicted.
The monomeric structure of MmCISD displays an internal pseudo 2-fold symmetry and bears remarkable similarity to other homodimeric CISD structures. The N-terminal half (L1-b1-L2-b2-L3) is related to the C-terminal half (b3-L4-b4-L5) by a pseudo dyad axis passing through the center of the structure. The two structural halves are associated through b-strand pairing and hydrophobic and polar interactions (Fig. 3A). The major hydrophobic core is composed of Val14, Val16, Tyr22, Val48, Tyr50, Pro52, Ala58 from the b-sandwich, as well as residues Trp24, Pro34, Phe35, Phe60, Pro70 and Leu71 from the two CDGSH motifs.
The two iron-sulfur binding modules pack back-to-back with a polar interface (Fig. 3A). The interactions at the interface are asymmetric unlike those found in symmetric CISD dimer structures. The completely buried Lys64 plays a key role in bridging the two iron-sulfur binding modules. The side chain amino group of Lys64 forms three hydrogen bonds simultaneously with the backbone carbonyl oxygen of Cys61 and Cys25 and the side chain carbonyl oxygen of Gln33. The side chain amide group of Gln33 is additionally stabilized by the interaction with the carbonyl oxygen of Ser30. Residues Gln33 and Lys64 are highly conserved in the CDGSH motifs of types 5 and 6 CISDs, but not in other types of CISDs ( Fig. 1), underscoring the importance of this type-specific interaction. In addition, the nitrogen atom in the indole ring of Trp24 makes a hydrogen bond with the carbonyl of Cys61, although Trp24 is often replaced by phenylalanine. Despite many polar interactions, no water molecule is found at the interface of the two tightly packed iron-sulfur binding modules.
In a search of the NCBI nonredundant database, we found type 6 CISDs in 44 eukaryotic species and 72 bacterial species, and type 5 CISDs in 32 bacterial species (Table 1). These types 5 and 6 CISDs should adopt a similar structure as MmCISD due to significant sequence homology ( Fig. 1A and File S5). For instance, MmCISD and human Miner2 share 31% sequence identity and 39% sequence similarity.
Interestingly, one of the two CDGSH motifs is degenerated in 17% of types 5 and 6 CISDs (File S5). In about half of these cases, the characteristic histidine ligand is replaced by a cysteine. The resulting motif may still be able to bind a [2Fe-2S] cluster, as shown experimentally for a mitoNEET mutant [12,20]. In the remaining cases, the iron-sulfur binding is mostly likely disrupted by substitutions (Cys to Ser, His to Gly, Ser, Gln or Asn) and whether the resultant structure with a single [2Fe-2S] cluster still folds normally is unclear.

Type 4 RsCISD forms a homodimeric structure
Type 4 constitute a major group of CISDs present in 103 bacterial species (Table 1). RsCISD adopts an intertwined dimeric structure with the dimer interface burying 1250 Å 2 of solvent accessible area per monomer. Residues 6-66 of RsCISD are resolved in the structure and form five consecutive b-strands (b1 though b5) followed by the iron-sulfur binding module and a short a-helix, a1 (Fig. 2D). Each subunit binds a [2Fe-2S] cluster via residues Cys41, Cys43, Cys52 and His56 in the CDGSH motif.
The structure contains a b-sandwich packed by two intermolecular five-stranded b-sheets. Each b-sheet comprises strands b1, b2 and b5 from one subunit and strand b39 and b49 from the other subunit with the order b5-b2-b1-b39-b49 (prime denotes the other subunit). The intermolecular pairing between strand b1 and b39 is parallel, whereas all other adjacent strand pairing is antiparallel. The b-sandwich is stabilized by a large number of hydrophobic residues (Ile7, Ala9, Tyr15, Ile17, Phe21, Ile23, Leu31, Ala38, Leu40, Pro50 and Phe51), which are highly conserved in type 4 CISDs (Fig. 1B, 3B and File S4).
The two iron-sulfur binding modules are packed closer to each other in the RsCISD structure than in the mitoNEET and MmCISD structures. At the closest distance, the peptide backbone atoms of Arg42, Cys43 and Gly44 are within 3.3-3.8 Å of their counterparts. The interface between the two iron-sulfur binding modules is mainly polar, burying two arginine residues and four water molecules (Fig. 3B). The solvent exposed residues in the RsCISD structure are mostly not conserved in type 4 CISDs with one exception, Lys57. Lys57 is located at the opening of the [2Fe-2S] cluster ( Fig. 3B) and is conserved as lysine or arginine in more than 90% type 4 CISDs ( Fig. 1B and File S4) and thus may be functionally important. The equivalent residue, the residue C-terminal to the histidine ligand, is not conserved in other type CISDs (Fig. 1).
Type 3 PcCISD structure is a permutated form of type 4 RsCISD structure Type 3 CISDs form a small group and were previously thought to exist only in archaea [6]. However, seven bacterial sequences with clear homology to the archaeal type 3 sequences should be grouped into this family (Fig. 1D, Table 1 and File S4). Like other single-motif type 1 and 4 CISDs, the 53-residue type 3 PcCISD also adopts a dimeric structure with each subunit binding a [2Fe-2S] cluster (Fig. 2C). The dimerization buries a surface area of 705 Å 2 for each monomer. Each subunit comprises four b-strands (b1 to b4) and one a-helix (a1) arranged in the order b1-b2-b3-a1-b4. The iron-sulfur binding module is located between b3 and a1 and coordinates a [2Fe-2S] cluster with residues Cys25, Cys27, Cys36 and His40 (Fig. 3C). The b-strands from one subunit form an antiparallel four-stranded b-sheet that further associates with its counterpart from the other subunit into an intermolecular b-sandwich. The PcCISD structure closely resembles the RsCISD structure, but displays a permutated topology. The two structures are superimposable by both subunits with a root mean square deviation (rmsd) of 0.562 Å over 83 Ca pairs (Fig. 4). They also share similar dimer interfaces, including buried arginine residues and water molecules between the two iron-sulfur binding modules ( Fig. 3B and 3C). Strands b1, b2 and b3 of PcCISD correspond to strands b1, b2 and b5 of RsCISD, respectively (Fig. 4). The b3 and b4 strands of RsCISD are replaced by a short disordered loop in the PcCISD structure; nevertheless, the position of the strand b3 of RsCISD is occupied by the C-terminal b4 strand of PcCISD coming from the opposite direction. Another difference between the two structures is that the bsandwich of RsCISD contains the subunit-swapped b-strands b3 and b4, but there is no strand swapping in the PcCISD structure.

Conserved structural features around iron-sulfur binding module
The four structures of mitoNEET, MmCISD, RsCISD and PcCISD provide five nonequivalent views (two motifs from MmCISD) of how a CDGSH motif binds a [2Fe-2S] cluster. The five iron-sulfur binding modules adopt nearly identical conformations with an rmsd in the range of 0.3-0.5 Å for the 17 Ca atoms of the CDGSH motif (Fig. 5A). This is probably not surprising, given that the residues that mediate iron coordination and structure maintenance [C2, C4, (S/T)7, P11, C13, D14, G15, (S/T/A) 16 and H17] are universally conserved. In addition, residues W1, P11 and W12 contribute to the hydrophobic core of the b-sandwich in all four CISD structures ( Fig. 3 and 5A). This indicates that the CDGSH motif itself possesses a conserved interface for dimerization.
The structural alignment also reveals two additional conserved structural elements beyond the iron-sulfur binding module. The first element is a b-strand (called N-strand) N-terminal to the CDGSH motif that is located at the outmost part of the b-sandwich. The hydrophobic residue W1 of the CDGSH motif is part of the Nstrand. The second element is a b-strand adjacent to the N-strand that has a consensus sequence of [GPWXW]. The proline residue in the motif is located near the opening of the [2Fe-2S] cluster and lies over a bridging sulfur of the [2Fe-2S] cluster like a lid (Fig. 5A). We therefore refer to the proline as lid-proline (lid-P) and the motif as P motif. The first residue in the P motif is variable in types 5 and 6 CISDs, and the proline residue is sometimes replaced by serine in type 4 CISDs (File S4 and S5). Despite adopting a conserved structure, the P motif may occupy different positions in the primary sequence of the different types of CISDs, being N-terminal of the CDGSH motif in types 3 and 4 CISDs and C-terminal of the CDGSH motif in type 1 CISDs (Fig. 1).

Type-specific features of CISD structure
The four CISD structures differ from other each in terms of the structure and topology of the b-sandwich dimerization module (Fig. 2). The type 6 MmCISD structure with tandem CDGSH motifs has a minimal b-sandwich packed by two two-stranded sheets and the constituent b-strands are situated N-terminal to the CDGSH motif in each half structure. The b-sandwich of type 1 mitoNEET consists of two three-stranded sheets that are assembled with b-strands from both sides of the CDGSH motif. The type 4 RsCISD structure has five-stranded sheets that are built up exclusively by the N-terminal region of the CDGSH motif. The b-sandwich of type 3 PcCISD is four-stranded, contains no swapped strand and comprises b-strands at both sides of the CDGSH motif.
In the four CISD structures, the two iron-sulfur binding modules all pack to each other in a back-to-back fashion, but their relative orientation is variable (Fig. 5B). The center-to-center distance between the two [2Fe-2S] clusters varies among 15.9 Å in mitoNEET, 15.8 Å in MmCISD and 14.2 Å in RsCISD and PcCISD. Compared to mitoNEET, when one of the iron-sulfur binding modules is aligned, the [2Fe-2S] cluster in the other module is shifted by 26 degrees in MmCISD and by 20 degrees in RsCISD and PcCISD. The different packing modes lead to certain interfacial residues in the CDGSH motif being conserved specially only in one type CISD, such as positions 3, 5, 6, and 10 in the type 1 motif, positions 5 and 10 of the N-terminal motif in types 5 and 6, position 5 in the C-terminal motif in types 5 and 6, and positions 3 and 5 in the type 4 motif (Fig. 1). The variable orientations and interactions between the two iron-sulfur binding modules suggest that the communication between dual [2Fe-2S] clusters is not essential for all CISDs. Structural prediction for type 7 CISD Type 7 CISDs are characterized by one or two CDGSH motifs fused to a DUF1271 domain. We found that type 7 CISDs scatter in 10 archaeal species and 43 bacterial species (Table 1 and File S1 and S4). On the basis of domain organization, type 7 CISDs are further divided into three subtypes, in which the DUF1271 domain is followed by one CISD (subtype DC, 14 species), two CISDs (subtype DCC, 8 species) or flanked by two CISDs at either side (subtype CDC, 31 species). Irrespective of subtype and relative position to DUF1271, individual type 7 CISDs show clear sequence homology with type 4 CISDs over the full extent of the domain (Fig. 1C and File S4). This strongly suggests that type 7 CISDs fold into a type 4-like dimer. However, it remains unknown whether the two CISDs in subtypes DCC and CDC form an intramolecular dimer or form an intermolecular dimer with another molecule.

Structural prediction for type 2a and 2b CISD
The CISD proteins from apicomplexa were previously all classified as type 2. Further analysis shows that many apicomplexa species contain up to three distinct CISD proteins. One protein contains tandem CDGSH motifs and should be classified as type 6 (such as the T. goni sequence in Fig. 1A). The other two are partially similar to each other and are reclassified as types 2a and 2b (Fig. 1E).
Types 2a and 2b CISDs bear significant sequence homology to type 1 CISDs within the CDGSH motif and C-terminal region (Fig. 1E, File S3). The two iron-sulfur modules are likely packed similarly in types 2a and 2b CISDs and type 1 mitoNEET, as key CDGSH motif residues mediating iron-sulfur module interaction  (arginine at position 3, tryptophan at position 5 and an aromatic residue at position 10) are conserved. Moreover, the C-terminal region harbors a P motif, suggesting that types 2a and 2b CISDs, like mitoNEET, employ a C-terminal b-strand to provide the lidproline (Fig. 2B). However, the region N-terminal to the CDGSH motif does not align well to type 1 CISDs and may adopt an alternative structure. Additionally, type 2a CISDs contain a long C-terminal extension (not shown in Fig. 1E) that might fold into an additional structure. In summary, the sequence analysis suggests that types 2a and 2b CISDs are a deviated form of type 1 CISDs.

Phylogenetic profile of CISDs
To provide insight into the evolutionary pathway of CISDs, we analyzed the phylogenetic profile of each CISD type. Except for fungi, which is notably absent of any CISD, eukaryotes contain type 1, 2 and 6 CISDs. Type 1 CISDs appear to be universally present in metazoa and land plants (Table 1 and File S1). Some metazoa are missing a CISD gene, most likely due to incomplete sequencing of the genome. Type 1 CISDs are also found in green algae (Chlamydomonas reinhardtii), choanoflagellate (Monosiga brevicollis), diatoms (Phaeodactylum tricornutum, Thalassiosira pseudonana), brown algae (Ectocarpus siliculosus) and oomycetes (Phytophthora infestans).
In contrast to their extensive eukaryotic occurrence, CISDs are sporadically distributed in prokaryotes. Our search found 255 bacterial species and 21 archaeal species that contain at least one CISD gene. These prokaryotes most frequently (95.2%) have a single CISD type and 13 species have two CISD types. From a set of prokaryotes whose genomes have been completely sequenced or are in draft assembly (CA-set), 17% of bacteria and 23% of archaea are CISD positive.
Interestingly, types 4 and 5 or 6 CISDs display mutually exclusive distributions at the phylum level. Type 4 CISDs are predominantly present in Gram-positive Actinobacteria and Firmicutes, whereas type 5 and 6 CISDs are mainly found in Gram-negative Proteobacteria. Distinct patterns are also evident for smaller bacterial groups. Type 4 CISDs are preferred in Bacteroidetes (together with type 7), Acidobacteria, Deinococcus-Thermus and Planctomycetes, whereas types 5 and 6 CISDs dominate in Cyanobacteria and Chlorobi. The exclusive phylogenetic profiles of type 4 and 5 or 6 CISDs are consistent with a predominant role of vertical gene transfer in spreading these genes. Nevertheless, horizontal gene transfer (HGT), which is widespread in prokaryotes [21], may contribute to the wide and sometime spontaneous occurrences of the CISD gene in extant prokaryotes. For types 3 and 7 CISDs that are present sporadically in diverse archaeal and bacterial species with no clear phylogenetic pattern, HGT is likely the major gene spreading mechanism.
Gene loss appears to be an important determinant for the limited occurence of the CISD gene in prokaryotes. For genera in the CA-set with at least five sequenced genomes and at least one CISD-positive species, only 35621% (mean6SD) species in the same genus are CISD-positive. As species belonging to the same genus should most likely inherit the CISD gene from a common ancestor, the absence of CISD gene in sister species is highly suggestive of gene loss.
Eukaryotes may acquire CISDs from their prokaryotic ancestor or from antecedent of mitochondria, an ancient alphaproteobactrium. The latter route is even plausible given that three human CISD proteins are all associated with mitochondrial function.

Discussion
We have determined three structures that are representative of type 3, 4, 5, and 6 CISDs. Together with the previously determined structure of type 1 mitoNEET, the current four CISD structures provide a template to understand the assembly of ,90% of known CISD proteins that are homologous to one of these proteins. Sequence analysis of the remaining types 2a/2b and 7 CISDs further reveals their resemblance with types 1 and 4 CISDs, respectively.
Our major conclusion is that different types of CISDs are more related to each other at the structure level than their sequences initially suggest. The four CISD structures display striking similarity in their overall appearance, with dual iron-sulfur binding modules and a b-sandwich dimerization module. It appears that the CDGSH iron-sulfur binding module must exist in a pair that could result from either dimerization of a protein containing a single CDGSH motif or folding of a single polypeptide containing tandem CDGSH motifs. The obligatory dimerization of CISD is further supported by the fact that the CDGSH motif already contains a universally conserved interface (W1, P11 and W12) for dimerization. Nevertheless, the iron-sulfur binding module requires a b-sandwich dimerization module, including a conserved N-strand and P motif, to form an integral CISD structure.
Our analysis also shows that the seven CISD types can be architecturally reduced to three primary structural folds, which each display a distinct phylogenetic pattern (Fig. 6). Types 3, 4 and 7 CISDs adopt a prokaryote-specific fold. Among them, type 7 CISDs apparently resulted from fusion of type 4 CISDs with a DUF1271 domain, whereas type 3 CISDs are a derivative of the more predominate type 4 CISDs as revealed by their closely related structures (Fig. 4). Type 1 and 2 CISDs adopt a eukaryotespecific fold. The prokaryote-and eukaryote-specific folds are distinguished from each other by the different order of the P and CDGSH motifs in the primary sequence. The third tandem-motif fold adopted by types 5 and 6 CISDs is present across both bacteria and eukaryotes. Type 6 CISDs apparently evolved from an ancestral single-motif CISD through gene duplication and gene fusion, while type 5 CISDs resulted from fusion of type 6 CISDs with a glutamate synthase domain.
In conclusion, all CISDs appear to descend from an ancestral domain that contained a single CDGSH motif and formed a homodimer. During the course of evolution, diversification of the b-sandwich structure and gene fusion events have given rise to the three primary folds and seven major types of CISDs, while preserving the core structure of iron-sulfur binding module. The structural and evolutionary knowledge about CISDs will aid the investigation of their yet-to-be-understood function.

Sequence analysis
CISD proteins were searched in the NCBI nonredundant database by Pattern Hit Initiated BLAST (PHI-BLAST) using the consensus sequence of CDGSH motif as query [22]. Over 600 sequences were retrieved as of December 30, 2010, and were classified into seven major groups and an orphan group (Fig. 1, Table 1 and File S1, S2, S3, S4, S5, and S6). Some erroneous or low-quality sequences were removed. The retrieved sequences were multiply aligned with ClustalW [23]. Sequence manipulation and alignment were conducted in Jalview [24].
In the analysis of phylogeny distribution, multiple sequences of same type CISD in a species and sequences present in multiple strains were counted once. The fraction of CISD-positive genomes was calculated against a set of genomes that were completely sequenced or in draft assembly as listed in the NCBI genome project on December 30, 2010. The prokaryotic genomes in draft assembly are mostly complete, as the CISD-positive fractions calculated from the genomes in draft assembly alone are similar to those calculated from the completed genomes. However, many eukaryotic genomes in draft assembly are indeed incomplete, leading to underestimation of the fraction of CISD-positive genomes. A eukaryote that is not a metazoan, plant or fungus is regarded as a protist.
Type 7 CISDs proteins were assigned to one of the subtypes DC, CDC or DCC. The boundary of the DUF1271 domain was defined according to the structure of its homolog [25], and the individual CISD was then extracted and aligned.

Protein expression and purification
The gene sequences encoding a type 6 CISD protein MmCISD from Magnetospirillum magneticum AMB-1 (access number: YP_423370), a type 4 CISD protein RsCISD (NP_521033) from Ralstonia solanacearum GMI1000 and a type 3 CISD protein PcCISD from Pyrobaculum calidifontis JCM 11548 (YP_001056297) were chemically synthesized by Generay and cloned into plasmid pET28a(+) (Novagen). Residues 2-79 of MmCISD (79 residues in total) were cloned as a fusion construct to an N-terminal 6-His tag (MGSSHHHHHH). Residues 1-66 of RsCISD (75 residues in total) were cloned as a fusion construct to an N-terminal 6-His tag and PreScission cleavage site (MGSSHHHHHHSQDLEVLFQGPH). The full-length PcCISD protein (53 residues in total) was cloned with a C-terminal 7-His tag (HHHHHHH). All proteins were expressed in Escherichia coli BL21(DE3) strain. The cells were cultured at 37uC to an OD 600 of 0.8 and grew at 20uC for 16 h after protein expression was induced with 1 mM isopropyl b-D-1-thiogalactopyranoside.
For purification of His-tagged MmCISD, cells from 6 liter culture were lysed by sonication in buffer A [20 mM Tris-HCl (pH 7.9), 500 mM NaCl and 30 mM imidazole]. The clarified supernatant was loaded onto a 5-ml HisTrap column (GE Healthcare) pre-equilibrated with buffer A. The column was washed with 20 column volumes of buffer A, and the protein was eluted with a linear 30-500 mM imidazole gradient in 20 mM Tris-HCl (pH 7.9) and 500 mM NaCl. The fractions containing the target protein were concentrated and loaded onto a HiLoad 16/60 Superdex 75 column (GE Healthcare) equilibrated in 10 mM Tris-HCl (pH 7.9), 100 mM NaCl and 33 mM trisodium citrate. The protein was eluted in a broad and tailing peak. The early fractions were colorless, suggesting iron loss and protein aggregation. The late fractions (elution volume 83-90 ml) with reddish color were pooled, concentrated to 15 mg/ml, and screened immediately for crystallization.
His-tagged PcCISD was purified essentially following the same steps as MmCISD. The gel filtration with Superdex 75 was performed in buffer B [10 mM Tris-HCl (pH 7.9) and 50 mM NaCl]. Because the PcCISD protein coeluted (elution volume = 106-118 ml) with a salt peak, it was further exchanged to buffer B using a HiTrap desalting column (GE Healthcare). The protein was concentrated to 20-30 mg/ml in buffer B, flash-frozen in liquid nitrogen and stored at 280uC.
His-tagged RsCISD was purified similarly. To cleave the His-tag, the protein eluated from HisTrap column was diluted 2.5-fold with 20 mM Tris-HCl (pH 7.9) and incubated with PreScission protease overnight at 4uC. After concentrating and exchange to buffer B, the sample was passed through a HisTrap column to remove the cleaved His-tag and any uncleaved protein. The protein in the flowthough was further purified with a Superdex 75 column in buffer B. The protein was concentrated to 20-30 mg/ml, flash-frozen in liquid nitrogen and stored at 280uC.

Crystallization
Crystallization was carried out by the hanging-drop vapor diffusion method and by mixing 1 ml of concentrated protein solution with 1 ml of reservoir solution. Crystals of MmCISD were obtained in 4.0 M sodium formate at 4uC and cryoprotected in 25% glycerol made in the reservoir solution. Crystals of RsCISD were grown in 0.5% PEG 5000, 0.9 M K/Na tartrate and 100 mM Tris-HCl (pH 7.4) at 4uC and cryoprotected in 20% ethylene glycol made in the reservoir solution. Crystals of PcCISD were grown in 1.6 M MgSO 4 and 100 mM MES (pH 6.5) at 20uC and cryoprotected in 25% glycerol made in the reservoir solution.

Structure determination
The initial datasets for structure determination were all recorded using a Rigaku MicroMax-007 X-ray generator equipped with an R-AXIS IV++ image plate detector at a wavelength of 1.5418 Å . These datasets have resolutions of 1.8 Å for MmCISD, 2.3 Å for RsCISD and 3.0 Å for PcCISD. Final structures were refined against data with better resolution collected at beamline BL17U of the Shanghai Synchrotron Radiation Facility (SSRF) at a wavelength of 1.0 Å . Data were processed by Denzo and Scalepack in house or HKL2000 at the synchrotron [26].
The MmCISD crystal belongs to space group P6 5 and contains one molecule in the asymmetric unit (ASU). The structure was determined by single-wavelength anomalous dispersion (SAD) making use of the anomalous signal of iron. Heavy atom search, phase calculation and density modification were performed in SHARP [27]. The model was first built automatically by ARP/ wARP [28], followed by iterations of model adjustment in Coot [29] and refinement in Refmac [30]. The final model was refined to 1.15 Å resolution in Phenix with riding hydrogen and anisotropic temperature factor for individual atoms [31]. The current model contains MmCISD residues 2-79, 9 residues from the N-terminal His-tag, two [2Fe-2S] clusters and 152 water molecules.
RsCISD crystallized in space group C2 with two molecules in the ASU. The structure was solved by combination of SAD and MR. The substructure of mitoNEET (PDB code: 2QD0) containing residues 67-88 and 99-102 and the [2Fe-2S] cluster was used as a search model for two copies with Phaser [32]. The solution was then used to locate four iron atoms. SAD phasing, MR phase combination and density modification were conducted in SHARP. The model was built in Coot and refined in Refmac. The final RsCISD structure was refined to 1.8 Å resolution with Phenix and contains residues 5-65 of each subunit, two [2Fe-2S] clusters, one polyethylene glycol molecule, one tartaric acid molecule and 105 water molecules.
The PcCISD crystal is of space group P3 1 21 and has one molecule per ASU. The structure was also solved by a combination of SAD and MR phasing. The MR search model contained residue 71-87 of the mitoNEET structure with bound [Fe-2S] cluster, where the side chains of residues that differ between mitoNEET and PcCISD were trimmed to the Cb atom. A marginal solution with TFZ = 5.7 was found in Phaser. The anomalous map based on this solution showed two scatters, whose positions are consistent with the two iron atoms in the [2Fe-2S] cluster. The iron positions were used to calculate SAD phases in SHARP. The MR and SAD phases were combined and further improved by solvent modification, yielding an interpretable map at 3.0 Å . The model was manually built in Coot and refined in Refmac. The final model of PcCISD was refined to 1.5 Å resolution and contains residues 3-53, 6 histidine residues from the C-terminal His-tag, one [2Fe-2S] cluster and 40 water molecules. Structural figures were generated in PyMOL [33].

Protein Data Bank accession number
The atomic coordinates and structure factors have been deposited in the Protein Data Bank under accession number 3TBO for the PcCISD structure, 3TBM for the RsCISD structure and 3TBN for the MmCISD structure.

Supporting Information
File S1 Collection of CISD proteins with taxonomy information (Excel table).

(TXT)
File S3 Aligned sequences of type 2 CISDs in FASTA format. Each protein is named as described for File S2.

(TXT)
File S4 Aligned sequences of type 3, type 4 and type7 CISDs in FASTA format. Each protein is named as described for File S2. Type 7 is further divided into subtypes CDC, DCC and DC. The first and second CISD in subtypes CDC and DCC are marked by 1 and 2, respectively. (TXT) File S5 Aligned sequences of type 5 and type 6 CISDs in FASTA format. Each protein is named as described for File S2. The types 5 and 6 CISDs with a degenerated CDGSH motif are labeled with prime. Eukaryotic proteins are prefixed with ''e''. (TXT) File S6 Aligned sequences of orphan CISDs in FASTA format. Each protein is named as described for File S2. (TXT)