Figure 1.
Syntenic mapping of CEACAM/PSG family genes in vertebrates.
a) Syntenic mapping of the CEACAM/PSG locus in human, chimpanzee (P. troglodytes), and Rhesus monkey (M. mulatta). CEACAM/PSG genes of these primates can be subdivided into two clusters: cluster I genes are flanked by orthologs of TGFB1 and XRCC1 whereas cluster II genes are close to orthologs of TOMM40, APOE, and SIGLEC8. All CEACAM subfamily genes are indicated by blue oval symbols. CEACAM/PSG locus marker genes, including TGFB1, ATP1A3, ZNF574, PAFAH1B3, TMEM145, CNFN, LIPE, ETHE1, XRCC1, TOMM40, APOE, and SIGLEC8, are indicated by triangle symbols. The PSG subfamily genes are indicated by shaded square boxes. The relative position of genes is shown under gene symbols in Kbp. b) Syntenic mapping of CEACAM family genes in dog, rat, the gray short-tailed opossum (M. domestica), platypus (O. anatinus), and the clawed frog (X. tropicalis). The opossum contains more than three dozen paralogs on multiple chromosomes (Table S2 in File S1). For the opossum, only paralogs mapped on chromosomes 2 and 4 are shown. Those found on unknown chromosomes are described in Table S2 in File S1. Among these homologs, eleven (MdoCEACAMI-XI) were found to cluster in a 2-Mbp span on chromosome 4, which also contained the marker genes TOMM40 and APOE. On the other hand, the platypus genome encoded four CEACAM homologs (OanCEACAM16, 16LI, 20LI, and 20LII)(Table S2 in File S1). In X. tropicalis, three homologs (XtrCEACAMI-III) were located near marker genes, including LIPE, CNFN, TMEM145, PAFAH1B3, and ZNF574. The chromosomal number and the genomic contig number are indicated at the top of the schematic representation of each genomic fragment. CEACAM family genes are indicated by red diamond-shaped symbols. Marker genes are identified by colored diamond-shaped symbols. The relative position of genes on chromosomes and contigs is shown next to the gene symbols. c) Syntenic mapping of CEACAM loci in teleosts. The genomes of the medaka fish (O. latipes), stickleback (G. aculeatus), zebrafish (D. rerio), and two pufferfishes (T. rubripes and T. nigroviridis) encode 1–12 CEACAM family genes. Syntenic mapping indicated that zebrafish and T. nigroviridis CEACAM genes are located on whole genome duplication (WGD)-derived chromosome fragments, and that zebrafish CEACAMs on chromosome 16 are located on three separate loci (I, II, and III). The WGD-derived syntenic chromosomal regions in teleosts are indicated by a yellow background. The chromosomal number and the genomic contig number are indicated at the top of the schematic representation of each genomic fragment. CEACAM family genes are indicated by red diamond-shaped symbols. Marker genes are identified by colored diamond-shaped symbols. The relative position of genes on chromosomes and contigs is shown next to the gene symbols.
Figure 2.
Analysis of CEACAM/PSG family gene evolution based on the Neighbor-Joining method.
a) Phylogenetic tree of PSG subfamily genes from human, chimpanzee, and Rhesus monkey. A Rhesus monkey-specific cluster and an ape-specific cluster are indicated by vertical bars on the right. Potential pseudogenes, including human PSG10, chimpanzee LOC468901, and Rhesus monkey LOC709992, were excluded from the analysis. Human, Hsa; Chimpanzee, Ptr; Rhesus monkey, Mmu. b) Phylogenetic tree of 48 CEACAM family proteins from human, dog (C. familiaris), opossum (M. domestica), and platypus (O. anatinus). The analysis involved 48 protein sequences. There were a total of 2119 positions in the final dataset. The human CEACAM1-like cluster is indicated by a vertical bar on the right. The CEACAM16 and 20 homologs appear to diverge from other family members before the separation of eutherian, metatherian and prototherian mammals. Human, Hsa; dog, Cfa, opossum, Mdo; platypus, Oan. It is important to note that the bootstrap values for basal lineages in this tree are extremely low. The interpretation of this Neighbor-Joining tree has to be cautious. c) Phylogenetic tree of teleost CEACAM homologs. Twenty-three CEACAM proteins from D. rerio, G. aculeatus, T. rubripes, and T. nigroviridis were analyzed. A D. rerio-specific cluster is indicated by a vertical bar on the right. The robustness of the tree was assessed by 1,000 bootstrap replicates, and the percentage of replicates is shown next to the branches.
Figure 3.
Analysis of CEACAM/PSG family gene evolution based on the Maximum Likelihood method.
a) Phylogenetic tree of PSG subfamily genes from human, chimpanzee, and Rhesus monkey. A Rhesus monkey-specific cluster and an ape-specific cluster are indicated by vertical bars on the right. Potential pseudogenes, including human PSG10, chimpanzee LOC468901, and Rhesus monkey LOC709992, were excluded from the analysis. Human, Hsa; Chimpanzee, Ptr; Rhesus monkey, Mmu. b) Phylogenetic tree of 48 CEACAM family proteins from human, dog (C. familiaris), opossum (M. domestica), and platypus (O. anatinus). The analysis involved 48 protein sequences. The human CEACAM1-like cluster is indicated by a vertical bar on the right. Human, Hsa; dog, Cfa, opossum, Mdo; platypus, Oan. c) Phylogenetic tree of teleost CEACAM homologs. Twenty-three CEACAM proteins from D. rerio, G. aculeatus, T. rubripes, and T. nigroviridis were analyzed. A D. rerio-specific cluster is indicated by a vertical bar on the right.
Figure 4.
Analysis of CEACAM transcript expression in tissues of platypus, pufferfish T. nigroviridis, and zebrafish D. rerio.
a) RT-PCR analysis of OanCEACAM16, 16LI, 20LI, and 20LII in the intestine of a platypus. Size markers are shown on the left. Specific PCR products for OanCEACAM16 and 20LI are indicated by arrows. b) RT-PCR detection of transcripts of DreCEACAMI, VII, and X in kidney, testis, ovary, gill, gut, head, heart, liver, and fin of D. rerio (right panel) as well as TniCEACAMI-III in brain, muscle, gut, kidney, heart, liver, gill, and skin of T. nigroviridis (left panel) using gene-specific primers (Table S4 in File S1). Expected size of PCR products for each gene is indicated by an arrow.
Figure 5.
The human PSG locus exhibits frequent copy number variation (CNV).
a) Schematic representation of CNVs found at the human PSG locus on chromosome 19 (47,600–48,500 kb) based on studies using high-density probes and DNA sequencing [51], [52], [53]. CNVs that were identified in CEU (U.S. residents with northern and western European ancestry, N = 20) and YRI (Yoruba from Ibadan in African, N = 20) populations are indicated by blue brackets under the chromosome. CNVs that were identified in Asian populations (Chinese, Japanese, and Koreans; N = 30) are indicated by black brackets. CNVs that have a frequency higher than 50% are indicated by bold brackets. b) The size distribution of 387 unique chromosome-19 CNVs that have a length greater than 500 bp. The figure in the inset shows the distribution of long fragment CNVs (46 in total have a length >20 kb) found on chromosome 19 and those at the PSG locus.
Table 1.
CEACAM/PSG family genes have a high density of nonsynonymous SNPs.
Table 2.
A large fraction of human CEACAM/PSG genes contain nonsynonymous SNPs.
Figure 6.
The CEACAM/PSG and three other progressive gene families (sialic acid binding Ig-like lectin, leukocyte immunoglobulin-like receptor, and olfactory receptor) have a higher percentage of genes containing a nonsynonymous SNP with FST scores in the top 15% bracket as compared to that of the rest of genes (conserved genes) on chromosome 19.
Progressive gene families are those encoded secreted ligands or cell surface receptors, and expanded multiple times during primate evolution.
Table 3.
Percentage of genes with nonsynonymous SNP(s) that exhibit a high FST score (in the top 15% or 10% bracket).