Conceived and designed the experiments: NF KA AF. Performed the experiments: MI JZ YS TA NF. Analyzed the data: MI JZ ON TA AT NF. Contributed reagents/materials/analysis tools: JZ ON AT AT KA NF. Wrote the paper: NF MI JZ.
The authors have declared that no competing interests exist.
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a
Organisms display traits beautifully adaptive for their environments. How organisms come to possess adaptive traits is a fundamental question for evolutionary biology. It is accepted that genomic alterations lead to diverse traits, and adaptive traits are then selected during evolutionary history. To understand the mechanisms of environmental adaptation, it is necessary to link genome to trait. Previous studies have identified genomic alterations causing evolved traits
Experimental evolution studies utilize model organisms evolved in defined environments in the laboratory, and therefore they address environmental adaptation more directly. Indeed, previous experimental evolution studies observed genomic alterations under environmental selection and evaluated the effectiveness of multiple genes on fitness
We utilized NGS technology to study an unusual line of
In 1954, a fly population derived from one pair of Oregon-R-S flies was divided into 6 populations. Three of them (aL, bL and cL populations) were reared in normal light-dark cycling conditions and the remaining three populations (dD, eD, and fD populations) were reared in constant dark conditions. Unfortunately, all of the L lines were lost by 2002. The dD and eD lines were lost in 1965 and 1967, and only the fD line has been maintained until now. In 2008, we started to rear the fD line and designated it “Dark-fly”. We have maintained Dark-fly in a minimum medium as done before (black lines), and in a standard cornmeal medium (white lines) in parallel. The population size of Dark-fly has not been controlled but has usually been about 100 flies each in several culture vials.
Here, we found that Dark-fly produced more offspring in dark than in light conditions, suggesting that Dark-fly possesses some traits advantageous in darkness. To examine genomic alterations involved in environmental adaptation, we performed whole genome sequencing for Dark-fly using NGS technology and found unique features of its genome.
We first asked whether Dark-fly exhibits successful reproduction in dark conditions, as a feature of environmental adaptation. Adult flies were placed in a light-dark cycling (12-hour ∶ 12-hour; LD), constant light (LL) or constant dark (DD) condition for 3 days and the offspring were counted. We used the Oregon-R-S strain, which was obtained from a stock center, as a control line, because Dark-fly originated from that strain
(A) Three-day fecundity (offspring/female) of Dark-fly and Oregon-R-S in LL, LD and DD conditions are shown by box plots. Boxes and median lines represent inter-quartile range and median values of data, and vertical lines represent minimum and maximum values of data within 1.5-fold of the inter-quartile range. Circles indicate values of outliers. * indicates FDR-adjusted p-value<0.05, Welch t-test. n = 10 (total 100 females). (B) Lifetime fecundity (offspring/female) of Dark-fly and Oregon-R-S in LD and DD conditions are shown by box plots in a similar manner to (A). ** indicates p-value<0.01, Welch t-test. n = 10 (total 100 females).
We next examined the fecundity over a fly's lifetime. Dark-fly produced a similar number of offspring over its lifetime in LD and DD conditions (
The decreased fecundity of Oregon-R-S in the dark appears to be partly due to decreased adult viability. When males and females were reared together, Oregon-R-S and Dark-fly males showed similar viability (
The viability of male flies (A) and female flies (B) reared together is plotted versus time (days). Dark-fly (red lines) and Oregon-R-S (blue lines) were reared under LD (dotted lines) and DD (solid lines) conditions. The viability of virgin females (C) was also measured in a similar manner. n = 92–100 flies. Oregon-R-S virgin females showed longer longevity than the mated ones, whereas Dark-fly virgin females showed shorter longevity than the mated ones.
To understand the molecular nature of Dark-fly's traits, we extracted genomic DNA from 20 adult males each of Dark-fly and Oregon-R-S, and performed whole genome sequencing using an Illumina Genome Analyzer II. Approximately 67 million and 87 million reads were obtained for Dark-fly and Oregon-R-S, respectively, and 96 and 90% of reads were successfully aligned to the
fly line | read length | read number | mapped read number | mapped read % | total read bases | mean depth |
Dark-fly | 36 | 66,855,594 | 64,422,374 | 96.4 | 2,319,205,464 | 13.7 |
Oregon-R-S | 36, 39, 48 | 87,101,330 | 78,109,114 | 89.7 | 3,307,906,716 | 19.6 |
The results of genome sequencing using an Illumina Genome Analyzer II are summarized. Flybase Dmel 5.22 genome (168,736,537 bases) was used as a reference genome.
After filtering the quality of each sequence, single nucleotide polymorphisms (SNPs) were identified at 415,626 sites for Dark-fly and 415,668 sites for Oregon-R-S, compared with the reference genome sequence (
fly line | Dark-fly | Oregon-R-S |
total fixed SNPs | 415,626 | 415,668 |
SNP frequency (bases/SNP) | 406 | 406 |
line-specific SNPs | 217,340 | 217,382 |
total SNPs | ||
total SNP-effects | 1,435,028 | 1,424,012 |
intergenic | 826,111 | 824,781 |
UTR and intron | 486,090 | 499,604 |
synonymous coding (sSNP) | 96,674 | 78,152 |
non-synonymous coding (nsSNP) | 25,514 | 20,840 |
others | 639 | 635 |
line-specific SNPs | ||
nsSNPs without redundancy | 9,695 | 6,521 |
genes carrying nsSNPs | 4,323 | 3,039 |
genes carrying nonsense mutations | 28 | 23 |
total fixed InDels | 5,322 | 5,461 |
InDel frequency (bases/InDel) | 31,705 | 30,898 |
line-specific InDels | 4,660 | 4,799 |
total InDels | ||
total InDel-effects | 16,726 | 17,507 |
intergenic | 8,790 | 9,767 |
UTR and intron | 7,790 | 7,674 |
coding region (cInDel) | 144 | 66 |
others | 2 | 0 |
line-specific InDels | ||
cInDels without redundancy | 52 | 27 |
genes carrying cInDels | 50 | 27 |
genes showing increased CNVs | 122 | ND |
genes showing decreased CNVs | 133 | ND |
These data represent a summary of our analyses of SNPs, InDels and CNVs for the Dark-fly and Oregon-R-S genomes. ND means not determined.
Since Dark-fly displays some traits advantageous for living in the dark, it should carry some genomic alterations related to these traits. Even if so, most of the SNPs we found would be expected to be functionally neutral and only a small fraction of the SNPs should contribute to the traits. To evaluate the Dark-fly SNPs, we categorized each SNP by its position relative to gene structures, such as intergenic regions and gene coding regions. Since one SNP often affects several isoforms of a gene or several overlapping genes simultaneously, the 415,626 SNPs of Dark-fly were classified to 1,435,028 SNP-effects (
An InDel is an insertion or deletion of a few nucleotides and can be detected by analyzing the NGS data. We identified 5,322 and 5,461 InDels for Dark-fly and Oregon-R-S, respectively, and 662 of these InDels (12.4% for Dark-fly) were shared between them (
We then asked whether the nsSNP or cInDel-carrying genes are concentrated in any gene families in the Dark-fly genome. Using the web-based tool DAVID
Among nsSNPs, a nonsense mutation produces a stop codon in the amino acid sequence of a gene product, and may severely affect the protein's function. We identified 28 nonsense mutations in the Dark-fly genome (
Runs of homozygosity (ROH) regions are homozygosity-extended genomic regions (more than a few hundreds kb) containing consecutive homozygous SNPs and are thought to be regions currently selected in a population's genome
Mean homozygosity of SNPs in a sliding window (200-kb window at 100-kb steps) was plotted versus the location on 2L (A), 2R (B), 3L (C), 3R (D) and X (E) chromosomes. The Oregon-R-S genome (blue lines) displayed higher homozygosity than the Dark-fly genome (red lines) in most of the regions. Thick horizontal bars represent ROH regions identified by PLINK software for Oregon-R-S (blue bars) and Dark-fly (red bars) and are plotted above the graph without homozygosity values.
fly line | Dark-fly | Oregon-R-S |
homo and hetero SNPs (0.4 = <freq) | 477,816 | 486,013 |
homo SNPs (0.9 = <freq) | 449,684 | 453,646 |
hetero SNPs (0.4 = <freq<0.9) | 28,132 | 32,367 |
homo SNP fraction in total (%) | 94.1 | 93.3 |
number of ROH regions | 24 | 128 |
total length of ROHs (kb) | 5,934 | 43,868 |
fraction of ROHs in genome (%) | 4.99 | 36.85 |
average length of ROH (mean ± SD, kb) | 230±70 | 342±155 |
average SNP number in ROH (mean ± SD) | 981±449 | 1621±989 |
average homo SNP fraction in ROH(mean ± SD, %) | 97.5±0.9 | 98.2±0.5 |
number of ROH regions with significantly high homozygosity | 21 | ND |
genes carrying nsSNPs and cInDels in ROH regions | 241 | ND |
These data represent a summary of our analyses of ROH regions. Homo and hetero SNPs were identified using Samtools and Vcftools functions. The number of homo SNPs was slightly different from that of the fixed SNPs identified using VarScan functions (
We also measured mean homozygosity (mean frequency of each SNP) in the Dark-fly and Oregon-R-S genomes (
ROH ID# | Chr | positionstart | positionend | lengthbases | genes carrying nsSNPs or cInDels |
ROH1 | 2L | 3353705 | 3669168 | 315463 | CG8838, CG34394, Ptpa, CG34175, CG31952, CG3238, CG31776, Sr-CIV, Spindly |
ROH2 | 2L | 6535198 | 6782752 | 247554 | CG9596, CG11319, CG11320, CG34345, Oatp26F, Tango1, CG31633, CG11070, CG13771, Nhe3, CG11327, GRHR, CG11188, homer, TTLL3A, CG31910, CG11221, CG11322, CG11321, CG17378 |
ROH3 | 2L | 8847085 | 9109796 | 262711 | CG32986, CG34398, CG9510, CG31886, CG32985, CG32984, CG18088, CG9541, CG9555, CG17906, CG18661, CG9568, CG9582, Toll-4 |
ROH4 | 2L | 10278630 | 10524864 | 246234 | CG34043, CG5604, CG13138, CG5384, CG4972, GATAd, CG34367, CG5367, Cand1, pim, CG5056, rho-5, CG33303, gny, CG5168, CG5188, CG6232, CG5322, CG6206, RluA-1, RluA-2, CG7456, CG13144, Myo31DF, CG7384, Fatp |
ROH5 | 2L | 13521459 | 13806482 | 285023 | CG33641, CG33644, CG33645, CG16853, CG18507, CG7311, CG31814, CG9014, CR31845, CG31731, sec71 |
ROH6 | 2L | 13806743 | 14034237 | 227494 | CG16865, Sos, b, tam, Orc5, mRpS23, CG33307, CG33306, CG8997, cenG1A, Ance-2, CG16886, CG16884, nimB1, nimB3, nimB5, He, nimC1, rk, bgm, CG18095 |
ROH7 | 2L | 15628469 | 15854613 | 226144 | CG7631, CG18480, CG4587, CycE, Ku80, CG18109, CG18518 |
ROH9 | 2R | 2722221 | 2975600 | 253379 | CG15236, Spn42Db, Spn42De, CG3358, CheB42b, CheB42c, ppk25, mim, Cyp6u1, CG30157, vimar, Tsp42Ee, Tsp42Eh, Tsp42Ei, CG12831 |
ROH10 | 2R | 12738094 | 13006423 | 268329 | Fen1, CG8910, Pkc53E, CG15614, mute, CG6665, ste24b, CG6796, CG8963, Ark, RhoGEF2, CG9640, CG9642, CG9646, CG8950, CG6967, CG30460, CG30456, CG15611 |
ROH11 | 3L | 3118085 | 3327625 | 209540 | CG14963, CG32284, CG32277, CG12034, CG11505, CG12009 |
ROH13 | 3L | 14059399 | 14275678 | 216279 | pex1, CG8100, Fbp1, Sox21b, nuf, CG34244 |
ROH14 | 3L | 15737620 | 15945049 | 207429 | CG13445, CG12713, CG32150, CG12486, pHCl, sff, Pka-C3 |
ROH15 | 3L | 18793182 | 19024297 | 231115 | CG14073, CR32027, CG14074, dysb, CG11637, Ir75d, CG14077, CG3819, CG14075, CG11619, CG18135, CG3808, CG18136, nkd |
ROH16 | 3L | 20560665 | 20819130 | 258465 | CG13251, CG34260, CG13252, CG4074, Pitslre, Spc105R |
ROH17 | 3L | 22471441 | 22725139 | 253698 | CG14459, CG14453, CG11370, CG6838, CG32454 |
ROH19 | 3R | 2862778 | 3085343 | 222565 | CG1988, CG1105, CG1965, CG1943, CG1091, CG31248, MAGE, lap, CG14605, CG1227 |
ROH20 | 3R | 3257401 | 3475620 | 218219 | CG14598, alpha-Est10, alpha-Est9, alpha-Est8, alpha-Est7, alpha-Est6, alpha-Est5, alpha-Est3, alpha-Est2, CG34127 |
ROH21 | 3R | 8358059 | 8659641 | 301582 | Octbeta2R, CG11608, Cyp313a4, CG14391, mus308, Men, CG5724, CG5999 |
ROH22 | 3R | 9912039 | 10141059 | 229020 | CCHa1, Or88a, Kif19A, 140up, CG14356, CG42500, CG31533, CG31327, DopR, CG9649, CG9631, Aats-met, trx, CG3259, su(Hw), CG31321 |
ROH23 | 3R | 12540659 | 12771162 | 230503 | Ubx, Glut3, Abd-B |
ROH24 | 3R | 22056540 | 22307403 | 250863 | CG14239, Hex-t1, CG5455, CG6490 |
The chromosomal position and length of the Dark-fly ROH regions showing significantly high homozygosity are listed. Genes carrying nsSNPs and InDels in each ROH region are shown. Details regarding nsSNPs and cInDels are presented in
We further characterized the Dark-fly ROH regions and identified 241 genes containing nsSNPs and/or cInDels (
Structural variations are generated by recombination and transposition of genome fragments, and together with SNPs and InDels, are important types of genomic alterations. Since short-read sequencing by NGS technology is not suitable for analyzing large-scale structural variations, we instead performed microarray analysis of genomic DNA. We used a
A view of Integrated Genomics Viewer around the CG4594 gene. The numerous small gray bars represent reads of genome sequencing. A region of about 500 bases in the CG4594 gene (red thick bar) was not covered by any read sequences of the Dark-fly genome (upper), but was fully covered by the sequences of the Oregon-R-S genome (lower). Numbers on a horizontal line indicate nucleotide position on chromosome 2L. Numbers on vertical alignment indicate read depth.
Reproductive success is one of the adaptive traits under natural and laboratory selection. Dark-fly produced more offspring in the dark than in the light for the first 3 days. This early reproduction of Dark-fly would be advantageous in the laboratory routine of fly maintenance. We observed that Dark-fly females do not show the gradual death that occurs in Oregon-R-S females in the dark, and as a result, Dark-fly females retain fecundity for a longer time in the dark. This trait would also contribute to reproductive success.
The early reproduction could be achieved via various traits of the fly, for example, egg-laying ability and mating behavior. Indeed, we observed abnormal mating behaviors of Dark-fly. Dark-fly males and females copulated more quickly than the Oregon-R-S pairs (K. Okamoto and N.F., unpublished data), suggesting that mating behaviors might be stimulated in the Dark-fly pairs: males might easily become active for courtship and females might easily accept males. Mating behavior is controlled by multiple sensory inputs, such as smell and taste
Oregon-R-S females gradually died in dark conditions, while Dark-fly females did not show such gradual death. This phenomenon is probably a complex consequence not easily explained, but it might be related to the fact that Dark-fly females retain longevity after mating. Reproduction is generally a cost for longevity
We determined the whole genome sequence for Dark-fly and identified approximately 220,000 SNPs and 4,700 InDels compared with the genome of Oregon-R-S strain. Although Dark-fly was derived from the Oregon-R-S strain 57 years ago, the genome sequences of the present Dark-fly and the present Oregon-R-S were somewhat divergent. Previous studies evaluated the spontaneous nucleotide mutation rate in
Analyses of ROH regions unexpectedly revealed that although the Dark-fly and Oregon-R-S genomes contain similar numbers of homozygous (fixed) and heterozygous (floating) SNPs, they contain different numbers of homozygosity-extended regions. That is, whereas fixed SNPs and floating SNPs are clustered with each other in the Oregon-R-S genome, they are distributed more evenly in the Dark-fly genome. These genome features might reflect differences of the population histories. For example, inbreeding (isogenization) might have occurred frequently for Oregon-R-S during its history, and consequently many SNPs might have become fixed as clusters in the population genome. In contrast, Dark-fly has been maintained mostly as a constant population size (about 100 flies), and many genomic regions might still be under genetic drift. If this is true, it would strongly support the notion that the Dark-fly ROH regions are rare genome regions selected during the current history (57 years).
Dark-fly possesses some traits advantageous in darkness and should carry some genomic alterations responsible for these traits. To search for such mutations, we characterized SNPs, InDels, and CNVs in the Dark-fly genome. We identified 21 ROH regions selected during the Dark-fly history. These regions contain 241 genes carrying nsSNPs and cInDels. These genes include 9 alpha-esterase genes, which are located as a cluster on chromosome 3R
The Dark-fly ROH regions also contain 5 guanyl-nucleotide exchange factor (GEF) genes carrying nsSNPs and cInDels. GEFs are regulators of small GTPase involved in various biological processes, such as neural development and activity. For example, Son of sevenless (Sos) is required for development of R7 photoreceptor neurons
We identified 28 nonsense mutations in the Dark-fly genome (
Rhodopsin is a light-sensing receptor that belongs to the G protein-coupled receptor family, and the
The independent lines of evidence of our CNV data and our NGS data strongly suggest that the coding region of CG4594 is deleted in the Dark-fly genome. The CG4594 gene encodes a putative dodecenoyl-CoA delta-isomerase. In
We identified ROH regions selected in the Dark-fly genome, and found that nsSNPs and cInDels were preferentially accumulated in some gene families in these regions. These are potential candidate genes related to Dark-fly's traits. Some of the genes might contribute to gain of useful traits or loss of useless traits in the dark environment. Alternatively, some genes might contribute to trade-off between useful traits and useless traits, as demonstrated in cavefish: the cavefish Shh gene has pleiotropic roles for gain of a wide jaw and loss of eyes
Dark-fly Oregon-R-S (referred to simply as “Dark-fly”) was kindly provided by Dr. Michio Imafuku (Dept. of Zoology, Kyoto University). Since 1954, Dark-fly has been maintained in a constant dark condition with a minimal nutrient medium, Pearl's medium (
We used several wild-type strains as controls. The Oregon-R-S strain provided by Dr. Michio Imafuku was derived from the Kyoto Stock Center and was used for analyses of the whole genome sequence. Another Oregon-R-S strain and the Oregon-R strain (the mother strain of Oregon-R-S) obtained from the Bloomington Stock Center (BL#4269 and 25211 stocks, respectively) were used for the fecundity and viability assays and for the CNV analysis, respectively.
Healthy virgin males and females were collected by brief ice-anesthesia 2 days before the experiment. Ten male and 10 female flies were mixed in a culture vial and were reared in constant light (LL), LD or DD conditions for 3 days (72 hours). Offspring were continuously reared in the indicated conditions and were counted after adult emergence.
To measure the lifetime fecundity, flies were reared in LD or DD conditions and were transferred to new vials every one or two days until all of the adults died. The offspring were reared in the LD condition, and the number of pupae was counted as offspring.
To measure the adult viability, 10 flies each in 10 vials were transferred to new vials every one or two days until all of the adults died. Dead adult flies were counted at the time of every transfer. When the total number of dead adults was smaller than the number of flies at the start, flies that had escaped during experiments (less than 8/100) were ignored for the calculation of viability.
Statistical analyses were performed using t.test (with var.equal = F option), pairwise.t.test (with p.adj = “fdr”, var.equal = F options) and boxplot functions of R software (ver. 2.12.1:
Genomic DNA was extracted from 20 adult males by a standard method. Briefly, flies were homogenized in lysis buffer (50 mM Tris-HCl pH 7.5, 350 mM NaCl, 10 mM EDTA, 2% SDS, 7 M urea) and the lysate was extracted with phenol and chloroform, and after RNase treatment, genomic DNA was precipitated with ethanol. Sequencing libraries (paired-end library for Dark-fly and single-end library for Oregon-R-S) were constructed according to manufacturer's protocols. Sequencing was performed using an Illumina Genome Analyzer II, and running 8 lanes for each library. Raw sequence data (36, 39 or 48 bases/read) were obtained as FASTQ files. The data were deposited in DDBJ under accession number DRA000451 (DRR001444–DRR001447).
Raw data of read sequences were aligned on the reference genome (Flybase FB2009_09 October, Dmel Release 5.22) using aln, sampe and samse functions (without any options) of BWA software (ver. 0.5.9:
The nucleotide sequences of DGRP lines were obtained from the Drosophila Genetic Reference Panel database (
We used snpEff software to classify SNPs and InDels by their locations relative to gene structures according to the gene annotation data (UCSC dmel 5.22). Our classified groups were intergenic (snpEff terms: intergenic, upstream and downstream), UTR and intron (intron, splice site, UTR 3′, UTR 5′ and start gain), synonymous in coding region (synonymous coding, synonymous start and synonymous end), non-synonymous in coding region (non-synonymous coding, start loss, stop gain and stop loss), InDels in coding region (codon insertion, codon deletion and frameshift) and others (noncoding and unknown). We focused on the non-synonymous SNPs (nsSNPs) and coding InDels (cInDels). Genes carrying nsSNPs and cInDels were classified into Gene Ontology (GO) families (MF4) using the DAVID web-based tool (ver. 6.7:
To obtain data of heterozygous (hetero) and homozygous (homo) SNPs, the BWA-alignment read data (bam files) were converted to the variant call format files (vcf files) using the SAMtools mpileup function (with -B -g –f options) and Bcftools view function (with -c -g -v -N -t 0.1 options). InDel data and low coverage data (less than 5 reads) were removed using the original bash scripts. Vcf files were convert to ped files using Vcftools (ver. 0.1.7:
To evaluate ROH regions under statistical tests, SNPs floating in the population genome (frequency> = 20%) were called from the BWA-alignment read data (bam files) using the SAMtools pileup function (without any options) and VarScan pileup2snp function (with –min-coverage 5 –min-reads2 2 –min-var-freq 0.2 –p-value 0.05 options). SNP frequency (homozygosity) data in the ROH regions were collected using the original bash scripts and were statistically tested by comparing with the average homozygosity of the whole genome using the t.test function (with var.equal = F, alternative = ”greater” options) of R program. Homo SNP fraction of each ROH was also statistically tested by comparing with the average fraction of the whole genome using the fisher.test function (with alternative = ”greater” option) of R program. For graphical analysis, mean homozygosity in the sliding window was calculated from the SNP frequency data using the “sliding.window” function of R program. Mean homozygosity of sliding windows was plotted on chromosomal locations using the R plot function.
DNA isolation and purification were done as described in Zhou et al. (2011)
Microarrays were ∼18,000-feature cDNA arrays spotted with
(PDF)
(PDF)
(PDF)
(ZIP)
(TXT)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
We thank Dr. M. Imafuku for providing us a great resource, “Dark-fly”. We also thank Mr. K. Okamoto, Mr. R. Miyamoto, Ms. M. Maeda and Mr. K. Tsujimoto for daily discussions, Ms. M. Futamata (Dr. T. Uemura's laboratory) for routinely providing us fly food, Dr. E. Nakajima for critical reading of the manuscript, Dr. T. Takano and members of the Global COE Program for helpful comments and discussions, and T. Shin-i and Y. Minakuchi for support of computational data management. We also thank Kyoto Drosophila Genetics Resource Center and Bloomington Drosophila Stock Center for providing us fly strains, and many laboratories for allowing us to use valuable open-source programs.