Male recombination produced multiple geographically restricted neo-Y chromosome haplotypes of varying ages that correlate with onset of neo-Y decay in Drosophila albomicans

Male Drosophila typically have achiasmatic meiosis, and fusions between autosomes and the Y have repeatedly created non-recombining neo-Y chromosomes that degenerate. Intriguingly, Drosophila nasuta males recombine, but their close relative D. albomicans reverted back to achiasmy after evolving neo-sex chromosomes. Here we use genome-wide polymorphism data to reconstruct the complex evolutionary history of neo-sex chromosomes in D. albomicans and examine the effect of recombination and its cessation on the initiation of neo-Y decay. Population and phylogenomic analyses reveal three distinct neo-Y types that are geographically restricted. Due to meiotic exchange with the neo-X, overall nucleotide diversity on the neo-Y is similar to the neo-X but severely reduced within neo-Y types. Consistently, outside of the region proximal to the fusion, the neo-Ys fail to form a monophyletic clade in sliding window trees. Based on tree topology changes, we inferred the recombinant breakpoints that produced haplotypes specific to each neo-Y type and estimated their ages revealing that recombination became suppressed at different time points for the different neo-Y haplotypes. Although there are no evidence of chromosome-wide differentiation between the neo-sex chromosomes, haplotype age correlates with onset of neo-Y decay. Older neo-Y haplotypes show more fixed gene disruption via frameshift indels and down-regulation of neo-Y alleles. Genes are downregulated independently on the different neo-Ys, but are depleted of testes-biased genes across all haplotypes, indicating that genes important for male function are shielded from degeneration. Our results offer a time course of the early progression of Y chromosome evolution, showing how the suppression of recombination, through the reversal to achiasmy in D. albomicans males, initiates the process of degeneration.


INTRODUCTION
Sex chromosomes originate from a pair of homologous autosomes, yet they are often highly differentiated in morphology and function [1]. This transformation starts when a sex--determining factor arises on an autosome causing it to have a sex--specific transmission. Sexually antagonistic mutations, which are variants that are beneficial to one sex but detrimental to the other, are expected to accumulate in close proximity to the sex--determining factor [2,3]. To ensure linkage of the sex-determining factor with sexually antagonistic alleles, recombination between the nascent X and Y chromosomes will become suppressed first around the sex--determining locus. Subsequent gains of sex-beneficial alleles across the chromosome can lead to the expansion of the non--recombining region (for review see [4]).
The cessation of recombination between the sex chromosomes marks a pivotal step that leads the X and Y down a predictable trajectory of differentiation. The absence of recombination on the Y implies that mutations on the same chromosome cannot become unlinked by recombination [2,5,6].
This can lead to the irreversible accumulation of deleterious mutations on the Y, and in the long term, the loss of all of its ancestral genes [1,3,6,7]. In particular, positive selection for an advantageous variant on the Y will cause a sweep of the entire chromosome, dragging along all linked deleterious alleles to fixation [7,8]. Similarly, negative selection on the Y chromosome can greatly reduce its effective population size below its neutral expectation of 1/4 of the effective population size of autosomes, thereby increasing the rate of fixation of weakly deleterious alleles [6,9]. Finally, Y chromosomes may irreversibly accumulate deleterious mutations by a process known as Muller's ratchet, where Y's with the fewest number of deleterious mutations are lost by genetic drift [2,10]. All these processes lead to an accumulation of deleterious mutations on the Y, and greatly reduce levels of nucleotide diversity [6,9]. The end result is typically a degenerate chromosome replete with pseudogenes and repetitive elements, and this chromosome is often transcriptionally suppressed by heterochromatin formation [11][12][13][14][15][16]. During the gradual loss of expression on the Y, the X acquires mechanisms of dosage compensation to balance reduced expression [17]. In Drosophila, this is accomplished by chromosome-wide up--regulation of the X by epigenetic mechanisms [18]. However, it can also be achieved through gene--by--gene up--regulation as exemplified by genes on the Z chromosome of birds [19,20].
Unique to Drosophila and some other insects is the fact that males are achiasmatic, i.e. recombination does not occur during male meiosis. Therefore, autosomes that become fused to the ancestral Y chromosome (so--called neo--Y chromosomes) will be transmitted through males only and will immediately stop any genetic exchange with their former homologs (termed neo--X chromosomes), bypassing the stepwise formation of non--recombining regions. Neo--sex chromosomes have formed independently multiple times in the Drosophila genus by fusions of autosomes (referred to as Muller elements [21,22]) to the ancestral sex chromosomes (Muller A) [23]. Neo--Ys at various state of degeneration within the Drosophila genus reflect their different time of origination. Among the oldest is the neo--Y in D. pseudoobscura which is likely the remnant of Muller D after 15 million years of degeneration; the autosomal origin is nearly unrecognizable, with very few genes remaining on its neo--Y [24,25]. Intermediate levels of degeneration can be found on the neo--Y of D. miranda which arose 1.5 million years ago through the fusion of Muller C to the Y [26]. Over 90% of the ancestral protein coding genes can still be identified on the neo--Y, but 40% have become non--functional through frameshifts, nonsense mutations, deletions and transposable element insertions [27][28][29][30]. On the younger neo--Y of D.
busckii which formed through the fusion of Muller F with the sex chromosome around 0.8 million years ago, over 60% of the genes have become pseudogenized [31,32]. Despite the different ages, heavy enrichment of repressive chromatin marks can be found on all these neo--Ys.
One of the youngest known neo--sex chromosome pairs is found in D. albomicans, a species primarily distributed across the Asia Pacific island chain ( Figure 1) [33,34]. D. albomicans is a young species that recently diverged from the west african species D. nasuta between 150 and 500 thousand years ago [35,36]. These otherwise indistinguishable sister species produce fully fertile hybrids [37,38], but differ karyotypically by two Robertsonian fusions of the ancestral X and Y with chromosome 3 (henceforth Chr. 3), which is itself a fusion between Muller C and D [39,40]. These two fusions produced two massive neo--sex chromosomes in D. albomicans that are each over 55 Mb ( Figure 1B). Previous studies have found very few coding differences between the neo--sex chromosomes, but the neo--Y nonetheless shows signs of reduced expression suggesting that degeneration might have begun with disruption of regulatory elements instead of coding sequences [41].
Non--recombining neo--Y chromosomes are expected to harbor low levels of sequence polymorphism [26,42]. Intriguingly, a recent study investigating levels of sequence variability at 27 neo-sex linked loci found unexpectedly high levels of polymorphism on the neo--Y chromosome of D.
albomicans, that also has high levels of shared variation with the neo--X chromosome [43]. These findings are inconsistent with either a single origin of the neo--Y, or with recombination being absent between the neo--sex chromosomes. Surprisingly, Satomura et al showed that recombination occurs in D. nasuta males, yet is currently absent in D. albomicans males [43], suggesting that ancestral recombination between the neo--sex chromosomes in D. albomicans males has resulted in the pattern of shared polymorphisms between them.
These surprising observations imply that the young neo--sex chromosomes of D. albomicans not only present a window to examine the initiating stages of differentiation, but also provide a unique opportunity to understand how the presence and absence of recombination affects and interacts with the process of Y degeneration. Here, we investigate patterns of neo--Y variation from six population of the species range with whole genome sequencing ( Figure 1B). Using population and phylogenetic analyses, we confirm that the neo--Y fusion occurred once, and subsequent meiotic exchange with the neo--X generated multiple neo--Y haplotypes of different ages that are now geographically restricted.
Curiously, the Chr. 3 that fused with the ancestral Y carried a D. nasuta introgression, which was subsequently broken down by male recombination. We show that different Y--haplotypes have lost recombination at different time points in the past, and their age is associated with different extents of degeneration. These results untangle the complex series of events during the formation and subsequent evolution of the neo--Y in D. albomicans and provide timed snapshots of the effect cessation of recombination has on the beginning of sex chromosome differentiation.

Complex patterns of evolutionary history on the neo--Y is inconsistent with single origin
In order to characterize the geographic distribution of neo--Y chromosomes, we selected eleven inbred lab strains of D. albomicans that were collected from Shilong India, Kunming China, Nankung Taiwan, Okinawa, Kyoto, and Fukui Japan (Supplementary Table 1). The males and females of each strain were genotyped using Illumina high throughput sequencing. Since the current reference is a female assembly lacking the neo--Y chromosome, neo--Y genotypes were determined from neo--X sites that are heterozygous in males but homozygous in females; the neo--Y alleles are then the alleles absent in females. This generated 255,010 variant neo--Y sites across the eleven neo--Ys and 584,333 across the neo--sex chromosomes after filtering. We validated 14 out of 14 neo--Y alleles that coincided with RFLP sites, confirming the efficacy of our approach (Supplementary Table 2). Application of the same bioinformatics pipeline on an autosome identified very few male--specific SNPs confirming that our approach is highly sensitive to detect neo--Y specific SNPs (Supplementary Table 3). Using all sites across the neo--Y chromosome, we constructed a maximum likelihood tree that revealed three distinct groups of neo--Ys ( Figure 2A). Group 1 and group 2, henceforth Y1 and Y2, include flies from Taiwan and Japan, with Y2 composed exclusively of flies from the Okinawa Island. The Shilong and Kunming lines form a distinct and distant group, henceforth Y3. The three groups are separated by long branches, but individuals are highly similar within groups. This starkly contrasts to the phylogeny of the neo--X chromosome, where the branches are substantially longer between individuals which lack strong clustering, indicative of higher rates of polymorphisms and less differentiation between populations ( Figure 2B).
The formation of a Drosophila neo--Y chromosome is typically accompanied with a reduction of effective population size and nucleotide diversity (π). Furthermore, because the fusion that formed the neo--Y likely happened only once and recently [40], π is expected to be minimal due to severe bottlenecking associated with the fusion. While π in the region proximal to the centromere is low, π across the rest of the neo--Y is only marginally lower than π of the neo--X ( Figure 2C). Interestingly, within each Y group, we find large swaths of the chromosomes with minimal π, indicating that the Ys within a group are nearly identical ( Figure 2C). Therefore, the elevated π is the result of highly differentiated neo--Y chromosomes groups, as further evidenced by high levels of F ST between the groups (Supplementary Figure 1). For all the Y groups, differentiation from the neo--X is extremely high with F ST > 0.6 in regions proximal to the centromere, followed by precipitous drops to 0.4 ( Figure 2C, . For Y1 and Y2, the drops occur more distally from the centromere at 50.5 Mb than for Y3 at 53Mb. Y1 and Y2 both see additional valleys of F ST spanning large parts of the chromosome arm. The drops in F ST are primarily due to increases in shared polymorphism between the neo--X and neo--Y along the chromosome ( Figure 2D). The centromere proximal region and the telomeric end are the only region with a high proportion of fixed differences ( Figure 2D). To better understand the causes of these variable patterns of differentiation, we generated maximum likelihood trees for the neo--Xs and neo--Ys in 200kb sliding windows across the chromosome.
We find that the centromere proximal region shows the expected topology, where all the neo--Ys form a monophyletic clade with a deep split from all the neo--Xs, further affirming that the neo--Y fusion occurred once shortly after the species split from D. nasuta ( Figures 2E,H). However, monophyly is interrupted at 53.2Mb which coincides with the decline in differentiation between Y3 and the neo--X

Male recombination produced large neo--Y haplotype blocks of different ages
The valleys of Fst and plateaus of shared polymorphism between the neo--X and neoY suggest that the different neo--Ys have megabase--sized haplotype blocks. To determine the recombination breakpoints that generated the different haplotypes, we reconstructed the individuals' haplotypes based on the tree topologies across the chromosome, reasoning that disruption of monophyly of the neo--Ys are due to recombination events with neo--Xs ( Figure 3A). For example, monophyly of the neo--Y at the centromere proximal region indicates that there is only one haplotype; this haplotype is disrupted when Y3 is no longer monophyletic with the rest of the neo--Ys, indicating a second haplotype. We find that each Y group has an unique haplotype resulting from recombination breakpoints distal to the centromere ( Figure 3B).  Given that D. albomicans males no longer recombine [43], we reasoned that since the neo--Y haplotypes would be locked after the cessation of recombination, the number of nucleotide changes before and after can be used to roughly date the age of the haplotypes. We first took the branch length from the node of the species split (s) to the nodes that separate the neo--Y groups from the neo--X sisters (h) ( Table 1, Figure 3C left). Shorter branch length indicates that the neo--Y haplotype ceased to recombine with the neo--X sooner after the formation of the species, and, extensively, an older neo--Y haplotype. The Y3 and Y2 haplotypes have the shortest and longest branch lengths, respectively.
Second, we calculated the branch lengths from the nodes separating the neo--Y and neo--X to the tip of the trees, which indicate the amount of changes after the cessation of recombination. However, since the neo--Y haplotypes likely began to experience very different rates of change after recombination stopped, we took the branch lengths of the neo--X sisters instead (Table 1, Figure 3C right). Consistent with the first method, Y3 has the longest branch length and Y2 the shortest. These results indicate that Y3 and Y2 are the oldest and youngest haplotypes, respectively, with Y1 and the taiwanese haplotypes being the intermediates. Based on the the ratio of the two branch length measures and the estimated species age of 250 Kya, we estimate that the Y1 haplotype stopped recombining roughly 110 Kya, and Y3 and Y2 are roughly 20 Kya older and younger respectively ( Table 1). The different ages of the haplotypes and time of cessation of male recombination may indicate the sequential introduction of the causal allele to suppress male recombination in the different populations.

Fixed D. nasuta introgression on the neo--Y is broken down by male recombination
To better understand the evolutionary history of the neo--sex chromosomes after their formation from an autosome, we compared them to D. nasuta's Chr. 3. With the addition of nine D.
nasuta strains, we inferred 742,084 variant sites between the neo--sex chromosomes and Chr.3. Based on a model whereby the neo--X and neo--Y fusions occurred after the species split, the two chromosomes are expected to be more similar to each other than to Chr. 3. However, lower levels of differentiation between the neo--X and Chr. 3 can result from gene flow between the two species since they recombine in female hybrids. Due to this and slightly lowered π, the neo--Y shows higher levels of F ST from Chr. 3 than the neo--X does for nearly the entire length of the chromosome. However, at the centromere proximal region, we see a surprising reversal where the neo--X is more differentiated from Chr. 3 ( Figure   4A); this region, as mentioned earlier, also shows sharply elevated Fst between the neo--X and all neo--Y groups ( Figure 2C and 4A). The absolute divergence (D XY ) between neo--X and Chr. 3 is also significantly higher at this region than the D XY between neo--Y and Chr.

No evidence of chromosome--wide down--regulation
In the absence of male recombination, the neo--Y is expected to start accumulating deleterious mutations immediately after its formation. Indeed, genes on older neo--Y chromosomes have been found to be down--regulated due to pseudogenization and spreading of heterochromatin from adjacent repeats [6,30,44]. The neo--X, in turn, will become dosage compensated through transcriptional up-regulation, in order to balance the down--regulation of the neo--Y homologs [45,46]. Previously, Zhou and Bachtrog reported that the neo--sex chromosomes in D. albomicans show widespread down--regulation of neo--Y linked genes, despite few pseudogenes; 30.1% of the genes were found to show significant neo--X bias and the neo--X alleles are on average 1.3--fold more highly expressed than the neo--Y counterparts [41]. However, our reanalysis of the RNA--seq data of the KM55 strain, belonging to Y3, reveals little difference between the expression of the neo--X and neo--Y alleles with a median fold-difference of 1.04, a negligible bias that is also observed in the DNA sequence ( Figure 5A, Table 2). Out of 3181 genes with distinguishable neo--sex chromosome alleles, only 115 (3.6%) show significant neo--X bias. Similarly we see no evidence for chromosome wide reduction in expression for Y1 (d.alb.03) and Y2 (d.alb.OKNY--2) ( Figure 5A, Table 2), with few significantly neo--X biased genes and negligible median expression difference between the alleles. The reason for this difference are not entirely clear, and may reflect differences in mapping strategies, reference allele mapping bias, and assembly quality.
In addition to gene expression differences, we compared heterochromatin and dosage compensation profiles of the sex chromosomes. We assayed two histone marks in male D. albomicans larvae using allele--specific ChIP--seq: H3K9me3, which is associated with silencing heterochromatin [44,47,48], and H4K16ac, which is found at the dosage--compensated, hypertranscribed X of male Drosophila [18]. As expected, the repressive histone modification H3K9me3 is enriched at the dot chromosome and the ends of chromosomes corresponding to pericentromeric regions ( Figure 5B). The

Beginning neo--Y degeneration correlates with neo--Y haplotype age
Theory predicts that non--recombining neo--Y chromosomes should accumulate deleterious mutations, and the amount of degeneration should correspond to its age [29]. As mentioned above, we find that the different neo--Y haplotypes were formed at different time points, and stopped recombining between roughly 90,000 to 130,000 years ago. Despite the absence of chromosome--wide epigenetic differentiation between the neo--sex chromosomes, the number of genes with significant neo--X biased expression is more numerous than neo--Y biased genes for all three neo--Y types (Table 2, Figure 6A; note that this difference is only significant for Y3 and Y1 as they have more genes, p < 0.01, Binomial Exact Test). Most interestingly, the number of neo--X biased genes is correlated with the age of the neo--Y (Table2, Figure 5A, Figure 6A), suggesting that older neo--Y chromosomes have more genes with reduced expression relative to the neo--X. Thus, neo--X biased gene expression may reflect the beginning stages of differentiation when degeneration of the neo--Y happens on a gene--by--gene basis.
To determine whether older neo--Y haplotypes are associated with more deleterious DNA mutations, we focused on the region (7.6--48.6Mb) where all haplotypes differ, excluding the small haplotype found in the taiwanese line d.alb.03 ( Figure 3A). We find that the number of fixed SNPs increases with the age of the haplotypes (Table3, Figure 6B). While there are few nonsense fixations which are diagnostic of functional degeneration, the number of non--synonymous fixations correlates with age, with Y3 having the most (n = 969) and Y2 having the least (n = 478) amino--acid changes. Given few nonsense mutations, we additionally used the number of neo--Y specific indels as a proxy for functional decay of the neo--Y. Indeed, the oldest Y3 haplotype has the most fixed indels (n = 11,108), the intermediate Y1 has 5,615, and the youngest Y2 has the fewest indels (n = 3,385), but only a small fraction of indels are found within coding sequences (Table 3). Similarly, the oldest haplotype Y3 has the most number of indels that are within coding sequences (n = 163) and creates frameshift (n = 49) while the youngest haplotype Y2 has the fewest (Table 3, Figure 6C). We see the same trend with frameshift within 500bp of genes, some of which are likely to cause regulatory changes. Despite the small number of indels that are likely to be disruptive, these results demonstrate that the extent of degeneration in these young neo--Ys is correlated with their age. Curiously, haplotype--specific indels and SNPs appear to be unevenly accumulated across the chromosome (Supplementary Figure 7). Despite having fewer indels for most of the chromosome, Y1 appears to have similar numbers to that of Y3 up to 15Mb.
Additionally, Y2 has a spike of indels between 27--29 Mb which may represent a larger structural variant.

Independent and gene--by--gene downregulation of neo--Y alleles
Interestingly, zero neo--X biased gene is shared across all three neo--Ys and no more than four are shared between any two Y--types ( Figure 7A), arguing that they are independently degenerating. In addition, the lack of shared loci of neo--sex chromosome differentiation suggests that degeneration began only after the establishment of the locked neo--Y haplotypes when male recombination ceased.
Furthermore, we find that across all neo--Ys, the neo--X biased genes are significantly depleted of testes-biased genes and enriched for somatic genes ( Figure 7B). To ensure that the paucity of testes--biased genes is not a technical artifact of using RNA--seq from head samples, we identified neo--X--biased genes in the testes of KM55 and similarly found significant depletion. Despite independent degeneration of the neo--Y types, genes with male--specific function are thus likely to be shielded from the initial accumulation of deleterious alleles.
To determine whether the neo--X bias is due to up--regulation of the neo--X alleles or down-regulation of the neo--Y alleles, we conducted reciprocal crosses producing F1 hybrids between D.
albomicans and D. nasuta; sons sired by D. albomicans have neo--Y and Chr. 3 while sons sired by D.
nasuta have neo--X and Chr. 3, thus allowing us to determine expression differences that are due to cis-regulatory divergence between the neo--X, neo--Y, and the ancestral Chr. 3 [49]. Down--regulated neo--Y alleles will have lower expression than the neo--X and, importantly, Chr. 3 alleles. For the genes with neo--X bias, the neo--X and Chr. 3 alleles have similar expression levels, but the expression of neo--Y alleles are on average half (mean = 0.48) of their Chr.3 counterparts ( Figure 7C). These results reveal that neo--Y down--regulation, not neo--X up--regulation, primarily accounts for the neo--X bias, consistent with gene-by--gene degeneration of the neo--Y alleles.

Multi--step model for generating multiple neo--Y haplotypes with single fusion
Neo--Y chromosomes of Drosophila have served as prominent models to study the evolutionary and molecular processes resulting in the degeneration of a non--recombining chromosome [4]. Unique to D. nasuta have suggested that the fusion between Chr. 3 and the X is more likely to be established first, as opposed to a Chr. 3 -Y fusion [40]. After the first Robertsonian fusion creating the neo--X [40], the neo--X is free to recombine with Chr. 3 in a meiotic trivalent both in males and females ( Figure 8A). The second fusion, forming the neo--Y, occurred between the Y and a Chr. 3 that harbored a large D. nasuta haplotype near the centromere. This fusion is likely under positive selection to resolve mis--segregation due to the trivalent pairing during meiosis [40]. While this produced a severe bottleneck, the presence of male recombination allowed the neo--Y to regain diversity through exchange with the neo--X, which also broke down the D. nasuta introgression. However, once male recombination stopped in a particular population, the diversity on the neo--Y rapidly decreases, eventually leading to the fixation of one Y haplotype in the population, due to drift and possibly positive selection for male beneficial alleles ( Figure 8B). Unlike on recombining chromosomes, deleterious mutations cannot be unlinked on the neo--Y and will increase in number and frequency in the population (see introduction). Because different isolated populations would likely fix different neo--Y haplotypes, overall diversity can be maintained on the neo--Y, despite high differentiation among haplotypes ( Figure 2C). Our dating of neo--Y haplotypes suggests that male recombination stopped at different times for the different Y haplotypes. This raises the possibility that the mutation causing the cessation of male recombination in D. albomicans spread sequentially from mainland Asia through the islands.
All sampled neo--Ys of D. albomicans show the remnant of a D. nasuta introgression near the centromere, raising the question of why the D. nasuta segment was not completely lost while the neo--Y could still recombine. The introgressed segment could have been maintained randomly, or it could contain male--beneficial alleles and represent an adaptive introgression [50,51]. Another possibility is that the introgressed region resides close to the pericentromeric heterochromatin where recombination is suppressed; however, the heterochromatic mark H3K9me3 does not appear to be enriched in the segment. Preservation of the introgressed region could also be due to structural differences between the neo--sex chromosomes which prevented ancestral recombination between the neo--X and neo--Y.
Indeed, in an experimental cross between D. nasuta and D. albomicans, no recombinants were recovered within this region [52]. Thus, we speculate that the neo--X harbors an inversion around this region that is absent on the neo--Y and Chr. 3, thereby preventing ancestral recombination between the neo--X and neo--Y and preserving the D. nasuta introgression.

Cessation of male recombination initiates degeneration
The neo--X and neo--Y of D. albomicans are among the youngest sex chromosomes investigated to date.
Overall, we find no global differences in levels of expression between neo--sex linked genes, and nearly indistinguishable heterochromatin and dosage compensation profiles between the neo--X and neo--Y.
However, expression differences at individual genes and biased accumulation of indels suggests that the neo--Y chromosomes are showing early signs of degeneration, and the amount of differentiation is correlated with the Y ages. In particular, we identified four neo--Y haplotypes of different age that are geographically distributed. The oldest Y3 haplotype has the highest number of fixed nonsynonymous changes, indels, neo--X biased genes, indicating the most extensive degeneration; the younger Y1 and Y2 groups show less extensive degeneration. The paucity of overlapping genes that are neo--X biased between the haplotypes suggest that the degeneration began only after the haplotype became locked when males stopped recombining. We show that neo--X biased expression is primarily due to down-regulation of the neo--Y alleles, as revealed by comparison with D. nasuta's Chr. 3 in hybrids. At this early stage of sex chromosome differentiation, down regulation of neo--Y genes is likely to be on a gene--by-gene basis, and we indeed find several genes with significantly higher levels of neo--X expression in all three Y groups (Table 2), which have also fixed indels that cause frameshifts. Unlike the typical trajectory of neo--Y in Drosophila where the male--exclusive transmission results in the immediate cessation of recombination, meiotic exchange prior to the halt of male recombination maintained the diversity of the chromosome and likely prevented the accumulation of deleterious alleles. While the neo--X and neo--Y formed shortly after the species split from D. nasuta [35] (Figure 4G), depending on when male recombination was abolished the age of a neo--Y haplotype will appear younger and less degenerate than expected based on when the neo--Y fusion occurred.
While the differentiation of the neo--Y is atypical compared to other Drosophila, it is similar to Y formation in organisms with male recombination. The neo--Y chromosome of the Japan Sea Stickleback (Gasterosteus nipponicus) which formed via a Y--autosome fusion shows little signs of degeneration outside of the non--recombining region [53]. Similarly, the pseudoautosomal regions on the mammalian Y (PAR--Y) chromosome retains the ability to pair and recombine with the X counterpart, despite having diverged over 180 million years ago [54][55][56]. Although the bulk of the chromosome is highly degenerate, recombination allows the PAR--Y to maintain genes important for development and cellular function [57]. Outside of the PAR--Y recombination with the X became suppressed due to multiple large scale inversions [58,59]. Our study of the neo--Y formation in D. albomicans offers a unique look at the transition of recombination suppression and provides insight on the rate of degeneration immediately after this transition.

Geographically restricted distribution of neo--Y haplotypes that are independently degenerating
After the cessation of recombination, diversity is expected to rapidly diminish due to a combination of mainland Japan population appears to be a recent invasion from the Taiwanese population [34], supporting the similarity between the Y1 and Taiwanese haplotypes. Under this model, the sequential spread of the allele(s) to suppress male recombination is likely aided by positive selection, as the fixation has to take place in each population to result in the species--wide fixation as currently observed [43].
Our results also show that the neo--Y groups share very few indels and down--regulated neo--Y alleles, indicative of independent degeneration. However, the small sets of genes undergoing degeneration, remarkably, show a similar attribute -they are all depleted for genes with testes--biased expression. This is consistent with findings that male--biased genes are maintained for longer periods of time on the degenerating neo--Y--chromosome of D. miranda, likely due to selection to preserve those important for male--exclusive function [60]. Thus, while the independent degeneration of multiple neo--Y haplotypes may be unique given the unusual evolutionary history of D. albomicans, the pattern of degeneration follows a typical and generalizable trajectory.  Depending on the samples required for the analyses, we first filtered out the samples that are unnecessarily, followed by removal of the indel sites and sites with missing genotypes. We then filtered for a minimum coverage of 5x and genotype quality of 20. For example, for the phylogenies with only the neo--X and neo--Y, we kept only D. albomicans samples and the one outgroup before subsequent filtering. This is to ensure that we are maximizing the number of sites we keep for each analysis, since the probability of one sample failing to pass the filters, which will cause the site to be removed, increases with more samples. All filtering was accomplished using bcftools (v.2.26.0) [64].

Inferring and validating the neo--Y genotypes
For each strain, we generated a vcf file containing both the female and male genotypes from which we identified sites that are homozygous in the female but heterozygous in the male. If one of the heterozygous alleles in the male is the same as the female allele, the neo--Y genotype is then the other allele. Sites that are heterozygous in the females are left as missing in the neo--Y.
To determine the efficacy of this computational pipeline, we identified HindIII cut sites on the neo--X of the reference strain by grepping for the restriction sequence AAGCTT then selecting those that are disrupted by neo--Y SNPs. Primers flanking the cut sites were designed using NCBI's Primer--BLAST (Supplementary table 2) [65] and ordered from IDT. PCR amplicons from female and male gDNA were digested with HindIII overnight at 37 ℃. Females, homozygous for the cut sites, are expected to have fewer bands than the males which are heterozygous for the cut and uncut alleles (Supplementary table   2).
For Y--linked indels, we used the same strategy, but also included male--specific homozygous indels as Y-linked. When multiple indels are within 50bp of each other, only the first one is kept to avoid mapping errors near indels. Frameshifts are categorized as indels within CDSs that are not multiples of 3.
Number and proportion of shared polymorphisms between the neo--X and neo--Y were calculated using a custom perl script.

Generating strain--specific references
We isolated each sample (or neo--X/neo--Y) into its own vcf and kept only non--reference sites. Using GATK's FastaAlternateReferenceMaker we generated a "pseudo--reference" fasta allowing for IUPAC ambiguities. The names of the chromosomes are then modified to include the strain information.

Phylogenetic Analyses
The individual pseudo--references are divided into 200kb windows using samtools faidx and concatenated into one fasta for each window. Maximum likelihood trees were generated using RAxML The resulting trees were visualized using the program Densitree [67] or plotted in R with the package Phytools [68]. To evaluate tree topologies, we used the R packages ape and phytools [68,69], in which the is.monophyletic function was used to determine whether the neo--Ys are monophyletic. For ChIP--seq, single male larvae were selected and processed following the low--input native chromatin immunoprecipitation protocol [72] with H3K9me3 and H4K16ac antibodies from Diagenode. The libraries were prepared using the SMARTer Universal Low Input DNA--seq Kit (Takara, formerly Rubicon).

Assigning haplotypes to individual neo--Ys
We mapped the reads similarly to the allele--specific RNA--seq, but used the genomeCoverageBed from bedtools (v.2.26.0) [73] to convert the uniquely mapped neo--X and neo--Y reads into read counts across the chromosome. We then extracted the sites that differ between the neo--X and neo--Ys. For both the input and the immunoprecipitation samples, the read counts at each site were then normalized by the median autosomal coverage. The enrichment at each window is the average across the sites.

Tissue enrichment and depletion
Available tissue specific RNA--seq [74] were mapped to the reference genome and the read counts at genes are converted into RPKM. The tissue--specificity, tau, is then determined [75], and genes with tau >