Skip to main content
Advertisement

< Back to Article

Fig 1.

Summary statistics of hybrid assemblies and dot plot of one individual to the reference genome.

a) A dot plot of the reference (GRCh38) vs. one individual (844–01) covering the X-degenerate and ampliconic regions of the Y chromosome. The protein-coding genes and azoospermia factor (AZF) regions are shown on the left, and the different region types and the positions of palindromes are shown on the right. Note that the Y axis has been truncated in the Yq heterochromatic region to show the entire Y chromosome. At the bottom, we show the positions of mapping scaffolds, which are colored according to their length. To the right of the dot plot is a table that shows the average father-son concordance for all sites, the number of sites covered in the region for the 844–01 individual and the total length of the region. b) Panels with summary statistics for all 62 individuals with good assemblies. From the top, there is a histogram of the N50 value, a histogram of the total amount of sequence obtained and the gap content of all scaffolds for a given individual.

More »

Fig 1 Expand

Table 1.

Summary of the variants found in this study and their validation rate.

We have divided the variants into the heterochromatic region and the combined ampliconic and X-degenerate regions and into whether they occur in more than one haplogroup. The number of variants and the percentage of these that are already known (in parenthesis). The in silico concordance rate is how often an alternative variant found in one member of a family is found in the other member.

More »

Table 1 Expand

Fig 2.

Size distribution, method of discovery and location of large indels.

a) The size distributions for non-recurrent complex variants, deletions, STRs and insertions. Known indels are colored gray while novel variants are colored orange. b) The four bar plots show the known and novel variants grouped into variants larger or smaller than 15 bp respectively, and they are colored according to the method that identified them, which is either GATK-HC (HC) or AsmVar (AV) or both (AV:HC). c) The average heterozygosity in windows of 10 kb across the Y chromosome for complex variants, deletion, insertions, SNPs and STRs. We only report regions where more than half the individuals aligned with more than 5000 bp on average and we show the genes of the Y chromosome below.

More »

Fig 2 Expand

Table 2.

The rate of SNPs, deletions, insertions, complex variants and STRs are shown for three classes of region.

Note that the rate for the X-degenerate region is the same as [26], because we used it for calibrating the number of generations spanned by the tree.

More »

Table 2 Expand

Fig 3.

Analysing the dynamics of palindrome arms.

a) An ideogram of the Y chromosome with the different regions colored accordingly to their type. The positions of the palindromes are marked and the protein-coding genes are shown below. b) Due to the high similarity of the palindromes many reads will map to both arms and only when there are differences between the arms will the reads map uniquely. However, if only one arm of the palindrome is used the palindrome can be treated as a pseudo-diploid chromosome with differences between arms corresponding to SNVs. Comparing different individuals for which a single phylogenetic tree exists can be used to identify mutations (position 2 in individual 1) and gene conversions (positions 2 and 3 in individual 2 and position 1 in individual 1, 2 and 3). c) The gene conversions are grouped based on if the ancestral or derived base is used as a donor sequence. Ancestral->derived means that an ancestral base is converted to the derived base and Derived->ancestral means that the derived based is converted to the ancestral base. The number of gene conversion events for each base transition is shown and the bars are colored according to transitions or transversions classification. d) Number of base changes grouped by event type (being gene conversion or mutation). The bars are colored based on if they are transitions or transversions and whether they are found in one individual or father/son pair (private) or in more individuals (common). e) Part of the phylogeny and genotypes for palindrome 5 and the y1/y2 segments of palindrome 1 (see part a) are shown. The individuals’ IDs are made up of family ID and a number plus a 01 for fathers and 03 and above for sons. In palindrome 5 the coverage of the segment has increased from around 60X to 90X, suggesting that three copies of this segment now exist. In palindrome 1, the coverage has decreased from 60X to around 30X, suggesting that only one copy of the segment exist.

More »

Fig 3 Expand

Fig 4.

Gene duplications and deletions on the Y chromosome.

a) A schematic of the Y chromosome is shown in the top with the palindromes marked and denoted P1-P8. The positions of the TSPY array and inverted repeat 2 (IR2) are also shown. Note that PRY is present in palindrome 3 with all exons and in palindrome 1 with exons 3, 4 and 5. b) To the left is a phylogenetic neighbor joining (NJ) tree showing the relationship between the different males; the numbers are bootstrap values. Bootstrap values above 90 are not shown. To the right is a table showing the haplogroup, name and copy number estimate for each individual. Genes where the copy number estimate differ from that of the reference sequence are colored green. For RBMY1A1 there are an unknown number of pseudogenes in the reference, but most individuals had 9 copies so this was chosen for the baseline. For clarity, only genes with copy number variants are shown. c) Two coverage profiles for two gene duplications, for individual 1113–05 in gene XKRY and one for individual 995–01 in BPY2.

More »

Fig 4 Expand