New Tools for Hop Cytogenomics: Identification of Tandem Repeat Families from Long-Read Sequences of Humulus lupulus

Hop (Humulus lupulus L.) is known for its use as a bittering agent in beer and has a rich history of cultivation, beginning in Europe and now spanning the globe. There are five wild varieties worldwide, which may have been introgressed with cultivated varieties. As a dioecious species, its obligate outcrossing, non-Mendelian inheritance, and genomic structural variability have confounded directed breeding efforts. Consequently, understanding genome evolution in Humulus represents a considerable challenge, requiring additional resources, including integrated genome maps. In order to facilitate cytogenetic investigations into the transmission genetics of hop, we report here the identification and characterization of 17 new and distinct tandem repeat sequence families. A tandem repeat discovery pipeline was developed using k-mer filtering and dot plot analysis of PacBio long-read sequences from the hop cultivar Apollo. We produced oligonucleotide FISH probes from conserved regions of HuluTR120 and HulTR225 and demonstrated their utility to stain meiotic chromosomes from wild hop, var. neomexicanus. The HuluTR225 FISH probe hybridized to several loci per nucleus and exhibited irregular, non-Mendelian transmission in male meiocytes of wild hop. Collectively, these tandem repeat sequence families not only represent unique and valuable new cytogenetic reagents but also have the capacity to inform genome assembly efforts and support comparative genomic analyses.


ABSTRACT
Hop (Humulus lupulus L.) is known for its use as a bittering agent in beer and has a rich history of cultivation, beginning in Europe and now spanning the globe. There are five wild varieties worldwide, which may have been introgressed with cultivated varieties. As a dioecious species, its obligate outcrossing, non-Mendelian inheritance, and genomic structural variability have confounded directed breeding efforts. Consequently, understanding genome evolution in Humulus represents a considerable challenge, requiring additional resources, including integrated genome maps. In order to facilitate cytogenetic investigations into the transmission genetics of hop, we report here the identification and characterization of 17 new and distinct tandem repeat sequence families. A tandem repeat discovery pipeline was developed using kmer filtering and dot plot analysis of PacBio long-read sequences from the hop cultivar Apollo.
We produced oligonucleotide FISH probes from conserved regions of HuluTR120 and HulTR225 and demonstrated their utility to stain meiotic chromosomes from wild hop, var.
neomexicanus. The HuluTR225 FISH probe hybridized to several loci per nucleus and exhibited irregular, non-Mendelian transmission in male meiocytes of wild hop. Collectively, these tandem repeat sequence families not only represent unique and valuable new cytogenetic reagents but also have the capacity to inform genome assembly efforts and support comparative genomic analyses.

INTRODUCTION
Humulus lupulus (hop) is a dioecious twining bine in the Cannabaceae family of flowering plants with a long history of cultivation (Neve, 1991;Moir, 2000) for various uses including medicine (as reviewed by Ososki and Kennelly, 2003;Bolton et al., 2019) and animal fodder (Siragusa et al., 2008), but is most commonly known as a flavoring agent in the brewing industry. The quest for complex taste and aromas in the rapidly expanding craft brewing industry has placed increasing demands on breeders to produce new varieties of plants with specific desirable traits as well as disease resistance (Kavalier et al., 2011;Easterling et al., 2018;Yan et al., 2019).
However, hop presents multiple challenges to the production of new varieties due to its extended juvenile phase of two years to first flowers and its non-Mendelian inheritance patterns (Zhang et al., 2017).
Cytogenetic analysis of male meiosis in hop has revealed a tendency for unusual meiotic configurations such as multivalent chromosomal complexes (Sinotô, 1929;Winge, 1929;Shephard et al., 2000;Zhang et al., 2017). Recent work with 3D molecular cytology has shown that pervasive whole chromosome or segmental aneuploidy exists in hop and is exacerbated by passage through meiosis (Easterling et al., 2018). To date, there are limited cytological tools for assessing segregation patterns and establishing hop karyotypes (9 autosomes, XY). These tools have included telomere, 5S rDNA, HSR1 (Humulus subtelomeric repeat 1) (Karlov et al., 2003;Divashuk et al., 2011), and more recently HSR0 (Humulus subtelomeric repeat 0) (Easterling et al., 2018). Despite these advances, most genomes of model hop varieties remain to be sequenced, assembled, and fully annotated, except for unannotated, partial assemblies of Shinshu Wase, H lupulus var. cordifolius (Natsume et al., 2015) and Teamaker (Hill et al., 2017). Given the importance of cytogenetics in guiding studies of chromosomal structural genomics and the challenge presented by hop transmission genetics, more cytogenetic tools are needed.
Tandem repeat sequences clusters occur as tandemly repeated segments of DNA with characteristic unit repeat lengths. These clusters are among the fastest evolving components in genomes (Raskina et al., 2008;Weiss-Schneeweiss et al., 2015;Su et al., 2019) and are typically found in heterochromatic, noncoding DNA at centromeric, pericentromeric, or subtelomeric regions. Tandem repeats also have various and interesting chromosomal functions including transcriptional modulation of chromatin, centromere function, and meiotic drive (Garrido-Ramos, 2015;Dawe et al., 2018;Su et al., 2019). As such, they provide unique opportunities to study genome evolution and phylogenetic relationships (Dodsworth et al., 2015). Plants, particularly angiosperms, are characteristically rich in repetitive DNA, which can account for the vast majority of plant nuclear genomes (Mlinarec et al., 2019). Hop has been previously reported to contain around 34% repetitive elements in the assembled portions of the genome (Natsume et al., 2015), but that value will likely increase as more complete genome assemblies are produced.
Here we use long-read genomic sequences to find and characterize new families of hop tandem repeats as new tools for cytogenomic analysis of hop. We describe our discovery pipeline using k-mer filtering and dot plot analysis of single molecule long read sequence data from cultivar Apollo, resulting in identification of 17 new tandem repeat families. FISH probes from two of these, HuluTR120 and HuluTR225, are shown to detect loci in non-cultivated wild hops.

Plant Materials, Collection, and Fixation
Male panicles were collected before pollen shedding and fixed in Farmer's fluid as previously described (Zhang et al., 2017;Easterling et al., 2018). Wild hops, H. lupulus var. neomexicanus were collected from the Coronado National Forest in Arizona (U.S.A.). Plant SH2 was collected on Mt. Lemmon and plant TM2-82C was collected on Mt. Bigelow.

Identification of Tandem Repeats in Long-Read PacBio sequences
Tandemly repeated sequences were discovered essentially using the approach previously described for the tandem repeat HSR0 (Easterling et al., 2018). Previously unreported details, parameters, and procedures are further described. DNA sequence input was hop (Apollo) genomic DNA from long-read PacBio DNA Single Molecule, Real-Time (SMRT) cells (libraries submitted Dec 2014, University of Washington PacBio Sequencing Services, Center https://pacbio.gs.washington.edu/) using single molecule sequencing without circular consensus error correction. The sequences from 32 SMRT cells had a library size range 3-20 kb, an average RQ (read quality) range of 81.5 -82.55, and an Average Polymerase Mean Read Length (bp) ranged of 4,093 -5,048. For repeat detection, PacBio single molecule FASTA sequences greater than 5 kb (n=1,037,871) were subjected to k-mer analysis in which all 12mers were counted and sorted by abundance for each read. Sequences were filtered for retention if meeting the criterion where the fifth most repeated 12mer occurred at least eight times within a single read. This filter reduced the total list to 1,121 sequences, reflecting ~1000fold enrichment. These k-mer filtered sequences were then used to produce a PDF document, referred to as the "HuluTR PDF book" in which each page shows the single-PacBio SMRT read dot plot for visual inspection along with a table of top 12mers using "ksift" (https://github.com/dvera/ksift) as previously reported (Easterling et al., 2018

Characterization of Tandem Repeat Families
Individual long reads that displayed prominent stripes in the dot plots were classified and placed into sequence similarity groups, also referred to as Tandem Repeat Families, using the online YASS dot-plot genome server (https://bioinfo.lifl.fr/yass/yass.php). Repeats were grouped into families if their pairwise dot-plots between two different reads returned "stripes" indicating repeating units of similar sequences between the two. For this, we used the default parameters  (Noe and Kucherov, 2005). To facilitate this process, we strung together sequences representing each TR family into a single "polySeq" FASTA file (Supplementary Material 3) and used it in each pairwise alignment with unclassified reads. New families (those not matching any of the repeats in the polySeq file) were added to the end of the polySeq file as they were discovered and included in blocks of sequence at 1kb intervals for ease of positional recognition in the YASS output dot plots. The Supplementary Material 3 file contains the full "polySeq34_v7" FASTA sequence with embedded locators, a table of synonyms to guide location to 1Kbp blocks, and individual YASS plots of the polySeq vs. each HuluTR consensus sequence. This family grouping process is illustrated in Figure 2.
For each TR family grouped by sequence similarity, we established an average consensus unit length based on the repeat spacing and results from the Tandem Repeats Finder server at https://tandem.bu.edu/trf/trf.html (Benson, 1999). Because of minor variation in the exact repeat lengths as determined by various methods, we rounded to the nearest 5bp and designated each HuluTR family accordingly, as summarized in Table 1. For each newly discovered HuluTR family, we obtained a single consensus sequence (Supplementary Material 4) from a representative read using the Tandem Repeats Finder online utility. Parameter settings used for TRF were default and as follows: alignment parameters (match = 2, mismatch = 2, indels = 7), minimum alignment score to report repeat = 50, Maximum period size = 1000, Maximum TR array size (bp, millions) = 2. The nomenclature used here denotes the TR family assignment with suffixes to specify single source reads as follows: "HuluTR120-r479" refers to For analysis of monomer divergence within and between reads of HuluTR210, we extracted 11 monomers from an internal contiguous cluster for each of 10 reads. These were analyzed using a multiple sequence alignment tool, Clustal Omega (Clustal 2.1, https://www.ebi.ac.uk/Tools/msa/clustalo/). The resulting Percent Identity Matrix was imported into MS Excel and the sequence identity values were visualized for the individual monomers or their read-to-read averages (Fig. 3) using the Conditional Formatting tool with 2-Color Scale set from 40 (black) to 70 (yellow).

FISH and 3D Cytology
Male meiocytes from hop plants were prepared, analyzed, and imaged using 3D deconvolution microscopy as previously described (Easterling et al., 2018). Nucleoli were measured using the Measure Distances program in the DeltaVision Software. Their diameter measurements were taken from central optical sections of each nucleolus, which are primarily spherical. Seventeen nucleoli were measured for cells with only one nucleolus (n=17 cells) and twelve were measured for cells with two nucleoli (n=8 cells). Average diameters were converted to volume in cubic microns.
Additional sequences for oligo FISH probes and synthetic consensus sequences were designed and are listed here (Table 2) in order to provide new information as additional tools for hop cytogenetics. The synthetic consensus sequences were made (GenScript Biotech Corp.) and inserted into plasmids to enable their use as templates to make FISH probes via conventional labeling techniques. These plasmids (pHTR120syn, pHTR225syn, pHTR600syn, pHTR390syn, an pHTR060syn) and their descriptions are available from AddGene (addgene.org).

RESULTS
In this study, we set out to develop new FISH probes that can be used for cytogenetic tracking of individual chromosomes in the Humulus lupulus species. To date, there exist only a few such probes including those for rDNA repeats and other tandemly repeated clusters. These have served to establish basic hop karyotypes, but more cytogenomic information is necessary in order to further delineate individual chromosomes and integrate physical and linkage maps for this group of plants.

Sequence Data
We and others have successfully mined sequence data to identify tandem repeats that have been developed into FISH probes (Novak et al., 2013;Sevim et al., 2016;Novák et al., 2017;Easterling et al., 2018;Mlinarec et al., 2019). Here we carried out a thorough analysis of PacBio Single Molecule, Real-Time (SMRT) reads (n=1,037,871 reads), each consisting of sequences greater than 5,000 bp long. These reads, from 2014, produced single molecule DNA sequence, not circular consensus corrected, with an estimated error rate of ~10% based on alignments with a telomeric test case (Supplementary Material 5). We used k-mer analysis to screen for repetitive sequences. The criterion used was that the 5th most abundant 12 base k-mer for any single read be present eight or more times. This computational filter resulted in a list of 1,121 reads which were visualized as self-aligned Dot Plots using the YASS program (Noe and Kucherov, 2005) as summarized in Figure 1. Self-aligned dot plots display the same sequence on the X and Y axis and produce a diagonal line indicative of their sequence similarity, or complete identity in this case. Repetitive sequences within a read show up as offdiagonal stripes at a frequency and spacing that reflect their abundance and repeat unit size.
Several types of repeat sequence patterns were observed among the 1,121 reads that passed k-mer screen. The dot-plot pattern types can be grouped as those with low complexity and no obvious tandem repeats (Fig. 1A, no off-diagonal line/stripes) or those with clear tandem repeats, which present as stripes and fall into several subgroups ( Fig. 1B-E). The spacing between the stripes resulting from tandem repeats is proportional to the repeat unit length, and these plots provide easy to interpret summary diagrams. Low complexity reads (e.g. Fig. 1A) comprised ~20% of the k-mer filtered reads, but were not further analyzed. They included homopolymeric runs of single or simple sequence repeats or microsatellites. In contrast, desirable reads of larger tandem repeats showed conspicuous dot-plot striping. These could be further subdivided into groups where the tandem repeats fill an entire read (Full Read TRs, Fig.   1B), a single portion of a read (Partial Read TRs, Fig. 1C), multiple but separate patches of the same repeat in a read (Interspersed TRs, Fig. 1D), or separate patches of dissimilar repeats in a read (Combo TRs, Fig. 1E). The reads with the Combo TRs account for ~2% of the full k-mer set ( Fig. 1F) and often include repetitive sequence clusters with relatively short repeat lengths of ~30-50 bp. These Combo TR reads, like the low complexity reads were not prioritized for further analysis.
By mining long-read sequence data, our pipeline identified nearly 900 PacBio SMRT reads with tandem repeats with full, partial, or interspersed TR patterns. Among these were reads housing known tandem repeat families (5S rDNA, HSR1, HSR0) and those housing new uncharacterized tandem repeat families. To consolidate and sort out the newly discovered TR families, we propagated the page number from the 1,121-page PDF file, allowing for downstream grouping on the basis of sequence similarity.

Defining HuluTRs: the Tandem Repeat Families of Hop
We systematically defined TR sequence families by grouping the tandem repeats according to sequence similarity criteria using dot plot analysis as summarized in Figure 2. The process is illustrated for four previously known tandem repeats ( Fig. 2A): telomere, HSR0, HSR1, and 5S rDNA. For each TR, a 1kb block representing a TR family was made by a concatenation of a single repeating units or consensus sequence repeats. These 1-kb TR family-specific sequence blocks provide convenient visual delineations on the dot plot axes.
These blocks help with assigning new repeat hits to matching families and were further concatenated to produce a file called "polySeq". The resulting 4-TR polySeq (shown as selfaligned in Fig. 2A) was used as one of the two inputs to screen new reads by dot plotting, one at a time. For each new, uncharacterized read (those not matching sequences in the existing polySeq), we gave them a name (based on unit repeat length or discovery number) and included them in the polySeq file as a 1kb block of repeats, or 2kb blocks for large repeats. This process was repeated for each read, eventually producing a polySeq set of 34 distinct TR families (Supplementary Material 2), shown as a self-aligned dot plot (Fig. 2B). The TR-family assignment procedure is illustrated for four different reads in panels C-F (Fig. 2 Table 1, sorted sequentially by relative abundance then by repeat length. For read-specific nomenclature, "HuluTR120-r9" refers to "Humulus lupulus Tandem Repeat family ~120 bp from the PacBio sequence read on page 9 of the HuluTR PDF book (Supplementary Material 1). The six most abundant TR families found in the library range from 34 to 232 parts per million, and included previously known sequences HSR1, HSR0, and 5S rDNA, and newly discovered sequences, HuluTR120, HuluTR225, and HuluTR060. Their relative abundance makes them good candidates for FISH probes. The other families were found to occur at a lesser frequency, including six that were found in only one read.
Several of these TR clusters feature a high %A+T (AT content), as is often observed for tandemly-repeated macrosatellite sequences (Garrido-Ramos, 2017). The average AT content for the 18 sequence families listed ranged from an unusually low value of 42% for HuluTR135 to a high value of 79% for HuluTR390. The mean and median AT content for these sequence families is ~64%, higher than the 59.2% AT content calculated for all of the sequences in the PacBio Apollo DNA sequence dataset. This TR discovery approach expands the number of characterized hop TR sequence families by 10-fold and this methodology could be readily applied to other plant species for which long-read sequence datasets are available.

Sequences for TR FISH probe production
Once the tandemly repeated DNA sequences were categorized by family, we aimed to produce representative oligonucleotide FISH probes for cytogenetic detection of the corresponding chromosomal loci. Oligo FISH probes are advantageous because of their small size, uniformity of labeling, and consistency across experiments. The goal of identifying the best region of a tandem repeat family to use as a FISH probe is complicated by considerable sequence variation that is commonly observed in tandem repeat sequence families (Dennis and Peacock, 1984). For instance, as summarized in Figure 3 for sequences of HuluTR120 family, we observed variation from one read to another in the dot plot striping patterns. We consider continuous, parallel stripes to reflect tandem repeats with a high degree of similarity (Fig. 3A, 1st two plots). Such sequences were given high priority for probe development. However, some reads exhibited a more degraded appearance of stripes (Fig. 3, 3rd plot), which we interpret as having undergone sequence divergence, and were excluded from use in probe development.
To illustrate the range of sequence similarity variation both between and within reads, we selected 10 reads assigned to the HuluTR120 family (Fig. 3B). For each read, we extracted an internal, contiguous 11-repeat block of HuluTR120 monomers and separated them to quantify all possible monomer-to-monomer pairwise sequence similarities. This resulted in 122 pairwise similarity values for each read-to-read comparison. The average value for these 122 are shown in the cells of the grid (Fig. 3B). The highest within-read average was surprising low at 67% (for 782 x 782). In contrast, the between-read averages were 46% (for 782 x 801, 677, or 440).
Given that the sequences of the monomeric repeating units tended to vary within and between individual reads, we decided to use consensus sequence data to guide oligo FISH design (Fig. 3C). For high priority reads, we used the Tandem Repeats Finder program (Benson, 1999) to define read-specific consensus sequences. We next carried out multiple sequence alignments of these consensus sequences to identify the most highly conserved sequence regions which were considered ideal for design and production of fluorescent oligonucleotide probes (Fig. 3C). A list of new and previously published tandem repeats and FISH probes for hop are summarized in Table 2. Collectively, these represent the beginning of a new toolkit for hop cytogenomics, suitable for future investigations for structural genomics, segregation patterns, and chromosome evolution in hop. Their utility is demonstrated below using two of these new reagents, the oligo FISH probes for HuluTR120 and HuluTR225, in wild collected var. neomexicanus hop.

Aberrant Meiosis And HuluTR FISH in Wild Hop
An important question in hop genome evolution is whether or not aberrant meiosis is a natural, intrinsic feature of hop or whether it can be explained entirely as a result of breeding and cultivation with structurally diverse genomes. To begin to address this issue and to demonstrate a possible application of these new FISH probes, wild hop was collected from what are thought to be isolated populations (Reeves and Richards, 2011)  In order to test our new FISH probes on wild and non-Apollo hops, we applied two of them, HuluTR120 and HuluTR225, to meiocytes of two var. neomexicanus plants, SH2 and TM2-82C, as shown in Figure 6. We show that both of these HuluTR probes, designed from Apollo sequence data, successfully hybridized as discrete foci on the chromosomes of wild hop.
In one case, the HuluTR120 probe gave two bright signals in plant SH2 as seen at midprophase (Fig. 6B) and metaphase I (Fig. 6C), a pattern indicative of paired homologous loci. At the tetrad stage, the HuluTR120 signals were distributed equally (Fig. 6D, 1:1:1:1). In another case, the HuluTR225 probe gave more complex patterns in plant TM2-82C, with variable brightness and size. The 10-12 FISH signals are seen at mid-prophase (Fig. 6F) and at metaphase I (Fig. 6G). The FISH signals appear to be distributed in an irregular pattern at both metaphase I and the post-meiotic tetrad-like stage (Fig. 6H). The examples represent multiple occurences of meiotic abnormalities from a single plant ( Fig. 4B and Fig. 6E-H). Therefore, TR probes designed from one genotype can be used in others, and wild hops show both balanced

DISCUSSION
Interest in tandem repeats has prompted investigators to develop new software programs to find or characterize tandem repeats using DNA sequencing data (Glunčić and Paar, 2013;Weiss-Schneeweiss et al., 2015;Mlinarec et al., 2019). Among the programs used are Tandem Repeats Finder (Benson, 1999), which uses string matching algorithms, and those utilizing graph-based clustering, such as RepeatExplorer (Novák et al., 2010) and TAREAN (Novák et al., 2017). These programs allow for the mining of existing and public repositories of genomic data to identify tandem repeats for various studies related to phylogenetics, genome evolution, and cytogenetics (Dodsworth et al., 2015;Belyayev et al., 2018;Mlinarec et al., 2019). More recently, long-read sequence data has been used to support FISH probe development in plants, Here, we describe an approach using long-read sequences that allows for TR discovery aided by direct visual inspection of single self-aligned read dot plots. Even with these errorprone early generation single-molecule reads, we were able to uniquely and unambiguously find and group tandemly repeated sequence families and build consensus sequences. The DNA sequences from these reads were screened by k-mer analysis using criteria that yielded ~1000X enrichment for reads with the desired sequence features. The k-mer filtered dot plots provide highly informative way to visualize the data, making it easy to quickly interpret tandem repeat patterns within their genomic context one read at a time without any requirement for assembly. The data processing pipeline produced a PDF booklet of 1 read per page (Supplementary Material 1), which proved useful for downstream analysis. Compared to other methods, the approach described here has several notable advantages including (1) intuitive visualization of the genomic structure of the repeats, (2) highly sensitive ability to detect tandem repeats, as illustrated by the discovery of reads with HuluTR families present once per million reads (e.g. HuluTR050, HuluTR055, HuluTR070, HuluTR150, HuluTR280, and HuluTR350), (3) the retention of adjacent flanking genomic sequence, ideal for guiding genome assembly efforts, and (4) the retention of the individuality of TR clusters, which may come from multiple different loci. This last advantage may be helpful for future consideration of homologous alleles, homeologous alleles from hybrids, or multi-chromosomal loci on different paths of divergence. In contrast, the approach reported here has disadvantages such as the requirement for long-read sequences as the input data and the fact that the larger repeats, the less likely they will meet our k-mer threshold for 5-10 kbp reads. Given that we readily found known TRs (5S rDNA, HSR1, and telomere repeats) while also discovering 18 TRs (Table 1), on the whole, we consider this a robust approach to be widely applicable and relatively simple to interpret.
Repetitive sequences pose the greatest challenge for assembling complete genomes.
The 1C genome size estimates for hop range from 2.5 -3.0 Gb according to flow cytometric methods (Zonneveld et al., 2005;Grabowska-Joachimiak et al., 2006;Natsume et al., 2015) but only 2.1 Gb according to a recent from genome assembly (Natsume et al., 2015). Therefore, sequence assemblies currently account for only 80% of the known genome size, indicating that a large fraction of the genome is not represented in contemporary assemblies.
Tandem repeat sequences are often mis-assembled and under-represented, being particularly prone to the repeat collapse problem in genome assembly. These discrepancies contribute to the genome size under-estimations while exacerbating problems associated with accurate contig assembly. For instance, markers flanking a TR cluster may be separated by only a few Kbp of TR, but reside on different contigs if only short read sequences guide the assemblies.
Accurate incorporation of TR clusters is especially important in hop given its high degree of structural variability and segregation distortion (Zhang et al., 2017;Easterling et al., 2018).
Another useful aspect of defining TR families within long-read sequences is that they provide opportunities to explore the flanking genomic sequences for markers or genes.
Sequences with "Partial read TR" or "Interspersed TR" patterns ( Fig. 1) can be used to obtain non-TR regions for use in BLAST queries of genome databases. For instance, regions of non-TR sequences within the read housing HuluTR135 align with an EST of unknown function (GenBank Acc. FG346016.1) and map to a region (004949F: 218,515..248,514) in HopBase (Hill et al., 2017). Such approaches represent examples of new avenues to explore regions of biological importance for hop, such as sex chromosomes, segregation distortion hot spots, and key genes for flavor and aroma biosyntetic pathways or disease resistance.
A primary goal of this study was to produce new molecular cytology tools for hop chromosome research. To that end, we have described 18 new tandem repeat families (Table   1) and shown FISH results with probes for HuluTR120 and HuluTR225. To date, most of the hop chromosomes are numbered and distinguished by their relative size and in some cases their centromere locations as inferred from the primary constriction on mitotic chromosomes (Shephard et al., 2000). The most current hop karyotype includes HSR1, 5S rDNA, NOR, and telomere signals, which together uniquely tag 4 of the 10 chromosomes (Karlov et al., 2003;Divashuk et al., 2011Divashuk et al., , 2014. Notably, centromere-specific sequences have yet to be identified in hop, including this study. It is possible that among our HuluTR families are one or more that reside at centromeres. Alternatively, hop centromere repeats may not be organized as tandem repeats or their size and copy number may have resulted in their exclusion from our k-mer filtered subset of 1,121 reads. Indeed, a recent study in wheat found that centromeric tandem repeats enriched at CENH3 ChIP seq peaks can exceed 500 bp in repeat unit length (Su et al., 2019). Wild hop populations occur naturally across the US in three varieties and are morphologically distinct but are not necessarily reproductively isolated (Reeves and Richards, 2011). They have been described as monophyletic (Tembrock et al., 2016) and are known to exhibit high levels of genetic diversity, particularly var. neomexicanus (Murakami et al., 2006). It is worth noting that cultivated, escaped hop plants, also referred to as ferals, can be mistaken for wild varieties, especially near areas where hop is cultivated or bred. In this study, we intentionally wanted wild neomexicanus hops and collected, therefore, from remote southwest US regions in the Arizona Sky Islands where the hop plants are morphologically distinct var.
neomexicanus. Our cytological data in these wild plants (Figs. 4-6), together with previously reported meiotic segregation irregularities (Zhang et al., 2017;Easterling et al., 2018) establish that such meiotic abnormalities are clearly not limited to cultivated hop and can also occur in the wild. These findings, while limited in scope, highlight the recurrent observations of genomic instability in some members of the species. Similar phenomena have been observed Oenothera sp. and Clarkia sp., members of the Onagraceae family (Bloom, 1974;Hollister et al., 2019).
Interestingly, some of these have stabilized structural variation though specialized meiotic behaviour possibly contributing directly to speciation events (Holsinger and Ellstrand, 1984). It remains to be determined whether the evolutionary dynamics of hop has contributed to speciation or divergence in the wild, questions that can be addressed using chromosomemarking FISH probes.
Here we increased the number of known tandem repeat sequence families in hop by nearly 10 fold using an innovative bioinformatic pipeline for de novo identification, visualization, and classification of TRs from long-read sequence data. This approach and the resulting cytogenetic resources should prove useful for further investigations into evolutionary, cytogenetic, or structural genomic research in hop.

ACKNOWLEDGEMENTS:
We thank Daniel Vera for help with the tandem repeat analysis. This work was supported by a Hopsteiner Doctoral Research Fellowship to KAE (FSU OMNI Award ID: 0000030675) and an FSU Planning Grant to HWB (FSU OMNI Award ID: 0000032134).      imaged and analyzed over multiple slides (n=5) stained with HuluTR120 during various meiotic stages. More than 80 nuclei from plant TM2-82C were imaged and analyzed over multiple slides (n=4) stained with HuluTR225 during various meiotic stages. Vondrak, T., Robledillo, L. A., Novak, P., and Koblizkova, A. (2019). Genome-wide characterization of satellite DNA arrays in a complex plant genome using nanopore reads. bioRxiv. Footnotes: A. Probes 5SBob1-3, HSR1, HSR0, and TELO were previously published (Easterling et al., 2018)