Genome sequences and SNP analyses of Corynespora cassiicola from cotton and soybean in the southeastern United States reveal limited diversity

Corynespora cassiicola attackes diverse agriculturally important plants, including soybean and cotton, in the US. It is a reemerge pathogen on cotton in southeastern US. Whole genome sequences of four cotton and one soybean isolate from Tennessee were used to develop single nucleotide polymorphism markers for cotton isolates. Cotton isolates had little diversity at the genome level and very little differentiation from the soybean isolate. Analysis of 75 isolates from cotton and soybean, using targeted-sequencing of 22 polymorphic SNP sites, revealed eight multi-locus genotypes and it appears a single clonal lineage predominates across the southeastern region. The cotton and soybean genome sequences were significantly different from the public reference genome derived from a rubber isolate and the utility of these novel resources will be discussed.


Introduction
Corynespora cassiicola (Berk. & M. A. Curtis) C. T. Wei, first described in 1868 as Helminthosporium cassicola, is a pathogen of many crops [1,2]. It is an anamorphic fungus in the order Dothideomycetes in the phylum Ascomycota [3]. C. cassiicola is found on or within 530 plant species from 380 genera-including dicot, monocot, fern and cycad hosts and acts as a pathogen, saprophyte or endophyte [2]. As a pathogen, C. cassiicola infects plant leaves, stem, and roots; and has been isolated from nematodes and a human corneal infection [4,5]. Pathogenicity varies depending on the host and some isolates can infect multiple hosts while others appear to be host specific. Isolates recovered from cucumber, green pepper and hydrangea can infect scarlet sage leaves, but not vice versa [6]. Isolates recovered from papaya leaf debris caused leaf lesions on tomato, cucumber, and watermelon but are not pathogenic to papaya [7].
In the southern US, Corynespora cassiicola attacks soybean and cotton causing the foliar disease known as target spot. In soybean, it can also attack roots and the hypocotyls of seedlings [8,21]. Target spot is present in multiple soybean growing areas in the U.S. The disease is more common in humid condition. The initial visible symptom is a small reddish spot which expands into a circular or irregular reddish-brown lesion, 4-5 mm in diameter, with a targeted or zonate-pattern [22,23]. In South Carolina, yields were reduced 20% to 40% in a soybean variety field trial [22]. In 2006, among the top eight soybean producing countries, Bolivia and Argentina had the highest estimated yield losses at 500 and 45.3 thousand metric tons, respectively [24]. In 2000, Louisiana had an estimated yield loss of around 11,430 metric tons [25]. Similarly, target spot can cause significant damage to cotton leaves resulting in premature defoliation [26]. Target spot is an emerging pathogen of cotton in the Southeastern US and has been reported in Georgia, Alabama, Louisiana, Mississippi, Arkansas, North Carolina and recently in Tennessee [9,21,[27][28][29]. In highly susceptible cultivars, premature defoliation, starting from the lower canopy, can reach up to 75% and reduce the yield of seed cotton by 336 kg/ha [9]. C. cassiicola causes Corynespora Leaf Fall disease of rubber and the levels of a putative effector protein, cassiicolin, differ between aggressive and moderately aggressive isolates [30]. Investigation of the cassiicolin gene for diverse isolates revealed significant variation and may be related to host range [31].
Random amplified polymorphic DNA (RAPD) markers differentiated isolates from diverse locations and hosts although a clonal lineage from rubber was not correlated with host or location [32][33][34][35][36]. Investigations using the ITS region and other genes showed no correlation between geographical location, although in some cases there was a correlation with the host [4,37].
Our goal was to develop genetic resources for isolates of C. cassiicola from Tennessee, particularly for cotton, and to investigate genotypic diversity for isolates recovered from cotton and soybean in Tennessee and surrounding states.

Sample recovery and DNA extraction
Permission to collect samples was received from all land owners. Leaves with typical symptoms of target spot were surface sterilized with 10% chlorine for 1 min and a section of tissue at the edge of a lesion was excised and placed onto RA-amended water agar media (rifampicin 25 ppm, ampicillin 100 ppm, 20 g agar and 1 L water). Hyphal-tips were transferred to RAamended V8 agar media (15 g agar, 3 g calcium carbonate + 160 mL V8 juice + 840 mL water) and maintained at -4˚C.
For genomic DNA extraction for Whole Genome Sequencing (WGS), mycelium was grown 2 weeks at room temperature in 250 ml flasks containing 10 ml RA-V8 liquid broth (above, minus agar). The resulting mycelium was transferred to 2 ml tubes containing 2-3 glass beads, freeze dried, and powdered with a Mixer-Mill (Qiagen). Genomic DNA was extracted using a standard phenol-chloroform protocol. DNA extraction for targeted-sequencing was accomplished in a 96 well plate as described by Lamour and Finley [38].

Whole genome sequencing
Isolates selected for WGS were confirmed by sequencing the internal transcribed spacer (ITS) using the ITS5 (5' GGAAGTAAAAGTCGTAACAAGG 3') and ITS4 (5' TCCTCCGCTTATTG ATATGC 3') primers as previously described [39]. High-quality genomic DNA was sheared with a Bioruptor Plus device (Diagenode, Inc.). Briefly, genomic DNA was diluted to 10 ng/μl with TE (10 mM Tris, 1mM EDTA, pH 7.5-8.0 buffer) and 100 μl was transferred to 0.5 ml Bioruptor microtubes (Diagenode, Inc.). The samples were incubated on ice for 15 minutes and sheared with the following setting: on/off-30/90 sec for 30 cycles. The fragmented DNA was visualized on a 2% gel and 200-300 bp fragments excised and cleaned using a PureLink Quick Gel Extraction Kit (Thermo Fisher Scientific Inc.). Illumina libraries were prepared using a PCR-free KAPA Hyper Prep Kit followed by qPCR library quantitation using the KAPA Library Quantification Kit (Kapa Biosystems) and sequenced on an Illumina device. Raw sequences were deposited in National Center for Biotechnology Information (NCBI) database as BioProject (PRJNA382361).

Genome comparison of isolates from cotton and soybean to an isolate from rubber
Raw FASTQ files were quality trimmed with FASTQC and Trimmomatic version 0.33 [40,41]. Reads were mapped using CLC Genomics Workbench (Qiagen) to the public C. cassiicola genome sequence which is derived from an isolate recovered from rubber (http://genome.jgi. doe.gov/Corca1/Corca1.home.html). Resulting BAM files were processed using GATK to identify putative SNP positions [42]. Sequences were mapped requiring 90% of the sequence matches at least 90% of the reference genome. Variant calling was done with HaplotypeCaller at default settings for the haploid genome. After recommended hard filtering, SNP genotypes were assigned using custom Perl scripts (https://github.com/sandeshsth) to require a minimum of 10X and maximum of 1000X coverage and an alternate allele frequency of 100%. The impact of putative SNPs was assessed using SnpEFF [43].

Marker development for differentiating cotton isolates
To identify SNPs useful on cotton (and possibly soybean) in the southeastern region, TS_cot-ton1 was de novo assembled using CLC Genomics Workbench and the resulting contigs were used as a reference for mapping the cotton and soybean isolates. Candidate variants were identified with an alternate allele frequency of 100%. Custom Perl scripts (https://github.com/ sandeshsth/) were used to annotate the reference contigs and target regions (100bp on each side of the target SNP) were extracted and used to design general PCR primers using Batchpri-mer3 [44]. Multiplex amplification of the targets was done by Floodlight Genomics, LLC (Knoxville, TN) to produce sample-specific amplicons using an optimized Hi-Plex approach as part of a no-cost Educational and Research Outreach Program [45]. Pooled barcoded amplicons were sequenced on a HiSeq3000 device and the sample-specific sequences were aligned to the sequences used for primer design with CLC Genomics Workbench and genotypes assigned using GATK (>10X coverage and 100% alternate allele).

Isolates
Sixty-five isolates were recovered from 15 cotton cultivars planted at the West Tennessee Research and Education Center in Jackson, TN in 2015. An additional ten isolates from cotton and soybean from Florida, Louisiana, Georgia and Virginia were also included in the study ( Table 1). The year of isolation for these isolates was unknown but was prior to 2015.

Whole genome sequences
Five isolates of C. cassiicola from Tennessee were selected for WGS, including four from cotton and one from soybean. At the time of sequencing we did not have access to isolates from surrounding states. Isolates from cotton included an isolate from Jackson, TN recovered in 2013 (used to report the first occurrence of target spot on cotton in Tennessee) and three isolates recovered from cotton in Jackson, TN in 2015 [29]. The isolate from soybean was recovered from Jackson, TN in 2015. Isolates are named TS_cotton1 (2013), TS_cottton2, TS_cottton3, TS_cottton4 and TS_soybean.
An initial comparison of the genome sequences for the cotton isolates indicates they are essentially identical and the TS_cottton1 (2013), TS_cottton2 and TS_soybean isolates were analyzed further to identify SNP sites and determine overall metrics. After quality trimming, TS_cotton1 (2013), TS_cotton2, and TS_soybean had approximately 43, 8 and 6 million paired-end reads, respectively. In total, 80.4% (TS_cotton1), 70.8% (TS_cotton2), and 78.28% (TS_soybean) of the reads mapped to the rubber isolate reference genome. Greater than 95% of the annotated genes in the rubber isolate reference genome are covered. Analysis using GATK identified 807,433 variable sites of which >99% were fixed differences between the cotton and the soybean isolates compared to the isolate from rubber. Comparison of the two cotton isolates revealed 16 putative SNP sites and comparison between cotton and soybean revealed 1627 candidate SNP sites. For the 807K variable sites (between the cotton + soybean isolates and the rubber isolate), 30% are predicted to be missense and 25% silent mutations.

SNP marker development and application
De novo assembly of TS_cotton1 (2013) produced 1846 contigs with a total size of about 42Mbp, similar to the 44.5 Mbp genome available for the rubber isolate. The other three cotton isolates were mapped to the 1846 contigs and 82.7%, 96.5% and 95.3% of the reads from TS_cotton2, TS_cotton3 and TS_cotton4 mapped, respectively. A total of 408 Single Nucleotide Variant (SNV) were discovered and from these, a subset of 40 SNV's from different contigs were selected for targeted sequencing and assessment in field populations. A total of 22 SNP markers in 75 isolates of C. cassiicola were retained for analysis after removing all monomorphic markers and missing data; revealing eight unique multi-locus genotypes ( Table 2, Table 3). Genotypes are assigned from G1 to G8. The G1 genotype was the most frequent and dominated the populations recovered from cotton in TN and included all ten isolates from the other states.

Discussion
Our goal was to investigate the genetic diversity of C. cassiicola recovered from cotton and soybean in Tennessee and to investigate diversity in the southeastern region. Overall, whole genome sequencing revealed almost no differences between four cotton isolates and a limited amount of variation between the cotton isolates and an isolate from soybean. There is some evidence that isolates recovered from cotton and soybean can cause disease on cotton but not https://doi.org/10.1371/journal.pone.0184908.t002 Corynespora cassiicola on soybean [46]. Pathogenicity test showed that only soybean isolates can cause disease on soybean and isolates from cotton are more aggressive on cotton when compared to isolates from soybean. Further analysis of field isolates using a relatively small set of SNP markers indicates a very low level of genotypic variation, typical for foliar fungal pathogens spread widely as clonal lineages. This could be due to the recent introduction of a highly successful clonal lineage of C. cassiicola to TN and surrounding states [9,27,29]. Development of additional SNP markers, using WGS from a wider array of isolates would be useful and the sequences presented here will be useful in this endeavor. Although isolates from cotton and soybean were highly similar and had an estimated SNP only every 25,000bp, they were both highly dissimilar to an isolate recovered from rubber with a SNP site every 40bp. We also did a whole genome comparison to an isolate recovered from a contact lens in Malaya (NCBI Bioproject PRJNA236064) and found a similarly high level of dissimilarity with over 1M putative SNPs across 40Mbp of genome sequence (Data not shown).
When compared to the isolate pathogenic to rubber, there were more missense mutations predicted than silent mutations-which supports the notion that these isolates belong to distinct evolutionary lineages that have diverged over an extended period. A previous investigation of C. cassiicola isolates using four genetic loci placed isolates from rubber and soybean into the same, as well as, different lineages out of six total lineages [4]. Although our work has a limited scope (considering the wide host range of this organism), it suggests that a revision of the genus using whole genome data may be helpful to assign isolates to anamorphic lineages or possibly distinct species.
The limited number of candidate SNP loci identified by WGS suggests a single clone may predominate in the southeastern region. This is not surprising as C. cassiicola is apparently new to the region and can produce copious airborne spores on foliar lesions. Further work characterizing the pathogen over time will be useful to track the epidemiology and monitor for cryptic sexual recombination and/or the introduction of novel clonal lineages [9,27,29,47].

Author Contributions
Conceptualization: Sandesh K. Shrestha. Data curation: Sandesh K. Shrestha, Kurt Lamour. Table 3. Summary data for the eight unique genotypes of C. cassiicola. Genotypes are in order, S1 to S22 as presented in Table 2.