Dominance of recombinant cotton leaf curl Multan-Rajasthan virus associated with cotton leaf curl disease outbreak in northwest India.

Cotton leaf curl disease (CLCuD), caused by whitefly (Bemisiatabaci) transmitted single-stranded DNA viruses belonging to the Genus, Begomovirus (family, Geminiviridae) in association with satellite molecules; is responsible for major economic losses in cotton in three northwest (NW) Indian states Haryana, Punjab, and Rajasthan. Annual CLCuD incidences during 2012 to 2014 were estimated to be 37.5%, 63.6%, and 38.8% respectively. Cotton leaves were collected from symptomatic plants annually for three years and subjected to DNA isolation, followed by rolling circle amplification (RCA), cloning, and DNA sequencing of apparently full-length begomoviral genomes and associated betasatellites and alphasatellites. Among the thirteen CLCuD-begomoviral genomes recovered, eight were identified as Cotton leaf curl Multan virus-Rajasthan (CLCuMuV-Ra), one as -Pakistan (PK) and another as -Faisalabad (Fai), whereas, three were as Cotton leaf curl Kokhran virus-Burewala (CLCuKoV-Bu), indicating that CLCuMuV-Ra was the most prevalent begomovirus species. Five of the eight CLCuMuV-Ra sequences were found to be recombinants. The CLCuMuV-Ra- associated satellites consisted of Cotton leaf curl Multan betasatellite (CLCuMB), and Gossypium darwinii symptomless alphasatellite (GDarSLA), and Croton yellow vein mosaic alphasatellite (CrYVMoA). The second most abundant helper virus species, CLCuKoV-Bu, was associated with CLCuMB and GDarSLA.


Introduction
Cotton (Gossypium hirsutum), the most important commercial crop, produces 27% of world's cotton occupying the largest production area of 11.9 million hectares (Mha) in India comprising 38% of total land devoted to cotton cultivation worldwide [1]. The cotton leaf curl disease (CLCuD) has become a major economic constraint to cotton production in the Haryana, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 the disease scenario. Most of the irrigated cotton in NW India was infected by CLCuD-begomovirus showing about 97% incidence with 53.6% yield loss in some farmer's field [3,4]. Afterward, the CLCuD is increased year by year and recently emerged as a devastating disease in entire 1.1 Mha cotton growing areas of NW India.
CLCuD incidence was found to be quite high as 51. .8% in all the surveys made from the years of 2012 to 2014 in Punjab state (India), but in Haryana and Rajasthan states significant variations of 32.7 to 77.5% and 8.9 to 59.2% respectively were observed. These observations suggest the changes in CLCuD-begomovirus genetic composition in NW India. To explain the basis for the suspected changes in the etiology of CLCuD-begomoviruses and satellite molecules associated with the CLCuD outbreak, the infected cotton leaf samples collected from different areas were analyzed based on molecular approaches for determination of the specific begomoviruses, and their satellite molecules, those are most prevalent in NW India.

Estimation of disease incidence and collection of cotton plant and whitefly samples
A survey was made in the years of 2012, 2013, and 2014 to estimate CLCuD incidence in cotton growing areas of NW India. Fatehabad, Hisar, Rohtak and Sirsa districts of Haryana state; Bathinda, Faridkot, Fazilka and Mansa districts of Punjab state; and Hanumangarh and Sri Ganganagar districts of Rajasthan state were considered for the survey. Cotton plants exhibiting vein thickening, leaf curling, leaf cupping, and leaf enations were considered to be the symptoms of the CLCuD. The percent disease incidence (PDI) was estimated using a standard method based on the number of plants infected compared to the number of plants counted ranging from 100-200 randomly per field site multiplied by 100 with three replications. Leaves of symptomatic cotton plants were randomly collected, placed in labelled polythene bags, transported to the laboratory, and stored at 4˚C prior to the isolation of DNA. The adult whiteflies were collected from the CLCuD affected cotton plants with the help of an aspirator from three areas, (i) Agricultural Research Station (ARS), Swami Keshwanand Rajasthan Agricultural University (SKRAU), Sri Ganganagar, Rajasthan; (ii) Regional Station (RS), Central Institute for Cotton Research (CICR), Sirsa, Haryana; and (iii) Regional Research Station (RRS), Punjab Agricultural University (PAU), Bhathinda, Punjab (India). The whiteflies were placed in the collection tubes containing 95% ethanol and brought to the laboratory for molecular assay.

Source of whitefly for pathogenicity test
The whitefly used for virus transmission in this study was initially obtained from the eggs laid on bottle gourd (Lagenariasicerariacv. Pusa Naveen) leaves of the Experimental Farm, CICR, Sirsa, Haryana. The adult whiteflies emerged from the pupae were transferred on the healthy cotton (cv. RST-9) and bottle gourd (cv. Pusa Naveen), reared and multiplied in the insectproof chamber for 4-5 generations. A homogeneous population of whiteflies was used for the virus transmission studies. Health of whitefly populations was confirmed by PCR test using the primer pair, C3F (5'AATTATGTCGAAGCGAGCTG3') and G1R (5'TAATATCAATTC GTTACAGAG3') [36] targeting the complete CP gene (771 bp) of CLCuD-begomovirus genome in randomly sampled whiteflies reared on cotton.

Ethics statement
No permits are required for the collection of cotton plant and whitefly samples from the farmer's field in cotton growing areas of NW India surveyed.

Infectivity test
Infectivity of the CLCuD was tested through adult whitefly inoculation onto susceptible cotton plant in the greenhouse. The methods of the maintenance of healthy whitefly, and inoculation of the virus through whitefly, developed by Godara et al. [44] were used in the study. Infectivity of CLCuD affected cotton samples randomly collected from different areas of NW India was tested using susceptible cotton cv. RST-9. Tender symptomatic twigs of cotton samples were used as a source of virus. The healthy whiteflies were collected from the culture stock in a small collection cage and transferred to the symptomatic cotton twig for the acquisition access period (AAP) of 24h. After completion of AAP, the viruliferous whiteflies were transferred to the healthy test cotton plants for the inoculation acquisition period (IAP) of 24h. After AAP, the viruliferous whiteflies were collected using an aspirator (glass tube fitted with nylon pipe and muslin cloth). Cotton seedlings at two-leaf stage were used for inoculation. Eight to ten viruliferous whiteflies per healthy cotton plant were released and covered with small plastic cages which covered each plant individually. After IAP, the whiteflies were killed by spraying insecticide, Pyriproxifeen 10% EC (Lano) @ 0.2%. The whitefly inoculated plants were kept in an insect-proof greenhouse for six weeks and observed regularly for symptom development. The leaves of inoculated symptomatic and non-symptomatic plants were processed for PCR analysis for confirmation of the CLCuD-begomovirus infection.

Total DNA isolation from plant and whitefly samples
Total plant DNA was extracted from 100 mg of infected cotton leaf tissue using the Cetyltrimethylammonium bromide (CTAB) method [45]. The DNA was visualized by electrophoresis in a 1% agarose gel in 1X TAE buffer (pH 8.0) and quality was evaluated by the spectrometry using a Nanodrop (Thermo Fisher Scientific Inc, Waltham, USA). Leaves of two healthy cotton plants maintained in the insect-proof greenhouse were taken for isolation of total plant DNA and used as controls throughout the experiments.
For isolation of the total DNA from whitefly population, a pool of 10 numbers of adult whiteflies was taken from the collection tube and the ethanol was air-dried for 2-5 min on a piece of filter paper. Whiteflies were placed in 1.5 ml micro-centrifuge tube and the total DNA was extracted using Nucleo-pore DNA Sure Tissue Mini Kit (Genetix Biotech Asia Pvt. Ltd. India, Cat No. NP-61305) according to the manufacturer's instructions. The total DNA was eluted in 100 μL pre-warmed buffer BET (70˚C) in a 1.5 ml microcentrifuge tube.

Cloning of full-length genome of CLCuD-begomovirus, satellite molecules and mtCOI gene of whitefly
The full-length circular genomes of begomovirus and satellite molecules were amplified from the infected cotton plant samples through rolling circle amplification (RCA) using phi29 DNA polymerase [46]. The concatamers were incubated with 1U of the restriction enzymes Bam HI, Eco RI, Hind III, Sal I and Xba I separately to release the full-length sequence of begomovirus (~2.7 kb) and satellite molecule (1.3-1.4 kb). The digested RCA products were eluted from 1% agarose gel, column purified (Qiagen, Maryland, USA: Cat No. 28115), ligated to pUC18 (Thermo Fisher Scientific, MA, USA), and transformed into competent E. coli DH5α cell using the standard protocol [47]. The positive clones were selected from the transformation Petriplates, cultured and the plasmid DNA was purified (Real Biotech Corporation, Taipei, Taiwan, Cat No. QPD100), and sequenced in an ABI 3130 automatic sequencer (Chromous Biotech Pvt. Ltd., Bangalore, India) using vector-derived M13 forward and reverse primers. Additional primers were designed based on the Sanger sequencing results and used for primer walking to obtain the complete genome sequence of the begomovius and satellite molecules, respectively.
The total DNA extracted from whitefly was used as the template for amplification of the partial mitochondrial cytochrome oxidase I (mtCOI) gene of whitefly using allele specific primers; CI-J Forward: 5'TTGATTTTTTGGTCATCCAGAAGT3' and TL2 Reverse: 5'TCCA ATGCACTAATCTGCCATATTA3' [48,49]. The PCR amplicons were cloned in T&A cloning vector system (RBC, Taipei, Taiwan) and sequenced in an automatic sequencer (ABI 3130, Chromous Biotech Pvt. Ltd., Bangalore, India).

Sequence analysis and phylogenetic relationship
All the DNA sequences of begomoviuses, satellite molecules and mtCOI gene of whitefly were quality-checked against the electropherogram and manually edited. The vector sequences were removed using the Bioedit version 7.1.3 [50]. The coding regions of the virus and satellite DNAs were identified using the NCBI ORF finder (www.ncbi.nlm.nih.gov/orffinder/) and annotated in the context of the expected coding and non-coding regions. All the sequences were searched for similarities using BLASTn tools, and the respective top 10-15 sequence matches available in the GenBank (http://www.ncbi.nlm.nih.gov) were downloaded.
For sequence analysis of the present CLCuD-begomoviruses, full-length genome sequences of other CLCuD-begomoviruses available in NCBI-GenBank were retrieved [4,7,22,29,36] including the recognized CLCuD-begomoviruses, CLCuAlV, CLCuMuV and CLCuKoV [51]. One sequence from each of the Cotton leaf curl Gezira virus (CLCuGeV: AF260241) and Cotton leaf curl Bangalore virus (CLCuBaV: AY705380) were taken as outgroups. For sequence analysis of the present satellite molecules, other satellite molecules associated with CLCuD-begomovirus and other begomoviruses were retrieved from NCBI-GenBank. To ascertain the genetic group status of whitefly populations, the nucleotide sequences of the partial mtCOIgene of whitefly were analyzed and compared with reference sequences retrieved from the NCBI-GenBank.
For phylogenetic analysis, the sequences were aligned using MEGA, version 6.06 [52] implemented in Clustal W v1.6 [53]. The phylogenetic tree was re-constructed using Neighbor-Joining (NJ) [54] with 1000 bootstrap iterations. To determine the pairwise nucleotide identity, a pairwise matrix was computed using the Sequence Demarcation Tool (SDT) version 1.2 [55].

Recombination analysis
Recombination analysis was carried out using Recombination Detection Program (RDP) version 4.66 implementing seven algorithms BootScan, Chimera, Geneconv, Maxchi, RDP, SiScan and 3Seq. The default settings were used to establish the Bonferroni-corrected highest acceptable P-value cut-off of 0.05 to identify predicted recombinants [56]. Each event was verified based on a breakpoint distribution plot, and results were compared against the UPGMA phylogenetic trees produced with genetic regions from major and minor parents. The recombination events detected in begomovirus genome by three or more, and in betasatellite and alphasatellite genomes by two or more algorithms were considered to be true recombination events.

Changing Cotton Leaf Curl Disease (CLCuD) scenario in NW India
The surveys revealed a variation of CLCuD incidence in cotton growing areas of NW India from year to year and area to area (Fig 1). CLCuD incidence was estimated to 32.7, 51.3 and 28.6% in 2012; 77.5, 54.1 and 59.2% in 2013; and 49.6, 57.8 and 8.9% in 2014 in Haryana, Punjab, and Rajasthan states, respectively (S1 Table). Overall disease incidence was recorded to 37.5 in 2012, 63.6 in 2013 and 38.8% in 2014 in NW India. Interestingly, the disease was very high as 77.5, 54.1 and 59.2% in Haryana, Punjab, and Rajasthan, respectively in 2013. Interestingly, CLCuD incidence was observed constantly as high as 51.3-57.8% in all the three years in Punjab. In 2010, CLCuD was also reported very high ranging from 50 to 100% incidence in some districts of Punjab and Rajasthan, but sporadic (0-30%) in Fazilka districts of Punjab and Hanumangarh districts of Rajasthan [4]. The previous and the present data indicated that the etiology of CLCuD-begomovirus complex in NW India is changing from year to year and area to area.

Whitely cryptic species Asia II 1 is prevalent in NW India and it transmits CLCuD-begomovirus with high efficiency
In the present study, three sequences of the partial mtCOI gene (867 bp) of three whitefly samples, accession no.'s MN329161, MN329162 and MN329163 were analyzed. All the three sequences had maximum of 94-95% nt identity with the sequences of whitefly cryptic specie-sAsia II 1-[China: Zhejiang] (AJ867557) (S2 Table) and they clustered in one phylogenetic group along with Asia II 1 (S1 Fig). These data concluded that the present whitefly populations are whitefly cryptic species Asia II 1. Based on the mtCOI sequence analysis, Ellango et al. [57] reported earlier that whitefly cryptic species Asia II 1 is mostly occurred in the cotton growing areas of NW India. This cryptic species Asia II 1 is also reported to be the most abundant and associated with a high CLCuD outbreak in the cotton growing areas of Pakistan [58].
The present study demonstrated the occurrence of CLCuMuV-Ra, CLCuMuV-PK, CLCu-MuV-Fai, and CLCuKoV-Bu stains in NW India, where CLCuMuV-Ra is the most prevalent strain. Earlier, it has been reported that CLCuKoV-Bu strain was dominant population of CLCuD-begomovirus in Punjab and Rajasthan states of India during the years of 2009 to 2010 [4]. After that CLCuMuV has been reported to be associated with CLCuD outbreak in both the states during 2013-2015 [29,30]. Thus, the previous [29,30] and the present studies showed shifting of CLCuD-begomovirus from time to time in NW India, indicating return of CLCu-MuV-Ra in NW India, corroborating with the recent report of rebound of CLCuMuV for CLCuD outbreak in Pakistan [22].

Single betasatellite species CLCuMB is associated with CLCuD outbreak in NW India
Three betasatellite sequences with varying length (1282-1373 nt) were obtained from the CLCuD-begomovirus infected cotton samples. All the sequences had a~357nt βC1 gene, typical to βC1 gene of other betasatellite, in the complementary sense strand ( Table 3). The present betasatellites had 87-92% nt identity among themselves, and they were related to Cotton leaf curl Multan betasatellite (CLCuMB) by 83-95% nt identity. Based on species demarcation cutoff value at �78% nt identity proposed by Briddon et al. [60], the present betasatellites are the members of CLCuMB; reveals that occurrence of a single betasatellite species CLCuMB in association with CLCuD outbreak in NW India. Recently Zubair et al. [22] also identified three types of betasatellites, CLCuMB Bur , CLCuMB Veh and CLCuMB Mul in association with CLCuD-begomovirus in Pakistan. In the phylogenetic analysis all the three present betasatellites made one group along with all the other CLCuMB; but upon closer inspection this group was also divided into two subgroups; subgroup-I (SG-I) and -II (SG-II). The SG-I consisted of two betasatellites KT228325 and KT228326, and they showed 93-94% nt identity with both the betasatellites CLCuMB Bur (CLCuMB-PK:Veh:MZ-35:16:KX697600) and CLCuMB Veh (CLCuMB-PK:Veh:MZ-37:16: KX697602). The SG-II consisted of one betasatellite KM103522 and it showed 92% nt identity with CLCuMB Mul (CLCuMB-PK:Veh:MZ-33:16: KX697598) (Fig 3a).

Alphasatellite species GDarSLA is predominantly associated with CLCuDbegomovirus complex in NW India
Six alphasatellite sequences with varying length of 1353-1386 nt were obtained from the CLCuD-begomovirus infected cotton samples. All the sequences had a typical Rep gene of 948 nt in the virion-sense strand (Table 3). Sequence analysis showed that the present alphasatellite sequences had 47-95% nt identity among themselves. Five alphasatellite sequences, KM103523, KM103525, KM103526, KT228319 and KT228320 were related to alphasatellite GDarSLA-[IN-Pun-C35-15](MF929023) by 79-98% nt identity. The alphasatellite sequence, KM103524 was related to Croton yellow vein mosaic alphasatellite (CrYVMoA:KC577541) by 95% nt identity. Considering the classification of the family Alphasatellitidae using species demarcation cut-off value at �89% nt identity proposed by Briddon et al. [31], the present alphasatellite was placed under two alphasatellite species, GDarSLA and CrYVMoA.  Corroborating with the pairwise nt identity result, five present alphasatellite sequences were phylogenetically affiliated to GDarSLA and one was to CrYVMoA (Fig 3b). Thus, the present study reveals occurrence of two alphasatellite species GDarSLA and CrYVMA in CLCuD complex in NW India, where GDarSLA is predominant. Earlier, occurrence of seven alphasatellites, GDarSLA, Cotton leaf curl burewala alphasatellite (CLCuBuA), CLCuMuA, Okra leaf curl alphasatellite (OLCuA), Tomato leaf curl alphasatellite (ToLCA), Ageratum yellow vein  India alphasatellite (AYVIA) and Gaur leaf curl alphasatellite (GLCuA) have been reported in cotton growing areas of NW India [7,29,30]. However, of them, except GDarSLA and CLCu-MuA, others were not considered as species in the recent classification given by Briddon et al. [31]. Therefore, it needs proper classification system for CLCuD associated alphasatellites in order to eliminate the taxonomic ambiguity.

Recombination is the common phenomenon for evolution of CLCuDbegomovirus variants and CLCuMuV-Ra strain is highly recombinant
The present CLCuD-begomovirus DNA-A sequences showed clear recombination events. Of 13 present sequences nine were detected as recombinants involving 12 recombination events. Two patterns of recombination were observed, recombination involving (i) coding and IR regions for CLCuMuV-Ra and -Fai strains, and (ii) IR region for CLCuKoV-Bu strain ( Table 4; Fig 4). But recombination was not detected for CLCuMuV-PK strain. Inter-species recombination events were common in all the recombinant sequences, as involvement of inter-species donor sequences, CLCuMuV x CLCuBaV/CLCuKoV for CLCuMuV-Ra, CLCuMuV x CLCuAlV/ CLCuKoV for CLCuMuV-Fai, and ToLCuNDV/Croton yellow vein mosaic virus (CrYVMV)/ CLCuKoV x CLCuBaV/CLCuMuV for CLCuKoV-Bu strains were detected. The CLCuMuV-Ra strain was detected as highly recombinant and it has evolved due to inter-species recombination involving multiple begomoviruses, CLCuMuV, CLCuKoV, and CLCuBaV as donor sequences.
On the other hand, although CLCuMuV-Ra strain reported from Pakistan is a recombinant but it has evolved due to inter-species recombination involving donor sequences, CLCuMuV and CLCuKoV [22]. Therefore, CLCuMuV-Ra strain of India is distinct from CLCuMuV-Ra of Pakistan. In the present study, they were also separately placed in the phylogenetic tree (Fig 2). The betasatellite and most of the alphasatellites associated with CLCuD-begomovirus are recombinants All the present betasatellites were found to be recombinants showing five clear recombination events (Table 5). Betasatellite sequences KT228325 and KM103522 showed recombination in βC1 gene and KT228326 in SCR. The SCR region has been considered as a hotspot for recombination in CLCuMB [22]. However, in this study most of the recombination in the present betasatellites was detected in the βC1 gene indicating the coding region is also a hotspot for the recombination of CLCuMB. Of six present alphasatellites tested, four were recombinants involving the Rep gene and the A-rich regions for recombination (Table 5). Interestingly, all the present recombinant alphasatellites are GDarSLA. Alphasatellite sequences KM103526 and KT228319 showed recombination in both the Rep gene and A-rich region, and KM103523 in Rep, and KM103525 in A-rich region.
The evolution of recombinant betasatellite has been reported to be associated with resistance breaking in cotton against CLCuD [37,38]. Recently, recombinant CLCuMB Bur in association with CLCuKoV-Bu strain has been reported to cause resistance breaking in cotton varieties in Pakistan [61]. Thus, the present study indicated that recombinant CLCuMB may be the cause of the increased CLCuD outbreak in NW India.

CLCuKoV-Bu strain of NW India contains truncated and extended ORFs
In closer inspection, a significant sequence variation was observed in C2, C4, and C5 ORFs among the present CLCuMuV strains (-Ra, -Fai, -Pak) and CLCuKoV-Bu strain. The ORF C2 of all the present CLCuMuV strains encodes full length TrAP of 150 amino acid (aa) (wild type), whereas all the CLCuKoV-Bu strains encode a truncated TrAP of 69 aa (mutant) ( Table 1). The ORF C4 of the present CLCuMuV strains encodes full length protein of 100 aa (wild type), whereas all the present CLCuKoV-Bu strain encode an extended protein of 146 aa (mutant) ( Table 1). The C5 ORF of the present CLCuMuV strains encode full length protein of 243 aa (wild type), whereas all the CLCuKoV-Bu strains encode a truncated protein of 174 aa (mutant) ( Table 1).  TrAP encoded by C2 ORF is a multi-functional protein and it functions as a pathogenicity determinant [62] and acts as suppressor of PTGS [63]. CLCuKoV-Bu strain which was dominant in Pakistan during the year 2000 and onward encoded truncated TrAP of 35 aa [38,43]. The TrAP of the present CLCuKoV-Bu strain is truncated (69 aa) but larger in length than that of CLCuKoV-Bu of Pakistan, indicating differences in suppressor activities between CLCuKoV-Bu of Pakistan and India. The truncated TrAP of CLCuKoV-Bu has been reported to overcome resistance in cotton [38,43,64]. Earlier, CLCuMuV has been shown to encode 100 aa C4 proteins, whereas CLCuKoV-Bu to encode an extended protein of 181 aa [4]. Thus, the previous [4] and the present study show that the C4 ORF of CLCuKoV-Bu has periodically undergone significant changes with respect to amino acid encoded by ORF C4. Earlier, majorities of CLCuKoV-Bu and some of CLCuMuV-Ra strains of NW India showed the presence of C5 ORF in their genome, but this ORF is not reported in the genome of the cotton-infecting begomoviruses in Pakistan [4,7].

Conclusions
The present study reveals that CLCuD is a major constraint in the cultivation of cotton and the disease outbreak has been changing constantly in year to year and area to area in NW India. The CLCuMuV-Ra strain is identified as the most predominant begomovirus, which is evolved from inter-species recombination. To a much lesser extent, the CLCuKoV-Bu, CLCuMuV-PK, CLCuMuV-Fai were identified from symptomatic cotton plants of NW India. Single betasatellite species CLCuMB, and more numbers of alphasatellite species, GDarSLA and CrYVMoA are associated with the CLCuD complex. Therefore, it is concluded that the recombinant CLCuD-begomovirus, particularly recombinant CLCuMuV-Ra, in association with recombinant CLCuMB betasatellite are the important causes for outbreak of CLCuD in NW India in present condition.