Genome skimming approach reveals the gene arrangements in the chloroplast genomes of the highly endangered Crocus L. species: Crocus istanbulensis (B.Mathew) Rukšāns

Crocus istanbulensis (B.Mathew) Rukšāns is one of the most endangered Crocus species in the world and has an extremely limited distribution range in Istanbul. Our recent field work indicates that no more than one hundred individuals remain in the wild. In the present study, we used genome skimming to determine the complete chloroplast (cp) genome sequences of six C. istanbulensis individuals collected from the locus classicus. The cp genome of C. istanbulensis has 151,199 base pairs (bp), with a large single-copy (LSC) (81,197 bp), small single copy (SSC) (17,524 bp) and two inverted repeat (IR) regions of 26,236 bp each. The cp genome contains 132 genes, of which 86 are protein-coding (PCGs), 8 are rRNA and 38 are tRNA genes. Most of the repeats are found in intergenic spacers of Crocus species. Mononucleotide repeats were most abundant, accounting for over 80% of total repeats. The cp genome contained four palindrome repeats and one forward repeat. Comparative analyses among other Iridaceae species identified one inversion in the terminal positions of LSC region and three different gene (psbA, rps3 and rpl22) arrangements in C. istanbulensis that were not reported previously. To measure selective pressure in the exons of chloroplast coding sequences, we performed a sequence analysis of plastome-encoded genes. A total of seven genes (accD, rpoC2, psbK, rps12, ccsA, clpP and ycf2) were detected under positive selection in the cp genome. Alignment-free sequence comparison showed an extremely low sequence diversity across naturally occurring C. istanbulensis specimens. All six sequenced individuals shared the same cp haplotype. In summary, this study will aid further research on the molecular evolution and development of ex situ conservation strategies of C. istanbulensis.

Introduction Crocus is one of the largest genera of the family Iridaceae and consists of more than 200 species occurring from Western Europe and Northwestern Africa to Western China with the largest diversity in the Balkan Peninsula and Turkey [1,2]. At present, the genus is represented in Turkey by 134 species, of which 117 are endemic, making it a biodiversity hotspot important for the conservation of Crocus species [1][2][3][4]. Some species of Crocus are economically important and have been used in the production of dye and perfume as well as in medicine. Despite their ecological and economic significance, most Crocus taxa are highly endangered because of anthropogenic activities such as mining, road construction, overgrazing, hydroelectric power stations, wind power stations and city expansion. The genus is characterized by slender grasslike leaves; white, yellow, blue, lilac or purple flowers; and corms with tunics. Since many of the diagnostic characters of this genus are relatively difficult to detect (such as characteristics of the underground corm and tunic, and the color and surface features of rarely collected seeds), integrative approaches including morphological and genetic analysis are now the preferred method for elucidating taxonomic ambiguities and phylogenetic questions [5].
Crocus istanbulensis (B.Mathew) Rukšāns was described by Mathew [1] as a subspecies of its relative C. olivieri J.Gay. Rukšāns [2] raised this taxon to the species level based on results by Erol & Küçüker [6]. C. istanbulensis is, one of the most endangered Crocus species in the world, having not been observed anywhere except in Istanbul. Its habitat is surrounded by highways, new human settlements and other anthropogenic activities resulting in soil alternation and destabilization. In particular, controversial forestation activities are a major factor in preventing the continued reproduction of C. istanbulensis because they destroy the soil and maquis vegetation of its habitat. During our last field trip to the locus classicus in winter of 2019, we found a total of only 25 individuals and it is estimated that no more than 100 individuals remain in the wild. The need to protect this plant is urgent and in situ and ex situ studies should start simultaneously to this end. To our knowledge, no genetic characterization studies have previously been carried out on C. istanbulensis and filling this knowledge gap was the primary motivation for this study. Analysing chloroplast genomes serves as a good starting point for the genetic characterization of this highly endangered species, as chloroplast genome sequences have been used extensively in plant molecular phylogenetics, population genetics and conservation genetics studies due to their slower rate of evolution compared with nuclear genomes, maternal inheritance and lower rate of recombination [7,8]. Therefore, whole chloroplast genome sequences can provide a wealth of genetic information and are useful molecular markers for efficient conservation and management strategies [9][10][11]. Typically, the chloroplast genome maintains a conserved circular and quadripartite structure, with a pair of inverted repeat regions that are located between large single copy (LSC) and small single copy (SSC) regions, harbouring about 110-130 genes, with about 80 protein-coding genes, 4 rRNAs and 30 tRNAs.
Genome skimming is a rapid and cost effective strategy for recovering plastid and mitochondrial genomes using next generation sequencing technology [12,13]. In this study, we sequenced the chloroplast genome sequences of six specimens of C. istanbulensis using DNA nanoball and combinatorial probe anchor synthesis on the BGI-Seq 500 platform. Our main objectives were to: (i) obtain information regarding the sequence and structural characterization of C. istanbulensis cpDNA, (ii) test whether complete chloroplast genomes in C. istanbulensis demonstrates structural rearrangements compared with other Iridaceae taxa and (iii) detect whether the genes underwent positive selection.

Plant sampling and total DNA extraction
Specimens were collected in January 2019 from Taşdelen state forest in the Ç ekmeköy district in Istanbul, Turkey. Permission for collecting specimens was granted by Republic of Turkey Ministry of Agriculture and Forestry (No:53231444-100.05-4722). Due to the extremely low number of individual and limited distribution area of about 4000 m 2 , only leaves of eight plant specimens were collected for total DNA isolation, the corms were not dug up or disturbed. Since the meristematic elongation zone of Crocus leaves is located at the leaf base, the leaves continued to grow and develop afterwards. Sampling was done in a way that would cause the least possible damage to the plant. The leaves were immediately frozen in liquid nitrogen and stored at −80˚C until DNA extraction. Approximately 750 mg of freshly frozen leaves were used for DNA extraction according to Healey [14]. The DNA concentration of each sample was measured using Qubit dsDNA HS Assay Kit (Life Technologies). DNA purity was assessed by measuring A260/280 absorbance ratio using a Nanodrop ND-2000c spectrophotometer (Nanodrop Technologies) and agarose gel electrophoresis to ensure high-molecular-weight DNA integrity. Only six DNA samples that had a A260/280 value between 1.7 and 1.9, and a concentration of >200 ng/μl (in total volume~40 μl) were selected for library preparation and sequencing.

DNA sequencing
Prior to library constructions, six qualified DNA samples were fragmented into 150-250 bp fragments using Covaris technology, then fragment size distributions were checked using the QIAxcel Advanced System (Qiagen) and quantified using the Qubit dsDNA HS Assay Kit (Life Technologies). End-repair of DNA fragments, addition of an adenine residue to the 3 0 fragment ends, adaptor ligation, and rolling circle amplification (RCA) were performed according to MGIEasy FS DNA Library Prep Set. Each DNA nanoballs (DNBs) were loaded onto a sequencing flow cell and then processed for 101 bp paired-end sequencing on the BGI-SEQ-500 platform. The raw image files obtained from the sequencing were processed using BGISEQ-500 basecalling software and the raw sequence data were saved in ".fastq" format. The raw fastq files were deposited in the Sequence Read Archives (SRA) of the National Center Biotechnology Information (NCBI) under Bioproject number PRJNA599306.

Comparative chloroplast genome analysis in Iridaceae
To infer evolutionary events such as sequence divergence, gene order rearrangements, the expansion and contraction of the inverted repeats in Iridaceae, we used the online webtool Irscope [22] to compare the complete cpDNA of C. istanbulensis with C. sativus L., C. cartwrightianus Herb., Iris missouriensis Nutt., Iris sanguinea Donn ex Hornem., Iris gatesii Foster and Geosiris australiensis B.Gray & Y.W.Low. Using the Irscope tool, we found and visualized the structural organization of junction sites connecting two inverted repeats (IRs) to long single-copy (LSC) and short single-copy (SSC) regions within Iridaceae [22]. We used the geneCo [23] software for the construction of a genome map and genome map comparison between Crocus species. To measure genetic distance and divergence between six C. istanbulensis individuals and other Iridaceae species, we applied an alignment-free, kmer-based approach using the accurate genomic distance estimation feature of Skmer v3.2.1 [24].

Positive selection analysis of PCGs in Iridaceae
For the accurate detection of site-specific positive selection in the protein-coding sequences of Iridaceae, a Nextflow pipeline, which is a scalable and reproducible scientific workflow designed for positive selection analysis, called "PoSeiDon" [25] was employed using default parameters. Briefly, the orthologous protein-coding sequences of seven Iridaceae species were manually extracted from GenBank files (".gbk") and validated using SwiftOrtho [26]. Following in-frame alignment, indel correction and the calculation of phylogenetic tree, the best-fitting nucleotide substitution model was selected using MODELTEST. Then, positively selected sites (ω>1) under varying models M1a vs. M2a, M7 vs. M8 within the PAML suite (v4.9) and M8a vs. M8 by Swanson et al. (2003) [27] were tested using three independent codon models F1X4, F3X4, F6. After this calculation, we used a Bayes empirical Bayes (BEB) approach [28] to calculate posterior probability (PP) of a codon coming from a site class of ω>1. Genes were considered to be positively selected if positively selected sites (ω >1) were assigned a PP > 0.95.

Chloroplast genome assembly and annotation
After trimming of adaptor sequences and low-quality sequences, a total of 114.1 million clean reads comprising 11.41 gigabases (Gb) were generated from C. istanbulensis specimens. On average 1.90 Gb were generated per individual, with a mean sequencing depth of 532X (S1 Table) and the sequence of the chloroplast genome was registered into GenBank with the accession number MN254968. The percentage of reads covering the chloroplast genome was between 8.56% (~73 million bases) and 8.44% (~94 million bases), the average being 8.47% (~81 million bases) (S1 Table). The entire chloroplast genome of C. istanbulensis consisted of 151,199 bp nucleotides, divided into four regions, which included a LSC region of 81,197 bp, a SSC region of 17,524 bp, separated by two inverted repeats (IR) regions of 26,239 bp each. These lengths were found be consistent with previous studies [29]. Previous cp genome studies suggest that angiosperm cp genomes are highly conserved, typically about 115-165 kb in size and a quadripartite structure with two IR regions (IRa and IRb), a LSC region and a SSC region [30]. The overall GC content of the C. istanbulensis cp genome was 37.6%. Among the LSC, SSC and inverted repeat regions, the highest GC content was found in the IR regions (42.75%), and GC contents of the LSC and SSC regions were 35.69%, and 30.97%, respectively. The IR region had an overall higher GC content due to the presence of more of rRNA and tRNA genes, which have high GC content (Table 1). This result was compatible with previous findings on the complete cpDNA of Crocus and Iris species [31][32][33]. Through gene annotation, we found that the cp genomes encode 132 genes, including 86 protein-coding genes (PCGs), 8 rRNA genes and 38 tRNA genes (Fig 1, Table 1).
The LSC region includes 62 protein-coding and 21 tRNA genes, while SSC includes 12 protein-coding and 1 tRNA genes. The IRa and IRb regions include 6 protein-coding genes 8 tRNA genes, and 4 rRNA genes (S2 Table). In other words, 6 protein-coding genes, 8 tRNA genes, and 4 rRNAs were duplicated in the IR regions. As expected, cp genes are functionally classified into four categories (Table 2), of which the photosynthetic pathway contains the most PCGs. All but 9 of the PCGs did not contain introns, and of these 5 (atpF, ndhA, rps16, rpoC1 and clpP) contain 1 intron, while 4 (rps12, ndhB, ycf3 and rpl2) contain 2 introns ( Table 2). As in a previous study, 3 genes (rps12, clpP, and ycf3) were found to possess 2 introns [29]. Moreover, rps12 was found to be a trans-spliced gene [34]. The longest intron with a length of 2,639 bp was trnK-UUU, which is found in the matK gene (Fig 1). matK coding sequence (CDS) and many other regions were tested for species identification and phylogeny reconstruction [35,36]. The non-coding sequence trnH (GUG)-psbA was found to be variable and thus useful for phylogeny and it has better resolution potential than matK and rbcL [36]. Such variable regions have the potential for Crocus species delimitation or phylogeny studies in future work.

Junction characteristics, IR expansion, and contraction
Although the chloroplast sequences of flowering plants generally conserve a typical quadripartite structure, rearrangements or contractions/expansions of inverted repeats and single copy regions can lead to changes in genome size and allow certain genes to enter the inverted region (IR) or single copy region (SCR). Accordingly, the contraction and expansion of the two IR regions can be thought of as an indicator of chloroplast genome evolution, especially between closely related genera [37,38]. We compared the inverted repeats and single copy regions boundaries of the seven Iridaceae chloroplast genomes (C. istanbulensis, C. cartwrightianus, C. sativus, I. missouriensis, I. sanguinea, I. gatesii and G. australiensis) (Fig 2). Although the IR boundary regions varied slightly, they all generally fit the quadripartite structure pattern. Moreover, we observed no significant change in contraction and expansion

PLOS ONE
of inverted repeats (IRs), except for in G. australiensis, whose LSC and SSC regions were contracted and IRb/a regions were expanded nearly 1.5 fold. In general, most size changes in the cp genomes of angiosperms can be explained by rare deletions and duplications that result in massive changes in the size of the IR region [39]. A notable difference was found in psbA, rps3 and rpl22 gene arrangements among Crocus species, indicating an inversion or reversal of gene order in LSC region terminal positions (Fig 2). To obtain more precise information about cp genome arrangements, a genome map comparison analysis was carried out with a genbank annotation file (".gbk") of Crocus species. Comparison analysis clearly indicates an inversion at the junction site of the LSC region (Fig 3).
Moreover, as can be seen in S1 Fig, rps19 and psbA genes are located in the flanking region of the LSC/IRb junction and the rpl22 gene is located in the LSC terminal region close to IRa in C. istanbulensis. In C. cartwrightianus and C. sativus, rps19 and psbA are located in the flanking region of the LSC/IRa boundary and rpl22 is located in the LSC terminal region close to IRb (S1 Fig). One other intriguing observation is that the ycf1 gene (5420 bp) in C. cartwrightianus and C. sativus is located within the SSC/IRa boundary and expanded upstream and downstream by 4166 bp and 1255 bp, respectively. However, the ycf1 gene of C. istanbulensis is located within the SSC region and separated from the SSC border by 74 bp (Fig 2). Expansion and contraction of IRs in the organelle genome (cpDNA) of most angiosperms have been proposed as evolutionary dynamics parameters/markers for illuminating relationships between some plant taxa [40,41]. IRs are also potential evidence of a duplication event prior to the separation of monocot lineages from basal angiosperms [42]. The absence of IRs in some plant groups, particularly legumes [43] and a decrease of up to 495 bp in Pinus thunbergii Parl. [44] suggest that these IRs are not required for chloroplast function. However, it is also thought that IRs are essential for the constant and stable nature of chloroplast genomes.

PLOS ONE
The gene arrangements in the chloroplast genomes of the highly endangered Crocus istanbulensis Particularly, structural rearrangements such as inversions, IR expansions and gene duplication directly govern the structural organization and size of the chloroplast genome. Although the mechanisms leading to rearrangements in chloroplast genome are poorly known, intramolecular homologous recombination governed by the presence of repeat structures at the boundaries of the rearranged region reportedly plays a role in such structural changes [45,46]. As indicated in Figs 2 and 3, the C. istanbulensis cp genome contains an inversion in the terminal position of the LSC region and a rearrangement of the psbA, rpl22 and rps3 gene order. It is noteworthy that this kind of arrangement has not previously been reported in Iridaceae cpDNAs. These results bring new insights into the evolution of the cp genome in Crocus genera, suggesting a need for further studies to understand how the ecological drivers, morphological traits and physiological functions of C. istanbulensis may relate to such rearrangements.
Recent studies also showed that two chloroplast structural haplotypes (inverted and canonical haplotypes) can occur in most land plants. Long-read sequencing approaches such as PacBio or Oxford Nanopore may be helpful in determining the haplotype structure [47]. Although this study found only inverted haplotypes, third-generation sequencing may reveal the presence of a canonical haplotype in C. istanbulensis.

PLOS ONE
The gene arrangements in the chloroplast genomes of the highly endangered Crocus istanbulensis

Repetitive sequences analysis
SSRs resulting from slipped strand mispairing during DNA replication are usually determined in organelle genomes and have been shown to have significant usage potential in plant population genetics and crop breeding studies [48]. In the current study, the online version of REPuter software was used to analyze forward, palindrome, reverse and complement repeat sequences of the Iridaceae cp genome, with a minimum repeat size of 30 bp and a sequence identity greater than 90%. An average of eight repeats with lengths of nearly 41 bp were observed in Iridaceae species. C. istanbulensis contained four palindrome repeats and one forward repeat (S3A Table) Table). Other Iridaceae species (I. missouriensis, I. gatesii and G. australiensis) seem to have more repeat sequences in terms of both number and size, except for I. sanguinea (S3D-S3G Table). Many repeats shared the same locus in Iridaceae: ycf1, ycf2, accD and

PLOS ONE
The gene arrangements in the chloroplast genomes of the highly endangered Crocus istanbulensis petN-psbM, psaC-ndhE, ndhD-psaC, psbA-rps19, psbM-petN and rps16-trnQ-UUG intergenic spacer (S3 Table). According to previous studies, cp-SSR regions show variable profiles generally without recombination, are uniparentally inherited and effectively haploid, and are used for genetic studies of plant populations [50,51]. Most of the repeat profiles are found in the intergenic spacer of Crocus species in the current study. This situation corroborates previous plant genome studies [52,53]. As for SSR number and motif distribution, SSRs occupied 0.49% and 0.26% of the total cp genome respectively, with an average of 0.39% (Table 3). Regardless of species, mononucleotide repeats were most abundant and accounted over 80% of total repeats, which contained mostly A/T mononucleotide motifs (Table 3, Fig 4). Only a minor fraction consisted of dinucleotide, trinucleotide, and hexanucleotide repeat motifs. Among dinucleotides, the number of repeats ranged from two (I. sanguinea, C. cartwrightianus) to eight (G. australiensis). One trinucleotide repeat (CTT, GAA) was detected in I. sanguinea, I. gatesii, G. australiensis. Tetra-, and pentanucleotides were not found in any Iridaceae, but hexanucleotide repeats were only present in C. cartwrightianus and C. sativus cp genomes (Table 3, Fig 4).

Identification of positive selection genetic signatures in cp coding genes of C. istanbulensis
To gain additional insight into potential changes in selection pressure in the exons of chloroplast coding sequences over the course of evolution of C. istanbulensis, we compared these genes across the six publicly available Iridaceae species. Here, we applied site-specific models with three comparison models (M1a vs. M2a, M7 vs. M8, M8a vs. M8) likelihood ratio test (LRT) (threshold value p < 0.01) in PoSeiDon pipeline (PP > 0:95). Currently, the signature of selection pressure (or evolutionary rate ω) can be detected by comparing the rate of non-synonymous (dN) and synonymous substitutions (dS) in alignment of orthologous sequences. The ratio is often used to assess the strength and direction of natural selection acting on protein-coding genes throughout nuclear and organelle genome [54][55][56]. This approach is generally used to demonstrate whether there are any positive selection pressures in organelle-coding genes. However, this approach does not take possible recombination events into account [25]. Although it is commonly stated that recombination events do not occur in chloroplast genomes, accumulating evidence of recombination events shows that chloroplast genomes do have the potential to alter their genome structure via recombination [50,[57][58][59]. Therefore, we used PoSeiDon pipeline, a new approach that takes recombination events into account

PLOS ONE
The gene arrangements in the chloroplast genomes of the highly endangered Crocus istanbulensis  (Table 4). Caseinolytic protease (CLP) and acetyl-coA carboxylase (ACCase) are two enzymes required for proper plastid function and fatty acid biosynthesis. The CLP complex and ACCase genes encode subunits of plastid-encoded accD and clpP genes, respectively [60][61][62]. Although clpP and accD are generally well conserved, recent findings indicate that the plastid-encoded version of these genes have elevated rates of sequence evolution in multiple independent lineages [54,63,64]. In this study, we found the signatures of intense positive selection acting on plastid-encoded accD and clpP genes, which have effects on leaf longevity and seed yield, and are essential for plant cell viability, respectively [54,65]. Zeng et al. [66] attributed the positive selection in clpP genes to plant acclimation to different physiological conditions and reported that the high degree of positive selection observed in clpP may be important in adapting Rehmannia species to habitats with different light intensities. We also found positive selection on photosystem II (PSII) reaction center protein K (psbK) gene, which encodes one of the components of the core complex of PSII, which functions in both light-harvesting and inducing the oxidation of water to dioxygen [67,68]. Because psbK is directly involved in PSII, the positive selection observed in the psbK gene of various plants such as Echinacanthus Nees. [69], Robinia L. [70], Debregeasia Gaudich [71], Monsteroideae (Araceae) [72] and Garcinia paucinervis Chun & F.C.How [73] are important for plant adaptation to harsh environmental conditions. A significant positive selection signature was also detected in ccsA gene, which  encodes a component of cytochrome c synthase complex for cytochrome c biogenesis [74] and has been reported to play a role in the adaptation of species to environmental conditions [75][76][77]. Interestingly, we also identified 3 genes with positive selection sites (rpoC2, ycf2 and rps12). The rpoC2 gene encodes subunits of plastid-encoded plastid RNA polymerase, responsible for photosynthetic gene expression. In other words, it allows for transcription of photosynthesis-related genes in the chloroplast. These plastid-encoded genes are also considered relatively rapidly evolving regions [78]. The ycf2 gene is one of the largest genes encoding for a putative membrane protein in the chloroplast. There is accumulating evidences suggesting that these two genes may have rapidly evolved in various plant cp genomes and enhance adaptation to diverse environments, possibly as a result of altered transcription [55,76,[79][80][81][82][83]. Apparent positive selection signatures were found in seven genes (accD, rpoC2, psbK, rps12, ccsA, clpP and ycf2) in the C. istanbulensis chloroplast genome. Previous studies indicated that many of these putatively positively selected genes were associated with plastid function, fatty acid biosynthesis, leaf longevity, seed yield, cell viability, adaptation to challenging environmental conditions and photosynthesis. Although the function of the seven positively selected genes in C. istanbulensis remains unknown and requires further experimental validation, we speculate that they might be involved in biological processes including photosynthesis, environmental stress response, and plant development and growth.

Estimating sequence distances between C. istanbulensis specimens
We used Skmer [24] software to infer evolutionary distances between DNA sequences by calculating dissimilarity high-throughput sequencing reads of C. istanbulensis. Skmer, a relatively new approach, uses the minhash Jaccard similarity between sets of k-mers in sequences to estimate average nucleotide divergence among samples. Skmer-like approaches are preferred in genome skimming studies [84][85][86] because they can be applied to unassembled or assembled reads and deal with low sequencing coverage. We processed unassembled fastq files of C. istanbulensis as input assembly-free sequence distance estimates from low coverage genome skimming using Skmer. After generating a reference library and computing all pairwise distances, we queried the unassembled reads of C. istanbulensis against the reference library, producing a list of samples sorted by their distance to the query. The DNA sequence similarity among individuals from C. istanbulensis was found to be high based on k-mer analysis of genome skims ( Fig 5). Fig 5 shows homogeneous the distribution of sequence similarities among C. istanbulensis, indicating that the average nucleotide diversity is low, as expected ( Fig 5A). We compared the unassembled reads of all C. istanbulensis individuals with the whole chloroplast genomes of other Iridaceae species (C. istanbulensis, C. cartwrightianus, C. sativus, I. missouriensis, I. sanguinea, I. gatesii and G. australiensis) using same the approach. As expected, there is a relatively high sequence diversity among Iridaceae species, while a low sequence diversity was noted among genome skim data in C. istanbulensis individuals ( Fig  5B). Crocus species can reproduce by seed as well as vegetatively, spreading rapidly by forming small cormlets, or stolons as in C. thirkeanus K.Koch. and C. kotschyanus K. Koch. Vegetative reproduction usually takes place when the plant is under physiological stress. Stressors such as unfavorable corm depth, injury, and insufficient drainage may trigger cormlet reproduction. There have been few studies on the vegetative propagation of wild Crocus species [87][88][89]. This type of reproduction, which allows the plant to multiply rapidly, ensuring the reproduction and survival of the plant under stress, has a negative effect on genetic diversity. The low nucleotide diversity in the examined individuals may suggests vegetative reproduction in C. istanbulensis.

Conclusions
We characterize the complete chloroplast genome sequence of six C. istanbulensis individuals, which is considered among the most endangered Crocus species in the world. We de novo assembled chloroplast genomes using genome skimming sequencing and focused on comparative analyses with other Iridaceae taxa. In general, the C. istanbulensis cp genome exhibited a pattern similar to other Iridaceae in terms of genome length, gene content and typical quadripartite structure. However, one inversion in the terminal positions of the LSC region and three different gene (psbA, rps3 and rpl22) arrangements that have not been reported previously in Iridaceae were found in C. istanbulensis. To the best of our knowledge, this is the first work to detect a total of seven genes (accD, rpoC2, psbK, rps12, ccsA, clpP and ycf2) under positive selection in Crocus cp genomes. C. istanbulensis is currently known from only one population; however, should new populations be discovered, these findings will serve as comparison material and inform conservation studies. In summary, our results might contribute to further research on population genetics studies, help in conservation efforts for this threatened species and, shed light on the evolutionary history of C. istanbulensis.  Table. BGI-Seq 500 DNA nanoball sequencing and chloroplast genome mapping statistics. All six sequences were produced in this study (SRA accession numbers SRX7512825-