Gene abnormalities, including mutations and fusions, are important determinants in the molecular diagnosis of myeloid neoplasms. The use of bone marrow (BM) smears as a source of DNA and RNA for next-generation sequencing (NGS) enables molecular diagnosis to be done with small amounts of bone marrow and is especially useful for patients without stocked cells, DNA or RNA. The present study aimed to analyze the quality of DNA and RNA derived from smear samples and the utility of NGS for diagnosing myeloid neoplasms. Targeted DNA sequencing using paired BM cells and smears yielded sequencing data of adequate quality for variant calling. The detected variants were analyzed using the bioinformatics approach to detect mutations reliably and increase sensitivity. Noise deriving from variants with extremely low variant allele frequency (VAF) was detected in smear sample data and removed by filtering. Consequently, various driver gene mutations were detected across a wide range of allele frequencies in patients with myeloid neoplasms. Moreover, targeted RNA sequencing successfully detected fusion genes using smear-derived, very low-quality RNA, even in a patient with a normal karyotype. These findings demonstrated that smear samples can be used for clinical molecular diagnosis with adequate noise-reduction methods even if the DNA and RNA quality is inferior.
Citation: Sadato D, Hirama C, Kaiho-Soma A, Yamaguchi A, Kogure H, Takakuwa S, et al. (2021) Archival bone marrow smears are useful in targeted next-generation sequencing for diagnosing myeloid neoplasms. PLoS ONE 16(7): e0255257. https://doi.org/10.1371/journal.pone.0255257
Editor: Francesco Bertolini, European Institute of Oncology, ITALY
Received: March 17, 2021; Accepted: July 5, 2021; Published: July 23, 2021
Copyright: © 2021 Sadato et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This research was supported in part by Clinical Research Fund (R010302001) of Tokyo Metropolitan Government (https://www.metro.tokyo.lg.jp/), and JSPS KAKENHI (Grant Number JP20K07840) of Japan Society for the Promotion of Science (https://www.jsps.go.jp/). All grants were awarded to Y.H. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Gene mutations are essential prognostic factors in diagnosing and predicting the effect of therapy on myeloid neoplasms [1, 2]. Next-generation sequencing (NGS) is normally performed using genomic DNA from fresh or stocked frozen bone marrow (BM) cells (BMCs) [3, 4].
However, adequate quantities of BMCs cannot be obtained in some patients. In such cases, laboratory tests, including karyotyping and flow cytometry in particular, are prioritized; therefore, gene abnormalities cannot be analyzed by NGS. However, BM smears have high priority for use in cytomorphological diagnosis, and because BM smear slides are stored after use, they are easily available, obviating the need for additional BMCs and DNA and RNA samples. While previous reports demonstrated that BM smear samples can used as a DNA source for PCR or Sanger sequencing, the quality of the results was not closely examined, especially with respect to their potential application to NGS. Using BM smears as a source of DNA and RNA for NGS would enable molecular diagnosis with small amounts of BM, even in patients without stocked cells, DNA or RNA. Previous studies examined the utility of slides containing biopsy samples as a source of DNA and RNA for target sequencing of lung adenocarcinoma  and thyroid cancer  and were able to provide profiles of gene mutations, including driver and drug-resistance mutations, suggesting that preserved or pretest samples can be used for NGS. However, in these cases, the samples were prepared using formalin-fixed, paraffin-embedded (FFPE) slides that allow preservation for extended periods of time unlike BM aspirate smears made by drying and alcohol-based fixation. Recently, target sequencing of genes associated with myeloid malignancies was tested using archived BM smears derived from a patient with acute myeloid leukemia (AML) . While the analysis showed that smear slides for NGS can be used to create gene mutation profiles, it is still unclear whether they can provide insight into other myeloid malignancies, information about the deterioration of data, including gene-expression noise in smear samples, and the utility of RNA derived from this source. The present study analyzed the quality of DNA and RNA in BM smear samples and assessed their utility in NGS analysis by analyzing the character of the variants detected.
Materials and methods
All the procedures performed in the present study involving human participants were approved by the ethics committee of Tokyo Metropolitan Komagome Hospital, and all the patients provided written informed consent for participation.
Patients and BM samples
Smear slides were prepared from diagnostic BM aspirates from which mononuclear cells were isolated and were stored at room temperature in a dark place. Genomic DNA from the mononuclear cells was extracted using Gentra Puregene Blood Kit (Qiagen, Hilden, Germany) in accordance with the manufacturer’s instructions. Cells on the smears were harvested by scraping and using ATL buffer (Qiagen), and the DNA was purified using QIAamp DNA Mini Kit (Qiagen) in accordance with the manufacturer’s instructions. RNA was extracted using TRIzol RNA Isolation Reagents (Thermo Fisher Scientific, Waltham, MA, USA). The integrity of the extracted RNA was determined using the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and the RNA integrity number (RIN), an algorithm for assigning an integrity value to RNA , was calculated using 4150 TapeStation (Agilent Technologies). The RNase P gene copy number in the genomic DNA was measured using TaqMan RNase P Detection Reagents Kit (Thermo Fisher Scientific) in accordance with the manufacturer’s instructions.
Targeted sequencing was performed using AmpliSeq for Illumina Myeloid Panel (Illumina, San Diego, CA, USA) and a custom-designed panel to detect mutations in 68 genes and fusions of 29 driver genes (S1 Table). As a template, 10 ng DNA (for mutations) or cDNA synthesized from 10 ng RNA (for fusions) was used to amplify the target genes. AmpliSeq Library Plus for Illumina (Illumina) was used to generate libraries. The size of the fragment libraries was determined using the 2100 Bioanalyzer. The libraries were analyzed using the MiniSeq High Output Reagent Kit (300 -cycles) with the MiniSeq (Illumina) platform in accordance with the manufacturer’s instructions.
Detection of variants and fusion genes
FASTQ files were generated, then cleaned with Trimmomatic , and the results were aligned to the human reference genome, hg19, using Burrows-Wheeler Alignment (BWA) . Mapped reads and their coverages were analyzed using Qualimap . Gene variants were detected using HaplotypeCaller (for high frequency variants) and Mutect2 (for low frequency variants) included in GATK . Gene variants obtained from HaplotypeCaller were filtered with the parameters of quality/depth, mapping quality, and strand bias to exclude false-positive variants as previously described . Variants were detected using the tumor-only mode or the panel of normal mode on Mutect2. The variants detected by Mutect2 were filtered with GATK FilterMutectCalls. ITDseek  and Pindel  were used to detect FLT3-ITD mutations. Variants were annotated with information from the Refseq, 1000G and Exac databases in Illumina VariantStudio 3.0 software (Illumina). Variants with a prevalence greater than 1% in a given regional population were excluded. Finally, mutations in hematological malignancies were manually analyzed. The FASTQ files cleaned with Trimmomatic were analyzed with JAFFA  and STAR-Fusion with FusionInspector  to detect fusion genes.
Results and discussion
Smears served as DNA sources for targeted DNA sequencing
Five paired samples of BMCs and BM smears were compared in terms of the quality of extracted DNA (Table 1).
The dsDNA/total DNA ratio in each sample indicating the degree of DNA decay was significantly lower (P = 0.0079) in the smear samples than in the BMCs (Fig 1A). At the same time, the copy number of the RNase P gene in 1 ng DNA was also significantly lower (P = 0.0079) in the smear samples (Fig 1B).
(A) dsDNA/total DNA ratio of the smears and BMC samples. (B) Copy number of the RNase P gene detected in smear and BMC samples.
Although the DNA quality was lower in the smears than in the BMCs, it was sufficient to generate NGS libraries (S1A, S1B Fig). The libraries were analyzed, and the reads were mapped to a human reference genome to evaluate the quality of the smear-derived sequence data. There was no difference between the smears and BMCs in terms of the total reads of the BAM file (Fig 2A, P = 0.2220), coverage (Fig 2B, P = 1.0000), and uniformities (Fig 2C, P = 0.8571). Furthermore, each amplicon was equally covered with synthesized reads (Fig 2D). These results suggested that the libraries of targeted sequences synthesized from smear-derived DNA are comparable with those synthesized from BMC-derived DNA.
The following values were compared between the smear and BMC samples: (A) total mapped reads, (B) median coverage depth, (C) uniformity (more than 20% median coverage), and (D) normalized coverage for each amplicon region.
Next, using these mapped sequences, variants were detected in paired samples using HaplotypeCaller (for germline or large clone variants) and Mutect2 (for somatic variants). Over 95% of variants detected via HaplotypeCaller were shared variants, while smear- and BMC-unique variants (3.08%) were suspected of being sequencing errors (Fig 3A). To investigate the characteristics of the variants, they were plotted according to their variant allele frequency (VAF) and read depth. Smear- and BMC-unique variants exhibited low read depth (Fig 3B). After these variants were filtered out, these variants decreased to 1.78%, and high VAF mutations were successfully detected in both the smear and BMC samples (Fig 3C and 3D).
The combined results from five paired samples of smears and bone marrow cells (BMCs) are shown. (A) Pie chart of the smear-unique variants, BMC-unique variants, and shared variants. (B) Distribution of the detected variants. VAF, variant allele frequency. The smear-unique, BMC-unique, and shared variants are color-coded. (C) Pie chart of the variants after filtering. (D) Distribution of the filtered variants.
However, the smear-unique variants detected by Mutect2 comprised two-thirds of the whole and needed to be filtered out (Fig 4A and 4B). The distributions of the large VAF variants showed two peaks at 100% and 50% VAF comprising chiefly SNPs while variants with a VAF of 25% or lower mostly consisted of small clusters of chiefly somatic variants. Smear-unique variants appeared to accumulate in very low-VAF regions, suggesting that they were noise (Fig 4B). FilterMutectCalls filtering was able to reduce this noise mainly by excluding low read depth noise; however, much smear-unique noise with a low VAF remained (Fig 4C and 4D).
Combined results from five paired samples of smears and bone marrow cells (BMCs) are shown. (A) Pie chart of the smear-unique, BMC-unique, and shared variants. (B) Distribution of the detected variants. VAF, variant allele frequency. The smear-unique, BMC-unique, and shared variants are color-coded. (C) Pie chart of the variants after filtering with FilterMutectCalls. (D) Distribution of the filtered variants.
To remove the artificial noises and detect variants sensitively using Mutect2, a panel of normals (PON) is recommended . To apply this method in the present study, a PON was constructed by merging 13 BMC-samples from patients without myeloid malignancies, and the detected variants were plotted based on their VAF and read depth (Fig 5A). Variants were color-coded to indicate whether or not they were a SNP. Most variants with a suspected SNP accumulated at the 50% and 100% VAF peaks whereas the others were distributed mainly in the low-VAF regions. Most, though not all, of the noise was removed by FilterMutectCalls, suggesting that the remaining noise may have been an artifact of the assay (Fig 5B). Using the PON, the remaining noise was removed by subtraction, which effectively reduced the noise where the VAF was around 10%. However, smear-specific noise remained in areas with VAF <5% (Fig 5C). Since the smear-derived mutations accumulated in the low-VAF regions, VAF filtering was considered effective. To set the VAF threshold for eliminating noise, the VAF distributions of the variants left after subtraction were plotted (Fig 5D). Large amounts of the smear-unique variants accumulated in the low-VAF regions, especially where the VAF <2.5%, suggesting that this value can be used as the threshold value (Fig 5E).
Distribution of all the variants (A) and the filtered variants (B) detected by PON. Variants with SNP and the other variants were color-coded. (C) Distribution of the subtracted variants with a low VAF. (D) VAF plot of shared, smear-unique, and BMC-unique variants after subtraction. (E) VAF plot of subtracted variants with VAF <5%. A boxplot of smear-unique variants is also shown.
PON subtraction and VAF filtering, in addition to FilterMutectCalls filtering, effectively reduced the rate of smear- and BMC-unique variants (Fig 6A) and improved the distribution of the remaining variants (Fig 6B). Furthermore, the shared variants showed almost the same VAF values for the smear and BMC samples (Fig 6C).
(A) Pie chart of the smear-unique, BMC-unique, and shared variants after filtering. (B) Distribution of the filtered variants. VAF, variant allele frequency. The smear-unique, BMC-unique, and shared variants were color-coded. (C) VAF plot of filtered variants in the BMC (X axis) and smear (Y axis) samples.
These results suggested that BM smears can be used for targeted DNA sequencing even if they are stored at room temperature under normal laboratory conditions. In variant detection, a very little noise was found while using HaplotypeCaller, which is able to detect germline mutations and large clone size mutations accurately without extra filtering (Fig 3). On the other hand, Mutect2, which has high sensitivity for low VAF variants (e.g., somatic mutations), required modified filtering because many noises with low VAF, which were unable to be removed completely by default filtering, were detected in the smear samples (Fig 4).
Based on our results, we performed additional targeted DNA sequencing using smear samples from patients with myeloid neoplasms, mainly acute myeloid leukemia (AML) and myelodysplastic syndromes (MDS). Twenty-one samples preserved for 0.1–11 years were analyzed using the established method described above, then filtered (Fig 7A and 7B). Of the filtered variants, 8.53% were in exons or splice sites and had various VAFs (Fig 7C and 7D). The effect of the duration between the sample preparation stage and the assessment of DNA quality and variants was further analyzed to determine the utility of the archival smears. The quality of the extracted DNA was clearly unaffected by either the duration (Fig 8A) or staining (Fig 8B, P = 0.2773), suggesting that DNA can be extracted from various types of smear. However, regarding the results based on old smear samples, filtering for variants using either FilterMutectCalls or a 2.5% or lower VAF detection level showed a tendency towards increasing variants. On the other hand, no significant difference was found in the quantity of variants after filtering (Fig 8C). To identify the effect of staining smear samples on variant calling results, detected and filtered variants were compared after excluding samples from patients #106, #113, #189, and #205, which had an abundance of noise. There was no significant difference in the amount of variant filtered out with FilterMutectCalls (Fig 8D, P = 0.4623) or variants with VAF <2.5% (Fig 8E, P = 0.9044). The detected variants were curated. Table 2 shows the pathogenic gene mutations, which were detected in 18 patients, with the initial genomic information obtained from nine of 11 patients without any karyotype abnormalities (eight normal, and three not available).
(A) Bar plot of the filtering effect. Variants with a VAF >25% are shown separately from those with VAF <25%. The subtracted variants are indicated in gray, and the remaining variants are indicated in purple. (B) Distribution of the subtracted and remaining variants. (C) Pie chart of the filtered variants. Known inherited germline variants (SNP), variants detected in exons and splice sites (exon+splice), and variants detected in introns (intron) are shown. (D) Distribution of the filtered variants.
(A) The dsDNA/total DNA ratio was plotted chronologically starting from smear preparation. (B) The dsDNA/total DNA ratio was compared between MGG-stained and unstained smear samples. (C) Bar plot of the amount of filtered and remaining variants. Filtered variants are shown separately by the filtering methods used (FilterMutectCall: orange; VAF 2.5 or less: blue) in the upper panel. Variants after filtering are shown in the lower panel. The quantity of variants removed by FilterMutectCalls (D) and the quantity removed at VAF 2.5 or a lower level of detection (E) were compared between the MGG-stained and unstained smear samples.
Mutations determining the disease subtype (NPM1 and CEBPA) and germline mutations (DDX41 and RUNX1) were particularly useful for a definitive diagnosis. Although target sequencing of the CEBPA gene is reportedly difficult , our assay was able to detect CEBPA mutations successfully in the smear samples. Moreover, prognostic factors, such as TP53, FLT3, and ASXL1, were also useful for determining indications for stem-cell transplantation. These findings demonstrated that archived smear samples can be used as templates for targeted DNA sequencing for molecular diagnosis.
Quality of RNA in smears and detection of fusion genes
RNA sequencing generally requires intact, high-quality RNA. However, targeted RNA sequencing can be performed if the desired fragments are amplified. In the present study, RNA was extracted from 15 smear samples and their fragmentation patterns were analyzed. Each RNA sample was sufficient for NGS analysis but displayed a very low fragment size (S2A Fig). The RIN value was also low independently of the duration from smear preparation to assessment, indicating that the RNA rapidly degraded with the start of smear preparation (Fig 9A). Nevertheless, reverse transcription was able to be performed even with the fragmented RNA, and libraries for targeted sequencing were fully synthesized (S2B Fig). Adequately-sized FASTQ files were generated through targeted RNA sequencing, and the obtained reads were able to be mapped to hg19. Among the detected fusions, highly expressed fusion genes identified using two detectors, JAFFA and STAR-Fusion, were considered as positive (Fig 9B). Fusion genes detected in five patients (#097 and #240 with RUNX1-RUNX1T1, #112 and #220 with CBFB-MYH11, and #238 with ETV6-CHIC2) were identical with their karyotypes, indicating that RNA from smears can be used to detect fusion genes via NGS (Table 3). Interestingly, unexpected fusion genes were detected through targeted RNA sequencing in two patients without translocation or inversion. The KMT2A-MLLT10 fusion gene was identified in Patient #152 without the t(10;11) karyotype and confirmed by PCR (S3 Fig). Moreover, the NUP214-ABL fusion gene, derived from t(9;9)(q34;q34) and difficult to detect by karyotypic analysis, was identified in Patient #231 and also confirmed by PCR (S4 Fig). These results underscored the utility of smear samples for diagnostic targeted RNA sequencing.
(A) RIN value of each sample and elapsed years. (B) Reads per million mapped reads (RPM) of each sample were plotted. Highly expressed fusion genes are shown.
The present results indicated that both DNA and RNA from smear samples can be used as templates for targeted NGS independently of the duration of preservation and staining. The variants detected in smear-derived samples were the same as those in BMC samples. Thus, pathogenic gene mutations and fusion genes can be detected from smear samples and can be especially useful for patients without karyotype abnormalities. Despite the generally inferior quality of their DNA and RNA, smear samples are useful for clinical molecular diagnosis as long as adequate noise-reduction methods are applied.
S1 Table. Target genes for targeted sequencing.
S1 Fig. Fragment analysis of synthesized libraries.
The fragment size (X axis) and fluorescent unit (Y axis) of synthesized libraries using smear-derived DNA (A) and bone marrow cell (BMC)-derived DNA (B) are shown. Yellow-highlighted regions indicate the predicted library size.
S2 Fig. Fragment analysis of RNA extracted from smear samples and the synthesized libraries.
The fragment size (X axis) and fluorescent units (Y axis) of RNA (A) and the synthesized libraries (B) are shown. Yellow-highlighted regions indicate the predicted library size.
S3 Fig. Detection of KMT2A-MLLT10 fusion.
(A) The fusion sequence detected by targeted RNA sequencing is shown. Arrows indicate the primers for amplifying the target region. (B) A fusion gene confirmed by RT-PCR is shown. The following parameters were used with the PrimeSTAR GXL DNA Polymerase (TAKARA): 98°C for 3 min, followed by 35 cycles at 98°C for 10 s, 70°C for 15 s, and 68°C for 30 s. The sample from Patient#176 was used as a negative control.
S4 Fig. Detection of NUP214-ABL1 fusion.
(A) The fusion sequence detected by targeted RNA sequencing is shown. Arrows indicate the primers used to amplify the target region. (B) A fusion gene confirmed by RT-PCR is shown. The following parameters were used with the PrimeSTAR GXL DNA Polymerase (TAKARA): 98°C for 3 min, followed by 35 cycles at 98°C for 10 s, 75°C for 15 s, and 68°C for 30 s. The sample from Patient#176 was used as a negative control.
- 1. Ogawa S. Genetics of MDS. Blood. 2019;133(10):1049–59. pmid:30670442
- 2. Bullinger L, Dohner K, Dohner H. Genomics of Acute Myeloid Leukemia Diagnosis and Pathways. J Clin Oncol. 2017;35(9):934–46. pmid:28297624
- 3. Ley TJ, Miller C, Ding L, Raphael BJ, Mungall AJ, Robertson A, et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059–74. pmid:23634996
- 4. Yoshizato T, Nannya Y, Atsuta Y, Shiozawa Y, Iijima-Yamashita Y, Yoshida K, et al. Genetic abnormalities in myelodysplasia and secondary acute myeloid leukemia: impact on outcome of stem cell transplantation. Blood. 2017;129(17):2347–58. pmid:28223278
- 5. Treece AL, Montgomery ND, Patel NM, Civalier CJ, Dodd LG, Gulley ML, et al. FNA smears as a potential source of DNA for targeted next-generation sequencing of lung adenocarcinomas. Cancer cytopathology. 2016;124(6):406–14. pmid:26882436
- 6. Ablordeppey KK, Timmaraju VA, Song-Yang JW, Yaqoob S, Narick C, Mireskandari A, et al. Development and Analytical Validation of an Expanded Mutation Detection Panel for Next-Generation Sequencing of Thyroid Nodule Aspirates. The Journal of molecular diagnostics: JMD. 2020;22(3):355–67. pmid:31866571
- 7. Al Hinai ASA, Grob T, Kavelaars FG, Rijken M, Zeilemaker A, Erpelinck-Verschueren CAJ, et al. Archived bone marrow smears are an excellent source for NGS-based mutation detection in acute myeloid leukemia. Leukemia. 2020;34(8):2220–4. pmid:32060404
- 8. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, et al. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol. 2006;7:3. pmid:16448564
- 9. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. pmid:24695404
- 10. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. pmid:19451168
- 11. Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2015;32(2):292–4. pmid:26428292
- 12. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. pmid:20644199
- 13. Najima Y, Sadato D, Harada Y, Oboki K, Hirama C, Toya T, et al. Prognostic impact of TP53 mutation, monosomal karyotype, and prior myeloid disorder in nonremission acute myeloid leukemia at allo-HSCT. Bone Marrow Transplant. 2020.
- 14. Au CH, Wa A, Ho DN, Chan TL, Ma ES. Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms. Diagn Pathol. 2016;11:11. pmid:26796102
- 15. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865–71. pmid:19561018
- 16. Davidson NM, Majewski IJ, Oshlack A. JAFFA: High sensitivity transcriptome-focused fusion gene detection. Genome Med. 2015;7(1):43. pmid:26019724
- 17. Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 2019;20(1):213. pmid:31639029
- 18. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31(3):213–9. pmid:23396013
- 19. Ng CWS, Kosmo B, Lee PL, Lee CK, Guo J, Chen Z, et al. CEBPA mutational analysis in acute myeloid leukaemia by a laboratory-developed next-generation sequencing assay. J Clin Pathol. 2018;71(6):522–31. pmid:29180507