The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies

doi:10.1371/journal.pone.0180904

Table 1.

Summary of read data, including average coverage when reads are mapped to native or heterologous genomes.

More »

Expand

Fig 1.

Data processing pipeline.

Orthology is identified between heterologous strains and reads are aligned to both reference genomes. Using the orthology mapping information, extrapolated read alignment counts are compiled such that counts can be compared for each read set as aligned to each reference genome.

More »

Expand

Fig 2.

Structural differences between reference strains for E. coli (left) and V. Vulnificus (right).

A 2.64% difference is observed between references in E. coli and a 3.61% difference is observed in V. Vulnificus, accounting for indels and polymorphic sites.

More »

Expand

Fig 3.

Log-fold changes of read counts for all E. coli strain K12 genes as aligned to both native and heterologous references.

The same read sets as aligned to both reference genomes reveals the extent of bias present due to differences in the choice of reference genome.

More »

Expand

Table 2.

Summary of differential expression false positives detected in non-native mapping cases.

More »

Expand

Fig 4.

Operon structure containing the gene, cusC, which is responsible for copper resistance in E. coli.

21 SNPs are present across 1373 bases between the K12 and IAI1 reference strains.

More »

Expand

Fig 5.

Read alignment for cusC operon for IAI1 batch condition, replicates 1 and 2.

The native IAI1 genome (top) shows a continuous operon with sparsely aligned reads, while the heterologous K12 genome (bottom) shows an insertion after the cusC gene that is highly expressed (outlined in red).

More »

Expand

Fig 6.

Reads in the indel region slightly overlapping cusC, with particularly poor alignment in the overlapping area.

The overlapping region substantially biases expression counts for this gene in the non-native reference genome.

More »

Expand

Fig 7.

hisD reads aligned to native genome (top) and heterologous genome (bottom).

Substantial read loss due to high-density SNP clusters can be seen in the non-native reference.

More »

Expand

Fig 8.

Simulation of the relationship between SNPs and read length with each gene containing 35 randomly distributed SNPs and simulated reads of length 50, 100, and 150.

Log-fold changes in read alignment for native and heterologous genomes show that shorter reads perform poorly when aligned to heterologous genomes, while longer reads are more resilient.

More »

Expand

Fig 9.

Log-fold differences in native vs heterologous alignment for different read lengths.

Shorter reads handle mapping more poorly and are subject to significant bias in non-native alignments, while bias is minimalized with increasing read length.

More »

Expand

Fig 10.

Simulation of the relationship between SNPs and read length shown using bowtie2’s—Very-sensitive alignment settings.

Modest reduction in bias was observed in these simulations with regard to standard alignment parameters.

More »

Expand

Fig 11.

Simulation of the relationship between SNPs and read length shown at 600x coverage.

Increased depth has little effect on bias, as relative read loss remains consistent in non-native alignments.

More »

Expand

Fig 12.

Simulation of the reads with ambiguously mapping inserts.

Log-fold change in read alignment for native and heterologous genomes shows that less bias is present when Bowtie2 determines alignment for reads with ambiguous mapping positions.

More »

Expand