gUMI-BEAR, a modular, unsupervised population barcoding method to track variants and evolution at high resolution

doi:10.1371/journal.pone.0286696

Fig 1.

Schematic illustration gUMI-BEAR.

An illustration demonstrating the method in its entirety, from library construction through experimental design, ending in the preparation of libraries for deep-sequencing. (a, b) An initial cellular population is barcoded by inserting unique 24 bp sequences into its genomes using CRISPR/Cas9 (centre top). (c) Following recovery from transformation, the desired number of live cells are sorted from the population, (d) are allowed to grow, and (e) the resulting population is divided into aliquots containing equal numbers of each barcoded lineage. The aliquots are subjected to the chosen experiment (centre middle), after which (f) the cells undergo DNA extraction. (g) A two-step polymerase chain reaction (PCR) targeting the barcode region (centre bottom) enables amplification of each lineage-associated variant (top) and subsequent sequencing of the barcode library (bottom).

More »

Expand

Fig 2.

Construction of the barcoded library.

(a) Schematic illustration of the donor DNA construction process (b) The number of colonies formed following transformation with donor DNA with (HDR+CRISPR) or without (HDR) a pCAS vector carrying the CRISPR/Cas9 machinery for five biological replicates of each treatment. Utilisation of CRISPR/Cas9 increased the number of colonies 20-fold (t-test between distributions, p = 3.89e-8). (c) Percentage of dead cells in a culture (left y-axis) as a function of time following transformation with (circles) and without (squares) donor DNA. Samples were taken at several time points to check membrane integrity using Propidium Iodide staining during the 40-h following transformation for quantification of the percentage of dead cells. The right y-axis presents optical density (OD; orange line) measurements for the culture showing that cells start to grow after 24 h (d) Four replicates of a library, A–D, were deep-sequenced to reveal their populations’ barcode composition and the log-transformed frequencies of the lineages are plotted against each other. Pearson correlations were computed between all replicates and the Pearson coefficient, r, is presented at the bottom of each comparison. Plots on the diagonal present the log-transformed frequency distribution for each replicate (***p < 0.0001). (e) Bar plot of expected and observed lineage frequencies for two control populations, Mock1 (upper panel) and Mock2 (lower panel). Eight lineages were isolated from single colonies, and their barcodes were identified using Sanger sequencing. Their DNA was extracted and mixed in known ratios to create the Mock1 and Mock2 control populations. Each population was sequenced on two separate occasions, and the results underwent the same analysis pipeline to reveal its population composition. Each colour represents a single lineage according to the legend to the right of the graph.

More »

Expand

Fig 3.

Applying the gUMI-BEAR method to track evolutionary dynamics.

(a, b) Lineage trajectories throughout a 44-day experiment in which temperatures were altered (vertical dotted lines) to induce fitness fluctuations in the population. (a) Each lineage is represented by a line. Colours were assigned to each lineage based on K-means clustering performed on their trajectories throughout the experiment (as seen in the key). (b) Muller plot where each lineage is represented by a different colour, and the width of the coloured region at each timepoint is proportional to the lineage’s relative frequency. (c, d) Rarefaction curves for the number of lineages (c) and lineage distributions (d) obtained by sampling an increasing number of reads at five time points (depicted by the different colours as defined in the key). (d) The accuracy of the distribution is quantified by the Wasserstein distance between the distribution obtained with a reduced number of reads and the original distribution (based on 100% of the reads). The inset shows a ridge plot of the frequency distribution of barcodes in the population quantified by sampling an increasing number of reads (y-axis, percentage of population sampled) from Day 1 (blue) and Day 44 (red).

More »

Expand

Fig 4.

Applying gUMI-BEAR to track gene variants.

(a) Heatmap showing entropy values (4 upper bands, as per the colour bar) for mutations in each nucleotide in the *Hsp82 variants for various steps in the library construction process. For each position, mutational proportions were normalised to sequencing depth, and Shannon’s entropy was calculated (dark to bright represents low to high entropy). From top to bottom: The Genemorph panel refers to Hsp82 variants created by insertion of random mutations using PCR with the Genemorph II kit. The Donor panel refers to the final donor molecule before transformation (mean entropy of three replicates). The panels labelled Start and End show the population at the start of the experiment, where cells have been transformed with the donor molecules and selected for successful transformants (mean of three replicates), and at the end of the experiment, respectively. The Protein panel shows mutational occurrence in the translated protein obtained by long-read sequencing of full-length variants (mutations are identified in white). The Conservation panel shows the conservation score for each position calculated using the ConSurf¹⁸ server (dark for conserved and bright for variable regions). The insets show two zoomed-in regions exhibiting different patterns where both conserved and variable regions can have non-synonymous mutations. (b) Donor construction process as described in the Materials and Methods section. GenemorphF and GenemorphR are primers used to amplify the native Hsp82 from the genome; LHA and RHA stand for Left/Right Homologous Arm and are added to enable correct transformation to the chosen locus; Gumy2Hyg is a forward primer containing all elements that are added to the donor molecule including the illumina read1 sequencing binding site (Rd1), a five non-guanine nucleotide sequence (5xH), the genomic unique molecular identifier (gUMI) and the linker sequence which is used for later preparation of the barcode molecule for sequencing. Briefly, Hsp82 variants were produced by PCR amplification of the gene from the S. cerevisiae BY4741 strain using the Genmorph II kit. These were then conjugated to the gUMI-box, with LHA and RHA complementary to the Hsp82 gene, which allowed them to replace the wild-type in the CRISPR/Cas9 reaction (c) The Hsp82 protein dimer (shown in green and purple). All mutations found by long-read sequencing are shown as red and cyan balls in the Hsp82 dimer. The image was produced with the Protein Homology/Analogy Recognition Engine (Phyre2). (d) Changes in relative lineage abundances within the yeast population over time. Each line represents the mean relative abundance of a lineage in the four replicates as a function of time, with error bars showing the standard deviation between replicates. Red lines mark the seven lineages whose Hsp82 variant was amplified using the gUMI sequence as a reverse primer and then Sanger-sequenced. (e) A scheme presenting Sanger sequencing using five forward generic primers (black arrows) distributed along the Hsp82 ORF with a specific gUMI reverse complementary primer that was used to obtain a constructed full-length sequence of the variant (Red). The constructed variants are compared to the full-length sequence of the variant achieved by long-read sequencing (Loop Genomics) in green (Identified SNPs are indicated by the white lines).

More »

Expand