Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

tomoseqr: A Bioconductor package for spatial reconstruction and visualization of 3D gene expression patterns based on RNA tomography

  • Ryosuke Matsuzawa,

    Roles Methodology, Software, Writing – original draft

    Affiliations Master’s Program in Medical Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan, Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan

  • Daichi Kawahara,

    Roles Data curation

    Affiliation Department of Chemistry and Biological Science, College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Kanagawa Japan

  • Makoto Kashima,

    Roles Conceptualization, Data curation, Funding acquisition, Resources, Writing – original draft

    Affiliations Department of Chemistry and Biological Science, College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Kanagawa Japan, Department of Biomolecular Science, Faculty of Science, Toho University, Funabashi, Chiba, Japan

  • Hiromi Hirata,

    Roles Data curation, Resources

    Affiliation Department of Chemistry and Biological Science, College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Kanagawa Japan

  • Haruka Ozaki

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing

    haruka.ozaki@md.tsukuba.ac.jp

    Affiliation Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan

Abstract

RNA tomography computationally reconstructs 3D spatial gene expression patterns genome-widely from 1D tomo-seq data, generated by RNA sequencing of cryosection samples along three orthogonal axes. We developed tomoseqr, an R package designed for RNA tomography analysis of tomo-seq data, to reconstruct and visualize 3D gene expression patterns through user-friendly graphical interfaces. We show the effectiveness of tomoseqr using simulated and real tomo-seq data, validating its utility for researchers. R package tomoseqr is available on Bioconductor (https://doi.org/doi:10.18129/B9.bioc.tomoseqr) and GitHub (https://github.com/bioinfo-tsukuba/tomoseqr).

Introduction

Understanding the spatial expression patterns of genes is pivotal for unraveling developmental processes and deciphering gene functions. RNA tomography is a computational method to reconstruct three-dimensional (3D) spatial patterns of gene expression in a genome-wide manner from one-dimensional (1D) tomo-seq data [1]. Tomo-seq data are generated by RNA sequencing of cryosections of embryos or tissue samples along three orthogonal axes (e.g., anteroposterior, dorsoventral, and left-right axes). Tomo-seq has been applied to the whole body [2], embryo [1, 3, 4], and organs [5, 6] of various animals, including humans. By utilizing tomo-seq data, RNA tomography provides 3D spatial expression patterns of myriad genes simultaneously across diverse biological and clinical samples.

Several tools were developed to analyze tomo-seq data (Table 1). Most of these tools are limited to reconstructing 1D or 2D spatial gene expression patterns and do not provide support for the 3D reconstruction of gene expression patterns using RNA tomography (TomoQC [7]; tomographer [8]; tomoda [9]). Furthermore, despite Junker et al.’s pioneering work on RNA tomography and their 3D expression pattern reconstructions, their visualization was constrained to continuous 2D cross-sections, and no publicly available tools enabled comprehensive 3D visualization for RNA tomography.

Here, we developed the R package tomoseqr, which is designed to reconstruct 3D spatial gene expression patterns from 1D tomo-seq data along three orthogonal axes using RNA tomography (Fig 1). It also includes an interactive graphical user interface (GUI) for easy visualization of the 3D patterns. We tested the capabilities of tomoseqr with simulated and real tomo-seq data from zebrafish and planarian samples, showing that tomoseqr can efficiently reconstruct and visualize the 3D spatial distribution of gene expression.

thumbnail
Fig 1. Overview of tomoseqr.

Based on 1D tomo-seq data along three mutually orthogonal axes and a mask data, tomoseqr performs RNA tomography using iterative proportional fitting and reconstructs a 3D expression pattern of each gene. A GUI called masker helps users to design and edit a mask data. Another GUI Image viewer visualizes the reconstructed gene expression patterns in 2D or 3D view.

https://doi.org/10.1371/journal.pone.0311296.g001

Materials and methods

Overview of tomoseqr

tomoseqr consists of three functionalities for RNA tomography (Fig 1). First, tomoseqr provides masker, the graphical user interface (GUI) to create a mask data (see below). Second, tomoseqr applies iterative proportional fitting to 1D tomo-seq data along three mutually orthogonal axes and the mask data, resulting in reconstructing the 3D expression pattern of a gene. Third, tomoseqr visualizes the expression patterns in 2D or 3D view using the graphical user interface (GUI) Image viewer.

Reconstruction algorithm for RNA tomography

The iterative proportional fitting (IPF) is a statistical method to adjust the elements of a multidimensional matrix to match specific marginal totals for each axis while preserving the matrix’s overall structure [10]. tomoseqr employs IPF, which was also used in the original study of RNA tomography [1], to reconstruct the 3D pattern of gene expression from 1D tomo-seq data along three mutually orthogonal axes (usually corresponding to the anterior-posterior, ventral-dorsal, and left-right axes). Briefly, IPF iteratively (1) fits the marginal distribution of the tentative reconstructed 3D expression distribution of a gene to its 1D distribution in 1D tomo-seq data along each axis and (2) updates the reconstructed 3D gene expression distribution.

Problem definition.

We assume the preprocessed tomo-seq data derived from lx, ly, and lz cryosections along x-, y-, and z-axes, respectively. Let Xg, Yg, and Zg be lx-, ly-, and lz-length vectors representing the expression level of a gene g in the preprocessed tomo-seq data along x-, y-, and z-axes, respectively.

Let be a lx × ly × lz array that holds the reconstruction result for a gene g, where t is the number of updates. The element (1 ≤ ilx, 1 ≤ jly, 1 ≤ klz) represents the reconstructed gene expression level of a voxel at a position (i, j, k).

Our goal is to reconstruct 3D expression patterns based on Xg, Yg, and Zg for each gene g. The loss function of reconstruction error is described below.

Mask data.

Mask data represents the region where the sample material is located across the lx × ly × lz voxels in . Let a mask data M be a lx × ly × lz array whose element is 0 or 1. The value of each element of the mask Mi,j, k is 1 if the element corresponds to the coordinate at which the sample material is located; otherwise 0.

We note that the use of mask data is essential for accurately reconstructing the 3D distribution of gene expression using the IPF algorithm. Specifically, the mask defines the voxels where gene expression is reasonably expected (or not expected) in the original sample, based on prior knowledge or assumptions about the sample’s morphology. Without a mask, the IPF algorithm would distribute the 1D gene expression data (i.e., the marginal distributions along each axis) across the entire 3D space in a manner that may not be reasonable. The mask ensures that the reconstructed 3D distribution aligns with the expected regions of gene expression, thereby improving reconstruction accuracy. In other words, by predefining the allocation of expression to specific areas, the mask helps reduce artifacts that could arise during the reconstruction process.

IPF algorithm.

  1. Use mask data as the initial value of the reconstruction array at t = 0. This prevents the assignment of expression levels to voxels without samples, thereby reducing artifacts.
  2. Calculate using and tomo-seq data Xg along the x axis. Each element of is calculated as follows: (1)
    This operation makes equal to Xg(i).
  3. Calculate using and tomo-seq data Yg along the y axis. Each element of is calculated as follows: (2)
    This operation makes equal to Yg(j).
  4. Calculate using and tomo-seq data Zg along the z axis. Each element of is calculated as follows: (3)
    This operation makes equal to Zg(k).
  5. Repeat steps 2 through 4 for a predetermined number of times m (m = 100 by default).
  6. The loss function after m sets of iterations (with t = 3m updates completed) is as follows: (4) (5) (6)

Preprocessing of tomo-seq data

Before applying IPF, tomoseqr preprocess 1D tomo-seq data.

Let n be the number of genes and lx, ly, and lz be the number of cryosections along x-, y-, and z- axes, respectively. Let X be a n × lx matrix that represents 1D tomo-seq data for x-axis. Let Y be a n × ly matrix that represents 1D tomo-seq data for y-axis. Let Z be a n × lz matrix that represents 1D tomo-seq data for z-axis.

Let M be the mask array, which is an lx × ly × lz array whose element is 0 or 1. The value of each element of the mask Mi,j, k is 1 if the element corresponds to the coordinate at which the sample material is located; otherwise 0.

The tomo-seq data is preprocessed as follows:

  1. The sum of elements of M is calculated for each section as follows: (7) (8) (9) where m(x), m(y), and m(z) are lx-, ly-, and lz-length vectors, respectively, and represent the relative sample material volume per section.
  2. Let g be the index of a gene of interest. To normalize the total gene expression per sample material volume across sections, the inter-section normalization option is applied to , , and (i.e., the g-th row vectors of X, Y, and Z, respectively). Specifically, for each section, each element is divided by the total expression and multiplied by the relative sample material volumes as follows: (10) (11) (12) where are the normalized g-th row vectors.
  3. The sum of expression levels for each of X, Y, and Z are averaged as T: (13)
  4. Each of , , and is divided by its own row sum and then multiplied by T. Let , , and as the resultant preprocessed vectors as follows: (14) (15) (16)

IPF algorithm is performed on , , and .

Masker

tomoseqr requires “mask” data in addition to tomo-seq data to reconstruct expression patterns. Mask data is a three-dimensional binary array that defines the shape of the sample material from which tomo-seq data was derived (as described in the subsection “Mask data” above). Masker is the Shiny-based GUI that enables users to create mask data by drawing a mask as a 2D image of a sample material for each section along the user-defined axis (Fig 2 and S1 Video). Mask data created using masker can be exported as an R Data File.

Image viewer

Since tomoseqr’s output is a 3D array whose elements are the reconstructed expression levels for a gene, visualization is essential to interpret the reconstruction result. Image viewer is the Shiny-based GUI that allows users to interactively visualize 2D cross-section tomographic images and 3D plots of the reconstruction results. In the 2D view, the gene expression pattern is visualized as a heat map in each cross section orthogonal to each axis (Fig 3 and S2 Video). In the 3D view, the gene expression pattern is visualized as an interactive 3D plot (Fig 4 and S3 Video). Image viewer can output 2D tomographic images in PNG and GIF format and 3D images in PNG format.

Implementation and code availability of tomoseqr

tomoseqr was implemented as an R package and is available on Bioconductor (https://doi.org/doi:10.18129/B9.bioc.tomoseqr) and GitHub (https://github.com/bioinfo-tsukuba/tomoseqr).

Evaluation using simulated tomo-seq data

Generation of simulated tomo-seq data.

To evaluate the reconstruction algorithm in tomoseqr, we generated simulated tomo-seq data. In the simulation, we assumed that the sample material was a ellipsoid and 50 cryosections were prepared for each of the three orthogonal axes. Accordingly, tomoseqr was applied to the simulated tomo-seq data to reconstruct gene expression patterns across the voxels within the ellipsoid (i.e., mask is 1) out of 50 × 50 × 50 voxels. We also assumed that the simulated tomo-seq data consisted of genes that exhibit spatially specific patterns (spatial genes) and those that do not (background genes). We set the number of spatial genes to 3 and that of background genes to 1,997 or 4,997 and generated the simulated tomo-seq data as follows (Fig 5):

  1. For each of the spatial and background genes, we first generated a simulated 3D expression array {Ei,j, k} (i, j, k are the indexes of cryosections for the three orthogonal axes). For spatial genes, we generated 3D expression arrays for three spatial genes (Gene1, Gene2, Gene3) with the following 3D gene expression patterns (Fig 5):
    1. Gene1 Strongly expressed in the center of the ellipsoid and no expression in other regions.
    2. Gene2 Strongly expressed in the center of the ellipsoid, weakly expressed in other regions.
    3. Gene3 Strongly expressed in a narrow region in front of the ellipsoid, with no expression in other regions.
    4. Gene4 Strongly expressed in a symmetrical narrow region in front of the ellipsoid, with no expression in other regions.
    5. Gene5 Strongly expressed in a narrow region located away from the left-right axis and the dorsal-ventral axis, with no expression in other regions.
    For background genes, 3D gene expression arrays were generated for 1,997 or 4,997 genes using the SPsimSeq package [11].
  2. For each gene, the 3D expression pattern array was converted to 1D tomo-seq data for each axis. Specifically, we calculated the marginal sum of {Ei,j, k} over two of the three axes while preserving the remaining one:
    This result in the simulated 1D tomo-seq data for each gene along the three axes: (x1, x2, …, x50), (y1, y2, …, y50), (z1, z2, …, z50).
  3. For each of the three axes, simulated 1D tomo-seq data were combined across the genes. This results in simulated tomo-seq data of 2,000 or 5,000 genes and 50 cryosections along the axis.
  4. We repeated the above procedures 10 times, resulting in 10 simulated tomo-seq data with 2,000 and 5,000 genes.
thumbnail
Fig 5. Spatial expression patterns and reconstruction results for Gene1, Gene2, Gene3, Gene4, and Gene5.

Ground truth and reconstruction results are shown for (A) Gene1, (B) Gene2, (C) Gene3, (D) Gene4, and (E) Gene5. The dots are colored by gene expression levels, with white dots representing the sample shape (mask).

https://doi.org/10.1371/journal.pone.0311296.g005

Accuracy evaluation of tomoseqr using simulation data.

tomoseqr was run on the simulated tomo-seq data with the inter-section normalization option. The accuracy of gene expression pattern reconstructions by tomoseqr was evaluated using the simulated tomo-seq data. Specifically, we calculated the Pearson’s correlation coefficient (PCC) between the reconstructed result and the ground truth. For each gene, the ground truth was defined as the gene expression level normalized to reads per million (RPM) in each voxel.

To verify the significance of the PCCs between the reconstruction results and ground truths, a randomization test was conducted. In the test, a randomized array was generated by randomly shuffling the position of each element of the reconstruction result. Then, the PCC between the randomized array and the ground truth was calculated. Note that the elements with a mask value of 0, i.e., voxels not included in the sample, were excluded from both the shuffle of elements and the calculation of PCCs. The above operations were performed 1,000 times for each of the 10 simulated tomo-seq data for each spatial gene to obtain the empirical null distribution of the PCCs. Finally, PCCs of the 10 simulated tomo-seq data and the empirical null distribution aggregated across the 10 simulated tomo-seq data were compared using the Kolmogorov-Smirnov test.

Evaluation of computation time and memory usage

We conducted a computer experiment using the zebrafish (Danio rerio) tomo-seq data [1] to evaluate the amount of computing resources used by tomoseqr during its execution. We prepared the tomo-seq data from the zebrafish shield stage by converting the tables entitled traces_shield_AV.txt (for animal-vegetal (AV) axis), traces_shield_VD.txt (for ventral-dorsal (VD) axis), and traces_shield_LR.txt (for left-right (LR) axis) in the Table S4 spreadsheet in Junker et al. [1] into comma-separated value (CSV) files. Using tomoseqr, we reconstructed 3D expression patterns for 100, 1,000, and 2,352 genes (genes expressed with at least one section with read count > 50 in tomo-seq data for the AV axis) and measured computation time and memory usage. Computation time was measured as the time of the actual reconstruction process. The amount of memory usage was measured when the entire R script was executed. We used a computer with the following specs:

Application of tomoseqr to zebrafish tomo-seq data

To evaluate the reconstruction performance of tomoser with real tomo-seq data, we applied tomoser on publicly available zebrafish tomo-seq data [1] and compared the reconstruction results by tomoser with previously published in situ hybridization results. We used the tomo-seq data from the zebrafish shield stage from Juker et al. [1] (as described above). A hollow hemisphere was used as mask data and the size of the hemisphere was matched to the expression distribution of the mitochondria-related gene (mt-co2). tomoseqr was run with inter-section normalization to normalize the total gene expression per sample volume across sections.

Application of tomoseqr on planarian tomo-seq data

Maintenance of planarian.

A lab stock of asexual planarians D. japonica purchased from Aqua Field (Tokyo, Japan) was maintained in deionized water containing 0.05 g/L Instant Ocean Sea Salt (Instant Ocean, Blacksburg, VA, USA). The planarians were fed with chicken liver once or twice a week. The intact (completely-regenerated) planarians were starved for at least one week prior to the following experiments.

Preparation of planarian sections followed by total RNA purification.

Three planarians were anesthetized and straightened with 0.2% Chloretone Hemihydrate (Tokyo Chemical Industry Co., Ltd., Tokyo, Japan) in deionized water containing 0.05 g/L Instant Ocean Sea Salt for a minute. The straightened planarians were moved into Tissue-Tek Cryomold [10 × 10 × 5 mm] (Sakura Finetek Japan Co., Ltd., Osaka, Japan). After removing the liquid, O.C.T compound (Sakura Finetek Japan Co., Ltd., Osaka, Japan) was added, followed by freezing them using liquid nitrogen. The series of sections (78 sections along anterior-posterior axis, 28 sections along dorsal-ventral axis, and 51 sections along left-right axis) was prepared with CM3050 S (Leica, Wetzlar, German). The thickness of sections along anterior-posterior axis, dorsal-ventral axis, and left-right axis were 80 μm, 20 μm, and 20 μm, respectively. Each section was moved into each well of the twin.tec PCR plates LoBind, semi-skirted (Eppendorf, Hamburg, German) using a chilled handled needle in the chilled CM3050 S. Total RNA was extracted from each individual of the planarians using “Direct-TRI” method [12]. Briefly, 100 μL of TRI Reagent LS (Molecular Research Center, Cincinnati, OH, USA) was added to each well. After lysed the sections by vortexing, 100 μL of 99.5% EtOH was added, followed by mixing well. Then, the lysate was purified using AcroPrep Advance 96-well Long Tip Filter Plate for Nucleic Acid Binding (Pall, Port Washington, NY, USA). RNA was eluted with 10 μL nuclease-free water.

RNA-Seq library preparation and sequencing.

3’ mRNA-Seq were conducted according to the Lasy-Seq ver. 1.1 protocol (https://sites.google.com/view/lasy-seq/) [13, 14]. Briefly, 9 μL of the purified total RNA were reverse transcribed using an RT primer with index and SuperScript IV reverse transcriptase (Thermo Fisher Scientific, Waltham, MA, USA). Then, all RT mixtures of the samples were pooled and purified using an equal volume of AMpure XP beads (Beckman Coulter, Brea, CA, USA) according to the manufacturer’s instructions. Second strand synthesis was conducted on the pooled samples using RNaseH (5U/ L, Enzymatics, Beverly, MA, USA), and DNA polymerase I (10U/ μL, Enzymatics, Beverly, MA, USA). To avoid the carryover of large amounts of rRNAs, the mixture was subjected to RNase treatment using RNase T1 (Thermo Fisher Scientific, Waltham, MA, USA). Then, purification was conducted with a 0.8 × volume of AMpure XP beads. Fragmentation, end-repair, and A-tailing were conducted using 5 × WGS Fragmentation Mix (Enzymatics, Beverly, MA, USA). The adapter for Lasy-Seq was ligated using 5 × Ligation Mix (Enzymatics, Beverly, MA, USA), and the adapter-ligated DNA was purified twice with a 0.8 × volume of AMpure XP beads. After optimisation of PCR cycles for library amplification by qPCR using Evagreen, 20 × in water (Biotium, Fremont, CA, USA) and the QuantStudio5 Real-Time PCR System (Applied Biosystems, Waltham, MA, USA), the library was amplified using KAPA HiFi HotStart ReadyMix (KAPA BIOSYSTEMS, Wilmington, MA, USA) on the ProFlex PCR System (Applied Biosystems, Waltham, MA, USA). The amplified library was purified with an equal volume of AMpure XP beads. One microliter of the library was then used for electrophoresis using a Bioanalyzer 2100 with the Agilent High Sensitivity DNA kit (Agilent Technologies, Santa Clara, CA, USA) to check for quality. Then, sequencing of 150-bp paired-end reads was performed using HiSeq X Ten (Illumina, San Diego, CA, USA).

Read mapping and gene expression quantification.

Read 1 reads were processed with fastp (version 0.21.0) [15] using the following parameters:

--trim_poly_x

-w 20

--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

--adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

-l 31

The trimmed reads were then mapped to a D. japonica reference transcriptome sequences deposited in http://www.planarian.jp/download.html [16], using BWA mem (version 0.7.17-r1188) [17] with the default parameters. The read count for each gene was calculated with Salmon using -l IU, which specifies the library type (version v0.12.0) [18].

Reconstruction of 3D gene expression pattern from planarian tomo-seq data.

We applied tomoseqr on the planarian tomo-seq data. The mask data was created with masker according to the morphology of an adult planarian. tomoseqr was run with inter-section normalization to normalize the total gene expression per sample volume across sections.

Systematic exploration of correlated and regionalized expression patterns among planarian genes

As part of our analysis of large-scale gene expression distributions, we investigated genes with expression patterns similar to those of specific target genes. Additionally, we explored genes with spatially specific expression patterns using autocorrelation methods.

Identification of genes with expression patterns similar to specific targets.

  1. We reconstructed the spatial expression distributions of genes from the tomo-seq data.
  2. For each spatial expression distribution, we calculated the correlation coefficient with the spatial expression distribution of opsin using the correlationWithSpecificGene function. This function is available in the GitHub version of tomoseqr and will be included in Bioconductor version 3.20.
  3. We obtained the reconstructed spatial expression distributions for the four genes that showed the highest correlation coefficients with opsin.

Identification of genes with spatially specific expression patterns using autocorrelation.

Following Mayeur et al. 2021 [3], we performed an autocorrelation analysis using Moran’s index (Moran’s I). The analysis was conducted as follows:

  1. We listed planarian genes that have homologs in mice. Out of the 255,666 genes analyzed, 14,576 were identified as such homologous genes.
  2. We filtered the homologous genes based on the list of mouse proteins associated with GO:0001228, which means “DNA-binding transcription activator activity, RNA polymerase II-specific” [19]. As a result, 214 genes were identified.
  3. We reconstructed the spatial expression distributions of planarian genes that are homologs of mouse transcription factors using tomo-seq data.
  4. For each spatial expression distribution, we calculated Moran’s I.
  5. We obtained the reconstructed spatial expression distributions and the names of the homologous mouse transcription factors for the five genes that exhibited the highest Moran’s I values.

Results

Accuracy evaluation of tomoseqr using simulated data

The accuracy of gene expression pattern reconstructions by tomoseqr was evaluated using the simulation data (Materials and methods). We employed two simulation settings, in which the expressions of 2,000 (3 spatial and 1,997 background genes) or 5,000 (3 spatial and 4,997 background genes) genes were considered, and generated 10 simulated datasets for each simulation setting. Subsequently, tomoseqr was applied to each of the simulated datasets with the inter-section normalization option, and the reconstruction results of 3 spatial genes (Gene1, Gene2, and Gene3) were compared with the ground truth patterns by the Pearson’s correlation coefficients (PCCs).

Fig 5 shows representative results of the reconstructed gene expression patterns of 5 spatial genes (Gene1, Gene2, Gene3, Gene4 and Gene5). The reconstruction results for each gene were consistent with the ground truth. Table 2 shows the PCCs of reconstruction results and ground truths for all simulated datasets. All of the spatial genes showed high PCC values in both simulation settings while the reconstruction results for Gene2 showed relatively moderate PCCs compared to other genes.

thumbnail
Table 2. Evaluation of the reconstruction accuracy of tomoseq using simulated tomo-seq data.

For each simulating setting and spatial gene, 10 simulated tomo-seq data were generated. tomoseqr was applied to these data with the inter-section normalization option. Means and standard deviations of PCCs of the reconstruction results with the ground truth across 10 simulated tomo-seq data are shown. P-values were calculated by randomly shuffling the reconstructed results and calculating PCCs of the randomized results with the ground truth 1,000 times, following Kolmogorov-Smirnov tests (Materials and methods).

https://doi.org/10.1371/journal.pone.0311296.t002

To verify the significance of the PCCs between the reconstruction results and ground truths, a randomization test was conducted (Materials and methods). The observed PCCs showed statistically significantly larger values than the respective null distributions for all cases (p < 5.03 × 10−9, Kolmogorov-Smirnov test) (Table 2). These results indicate that the spatial pattern of gene expression by tomoseqr can be reconstructed with good accuracy.

Evaluation of computation time and memory usage

We performed a computer experiment to evaluate the amount of computing resources used by tomoseqr during its execution of the reconstruction of 3D gene expression. Specifically, we run tomoseqr for 100, 1,000, and 2,352 genes on a typical laptop PC (Materials and methods).

The measurement results are shown in Table 3. The computation time was 132.7, 1,330, and 2,626 s for 100, 1,000, and 2,352 genes, respectively, suggesting that the computation time is proportional to the number of genes to be reconstructed. The amount of memory usage was 380 MB, 1.96 GB, and 3.94 GB for 100, 1,000, and 2,352 genes. This result implies that the increase in maximum memory usage is more gradual than proportional to the number of genes. These results indicate that tomoseqr can reconstruct thousands of genes with realistic computation times and maximum memory usage on a typical laptop PC.

thumbnail
Table 3. Computation time and amount of memory usage for reconstruction using tomoseqr.

https://doi.org/10.1371/journal.pone.0311296.t003

Application of tomoseqr to zebrafish tomo-seq data

To evaluate the reconstruction performance of tomoser with real tomo-seq data, we applied tomoser on publicly available zebrafish (Danio rerio) tomo-seq data [1] and compared the reconstruction results of tomoser with previously published in situ hybridization results.

The results are shown in Fig 6.eve1 was strongly expressed in the ventral and near the center of the animal-vegetal (AV) axis (Fig 6A).bmp7a showed expression throughout the ventral side (Fig 6B).ntla showed expression near the center of the AV axis (Fig 6C).chd showed strong expression in a narrow dorsal region (Fig 6D). All of these results were consistent with previous studies using in situ hybridization [2023], showing that tomoseqr can reconstruct 3D expression patterns of developmentally important genes.

thumbnail
Fig 6. Results of reconstruction with zebrafish (Danio rerio, shield stage).

Reconstruction result for expression of (A) eve1 (ENSDARG00000056012), (B) bmp7a (ENSDARG00000018260), (C) ntla (ENSDARG00000009905), and (D) chd (ENSDARG00000006110). For each of the genes, 3D view, cross section perpendicular to the animal-vegetal axis (18 m × 50 sections), cross section perpendicular to the dorsal-ventral axis (18 m × 49 sections), and cross section perpendicular to the left-right axis (18 m × 56 sections) are shown from left to right. Parentheses after gene names indicate gene IDs. Colors represent the reconstructed expression levels.

https://doi.org/10.1371/journal.pone.0311296.g006

Application of tomoseqr on planarian tomo-seq data

We further applied tomoseqr on a newly generated tomo-seq data of planarian (Dugesia japonica). The tomo-seq data was generated by performing Lasy-Seq on the 78 sections along anterior-posterior axis, 28 sections along dorsal-ventral axis, and 51 sections along left-right axis (Materials and methods).

Fig 7 shows the results of reconstruction with inter-section normalization. For piwiA, which is known to be expressed in the whole body [24], expression was confirmed throughout the body (Fig 7A).opsin, which is known to be expressed in the eye [25], was found in two high expression spots on the head (Fig 7B).Djf-1 was found to be expressed in the epidermis (Fig 7C and S4 Video), which was consistent with a previous study [26].DjNp19, which is known to be expressed in the brain and nervous [27], expression was observed throughout the body, but was particularly strong in the head (Fig 7D and S5 Video). These results support the performance of tomoseqr to reconstruct 3D spatial gene expression patterns.

thumbnail
Fig 7. Reconstruction results using planarian tomo-seq data.

Reconstruction result for expression of (A) piwiA (DjGI005146_001), (B) opsin (DjGI008464_001), (C) Djf-1 (DjGI006376_001), and (D) DjNp19 (DjGI017609_001). Parentheses after gene names indicate gene IDs. Colors represent the reconstructed expression levels.AP means anterior-posterior axis (80 μm × 78 sections), VD means ventral-dorsal axis,(20 μm × 28 sections) and LR means left-right axis (20 μm × 51 sections).

https://doi.org/10.1371/journal.pone.0311296.g007

Systematic exploration of correlated and regionalized expression patterns among planarian genes

Finally, we reconstructed the 3D spatial expression patterns of 18,768 expressed planarian genes using tomoseqr. This necessitates prioritizing genes that exhibit biologically interesting expression patterns from the large number of genes. To achieve this, we adopted two strategies: correlation and autocorrelation.

First, we explored genes that showed a high correlation with specific spatial patterns. For instance, we identified genes specifically expressed in the eye by calculating the Pearson correlation coefficients between the reconstructed gene expression patterns and the eye-specific opsin gene (Fig 8A). The top four genes with the highest correlation coefficients with opsin all exhibited eye-specific expression (Fig 8B–8E). Arrb1 is involved in melanopsin signaling in the mammalian retina [28], and Rnf13 is an E3 ubiquitin-protein ligase [29] and one of the retinal pigment epithelium signature genes in rodents and humans [30]. On the other hand, although Cpne9, a calcium-dependent phospholipid-binding protein [31], and Tnnc1, a calcium-binding protein involved in muscle contraction [32], are not directly linked to visual function, they may play a role in calcium-dependent signaling relevant to vision.

thumbnail
Fig 8. Systematic exploration of genes correlated with the reconstructed opsin expression pattern in planarian tomo-seq data.

(A) A histogram showing the Pearson correlation coefficients (PCCs) with the reconstructed opsin expression pattern for 18,768 expressed genes. The vertical lines indicate the top 4 genes with the highest PCCs. (B-E) Reconstruction results for the expression of (B) Arrb1 (DjGI008623_001), (C) Rnf13 (DjGI009330_001), (D) Cpne9 (DjGI006974_001), and (E) Tnnc1 (DjGI015938_001). The PCCs of these gene expression patterns with the opsin gene (Fig 7B) are shown. (F) A heatmap showing the expression of human homologs in human eye single-cell types using single-cell RNA sequencing data from the Human Protein Atlas. The gene expression levels, in normalized transcript per million (nTPM), were transformed into Z-scores.

https://doi.org/10.1371/journal.pone.0311296.g008

Second, we searched for genes displaying regionalized expression patterns using Moran’s index (Moran’s I), a spatial autocorrelation measure [3]. Specifically, we focused on 214 expressed genes that encode the planarian homologs of mouse transcription factors and calculated Moran’s I for these genes (Fig 9A). The top five genes with the highest Moran’s I each exhibited distinct expression patterns (Fig 9B–9F), suggesting potentially diverse functions in planarians. Consistently, the homologs of these five regionalized genes are associated with various developmental, homeostatic, and disease processes in mammals (e.g., Creb3l1 with osteogenesis imperfecta [33], Kmt2d with Kabuki syndrome [34], Smad1 with the control of cell fate [35], Mzf1 with keratinocyte differentiation [36], and Foxa2 with cholestatic syndromes [37]).

thumbnail
Fig 9. Systematic exploration of genes with spatial autocorrelation in the reconstructed expression patterns in planarian tomo-seq data.

(A) A histogram showing the Moran’s I values of the reconstructed expression patterns for 214 genes encoding planarian homologs of mouse transcription factors. (B-F) Reconstruction results for the expression of (B) Creb3l1 (DjGI019398_001), (C) Kmt2d (DjGI001455_001), (D) Smad1 (DjGI002591_001), (E) Mzf1 (DjGI000932_002), and (F) Foxa2 (DjGI007454_001). The corresponding Moran’s I values are shown.

https://doi.org/10.1371/journal.pone.0311296.g009

Together, these results demonstrate that tomoseqr, by reconstructing 3D spatial gene expression patterns, not only facilitates the investigation of known gene expression patterns but also aids in the discovery of genes with biologically significant spatial distributions, through both correlation and autocorrelation analyses.

Discussion

Our study presents tomoseqr, a Bioconductor R package accurately and efficiently reconstructs 3D gene expression patterns using 1D tomo-seq data along three orthogonal axes. Through validation with both simulated and real datasets, including zebrafish and planarian data, we demonstrated that tomoseqr can reliably reconstruct spatial gene expression for thousands of genes.

The inclusion of Shiny-based graphical user interfaces, masker, and Image viewer, further underscores tomoseqr’s utility by providing intuitive tools for mask data creation and visualization of gene expression patterns. This makes tomoseqr accessible to a broad range of researchers, including those without extensive computational expertise.

Our work further underlines the critical role of software packages in interpreting and visualizing complex gene expression data in life sciences. As high-throughput multi-sample RNA-seq technologies become increasingly accessible and lower the barrier to RNA tomography [38], the importance of our work is set to escalate.

Once a large number of reconstructed 3D spatial expression patterns of genes are obtained using tomoseqr, it becomes crucial to explore and prioritize those with biological significance for discovery and hypothesis generation. In this study, we demonstrated that by leveraging correlation and spatial autocorrelation metrics, we can systematically identify genes with intriguing expression patterns in planarians, a species for which prior knowledge is limited. This data-driven approach to gene exploration highlights the potential of tomoseqr to significantly contribute to a wide range of research areas in the life sciences.

There are several possible directions for improving tomoseqr. The first is the improvements to the reconstruction algorithm. Although the IPF algorithm used in tomoseqr is based on numerical reconstruction, it could be possible to devise other reconstruction algorithms by incorporating biological properties, such as spatial autocorrelation, gene-gene correlation, and noise structures. The second is the enhancement of robustness through the preprocessing of tomo-seq data. In RNA-seq data, gene expression values might happen to drop to zero, especially for lowly-expressed genes, which could potentially affect the performance of spatial reconstruction. This issue could be addressed by employing preprocessing techniques for RNA-seq data, such as imputation or smoothing, which are accessible as functions implemented in existing R packages for transcriptome data analyses. The third is an accuracy measure for the reconstructed 3D gene expression patterns. RNA tomography based on 1D tomo-seq data along three orthogonal axes may not always accurately reconstruct any given gene expression distribution. For instance, it is impossible to reconstruct a distribution where a gene is expressed at two spots along a diagonal of the cube because infinite feasible solutions can fit the marginal distributions defined by the corresponding tomo-seq data. Currently, we are developing a method to evaluate identifiability for reconstructed patterns. The fourth is the support for the tomo-seq data prepared through sampling methods other than sampling along three orthogonal axes. A previous study provide a promising lead: Schede et al. successfully reconstructed 2D spatial gene expression from 1D tomo-seq data obtained from three consecutive sections secondary sliced with stripes at different angles [8]. We currently speculate that 1D tomo-seq data sampled along at least six axes enable us to accurately reconstruct any spatial distribution. The fifth is the integration of tomo-seq data with single-cell and spatial transcriptome data. For example, inference of cellular localization has been achieved by integrating single-cell RNA-seq data with landmark gene expression patterns determined by in situ hybridization experiments [39]. Such strategy can also employ 3D spatial expression patterns reconstructed by tomoseqr as landmarks. In addition, tomo-seq data might be integrated with 2D expression patterns derived from spatial transcriptomics techniques [40], enabling a more accurate reconstruction of 3D spatial expression patterns. The sixth is the enhancement of the functionality of Image viewer to allow simultaneous display of the spatial expression distributions of multiple genes. This feature would be particularly useful for morphological analyses that compare genes of interest with marker genes (e.g., tubulin beta III-positive neuronal tissue, alpha-actin-positive skeletal muscle, or cardiac myosin-positive heart).

Our evaluation showcases tomoseqr’s ability to process large datasets with computational efficiency, making it a valuable tool for exploring the complex spatial organization of gene expression in biological tissues. By offering an accurate and user-friendly approach to analyzing tomo-seq data, tomoseqr represents a significant advancement in the field of spatial transcriptomics, paving the way for new insights into developmental biology, disease mechanisms, and beyond.

Supporting information

S4 Video. Reconstruction result of Djf1 (3D).

https://doi.org/10.1371/journal.pone.0311296.s004

(MP4)

S5 Video. Reconstruction result of DjNp19 (3D).

https://doi.org/10.1371/journal.pone.0311296.s005

(MP4)

Acknowledgments

We thank Koki Tsuyuzaki and other members of Bio“Pack”athon for their helpful advice. We also thank Tsukasa Fukunaga, Kenichi Suzuki, Akinori Okumura, and our lab members for fruitful discussion.

References

  1. 1. Junker J, Noël E, Guryev V, Peterson K, Shah G, Huisken J, et al. Genome-wide RNA Tomography in the Zebrafish Embryo. Cell. 2014;159(3):662–675. pmid:25417113
  2. 2. Ebbing A, Ábel Vértesy, Betist MC, Spanjaard B, Junker JP, Berezikov E, et al. Spatial Transcriptomics of C. elegans Males and Hermaphrodites Identifies Sex-Specific Differences in Gene Expression Patterns. Developmental cell. 2018;47(6):801–813.e6. pmid:30416013
  3. 3. Mayeur H, Lanoizelet M, Quillien A, Menuet A, Michel L, Martin KJ, et al. When Bigger Is Better: 3D RNA Profiling of the Developing Head in the Catshark Scyliorhinus canicula. Frontiers in cell and developmental biology. 2021;9:744982. pmid:34746140
  4. 4. Yvernogeau L, Klaus A, Maas J, Morin-Poulard I, Weijts B, Schulte-Merker S, et al. Multispecies RNA tomography reveals regulators of hematopoietic stem cell birth in the embryonic aorta. Blood. 2020;136(7):831–844. pmid:32457985
  5. 5. Burkhard SB, Bakkers J. Spatially resolved RNA-sequencing of the embryonic heart identifies a role for Wnt/β-catenin signaling in autonomic control of heart rate. eLife. 2018;7. pmid:29400650
  6. 6. Ruiz Tejada Segura ML, Abou Moussa E, Garabello E, Nakahara TS, Makhlouf M, Mathew LS, et al. A 3D transcriptomics atlas of the mouse nose sheds light on the anatomical logic of smell. Cell Reports. 2022;38(12):110547. pmid:35320714
  7. 7. Schild E. TomoQC: A quick quality check for tomosequencing data;. Available from: https://github.com/erikschild/TomoQC.
  8. 8. Schede HH, Schneider CG, Stergiadou J, Borm LE, Ranjak A, Yamawaki TM, et al. Spatial tissue profiling by imaging-free molecular tomography. Nature biotechnology. 2021;39(8):968–977. pmid:33875865
  9. 9. Liu W. tomoda: Tomo-seq data analysis; 2022. Available from: https://github.com/liuwd15/tomoda.
  10. 10. Fienberg SE. An Iterative Procedure for Estimation in Contingency Tables. The Annals of mathematical statistics. 1970;41(3):907–917.
  11. 11. Assefa AT, Hawinkel S, Vandesompele J, Thas O, R Core Team. SPsimSeq: Semi-parametric simulation tool for bulk and single-cell RNA sequencing data; 2022.
  12. 12. Ujibe K, Nishimura K, Kashima M, Hirata H. Direct-TRI: High-throughput RNA-extracting Method for All Stages of Zebrafish Development. Bio-protocol. 2021;11(17):e4136. pmid:34604443
  13. 13. Kamitani M, Kashima M, Tezuka A, Nagano AJ. Lasy-Seq: a high-throughput library preparation method for RNA-Seq and its application in the analysis of plant responses to fluctuating temperatures. Scientific Reports. 2019;9(1):7091. pmid:31068632
  14. 14. Kashima M, Shida Y, Yamashiro T, Hirata H, Kurosaka H. Intracellular and Intercellular Gene Regulatory Network Inference From Time-Course Individual RNA-Seq. Frontiers in Bioinformatics. 2021;1. pmid:36303726
  15. 15. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. pmid:30423086
  16. 16. An Y, Kawaguchi A, Zhao C, Toyoda A, Sharifi-Zarchi A, Mousavi SA, et al. Draft genome of Dugesia japonica provides insights into conserved regulatory elements of the brain restriction gene nou-darake in planarians. Zoological Letters. 2018;4(1):24. pmid:30181897
  17. 17. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. pmid:19451168
  18. 18. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nature methods. 2017;14(4):417–419. pmid:28263959
  19. 19. QuickGO: Gene Ontology browser;. https://www.ebi.ac.uk/QuickGO/term/GO:0001228.
  20. 20. Collins MM, Maischein HM, Dufourcq P, Charpentier M, Blader P, Stainier DY. Pitx2c orchestrates embryonic axis extension via mesendodermal cell migration. eLife. 2018;7. pmid:29952749
  21. 21. Okuda Y, Ogura E, Kondoh H, Kamachi Y. B1 SOX Coordinate Cell Specification with Patterning and Morphogenesis in the Early Zebrafish Embryo. PLoS Genetics. 2010;6(5):e1000936. pmid:20463883
  22. 22. Ramel MC, Buckles GR, Baker KD, Lekven AC. WNT8 and BMP2B co-regulate non-axial mesoderm patterning during zebrafish gastrulation. Developmental Biology. 2005;287(2):237–248. pmid:16216234
  23. 23. Schulte-Merker S, van Eeden FJM, Halpern ME, Kimmel CB, Nüsslein-Volhard C. no tail (ntl) is the zebrafish homologue of the mouse T (Brachyury) gene. Development (Cambridge). 1994;120(4):1009–1015.
  24. 24. Teramoto M, Kudome-Takamatsu T, Nishimura O, An Y, Kashima M, Shibata N, et al. Molecular markers for X-ray-insensitive differentiated cells in the Inner and outer regions of the mesenchymal space in planarian Dugesia japonica. Development, Growth & Differentiation. 2016;58(7):609–619. pmid:27530596
  25. 25. Orii H, Katayama T, Sakurai T, Agata K, Watanabe K. Immunohistochemical detection of opsins in turbellarians. Hydrobiologia. 1998;383(1):183–187.
  26. 26. Yamamoto A, Matsunaga Ki, Anai T, Kawano H, Ueda T, Matsumoto T, et al. Characterization of an intermediate filament protein from the platyhelminth, Dugesia japonica. Protein and Peptide Letters. 2020;27(5):432–446. pmid:31652112
  27. 27. Shimoyama S, Inoue T, Kashima M, Agata K. Multiple Neuropeptide-Coding Genes Involved in Planarian Pharynx Extension. Zoological Science. 2016;33(3):311–319. pmid:27268986
  28. 28. Cameron EG, Robinson PR. β-Arrestin-dependent deactivation of mouse melanopsin. PLoS One. 2014;9(11):e113138. pmid:25401926
  29. 29. Zhang Q, Meng Y, Zhang L, Chen J, Zhu D. RNF13: a novel RING-type ubiquitin ligase over-expressed in pancreatic cancer. Cell Res. 2009;19(3):348–357. pmid:18794910
  30. 30. Abcouwer SF, Miglioranza Scavuzzi B, Kish PE, Kong D, Shanmugam S, Le XA, et al. The mouse retinal pigment epithelium mounts an innate immune defense response following retinal detachment. J Neuroinflammation. 2024;21(1):74. pmid:38528525
  31. 31. Tomsig JL, Creutz CE. Biochemical characterization of copine: a ubiquitous Ca2+-dependent, phospholipid-binding protein. Biochemistry. 2000;39(51):16163–16175. pmid:11123945
  32. 32. Cordina NM, Liew CK, Gell DA, Fajer PG, Mackay JP, Brown LJ. Effects of calcium binding and the hypertrophic cardiomyopathy A8V mutation on the dynamic equilibrium between closed and open conformations of the regulatory N-domain of isolated cardiac troponin C. Biochemistry. 2013;52(11):1950–1962. pmid:23425245
  33. 33. Keller RB, Tran TT, Pyott SM, Pepin MG, Savarirayan R, McGillivray G, et al. Monoallelic and biallelic CREB3L1 variant causes mild and severe osteogenesis imperfecta, respectively. Genet Med. 2018;20(4):411–419. pmid:28817112
  34. 34. Li Y, Bögershausen N, Alanay Y, Simsek Kiper PO, Plume N, Keupp K, et al. A mutation screen in patients with Kabuki syndrome. Hum Genet. 2011;130(6):715–724. pmid:21607748
  35. 35. Kretzschmar M, Doody J, Massagué J. Opposing BMP and EGF signalling pathways converge on the TGF-beta family mediator Smad1. Nature. 1997;389(6651):618–622. pmid:9335504
  36. 36. Dong S, Ying S, Kojima T, Shiraiwa M, Kawada A, Méchin MC, et al. Crucial roles of MZF1 and Sp1 in the transcriptional regulation of the peptidylarginine deiminase type I gene (PADI1) in human keratinocytes. J Invest Dermatol. 2008;128(3):549–557. pmid:17851584
  37. 37. Bochkis IM, Rubins NE, White P, Furth EE, Friedman JR, Kaestner KH. Hepatocyte-specific ablation of Foxa2 alters bile acid homeostasis and results in endoplasmic reticulum stress. Nat Med. 2008;14(8):828–836. pmid:18660816
  38. 38. D’Agostino N, Li W, Wang D. High-throughput transcriptomics. Sci Rep. 2022;12(1):20313. pmid:36446824
  39. 39. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502. pmid:25867923
  40. 40. Tian L, Chen F, Macosko EZ. The expanding vistas of spatial transcriptomics. Nat Biotechnol. 2023;41(6):773–782. pmid:36192637