A Digital Framework to Build, Visualize and Analyze a Gene Expression Atlas with Cellular Resolution in Zebrafish Early Embryogenesis

A gene expression atlas is an essential resource to quantify and understand the multiscale processes of embryogenesis in time and space. The automated reconstruction of a prototypic 4D atlas for vertebrate early embryos, using multicolor fluorescence in situ hybridization with nuclear counterstain, requires dedicated computational strategies. To this goal, we designed an original methodological framework implemented in a software tool called Match-IT. With only minimal human supervision, our system is able to gather gene expression patterns observed in different analyzed embryos with phenotypic variability and map them onto a series of common 3D templates over time, creating a 4D atlas. This framework was used to construct an atlas composed of 6 gene expression templates from a cohort of zebrafish early embryos spanning 6 developmental stages from 4 to 6.3 hpf (hours post fertilization). They included 53 specimens, 181,415 detected cell nuclei and the segmentation of 98 gene expression patterns observed in 3D for 9 different genes. In addition, an interactive visualization software, Atlas-IT, was developed to inspect, supervise and analyze the atlas. Match-IT and Atlas-IT, including user manuals, representative datasets and video tutorials, are publicly and freely available online. We also propose computational methods and tools for the quantitative assessment of the gene expression templates at the cellular scale, with the identification, visualization and analysis of coexpression patterns, synexpression groups and their dynamics through developmental stages.

1 User Guide S1 -Match-IT step-by-step protocol What is Match-IT?
Match-IT is a software package to build cellular-level atlases of gene expression in early embryogenesis. It gathers gene expression acquired by in-situ hybridization from multiple, partial analyzed embryos into one single, complete template. It also performs segmentation of the analyzed gene expression patterns and offers the possibility to supervise and correct the final results through a graphical user interface. Match-IT is delivered as part of the following publication: C. Castro-Gonzlez, M.A. Luengo-Oroz, L. Duloquin, T. Savy, B. Rizzi, S. Desnoulez, R. Doursat, Y. Kergosien, M.J. Ledesma-Carbayo, P. Bourgine, N. Peyriras and A. Santos. "A digital framework to build, visualize and analyze gene expression atlases with cellular resolution in zebrafish early embryogenesis".
A public release of Match-IT together with the present tutorial, a readme.txt file and representative datasets are available as Supplementary Material to this paper: http://bioemergences.iscpif.fr/documents/MatchIT.zip

System requirements
• Windows 64 bits (Xp or higher) • Having either Matlab R2010b (64 bits) or Matlab Compiler Runtime v714 (64 bits) installed in your computer. In case you have neither of them installed, the Matlab Compiler Runtime installer ("MCRInstaller.exe") is provided within the Match-IT package • 8GB RAM (minimum) • The disk space needed to load the provided example data and to run Match-IT on it is approximately 2GB

Installing Match-IT
To run the program, simply unzip the software package and double-click on "Match-IT.exe". Representative confocal datasets (found within the "data" folder of the Match-IT package) must be uncompressed before running the program.

Using Match-IT
In order to map one analyzed embryo onto the common template, follow the 6 processing steps described below.

Gene Segmentation Tab:
This tab performs the segmentation of the specified raw gene patterns and identifies the selection of cells expressing those genes in the analyzed embryos. Press the "Select Gene Patterns to Segment" button. This will open a file selection dialog that will let you choose, within the local "data" folder, the files that hold the raw gene expression patterns: • gsc G YY tTTT ch01.vtk = gsc channel • gsc G YY tTTT ch02.vtk = gene channel where: • "gsc" stands for goosecoid, the gene pattern used for reference in our atlas which was tagged for all our acquired datasets in ch01 • "G" is the variable-length name designating the specific gene expression, G, which accompanies gsc in ch02 (e.g. "ntla"). Files whose "G" name is "tem" denote the template specimen.
• "YY" is a 2-letter code which can take any free value (e.g. "WT" in our sample datasets) • "TTT" is a 3-digit number specifying the time stage of the dataset (e.g. "008" designates developmental stage called late shield) NOTE: To successfully perform gene pattern segmentation, Match-IT requires the lower value that best define each raw gene expression. This value must be specified in the "config.txt" file in the root folder. After performing the gene pattern segmentation (5 min), Match-IT identifies the selection of cells in the analyzed embryo that express that gene pattern. To this end, it requires the specification of the analyzed embryo nuclei coordinates held in a text file in the "data" folder: • gsc G YY tTTT ch00.emb = nuclei coordinates The outputs of this first processing step are: • gsc G YY tTTT ch01.csv = selection of cells expressing gsc • gsc G YY tTTT ch02.csv = selection of cells expressing gene G • gsc G YY tTTT ch03.vtk = nuclei shape segmentation • gsc G YY tTTT ch04.vtk = gsc segmentation • gsc G YY tTTT ch05.vtk = gene G segmentation which can be found in the local "data" folder. And: • gsc G YY tTTT distance.txt = mean internuclei distance which can be found in the local "parameters" folder.

Initialization Tab:
This tab performs the initial coarse alignment between the analyzed embryos and the template by extracting common orientation planes.
Initialization of one analyzed embryo at one given developmental stage requires the previous segmentation, in step 1, of the gsc pattern of both that particular analyzed embryo and its corresponding template.
Consequently, pressing the "Select Datasets to Initialize" button will only let you choose among those datasets having already a gsc segmentation file: • gsc G YY tTTT ch04.vtk = gsc segmentation in the local "data" folder. Apart from the gsc segmentation image, this initialization step also requires to read the corresponding analyzed embryo nuclei coordinates from the local "data" folder: • gsc G YY tTTT ch00.emb = nuclei coordinates The output of this second processing step is a text file in the local "parameters" folder which holds the extracted initialization information: • gsc G YY tTTT init param.txt = initialization parameters

Validate Initialization Tab:
This tab starts Match-IT Graphical User Interface (GUI) to supervise the orientation planes extracted in step 2 and correct them if necessary.
The components of the GUI are: (a) Dataset selector: It pops up a menu to choose which analyzed embryo to visualize. It requires reading the corresponding analyzed embryo nuclei coordinates from the local "data" folder: gsc G YY tTTT ch00.emb = nuclei coordinates (b) Time selector: It allows sliding through different developmental stages of the same dataset (i.e. of the same gene expression G) (c) Gene selectors: These two radio buttons permit to highlight the selection of cells expressing either gsc or gene G for a particular analyzed embryo. They consequently require analyzed embryo expressions to have undergone processing step 1 since they need to read cell selections stored in the local "data" folder: gsc G YY tTTT ch01.csv = selection of cells expressing gsc gsc G YY tTTT ch02.csv = selection of cells expressing G (d) Parameter selector: It pops up a dialog to choose the folder where to find the initialization parameters text file (by default this folder is set to the local "parameters" directory): gsc G YY tTTT init param.txt = initialization parameters (e) Visualization controls: These radio buttons permit to visualize the spherical model, the referential axes (a, d, l), the 3 orientation planes and the coordinates origin extracted in processing step 2.
(f) Hand-correction controls: These editable text fields allow to manually correct the rotation and offset of the initialization parameters extracted in processing step 2. These corrections can be saved by pressing the "Save init parameters" button.
(g) Visualization of the chosen analyzed embryo together with the options specified in the menu on the left.
(h) Camera toolbar: It allows to zoom in and out, orbit around the object and/or translate the camera point of view.

Registration Tab:
This tab performs the image registration between analyzed embryos and templates to further refine the initial alignment extracted in processing step 2.
Registering an analyzed embryo onto the template requires the previous extraction of the common orientation planes in processing step 2. Consequently, pressing the "Select Datasets to Register" button will only let you choose among those datasets having already an initialization parameters text file: • gsc G YY tTTT init param.txt = initialization parameters in the local "parameters" folder. Note that this file can have been corrected by the user in step 3. Apart from the initialization parameters, this registration step also requires both the corresponding analyzed embryo nuclei coordinates and the nuclei shape segmentation image present in the local "data" folder: • gsc G YY tTTT ch00.emb = nuclei coordinates • gsc G YY tTTT ch03.vtk = nuclei shape segmentation Then, this registration step maps all analyzed embryos and templates images into a common space having the following default dimensions: • 600x600x500 pixels • 1.51x1.51x1.51 microns/pixel Users can modify these default values by unchecking the box placed at the top left corner. This action will also display the recommended dimensions to specify according to 3 factors: 1) the pixel size specified by the user, 2) the initialization parameters of all the analyzed embryos processed so far and 3) the coordinates origin of the atlas (defined by default as the coordinates origin for the template of the latest temporal stage available). Finally, the output of this fourth processing step is a text file in the local "parameters" folder which holds the extracted registration information: • gsc G YY tTTT reg param.txt = registration parameters

Validate Registration Tab:
This tab starts the Match-IT Graphical User Interface (GUI) to supervise the result of the mapping procedure and correct it if necessary.
The components of the GUI are: (a) Dataset selector: It pops up a menu to choose which analyzed embryo to visualize. It requires to read the corresponding nuclei coordinates of both the analyzed embryo (displayed in white) and the template (displayed in blue) from the files placed at the local "data" folder: gsc G YY tTTT ch00.emb = nuclei coordinates gsc tem YY tTTT ch00.emb = nuclei coordinates (template) (b) Time selector: It allows to slide through different developmental stages of the same dataset (i.e. of the same gene expression G) (c) Parameter selector: It pops up a dialog to choose the folder where to find the initialization and registration parameters text files (by default this folder is set to the local "parameters" directory): gsc G YY tTTT init param.txt = initialization parameters gsc G YY tTTT reg param.txt = registration parameters (d) Mapping selector: This radio button permits to visualize the analyzed embryo mapping into the template before and after the registration procedure in step 4.
(e) Cell selectors: These radio buttons permit to display/hide several different selections of cells: 1) Template nuclei (in blue), 2) Template gsc expression (in red), 3) Analyzed embryo nuclei (in white), 4) Analyzed embryo gsc expression (in red), 5) Template nuclei determined to express the superimposed analyzed embryo gsc (in red) , 6) Analyzed embryo G expression (in green) and 7) Template nuclei determined to express the superimposed G expression (in green). Consequently, these radio buttons make use of the following files stored in the local "data" folder (in addition to the aforementioned nuclei coordinates files): gsc G YY tTTT ch01.csv = selection of cells expressing gsc gsc G YY tTTT ch02.csv = selection of cells expressing G gsc tem YY tTTT ch01.csv = cells expressing gsc (template) (f) Visualization controls: These radio buttons permit to visualize the spherical model, the referential axes (a, d, l), the 3 orientation planes and the coordinates origin extracted in processing steps 2-4.
(g) Hand-correction controls: These editable text fields allow to manually correct the rotation and offset of the mapping parameters extracted in processing step 2-4. These corrections can be saved by pressing the "Save parameters" button.
(h) Visualization of the chosen analyzed embryo and/or template together with the options specified in the menu on the right.
(i) Camera toolbar: It allows to zoom in and out, orbit around the object and/or translate the camera point of view.

Transfer to Atlas-IT Tab:
This tab creates the final 3D atlas model that can be visualized in our Atlas-IT software: It determines which template nuclei express each of the mapped gene expressions coming from the analyzed embryos and it also transforms the original raw files and their segmentations into their newly shared "atlas" space.
Transferring an analyzed embryo into the final atlas model requires at least having finished the registration process in step 4. Consequently, pressing the "Transfer Results to Atlas-IT" button will only let you choose among those datasets having already a registration parameters text file: • gsc G YY tTTT reg param.txt = registration parameters in the local "parameters" folder. Note that this file can have been corrected by the user in step 5. Apart from the registration parameters, this final step also requires both the corresponding template nuclei coordinates and the selection of analyzed embryo cells expressing gsc and G: • gsc G YY tTTT ch01.csv = selection of cells expressing gsc • gsc G YY tTTT ch02.csv = selection of cells expressing gene G The result of this final step is a text file that indicates, for each of the template nuclei constituting the final model, whether they express or not each of the mapped gsc and G expressions coming from the cohort of analyzed embryos: • 3DAtlas Localn0t1.emb = final atlas model which is stored in the local "cache" folder. This file can be read by our Atlas-IT software in order to display the resulting 3D atlas together with all the mapped gene expressions. Apart from that, this final step also transforms all the analyzed embryos raw files and their segmentations to the new common "atlas" space: • gsc G YY tTTT ch00.vtk = raw nuclei channel • gsc G YY tTTT ch01.vtk = raw gsc channel • gsc G YY tTTT ch02.vtk = raw gene channel • gsc G YY tTTT ch03.vtk = segmented nuclei shape • gsc G YY tTTT ch04.vtk = gsc segmentation • gsc G YY tTTT ch05.vtk = gene segmentation which are all stored in the local "data" folder. The results are the following .vtk files: • gsc G YY tTTT ch00 trans.vtk = mapped raw nuclei channel • gsc G YY tTTT ch01 trans.vtk = mapped raw gsc channel • gsc G YY tTTT ch02 trans.vtk = mapped raw gene channel • gsc G YY tTTT ch03 trans.vtk = mapped nuclei shape • gsc G YY tTTT ch04 trans.vtk = gsc segmentation • gsc G YY tTTT ch04 trans.vtk = mapped gsc segmentation • gsc G YY tTTT ch05 trans.vtk = mapped gene segmentation which are saved in the local "cache" folder and can be directly visualized in Atlas-IT together with the final atlas model.