Stochastic modeling of a gene regulatory network driving B cell development in germinal centers

Germinal centers (GCs) are the key histological structures of the adaptive immune system, responsible for the development and selection of B cells producing high-affinity antibodies against antigens. Due to their level of complexity, unexpected malfunctioning may lead to a range of pathologies, including various malignant formations. One promising way to improve the understanding of malignant transformation is to study the underlying gene regulatory networks (GRNs) associated with cell development and differentiation. Evaluation and inference of the GRN structure from gene expression data is a challenging task in systems biology: recent achievements in single-cell (SC) transcriptomics allow the generation of SC gene expression data, which can be used to sharpen the knowledge on GRN structure. In order to understand whether a particular network of three key gene regulators (BCL6, IRF4, BLIMP1), influenced by two external stimuli signals (surface receptors BCR and CD40), is able to describe GC B cell differentiation, we used a stochastic model to fit SC transcriptomic data from a human lymphoid organ dataset. The model is defined mathematically as a piecewise-deterministic Markov process. We showed that after parameter tuning, the model qualitatively recapitulates mRNA distributions corresponding to GC and plasmablast stages of B cell differentiation. Thus, the model can assist in validating the GRN structure and, in the future, could lead to better understanding of the different types of dysfunction of the regulatory mechanisms.


Introduction
Adaptive immune response is a complex mechanism, relying on B and T lymphocytes, which protects the organism against a range of pathogens.Crucial elements of adaptive immune response, the germinal centers (GCs) are the structures in lymphoid organs where activated naive B cells are expanded (in a dark zone, DZ) and selected (in a light zone, LZ) and can have multiple exit fates, such as antibody production (plasmablasts and plasma cells, PB_PC), long term storage of antigen information (memory B cells, MC), or death via apoptosis [1,2].
For the normal di erentiation of GC B cells towards PB_PC stage, the kinetic ODE model fits microarray data at two steady-states: the first one associated with the GC stage of B cell di erentiation (with high levels of BCL6 and low levels of IRF4 and BLIMP1), and the second one associated with PB_PC stage (with low levels of BCL6 and high levels of IRF4 and BLIMP1).
Recently, multiple protocols for SC RNA-seq data generation have been developed and used to answer various questions in biology [19,20].At the same time, di erent groups showed that gene transcription in eukaryotes is a discontinuous process and follows bursting kinetics [21,22,23,24].Such results suggest that the stochastic nature of gene expression at the single cell (SC) level can be partly responsible for the phenotype variation in living organisms [25].Thus, by gaining access to a stochastic behavior of gene expression, the SC viewpoint may lead to further improvement of the understanding of the biological systems and their variability.
Nevertheless, stochastic modeling of GRNs using SC gene expression data is still in its early stage [26,27] and has never been studied for GC B cells.Here, we apply a particular class of stochastic models combining deterministic dynamics and random jumps, called piecewise-deterministic Markov processes (PDMPs) [28], to the description of GC B cell di erentiation.It is a two-state model of gene expression introduced in [29] that allows a description of the system's dynamics at the promoter, transcription and translation levels for a given GRN.We apply this model to the GRN made of the three key genes, BCL6, IRF4 and BLIMP1, and simulate single B cell mRNA data [30].We show that the model can qualitatively simulate the SC mRNA patterns for normal B cell di erentiation at GC and PB_PC stages.

Single-cell data
We used the B cell dataset from human lymphoid organs published by Milpied et al. [30].
The authors studied normal B cell subsets from germinal centers of the human spleen and tonsil and performed integrative SC analysis of gene expression.They used an adapted version of the integrative single-cell analysis protocol [31].In short, the authors prepared cells for flow cytometry cell sorting.Then in every 96-well plate the authors sorted three to six ten-cell samples of the same phenotype as a single-cell.They performed multiplex qPCR analysis using the Biomark system (Fluidigm) with 96x96 microfluidic chips (fluidigm) and Taqmann assays (Thermofisher) [30].They obtained results in the form of fixed fluorescence threshold to derive Ct values.We used Ct values to derive Expression threshold (Et) values: Et = 30 ≠ Ct.When there was an unreliably low or undetected expression (Ct > 30), Et was set to zero [30].Using SC gene expression analysis of a panel of 91 preselected genes and pseudotime analysis (based on the cartesian coordinates of SC on the first and second principal components of the PCA), the authors separated GC DZ cells, GC LZ cells, memory cells and PB_PC cells.
Here we focused on three genes, BCL6, IRF4 and BLIMP1.We selected the SC gene expression values for BCL6, IRF4 and BLIMP1 for GC DZ cells (317 SC) and for PB_PC (104 SC) (see Figure 5).The experimental dataset includes at the GC B cell stage 30 cells with zero BCL6 mRNA amount, 292 cells with zero IRF4 mRNA amount and 292 cells with zero BLIMP1 mRNA amount.For the end of the B cell di erentiation (PB_PC), there were 25 cells with zero BCL6 mRNA amount, 79 cells with zero IRF4 and 5 cells with zero BLIMP1 mRNA amount.

Kinetic ODE model
( In this model, CD40 and BCR act as stimuli on genes: BCR temporary represses BCL6 and CD40 temporary activates IRF4.

Stochastic model
The stochastic model that describes the coupled dynamics of gene i and the other genes of the GRN is defined by the series of equations: where E i (t), M i (t) and P i (t) are, respectively, the activation status of the promoter, the quantity of mRNA and the quantity of proteins of gene i, for i oe {1, 2, 3}.For s oe {4, 5}, Q s accounts for external stimuli intensity.Each index i refers to one of the gene in the GRN, either BCL6, IRF4, or BLIMP1, and each index s to stimuli BCR and CD40 (see Table 1).
For each gene i, System (4) is defined by the promoter state switching rates k on,i (h ≠1 ) and ), by a degradation rate of mRNA (d 0,i , h ≠1 ), a protein degradation rate (d 1,i , h ≠1 ), a transcription rate (s 0,i , mRNA◊h ≠1 ), a translation rate (s 1,i , protein◊mRNA ≠1 ◊ h ≠1 ), and interaction parameters ◊ j,i with either gene or stimulus j. Interactions between genes are based on the assumption that k on,i is a function of the proteins P 1 , P 2 , P 3 and stimuli Q s and is given by: where 1 + e ◊ j,i (P j /H j,i ) " 1 + (P j /H j,i ) " .( In (6), H j,i represents an interaction threshold for the protein j on gene i , whilei is a scaling parameter.The structure of System (4)-( 6) for the particular network considered in this paper is illustrated in Figure 1.
A detailed derivation of the model is presented in the supplementary material of [29].
Starting from a simple biochemical model of gene expression, the authors described higher-order interactions and took into consideration possible auto-activations.After normalization and simplifcation steps, Herbach et al. [29] and Bonna oux et al. [32] described the promoter switching rates k on,i and k o ,i in the form of ( 5) and ( 6) by introducing the scaling parameteri .
It can be noted that the promoter state evolution of gene i between time t and t + "t in System (4)-( 6) is defined, for small "t, as a Bernoulli-distributed random variable [29,32]: where probability fi i (t), derived by solving the master equation [29,33], is given by It follows that the promoter state of gene i averages to k on,i /(k on,i + k o ,i ) in the fast promoter regime (k on,i + k o ,i ∫ 1/"t).This quantity will be used to reduce System (4)-( 6) into an ordinary di erential equation (ODE) system in Section 3.1.

Simulating the stochastic model
During B cell di erentiation in GC, B cells first receive BCR signal, through follicular dendritic cells interaction, that represses BCL6.Then, B cells integrate CD40 signals, through T follicular helper, that activate IRF4 [3,6,34].
In order to simulate these interactions, we assumed that BCR was acting on BCL6 from 0h until 25h, and CD40 was acting on IRF4 from 35h until 60h.Stimuli were implemented in three steps: first a linear increase (t BCR oe [0.5h; 1.5h]; t CD40 oe [35h; 36h]), then a stable stimulus (t BCR oe [1.5h; 24h]; t CD40 oe [36h; 60h]), finally a linear decrease In all simulations, the system evolves for 500h so it can reach a steady state before applying the stimuli (at time t = 0h).After the first stimulus (BCR) is applied, the system is simulated for an additional 500h.For each simulation, the amounts of mRNA counts have been collected every 0.5h.
The stochastic system (4)-( 6) is defined by 40 parameters, whose values are given in Tables 2 to 5. ) and BLIMP1 (gene 3), and with stimuli BCR and CD40 acting on the network.The interaction j ae i between a regulating protein j and a target gene i is represented by the interaction parameter ◊ j,i .B) Schematic representation of the associated stochastic model.A gene is represented by its promoter state (dashed rectangle), which can switch randomly from on to o (and vice versa), with rates k on,i (k o ,i ).When promoter state is on, mRNA molecules are continuously produced at s 0,i rate.Proteins are constantly translated from mRNA at s 1,i rate.Parameters d 0,i and d 1,i are degradation rates of mRNA and proteins.The interaction between a regulator gene j and a target gene i is defined by the dependence of k on,i and k o ,i on the protein level P j and the interaction parameter ◊ j,i .IRF4 gene exhibits an autoactivation loop (◊ 2,2 ).Additionally, two external stimuli, BCR and CD40, act on the GRN.6/24

Model execution in a computational center
All models were established as part of the WASABI pipeline [32] and were implemented in Python 3.All computations were performed using the computational center of IN2P3 (Villeurbanne/France).

Parameters estimation for the ODE-reduced model
In Section 3.1, we use a reduced, deterministic version of System ( 4)-( 6), namely System (11).Initial guess of each parameter has been chosen randomly in the same order of magnitude as in Bonna oux et al. [32].Specifically, the initial value of k on for IRF4 (k on, init, IRF4 ) has been estimated by comparison with values of the kinetic model from Martinez et al. [3].Initial values of k on for BCL6 and BLIMP1 were selected in the same order of magnitude as k on, init, IRF4 .

Estimation of the parameters for the stochastic model: Automatized approach
After we have established the parameters for the reduced model ( 11), and we have shown that (11) has two steady states, we used these values as initial guess for the stochastic model ( 4)-( 6).The goal was then to further tune parameter values so the stochastic model ( 4)-( 6) fits the experimental SC data.
During this automatized tuning procedure, we selected a set of parameter values that allows the system to provide the best fit of the experimental mRNA values for BCL6, IRF4 and BLIMP1 at the GC stage, based on a quality-of-fit criterion.This criterion was defined as a comparison between the average model-derived values ( ) and the average experimental values ( ), with an objective function (OF ) to minimize for the set of genes G = {BCL6, IRF 4, BLIMP 1} and stages ST = {GC, P B_P C} defined by 7/24 Parameter Version I, II, III Table 2. Parameter set of the stochastic model ( 4)-( 6) and reduced model (11).Version I -initial parameter set.Version II -parameter set obtained from the automatized approach.Version III -parameter set obtained after semi-manual tuning.Parameters are defined in the text.
The quality-of-fit criterion is then OF, (10) where P S is the set of parameter values from Tables 2 to 5.

Estimation of the parameters for the stochastic model: Semi-manual tuning
The automatized estimation procedure was followed by a semi-manual tuning of the parameters of the stochastic model ( 4)-( 6) to improve the quality of the fit.
Values of candidate parameters have been tested in an interval of interest and the rest of the parameter values have been fixed at this stage.After model execution, modelsimulated SC values of gene expression were collected.Then we selected the values of the parameters that provided the best qualitative fitting (see Equation ( 10)) of the experimental SC data.Ranges of tested values are summarised in Table 6.

Evaluation of model variability using Kantorovich distance
To compare distributions and to evaluate model variability, we used the Kantorovich distance (KD, particular case of Wasserstein distance, WD), as defined by Baba et al. [35] and implemented in Python 3 by Bonna oux et al. [32].Consider two discrete distributions p and q, defined on N bins of equal sizes, and denote by x k the center of the k-th bin.Then the Kantorovich distance between p and q is given by We chose WD because it suggested to be preferable over alternative methods such as Kullback-Leibler (KL) divergence or Jensen-Shannon (JS) divergence [36].More specifically, WD does not require that distributions belong to the same probability space.
At the same time, WD is more tractable and has higher performance compared to KL divergence [37].JS divergence, in turn, does not provide a gradient for the distributions of non-overlapping domains, compared to WD [36].Also, because WD is a metric and accounts both for the "cost" for the transfer (distance) and "the number of counts" to transfer, we selected its 1D case of WD (Kantorovich Distance, KD) for comparison of discrete experimental distributions versus model-derived distributions [38].

Reduced model
In [3], Martinez et al. applied the kinetic ODE model ( 1)-( 3) to the BCL6-IRF4-BLIMP1 GRN associated with B cell di erentiation and successfully simulated GC B cell dynamics based on microarray data.Before using the complex, stochastic model ( 4)-( 6) to fit SC data, we considered a reduced version of System (4)-( 6) that can be compared to model (1)-( 3), hence providing an initial guess for a key parameter of the model.
Since model ( 1)-( 3) is deterministic, it is necessary to simplify the stochastic model ( 4)- (6) to perform a comparison of both models dynamics.We assume, in this section, that the stochastic process E(t) (promoter status) in ( 4)-( 6) equals its mean value, ÈE(t)Í, given by k on /(k on + k o ).System (4)-( 6) then reduces to Comparing mathematical formulations of systems ( 1)-( 3) and ( 11  Before application of BCR and CD40 stimuli, the system is at a steady state (simulating GC B cell stage) that corresponds to a low amount of IRF4 and BLIMP1 and a high amount of BCL6 mRNA molecules.After application of both stimuli, the system has transitioned to a second steady state that corresponds to a high number of IRF4 and BLIMP1 mRNA molecules and a low number of BCL6 mRNA molecules.However, it can be noted that for the current parameter set (see Tables 2-5, version I), System (11) underestimates the amount of IRF4 mRNA at both steady states (see Figure 2).
Dynamics of System (11) shows the existence of two steady-states for the parameter set from Tables 2-5, version I. Notably, if we test a random value of k on, init, IRF4 in combination with the parameters from Tables 2-5, version I (see Supplementary Table S1), System (11) has only one steady-state (see Supplementary Figure S2).To our knowledge, there may be more than one set of parameter values associated with two steady states of System (11).
We showed that for the parameter set from Tables 2-5, version I, the reduced model ( 11) is capable to qualitatively recapitulating the expected behavior of GC B cell di erentiation GRN (see Figure 2).Next we wanted to understand if the stochastic system ( 4)-( 6) can fit the experimental SC data.

Assessing the variability of the stochastic model
Due to the stochastic nature of the stochastic system ( 4)-( 6), it is important to first evaluate the variability of the model-generated SC data, that is of model's outputs.Indeed, when one repeatedly simulates a finite number of cells from the stochastic system ( 4)-( 6) for the same parameter value set (Tables 2-5, version I), the resulting model-derived empirical distributions are slightly di erent between each run due to the stochasticity of the model.We investigated how strongly shapes of distributions of simulated SC mRNA molecules vary for di erent executions of model ( 4)- (6).
We evaluated the level of variability of model ( 4)-( 6) using the Kantorovich distance (KD, see Section 2.7).We simulated 200 datasets, each containing 200 single cells, of System ( 4)-( 6) with a fixed parameter set (see Tables 2-5, version I).We estimated the KD between pairs of simulated datasets (mRNA counts for three genes at GC and PB_PC stages for 200 simulated cells), and obtained a distribution of all KD that we call the model-to-model (m-t-m) distribution (Figure 3).Shapes of m-t-m distributions are di erent for each gene and stage of di erentiation.For instance, for BLIMP1, long tails are observed.We can also notice that the mean value of IRF4 at GC stage is low compared to other genes.
In order to get a more accurate evaluation of the variability in model's outputs, we plotted distributions of the number of mRNA molecules (model's outputs) for each node of the GRN with the highest m-t-m distribution at both GC and PB_PC stages (Figure 4).
Qualitatively, no di erence is detected in the shapes of model-generated distributions.
For all 6 nodes, the shapes of distributions are remarkably similar.
These results suggest that it may be su cient to perform parameter tuning of the stochastic model ( 4)-( 6) using only one simulation run for each parameter value set.

Initial estimation step based on an automatized approach
Variability of the stochastic model being assessed, and comparison of experimental data and a single model's output in order to assess their closeness being validated, we now focus on the estimation of parameter values.Model ( 4)-( 6) comprises 40 parameters, so we first apply a straightforward strategy, that we call automatized approach, which consists in solving the stochastic system (4)-( 6) for a number of fixed parameter values 14/24 and selecting the set of parameter values associated with the best fit (see Section 2.6.2) of experimental data [30].
Approximately 8 ◊ 10 6 combinations of parameter values have been tested (see Section 2.6.2),then the best set of parameter values has been selected based on the quality of BCL6, IRF4 and BLIMP1 fitting at the GC and PB_PC stages (see Equations ( 9)-( 10)).
Numbers of mRNA molecules estimated by the stochastic model ( 4)-( 6) are in a similar range of magnitude as the experimental SC data (see Supplementary Figure S3).However, the selected parameter values (Tables 2-5, version II) generate model-derived mRNA distributions that have su cient overlap with experimental data for GC stage but insucient overlap for PB_PC stage (see Supplementary Figure S3).Indeed, distributions of numbers of mRNA molecules at PB_PC stage mostly underestimate the experimental SC data (see Supplementary Figure S3B, D and F).
Implementing an automatized approach for estimating parameter values helped to establish a set of parameter values that allows System ( 4)-( 6) to correctly estimate the number of mRNA molecules for 3 out of 6 nodes of the GRN.In order to improve the quality of the fit, a more directed and sensitive tuning of the parameter set is then performed (see Section 2.6.3).

Generation of simulated distributions of mRNA counts describing B cell di erentiation
Due to the complexity of the stochastic model ( 4)-( 6) that is made of 40 parameters, it is important to identifiy which parameters should be targeted to improve the quality of fit.To do so, we rely on the properties of the GRN (see Figure 1A).Thanks to the topological structure of the BCL6-IRF4-BLIMP1 GRN, where IRF4 activates BLIMP1 and autoactivates itself, we hypothesize that System (4)-( 6) underestimates the experimental SC data at the PB_PC stage due to low values of the parameters responsible for IRF4 autoactivation (◊ 22 , and to a lesser extent s 0, IRF4 ) and BLIMP1 activation by IRF4 (◊ 23 ).
Further, we improved the quality of the fit, in particular of BLIMP1 distribution, by focusing on BLIMP1-related interaction parameters (◊ 13 , ◊ 31 ).
Indeed, if IRF4 autoactivation reaction is not e cient enough, there are not enough IRF4 molecules to a ect BCL6 and BLIMP1 activity at PB_PC stage.Because IRF4 activity is only impacted by its autoactivation loop, we first modulated values of the parameter related to this reaction (◊ 22 ).During preliminary tests, we noticed that this reaction is crucial for the transition from GC towards PB_PC stage and that when interaction parameter ◊ 22 and transcription rate s 0, IRF4 have low absolute values then the system cannot reach PB_PC stage, even after application of the stimuli.It can be explained by the insu cient amount of IRF4 molecules produced (see Supplementary Figure S3, C and     D).On the other hand, when parameters ◊ 22 and s 0, IRF4 have high values, model ( 4)- (6) transitions from GC towards PB_PC stage even before application of stimuli, exhibiting an abnormal behavior.
After comparison of the stochastic system (4)-( 6) outputs for a range of di erent ◊ 22 and s 0, IRF4 values (described in Table 6), we selected the parameter set for which model ( 4)-( 6) 15/24 correctly fits the IRF4 experimental data at both GC and PB_PC stages.Such modelderived SC pattern is obtained using the values (◊ 22 = 11 and s 0, IRF4 = 2.1 molecule.h≠1 ) We additionally performed simulations to improve the quality of the fitting of BLIMP1 and BCL6 distributions by testing parameters that are directly responsible for the balance between BLIMP1 and BCL6, such as interaction parameters ◊ 13 , ◊ 31 and ◊ 23 .We also tested parameters which can influence BCL6 and BLIMP1 indirectly, such as transcription rates (s 0, BCL6 and s 0, BLIMP1 ), and degradation rates of mRNA (d 0, BCL6 , d 0, IRF4 and d 0, BLIMP1 ).
After comparison of the stochastic system ( 4)-( 6) outputs, we selected the parameters which allow the model to have a qualitative fit of the experimental data for all nodes at GC and PB_PC stages (see Figure 5, and Tables 2-5, version III).For this tuned parameter set, we see that the model ( 4)-( 6) can have a good qualitative fitting of experimental data for all nodes.Results also show that for this parameter set (version III), the stochastic model ( 4)-( 6) fits SC data at the GC stage for BCL6 (see Figure 5A).The model-derived empirical distribution of BLIMP1 was capable of showing overlap with experimental data at the PB_PC stage (see Figure 5F), but it overestimated the number of BLIMP1 mRNA molecules at the GC stage (see Figure 5E).
The current parameter set (Tables 2-5, version III) has di culties to correctly evaluate the number of zero values.The model ( 4)-( 6) tends to overestimate the number of BCL6 mRNA molecules at PB_PC stage, as well as the number of IRF4 mRNA molecules at GC stage and number of BLIMP1 mRNA molecules at GC stage (see Figure 5).Nevertheless, this parameter set allowed the model to generate SC data with a similar level of magnitude of the amount of mRNA as experimentally observed.

Discussion
In this work, we applied a particular class of stochastic models combining deterministic dynamics and random jumps to the simulation of SC data from two stages of B cell di erentiation in germinal centers.
We first defined a reduced model (11) whose dynamics were compared to the ones of the kinetic model ( 1)-( 3) and we established an initial parameter value for the key parameter k on, init, IRF4 .We then showed that for a given parameter set (Table 2-5, Version I), the reduced model (11) admits two steady states.Secondly, we evaluated the e ect of stochasticity on multiple independent generations of the number of mRNA molecules by the stochastic model ( 4)-( 6) and we confirmed that for the same parameter set there is no noticeable di erence between each model-generated outputs for BCL6-IRF4-BLIMP1 GRNs (see Figure 4).These results allow performing a combined parameter screening with the confidence that for each candidate parameter set, the algorithm needs to perform only one run of the model ( 4)- (6).Lastly, we showed that the model ( 4)-( 6) can simulate distributions of the number of mRNA molecules for BCL6, IRF4, BLIMP1 at GC and PB_PC stages with the same order of magnitude as experimental data.However, as future scope of this work, a few strategies to improve the final parameter value set (Tables 2-5, version III) can be investigated.Since in BCL6-IRF4-BLIMP1 GRN, IRF4 activity depends only on its autoactivation reaction, we have only succeeded, by writing the reduced model (11) in terms of the kinetic model ( 1)-( 3), in estimating the value of k on, init, IRF4 .It would be advantageous to additionally estimate the values of k on, init, BCL6 and k on, init, BLIMP1 , using the same logic.
The e ect of mutual repression between BCL6 and BLIMP1 could be evaluated by performing a more extensive parameter value search.The current parameter value set (Tables 2-5, version III) makes model ( 4)-( 6 to positive regulation of BLIMP1 and consecutive repression of BCL6 [5,34]. Another important transcription factor in GC development is MYC, which regulates B cell proliferation [42] and the DZ B cell phenotype [43].MYC indirectly activates the histone methyltransferase enhancer of zeste homologue 2 (EZH2), which is responsible for the repression of IRF4 and BLIMP1 [44,45,46,47].
The transcription factors mentioned above are present in the SC RT-qPCR dataset from Milpied et al. [30] that we used and could be used to extend the current GRN.Inclusion of additional transcription factors may have both positive and negative e ects on the application of model ( 4)- (6).On one side, it can increase the computational time and the number of parameters required for simulating System (4)- (6).On the other side, because the inclusion of transcription factors can more precisely describe the biological system it could improve the quality of the fitting.However, any inclusion of new nodes to GRN 18/24 should be carefully evaluated and only essential transcription factors should be added.
For instance, there are no advantages in adding a transcrption factor that would only have one downstream output.As an example, MYC activates E2F1 and further activates EZH2.For this reason, incorporation of the chain MYC-E2F1-EZH2 should have a similar outcome, as the incorporation of the simplified MYC-EZH2 reaction.This is expected because in the modeling, intermediate elements of one-to-one redundant reactions can be omitted without significant changes in the quality of the simulations.
To further continue our study, we could also use SC RNA-seq dataset from Milpied et.
al [30].The authors have produced SC RNA-seq dataset from GC B cells and analysed the similarities between SC RNA-seq and SC RT-qPCR dataset.Even though the gene-gene correlation levels were lower in SC RNA-seq compared to SC RT-qPCR, SC RNA-seq analysis confirmed the observation obtained by SC RT-qPCR [30].From the stochastic modeling perspective, combining the data from SC RT-qPCR and SC RNA-seq should improve our understanding of the SC dataset variability and the quality of the fitting.
To summarise, the stochastic model ( 4)-( 6) is capable of qualitatively simulating and depicting the stochasticity of experimental SC gene expression data of human B cell di erentiation at the GC and PB_PC stages using a GRN made of three-key genes (BCL6, IRF4, BLIMP1).These results are encouraging, and suggest that our model may be used to test the di erent B cell exits from GC. Future steps may include testing of the model ( 4)-( 6) on alternative SC datasets [48,49,50] and investigating the malignant formations, by evaluating di erences of the associated GRN compared to the normal B cell di erentiation from GC towards PB_PC.

Index
Martinez et al.[3] derived an ODE model that simulates B cell di erentiation from mature GC cells towards PB_PC.Dynamics of each protein (BCL6, IRF4 and BLIMP1) are defined by a production rate (µ), a degradation rate (⁄), a dissociation constant (k) and a maximum transcription rate ( ‡).Dynamics are described by System (1)-(3), where p, b

Figure 1 .
Figure 1.A) Schematic representation of the three-gene GRN involved in B cell di erentiation.It consists of BCL6 (gene 1), IRF4 (gene 2) and BLIMP1 (gene 3), and with stimuli BCR and CD40 acting on the network.The interaction j ae i between a regulating protein j and a target gene i is represented by the interaction parameter ◊ j,i .B) Schematic representation of the associated stochastic model.A gene is represented by its promoter state (dashed rectangle), which can switch randomly from on to o (and vice versa), with rates k on,i (k o ,i ).When promoter state is on, mRNA molecules are continuously produced at s 0,i rate.Proteins are constantly translated from mRNA at s 1,i rate.Parameters d 0,i and d 1,i are degradation rates of mRNA and proteins.The interaction between a regulator gene j and a target gene i is defined by the dependence of k on,i and k o ,i on the protein level P j and the interaction parameter ◊ j,i .IRF4 gene exhibits an autoactivation loop (◊ 2,2 ).Additionally, two external stimuli, BCR and CD40, act on the GRN.

Figure 2 .
Figure 2. Temporal evolution of mRNA counts of IRF4 (A), BCL6 (B) and BLIMP1 (C) (see Figure1), generated by the reduced model(11).BCR stimulus was applied from 0h until 25h and CD40 stimulus from 35h until 60h.Microarray gene expression dataset from GEO accession no.GSE12195 was used to estimate model's parameters (see Tables2 to 5, version I) and are shown as dots with error bars.

Figure 3 .
Figure 3. Model-to-model distributions of KD for GC and PB_PC stages and the three genes, BCL6, IRF4, BLIMP1.Model (4)-(6) was simulated with parameter values from Tables 2-5, version I.The violin plots show the shapes of the distributions, median value, interquartile range and 1.5x interquartile range of the KD values.

Figure 4 .
Figure 4. Histograms of two model-generated mRNA counts of BCL6, IRF4 and BLIMP1 at GC and PB_PC stages with the highest KD.The subgraphs A, C, E represent log2 (molecule+1) transformed values for BCL6, IRF4 and BLIMP1 at GC stage.The subgraphs B, D, F represent log (molecule+1) transformed values for BCL6, IRF4 and BLIMP1 at PB_PC stage (Parameters from Tables 2-5, version I).

Figure 5 .
Figure 5. Histograms of model-generated and experimental mRNA counts of BCL6, IRF4, BLIMP1 at GC and PB_PC stages.The subgraphs A, C, E represent log (molecule+1) transformed SC values with BCL6, IRF4 and BLIMP1 compared between the model estimations at GC stage (grey) vs the experimental data from GC B cells (blue).The subgraphs B, D, F represent log (molecule+1) transformed SC values with BCL6, IRF4 and BLIMP1 compared between the model estimations at PB_PC stage (grey) vs the experimental data from PB_PC cells (green).Simulation of 200 single cells were used based on the parameter set, selected after semi-automatized parameter screening (see Tables2-5, version III).Performed based on the dataset from Milpied et al.[30] ) overestimate the number of mRNA molecules of BLIMP1 at GC stage.Increasing BCL6 repression of BLIMP1 could potentially decrease the quantity of BLIMP1 at the GC stage.The e ect of the duration of the BCR and CD40 stimuli on the di erentiation from GC B cells towards PB_PC could be investigated.Multiscale modeling of GCs performed by Tejero et al.[39] showed that CD40 signalling in combination with the asymmetric division of B cells results in a switch from memory B cells to plasmablasts.It would be relevant to evaluate a possible application of the stochastic model to study the e ect of combined CD40 and BCR signaling with di erent intensities and durations at the SC level.Additionally, one can evaluate the impact of including additional genes into the BCL6-IRF4-BLIMP1 GRN on the quality of data fitting by the stochastic model.One of the possible candidates to incorporation in the GRN is PAX5, which plays an important role in directing lymphoid progenitors towards B cell development [40].PAX5 positively regulates IRF8 and BACH2, which indirectly positively regulate IRF4 and negatively regulate BLIMP1 at an early stage of B cell di erentiation.During further development, BLIMP1 starts to repress PAX5, consequently decreasing the expression of IRF8 and BACH2.The correct orchestration of PAX5-IRF8-BACH2 during B cell di erentiation is important for the successful di erentiation towards antigen producing cells (PB_PC), while its malfunction can cause aberration in GC B cell development [41].CD40 stimulation of B cells initiates NF-ŸB signaling which is associated with cellular proliferation.In B cells, NF-ŸB activates IRF4, negatively regulates BACH2, what leads

Table 1 .
Correspondence between gene or stimulus names and model index.

Table 3 .
(11)meter set of the stochastic model (4)-(6) and reduced model(11), presented parameters are di erent between all versions.Version I -initial parameter set.Version II -parameter set obtained from the automatized approach.Version III -parameter set obtained after semi-manual tuning.Parameters are defined in the text.

Table 4 .
Parameter (11)of the stochastic model (4)-(6) and reduced model(11), presented parameters are equal between versions I and II.Version I -initial parameter set.Version II -parameter set obtained from the automatized approach.Version III -parameter set obtained after semi-manual tuning.Parameters are defined in the text.

Table 5 .
(11)meter set of the stochastic model (4)-(6) and reduced model(11), presented parameters are equal between versions II and III.Version I -initial parameter set.Version II -parameter set obtained from the automatized approach.Version III -parameter set obtained after semi-manual tuning.Parameters are defined in the text.

Table 6 .
Parameters tested during the semi-manual tuning of the stochastic model.