The mapping of molecular inputs to their molecular outputs (input/output, I/O mapping) is an important characteristic of gene circuits, both natural and synthetic. Experimental determination of such mappings for synthetic circuits is best performed using stably integrated genetic constructs. In mammalian cells, stable integration of complex circuits is a time-consuming process that hampers rapid characterization of multiple circuit variants. On the other hand, transient transfection is quick. However, it is an extremely noisy process and it is unclear whether the obtained data have any relevance to the input/output mapping of a circuit obtained in the case of a stable integration. Here we describe a data processing workflow, Peakfinder algorithm for flow cytometry data (PFAFF), that allows extracting precise input/output mapping from single-cell protein expression data gathered by flow cytometry after a transient transfection. The workflow builds on the numerically-proven observation that the multivariate modes of input and output expression of multi-channel flow cytometry datasets, pre-binned by the expression level of an independent transfection reporter gene, harbor cells with circuit gene copy numbers distributions that depend deterministically on the properties of a bin. We validate our method by simulating flow cytometry data for seven multi-node circuit architectures, including a complex bi-modal circuit, under stable integration and transient transfection scenarios. The workflow applied to the simulated transient transfection data results in similar conclusions to those reached with simulated stable integration data. This indicates that the input/output mapping derived from transient transfection data using our method is an excellent approximation of the ground truth. Thus, the method allows to determine input/output mapping of complex gene network using noisy transient transfection data.
One of the key features of a gene circuit is its input/output behavior. A few earlier publications attempted to develop methods to extract this behavior using transient transfection of circuit components in mammalian cells. However, the hitherto developed methods are only suitable for circuit with monomodal output distribution. Moreover, the relationship between the extracted I/O mapping and the "ground truth" that would have obtained with stably-integrated circuits, has not been addressed. Here we explore cell populations easily identifiable in flow cytometry data, namely, the peaks of fluorescent readout distribution in cells binned by the common expression value of the transfection reporter, or marker, gene. Using numerical simulations, we find that the distribution of circuit copy number in these cells deterministically depends on marker fluorescence in the noise-dependent manner. Moreover, we find that this is true also in the case of bi-modal output distribution. Using the peaks of input and output distributions, we are able to reconstruct the I/O mapping of the circuit and relate it to the I/O mapping of the stably-integrated circuit. The reconstruction is enabled by a new computational method we call PFAFF. The method is extensively validated with forward-simulated flow cytometry data from stable and transient transfections, with up to seven different circuits. The results show excellent correlation between the I/O behavior extracted by PFAFF from simulated transient transfection data, and the data simulated for stably integrated circuit.
Citation: Stelzer C, Benenson Y (2020) Precise determination of input-output mapping for multimodal gene circuits using data from transient transfection. PLoS Comput Biol 16(11): e1008389. https://doi.org/10.1371/journal.pcbi.1008389
Editor: Lingchong You, Duke University, UNITED STATES
Received: April 23, 2020; Accepted: September 23, 2020; Published: November 30, 2020
Copyright: © 2020 Stelzer, Benenson. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The research was funded by European Research Council Starting Grant StG 281490 and Swiss National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: No competing interests to declare.
Many synthetic gene circuits fall into the category of information-processing systems that convert molecular inputs to molecular outputs according to a specific relationship , often called a “program”. A typical design-build-test cycle of a synthetic gene circuit requires that an input/output (I/O) relationship be characterized in order to confirm circuit function. Direct characterization is possible when both the input(s) and the output(s) can be measured simultaneously in single cells. Using fluorescent reporters, it is possible to obtain the collection of single-cell data points of the type [input; output], including for natural regulatory pathways, either by direct observation using staining, or by creating synthetic analogs of natural circuits furnished with fluorescent reporters [2–8]. It has emerged that the output forms a distribution at a single cell level for each input [8–10], resulting in a two-dimensional probability distribution for the entire I/O relationship, rather than a curve, due to cell-to-cell variation in parameter values. Nevertheless, after averaging, these noisy data sets usually collapse to Hill functions or to multimodal, two-value functions .
Characterization of a circuit that is stably integrated in a cell genome or on replicating fixed-copy episomal vectors is usually straightforward, provided that the inputs and the outputs can be measured. Thus, till now most of characterized input/output behaviors were obtained in bacteria or yeast, where genome manipulation is relatively facile. However, obtaining such “ground truth” information in mammalian cells has lagged behind, because it is still very labor-intensive to establish stably integrated multi-gene circuits. Further, properly executed characterization requires multiple accompanying control circuits to serve as baseline, thus requiring that not one but multiple stable cell lines be developed. Even though technologies such as transposon [12,13] and viral delivery [14,15], targeted integration via Zinc finger nucleases (ZFNs) , transcription activator-like effector nucleases (TALENs)  or clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 [18,19] are available today, they are still time consuming even in simple cases and become more challenging with the increase in circuit size. Integration locus-specific effects further complicate the characterization.
Transient transfection of gene circuits is a widespread alternative to stably-integrated circuit characterization in mammalian cells [20–24]. Multiple plasmids, each carrying a single gene, can be co-delivered, leading to correlated gene copy numbers in individual cells. The expression of gene products in dividing cell cultures typically reaches quasi-steady-state two to three days post transfection, and decreases on days four to six due to plasmid dilution [24,25]. The advantage of the transient transfection is that the genome integration-specific effects can be ignored; likewise, secondary effects that often result from having a few genes close to each other on the genome do not play a role because each gene is encoded on a separate plasmid. On the other hand, transient transfections are extremely noisy due to large copy number variation (1–150 transcriptionally-active gene copies per cell ), which makes direct interpretation of the resulting datasets impossible. Accordingly, the standard analysis applied to transient transfection data is at the cell population level, with average values of inputs and outputs reported for entire cell populations (see, Schreiber et al.  as a representative case). This works sufficiently well for logic gene circuits that are often characterized at the extremes of their input values. Progress towards deriving continuous input/output relationship using transient transfection data has been made in the past [6,27–29]. However, these methods were designed to extract monomodal input/output curves and are thus unsuitable for bi- or multimodal circuits. Moreover, there has been very little computational or experimental validation of these results, in particular, how they compare to stably-integrated systems, and to what the different input/output curves correspond.
We sought to develop a data analysis strategy that would determine input/output relationships from transient transfection data and be applicable to all steady-state networks, including those with bi-modal or bi-stable behavior. We also sought to understand what exactly constitutes a comparable "stable integration" scenario for the information extracted from raw transient transfection data. Accordingly, we first investigate the gene copy number distributions in cell populations that are easily identifiable in flow cytometry-like datasets. We address the question numerically and find a number of important reproducible trends that make it possible to draw reliable and interpretable conclusions from data obtained in transient transfections, and map them back to their stable-integration counterparts. In order to validate the method, we perform in-silico experiments by simulating flow cytometry data expected in a transient transfection using dynamic circuit models. At the same time, we use the exact same models and parameter values to simulate the input/output relationship for the case of stable genomic integration. With this approach, we are able to evaluate whether our workflow, when applied to transient transfection data, results in an input/output behavior that is similar to the input/output behavior one would expect for a stable integration.
As benchmarks, we focus on three-node gene network motifs that have been extensively studied earlier [30,31]. We find excellent correspondence between the results of our processing pipeline and the ground truth of the stable integration. Importantly, we are able to capture multi-value, bi-modal responses. Therefore, the method described here can be used to analyze transient transfection data and draw conclusions about the underlying input/output mapping in complex gene circuits, without the need to construct stable cell lines.
Statistical framework for transient transfection
In what follows, we define a gene circuit as a set of N genes (1) and their corresponding gene products (2) in which a subset of components (3) is defined as input and a subset of components (4) is defined as output.
Further, consider a cell that harbors a gene circuit, either in a stably-integrated or transiently-delivered fashion, such that a gene gi is present in ki copies in that cell, and the entire set of copy numbers is a vector (5)
Hereafter, we consider only interactions between circuit components that have been intentionally engineered (i.e., chromatin-related effects do not interfere with circuit function in the stable case), and assume that the biochemical parameters describing individual interactions do not change between stably integrated and transiently-delivered components. Even though individual cells in a population of stable clones may behave differently, e.g., through stochastic effects , we expect the aggregate statistics of different clones containing identical circuit copy number to be similar to the aggregate statistics of cells transiently transfected with the same circuit copy number. Therefore, when considering stable clones, we imply an idealized "averaged" clone in which the integrated circuit is governed by the same parameters as the transiently transfected circuit. It then follows that if we apply the same input I to a population of cells that all harbor the circuit with the copy number vector k, and allow the cells to arrive at a steady state in the stable case and to the quasi-steady state in the transient case (see S1 Text “In-silico time-courses”), then the outputs O will form the same statistical distribution, which can be mono- or multimodal , in both cases. Reporting the distribution of O for various inputs I would conclude the characterization of a stably-integrated circuit, because all cells harbor the exact same vector k, which can be engineered or experimentally determined post factum after clonal isolation.
In the transient transfection experiment, while the values of I and O could be collected for individual cells, the underlying values of k are unknown because the process of transient delivery is extremely noisy. The only way to derive useful data from transient transfections is to deduce, at least for a subset of cells, their k values, and group together data from cells with similar values of k. If this can be accomplished, the input and output values measured in these cells will be similar to the values one would have obtained with a circuit stably integrated at k copies. Below, we develop a statistical description of a transient co-transfection process, which leads us to identify cells residing in binned modes of input and output distributions as cells for which the copy number vector k can be estimated.
We start with the statistical description of a multi-plasmid co-transfection of N constitutively expressed and mutually independent genes g1,g2,…,gN, generating (fluorescent) protein products O1,O2,…,ON. Note that there is no input in this system, so every protein product can be called an “output”. Available data  suggest that experimentally-observed distributions of a protein level expressed from a constitutive promoter are lognormal. The mean of the distribution is proportional to the gene copy number ki with the promoter-dependent global proportionality coefficient βi being independent of ki; the standard deviation σi of the log-transformed protein level distribution may, in principle, depend on a copy number, but we assume it to be constant in the following equations. Let us define a random variable Yi as the log-transformed protein output of the gene gi.(6)
For a vector of gene copy numbers k = [k1,k2,…,kN], a conditional multivariate pdf of the log-transformed protein expression values Y = [Y1,Y2,…,YN], provided that each gene generates its own protein output independently of each other, is described by a multivariate normal distribution without covariances (for simplicity, we assume σi to be the same for all genes and use a symbol σ in what follows): (10)
To describe the distribution of gene copy numbers in a transient transfection, we introduce an independent parameter m that we call “multiplicity of transfection”. Indeed, there is no experimental data that concerns the probability distribution of genes in a co-transfection, and it likely depends on the exact transfection protocol. Therefore, we make a baseline assumption about the pdf of the gene copy number vector k as a multivariate normal distribution without covariance that depends on the multiplicity of transfection. The standard deviation of each gene copy number distribution scales linearly with multiplicity, with the scaling factor ε. To account for gene combinations that deviate from an equimolar ratio, a parameter ai describes the relative abundance of a gene. In this case, one gene is assigned as the "reference" with ai = 1.(11)
Lastly, m itself can be distributed non-uniformly according to its pdf p(m). Distributions such as Poisson , Gamma , lognormal  or even a combination of them , have been used to describe the process of DNA or viral vector delivery to cells. For transient lipofection of DNA, lognormal distributions approximate experimental data well, and therefore (12)
One of the genes and its protein product is assigned the role of, respectively, a reference gene and a reference protein (sometimes called “transfection marker”); let us assume it is k1, with gene product O1 and its log-transformed counterpart Y1. Thus, by definition a1 = 1. To derive the conditional marginal pdf p(Yi|Y1), which is the probability to find the value Yi of the log-transformed protein Oi expression in a cell in which the log-transformed reference protein expression equals Y1, we first drop irrelevant variables from Eq 10 to evaluate joint probability distribution of log-transformed protein levels [Yi,Y1] given the underlying gene copy numbers [ki,k1]: (13)
It is customary, as already done earlier [6,38], to bin cells that share the same Y1, the log-transformed value of O1, because this is the only readout independent of the other components, as it is a self-contained gene expressed from a constitutive promoter. We follow this approach here: cells binned according to their Y1 value will exhibit certain log-transformed distributions of the other proteins Y2,…,YN. Knowing the joint pdf (Eq 15), one can derive the conditional probability of Yi given Y1 (that is, the pdf of Yi among cells that express Y1 log-transformed copies of the reference protein), as follows: (16)
Let us denote this most probable value as . The value of can be determined experimentally as a mode of Yi distribution after binning the cells according to their Y1 value. The equation may have more than one solution, corresponding to multimodal probability density function from Eq 16.
This bring us to the most relevant question of this section: What is the distribution of the gene copy number ki for the cells that reside in the mode(s) of Yi, and what is the most probable value of ki? To answer this question, we evaluate the conditional probability p(ki|Yi) according to Bayes’ theorem: (18)
Knowing the most probable gene copy numbers in the cells residing in the modes of log-transformed protein distributions allows us to correlate the data to what might be obtained in cells with stably integrated constructs harboring similar gene copy numbers.
Numerical analysis of transient co-transfection of constitutively expressed genes
An analytical solution of Eq 19 does not exist, and we solve it using numerical simulations. To this end, we performed in-silico simulations of a transient co-transfection containing multiple (N = 5) independent genetic constructs (Methods). The change in protein expression over time, , of each gene gi can be described by an ordinary differential equation (ODE) with kinetic parameter , gene copy number ki and degradation rate δi (20)
In the steady state, i.e. , the steady-state level of Oi is proportional to ki with the global coefficient of proportionality and identical to Eq 7: (21)
Iterating multiple times to simulate multiple single cells j (1≤j≤C, where C is the total number of simulated "cells"), we draw the multiplicity mj from a lognormal distribution (Eq 12) with parameters that roughly fit experimental data (see below) and initialize gene copy number vectors (22) according to Eq 11 with pre-set parameters (Methods). To create log-normal protein distributions given kj according to Eq 10, for each kji in kj, local proportionality factor bji is drawn from a log-normal distribution: (23) with fixed βi values (Methods, S1 Table); the values of σ are fixed for a given simulation run and systematically varied between 0.00 and 0.32 in different runs. A value (24) is the level of protein Oi in cell j (S1 Fig).
The generated in-silico dataset (S2A Fig) is similar to a flow cytometry dataset what one would obtain in a transient co-transfection experiment of constitutively-driven genes. In order to confirm that the parameters, and in particular the values of σ are realistic, we transiently co-transfected five plasmids, each expressing constitutively different fluorescent protein (O1: SBFP2, O2: Cerulean, O3: Citrine, O4: mCherry and O5: iRFP; Methods) (S2B Fig). We find that the standard deviation of the log-transformed protein expression distribution in cells pre-binned on similar values of the reference protein, which we denote σ*, (25) depends on Y1, and indeed ranges between 0.1–0.3 (S2C Fig). Higher values of σ*are observed as very low Y1 values, and they plateau towards 0.1 for larger Y1. Accordingly, the range of σ values used in the simulations, and given that σ<σ*, constitutes a realistic range for the gene expression variability due to "intrinsic noise".
Next, we simulate transient co-transfections using two different gene ratios; (i) equimolar and (ii) a ratio of 1.0:1.3:0.8:0.5:0.4, the latter following some fine-tuning in a parallel experimental project (manuscript under preparation), to generate a joint pdf p(Y) (Figs 1A, S3A, S3B, S4A and S4B). We use these datasets to solve Eqs 17 and 19 numerically, that is, determine the and . To do so, we bin cells that share a log-transformed reference protein value Y1, evaluate the conditional pdf p(Yi|Y1) and, first, determine numerically the value of . Second, we retrospectively look up the values of ki in cells whose Yi and Y1 expression levels lie in the vicinity of a vector . The empirical distribution of ki (Figs 1B, S3C and S4C) is from Eq 19, and the mode of the gene copy number distribution, , or , is determined numerically (Figs 1C, S3D and S4D, Methods).
According to Eq 7 there is a simple, linear relation between Oi and the gene copy number ki, linking them via the global coefficient of proportionality βi. The global coefficient of proportionality can be determined experimentally using e.g., a calibrated Western blot to measure the absolute amount of protein and calibrated qPCR to measure absolute mean internalized gene copy numbers. For the transfection reference protein, we introduce the variable that corresponds to the gene copy number that one would "naïvely" anticipate to be the most probable value leading to a particular Y1, given β1: (26)
Given that the value of Y1 and β1 are the only “knowable” parameters, it is of interest to ask how the actual copy numbers relate to these anticipated values. Using our simulated datasets, we compute the ratio between numerically found and the anticipated copy number from Eq 27 (Fig 1D) as a function of Y1. We find that the deviation from the anticipated value is a decreasing monotonous function of Y1 with the following properties: (1) The deviation is always positive for values of Y1<ln E[O1]; (2) the deviation is essentially zero when Y1 = ln E[O1], and (3) it is negative for Y1>ln E[O1]. Further, the absolute magnitude of the deviation increases with increasing σ (S3E and S4E Figs). However, for all noise levels, the deviation is zero at the global mean of the O1 distribution, E[O1], and (28)
In-silico simulations shown for two cases with parameter settings σ = 0.08, ε = 0.04 and m = 10: equimolar (a1:a2:a3:a4:a5 = 1.0: 1.0: 1.0: 1.0: 1.0; top row) and "nominal" (a1:a2:a3:a4:a5 = 1.0: 1.3: 0.8: 0.5: 0.4; bottom row) ratio of gene mix. (A) Density plots show the amount of expressed proteins O2 versus O1. Solid black lines indicate the edges of an example bin for the transfection reference protein O1. (B) The plots show the distribution of gene copy numbers in cells whose O1 values fall into the bin shown in panel A. Copy number distributions corresponding to different genes gi are shown using different colors (legend on the very right of the figure). In the equimolar case, gene copy number distributions overlap, while in the "nominal" case they are separated. (C) The modes of the copy number distributions are plotted versus the median signal of the transfection reference protein O1 in all bins (colored lines). The dash-dotted line marks the mean (E[O1]) of the global O1 distribution. (D) For each bin of the transfection reference protein O1, the modes of the gene copy number distributions from each gene gi are determined numerically. The ratio of the numerically-determined mode of the copy number distribution and the anticipated copy number are computed and plotted versus the corresponding O1 values in individual bins. The global mean of O1 is shown with a dash-dotted line. (E) Ratios of gene copy number modes relative to the gene copy number mode of the transfection reference protein , as a function of the O1 median value in the bins.
We further analyzed the ratio of the modes of gene copy number distributions to in cells that reside in the close vicinity of the log-transformed expression vector . The ratio stays constant for almost the entire range of Y1 values (Figs 1E, S3F and S4F). Since the naïve estimate and the numerical mode of the absolute gene copy number coincide at the global mean of the O1, both the relative and the absolute abundance of the gene copy numbers can be deduced with high certainty in cells that express O1 around its global mean. This is true regardless of the chosen distribution of m and βi. Simulations that employ Poisson, Gamma or lognormal distributions show a strikingly similar effect (S5 Fig). Appropriate experimental techniques allow measuring both the protein copy number  and the gene copy  number in the cells residing at the global mean of O1, making it possible to determine the value of β1 experimentally and therefore extrapolate directly to the ground truth expected in the stable cell line with the similar gene copy number.
Numerical analysis of transient co-transfection of non-trivial gene circuits
Next, we consider the case when the same genes, apart from the transfection reference protein gene g1, encode a set of genes interacting in a circuit. Depending on the circuit, log-transformed distribution Yi of protein Oi in cells pre-binned on the value of Y1 may exhibit mono-, bi- or multimodality. We may consider the joint probability distribution of the vector of independent constitutive genes and their gene products, k∙Y, as a baseline state of any circuit. When the genes are interconnected (not including the reference gene g1 and its log-transformed product Y1), this baseline distribution is transformed because the values of Y are no longer independent. However, the values of k remain the same, because they represent the exact same underlying process of DNA delivery, and only the Y values change relative to the independent, constitutive values. We hypothesize that despite the fact that the values of Y are no longer independent of kj for i≠j, k vectors corresponding to the (possible multiple) multivariate modes of Y|Y1, would not deviate far from the k vectors obtained in the case of independent co-transfection. We further hypothesize that this deviation will decrease as the noise in the system increases to biologically-plausible levels.
To test these hypotheses, we simulated two three-node gene circuit architectures (currently being investigated experimentally in a related project, see S6 Fig for the experimental results of the fan-out circuit); a simple monomodal fan-out circuit (FO; Fig 2A) and a non-trivial pitchfork bifurcation circuit, also known as reinforced incoherent feed forward motif RIFFM [4,30,40,41] (Fig 2B). The input to the circuit is a transcriptional activator PIT2 , whose level is tuned by Doxycycline via an bi-directional TRE promoter that also drives a fluorescent protein mCherry as a proxy for input expression. The first PIT2 target promoter (P1) drives the D. melanogaster derived transcriptional repressor Knirps (kni) and translationally-linked fluorescent protein Cerulean, constituting the first circuit output. The second PIT2 target promoter (P2) drives the transcriptional repressor LacI fused to a KRAB domain and translationally linked to a fluorescent protein Citrine, representing the second circuit output. In RIFFM circuit, kni is able to repress P2 while LacI is able to repress P1; in FO, the mutual repression is eliminated via mutations.
(A)| Monomodal/fan-out (FO) and (B) bi-modal (RIFFM) gene circuit. The circuits are composed of five independent genes. Constitutively-expressed transcription factor rtTA co-induces PIT2 and the fluorescent protein mCherry in a Doxycycline-dependent fashion. PIT2 in turn activates two promoters P1 and P2, which express a transcriptional repressor lacI-KRAB and Kni, respectively, co-expressed via P2A linkers with Citrine and Cerulean fluorescent reporters. In the FO circuit the repressors do not interact due to mutated promoter binding sites, while in the RIFFM circuit they repress each other, establishing a mutual inhibition. (C) Graphical illustration of all steps to find the modes within our simulated data sets. The raw flow cytometry like data (1) is binned by the transfection marker (i.e. Y1: SBFP2). The binned data is isolated and subsequent analysis is done only on this subset (2). Afterwards the distribution of the log-transformed input signal (mCherry) within the binned subset is determined and at least one Gaussian is fitted to the distribution. A narrow bin(s) around the mode(s) (black arrows) of the fitted distribution(s) is determined (pink bars), thus obtaining a subset of the originally-binned dataset. The subset around the peaks ( and ) are now analyzed individually. For illustrative purposes, we show the input (mCherry) and output (Cerulean) signal of the subset around (3) and (4). The modes of the log-transformed output distributions are identified using a similar peak finding procedure as for the input signal (mCherry). The output histograms and their modes (black arrows) are shown on the sides of plot. Within these modes, we look at the pdf of the gene copy number distributions for all circuit genes as well as the transfection reference gene and identify the vectors and , respectively.
We built mechanistic kinetic models of the circuits FO and RIFFM (S2 Text "Simple Fan-Out Model" and S3 Text "Detailed Models") and simulated the flow cytometry dataset for multiple transiently transfected cells j (1≤j≤C, where C is the total number of simulated "cells"). As above, every gene is encoded on a separate plasmid. We also compared this to a single-plasmid setup with all five genes are located on a single DNA backbone, but saw only a marginal difference in outcomes (S7 Fig). The multiplicity of transfection and the gene copy numbers are simulated as above (S1A Fig); the gene copy numbers become the initial conditions for running a simulation. Differently from that case of constitutive co-transfection, we directly simulate circuit dynamics governed by kinetic parameters p; to simulate the effects of intrinsic gene expression noise, the parameters that govern protein translation rates are sampled independently from a lognormal distributions with nominal parameter values πi and preset "noise" levels ranging, as above, from 0.00 to 0.32: (29)
For every cell j, the drawn parameter values pij are used in a dynamic simulation ran to a steady state, with the simulated steady state input and output protein levels corresponding to the readouts from that cell.
First, we simulate mono- and bi-modal gene circuits for a single Doxycycline/input level. Similar to the data analysis above, we bin the cells according to Y1 value. In the bin, we first focus on the input protein Yinput and identify its mode. Importantly, in the general case the distribution of Yinput|Y1 can be bi-modal, leading to two numerically-found values and . In this case, we consider separately the cells residing close to the expression vectors and . Next, for every circuit output we consider the distributions and . These distributions can also be multimodal; in what follows we assume they are bi-modal. We denote the modes of the output distribution corresponding to the high mode of the input and , and use similar notation for the output modes corresponding to the low mode of the input, if the latter is present. Lastly, we consider all cells in the vicinity of the expression vector and and evaluate the copy number distribution of every circuit gene as well as the reference gene. These are monomodal distributions, with the modes denoted respectively as and (see Fig 2C for schematic description of the process). These numerically evaluated values are then compared to the naively anticipated values calculated according to the Eq 27. Note that in these simulations, the transfection reference gene expression is modelled explicitly as a transcription/translation/degradation cascade with corresponding kinetic parameters; the value of β1 for Eq 27 is calculated according to Eqs 20 and 21.
The analysis of the simulated data reveals the following: for the monomodal FO circuit, the behavior of the copy number modes of the input and the output genes is quantitatively identical to what is observed in the simulation of multiple constitutive gene co-transfection (Fig 3A–3E). The bi-modal circuit (Fig 3F) shows its bi-modal behavior at the lower intensities of the transfection reference protein O1 (103–105). In this range, distributions of gene copy numbers for the high and low modes of output expression are slightly diverging (Figs 3G–3K). They are, however, almost fully overlapping, and their modes differ only by a few percent respectively upwards or downwards relative to the monomodal case, despite large difference in the corresponding protein modes. We quantify the divergence in gene copy number modes between high and low protein output modes by introducing a metric (30) with being the mode of the transfection marker copy number distribution found in the constitutive co-transfection simulation (Fig 1). We observe a steady increase in upon an increase in noise level σ (Fig 3L), however, it is less that 10% for realistic levels of noise. Furthermore, we quantify bi-modality-dependent deviations of gene copy number ratios between the high and low output modes and introduce the metric (31)
Unlike , this deviation of gene copy number ratios, Δϕi, decreases with an increase in the noise level σ (Fig 3M). These observations confirm our hypothesis that even in multimodal circuits, the cells that share the same amount of the reference protein, also share very similar gene copy numbers, both in absolute and especially, in relative terms (S8–S10 Figs). Moreover, deviations from the nominal gene copy number ratio decrease with the increase of noise levels σ.
In-silico simulations (global parameters σ = 0.08, ε = 0.04, μm = 1.4979, σm = 1.2686 and a1:a2:a3:a4:a5 = 1.0: 1.3: 0.8: 0.5: 0.4) shown for two circuits: A-E monomodal/fan-out (FO) and F-M bi-modal (RIFFM). (A) Raw data of the simulated transiently transfected FO circuit. The amount of expressed proteins O2 (left: Cerulean) or O3 (right: Citrine) versus O1 (transfection marker: SBFP2) is shown as a density plot. Solid lines indicate the edges of a transfection marker bin. (B) Gene copy number distributions of cells binned by a particular value of log-transformed O1 in a bin shown in panel A. (C) The modes of the copy number distributions, , are plotted versus the median signal of the transfection reference protein O1 for all bins (colored lines). The dash-dotted line marks the mean (E[O1]) of the global O1 distribution. (D) The ratio of the numerically-determined mode of the copy number distribution and the anticipated copy number plotted versus the O1 values in individual bins. The global mean of O1 is shown with a dash-dotted line. (E) Modes of the gene copy number distributions normalized by the mode of the transfection reference gene are shown as a function of O1 values in individual bins. (F) Raw data of the simulated RIFFM circuit with bi-modal output O2 (Cerulean; left) or O3 (Citrine; right). The black lines indicate a bin within the bimodal range. (G) The modes of the copy number distributions, and , are plotted versus the median signal of the transfection reference protein O1 of all bins (colored lines). Dashed segments indicate the range of O1 in which the high and low modes do not coincide; the black dash-dotted line indicates the mean (E[O1]) of the global O1 distribution. (H) The ratios of the numerically-determined mode of the copy number distribution and and the anticipated copy number are plotted versus the corresponding O1 values in individual bins. Dashed segments indicate the range of O1 in which the ratios corresponding to high and low modes do not coincide. The global mean of O1 is shown with a straight dash-dotted line. (I) Modes of the gene copy number distributions and , normalized by the mode of the transfection reference gene are shown as a function of O1 values in individual bins. Dashed lines indicate the range of O1 where the values do not coincide. (J) The ratios depicted in panel I are shown in greater detail for the outputs O2 (Cerulean; top) and O3 (Citrine; bottom). (K) Fitted gene copy number distributions of the indicated bin in F are shown for all genes (left to right) and noise levels σ. Black curves indicate the fitted distributions to the low output mode, , and colored curves the fitted distributions of the high output mode . (L) Divergence in gene copy number modes between high and low protein output modes normalized to the mode of copy number distribution of the transient co-transfection for all noise levels σ as a function of O1 values in individual bins. (M) Difference of copy number modes’ ratios Δϕi in the high and low protein output modes for all noise levels σ as a function of O1 values in individual bins.
The rationale for extracting input/output relationship from transient transfection data
The analyses above suggest a workflow for analyzing and deducing input/output relationships from transiently-transfected circuits. To summarize the findings so far, we show that cells, which express a certain level of reference protein O1 and reside at multivariate modes of log-transformed input and output expression, harbor both input and output genes (or plasmids) with the following properties: (1) even for multimodal outputs with large differences in protein level modes, the distribution of the input and output genes copy numbers corresponding to the different output protein modes are almost overlapping, with the copy number modes varying by about 10% for biologically-realistic noise values, and thus can be treated as the same copy number for all practical purposes; (2) the copy number distribution modes’ ratio almost exactly corresponds to the nominal ratio used in a transfection for all reference protein bin values, for both low and high output modes, and the deviation from the nominal ratio decreases with increased noise; (3) the distribution modes’ absolute values exactly match the naïve anticipation (Eq 27) when the reference protein is expressed at the level E[O1]; (4) the distribution modes’ absolute values (for both high and low output modes) deviate from the naïve expectation in a predictable linear fashion as a function of log-transformed reference protein level Y1, with the magnitude of the deviation increasing with the overall noise level. However, because the actual noise level and thus the degree of deviation can be quantified experimentally, even in cells that lie away from E[O1] the copy numbers can be estimated not only in relative but also in absolute terms. We show that this holds for a case of a complex circuit that generates bi-modal output distribution, with only slight deviations of the copy number modes from the expectation. Accordingly, by analyzing the input and output values in the cells that reside in the multivariate modes of circuit inputs’ and outputs’ distributions (after binning by the log-transformed reference protein value Y1), we should be able to extract the information about the input/output response of the circuit that is comparable to the stable cell line harboring the circuit at the copy number derived from Y1 according to Eq 26 and corrected by the measure of deviation that depends on the noise level.
Overview of the workflow validation procedure
To validate the workflow suggested above, we simulate transient transfection and stable integration for FO and RIFFM circuits for a wide range in circuit input levels using the exact same ODE model, with the input modulated via varying Dox level; otherwise the parameter randomization is performed as described above according to Eq 29. To simulate a stable integration dataset, we initialize a copy number vector k such that the ratio between individual genes corresponds to the nominal ratio of the transient transfection, and the absolute copy number of the reference gene is set to different fixed values corresponding to the bins used for transient transfection data analysis (S11 Fig; Methods). After the datasets are simulated, we extract input/output relationships corresponding to various bins of log-transformed values of the (transfection) reference protein Y1 from the simulated transient transfection data, as described in detail in the next section. This is compared to the results of the stable integration simulation performed for the copy numbers that correspond to those reference protein levels. The simulation of the stable integration scenario generates an input/output “cloud”, as has also been demonstrated experimentally [8,43,44]. The cloud can be used "as is" for the purpose of comparison, or it can also be processed via output mode identification for different input levels and building averaged curves. The process is illustrated schematically in S12 Fig.
The input/output relationship is generated for each level of log-transformed reference protein Y1. We make use of the datasets simulated with different Doxycycline levels and thus different amounts of input expressed per gene copy. (Note that Doxycycline does not affect the gene copy number or the expression of the transfection reference protein, it is not a direct input and is not a part of the input/output relationships that we seek.) For a given Y1 bin, a single transfection experiment simulation with a fixed Doxycycline value generates (for a bimodal case) at most four points on the input/output curve: ; ; ; and . In most cases, the input will only exhibit a single mode that we denote for uniformity and thus only two points will be generated for a bi-modal case, and one for a monomodal case. For this same reference protein bin, we repeat the procedure defined earlier (Fig 2C) for every Doxycycline level and generate multiple [input; output] pairs that cover the entire input range. The procedure can be done for any desired reference protein bin, thus showing circuit behavior for different absolute gene copy number of its components (Fig 4).
The steps of finding peaks in distributions of a binned data as shown in Fig 2C are repeated for every input induction level (i.e. Doxycycline level; here three representative cases for low (1), medium (2) and high (3) input modulation levels are depicted) for each output (here, Cerulean and Citrine). Initially, a bin of the transfection reference protein (here, SBFP2) is determined and all downstream analyses only deals with cells residing in this bin. The plots in the dashed box show the example workflow of applying the peak finding to the individual input modulation levels. The density plots depict the binned data and the adjacent histograms show the distributions of their respective input and output proteins. Black wedges indicate the modes of the (convoluted) distributions and black markers indicate their location on the density plots. The [input; output] mode pairs, identified in this workflow (markers), derived from the raw data corresponding to the different input modulation level are plotted on the input/output mapping charts the output Cerulean (4) and Citrine (5).
The fact that every transient transfection performed with a certain Doxycycline level generates only up to four, and usually one or two, points on the input/output relationship curve is slightly counterintuitive because a flow cytometry plot would reveal wide distribution of the input values. However, this distribution results from the variability in the copy number of the input gene in the cells and is therefore irrelevant to the determination of the input/output relationship. In order to characterize an entire curve, there must be a practical way to modulate input expression per gene copy and repeat the experiment multiple times, every time with a different degree of modulation. This can be done with Doxycycline as in our case; when this is not feasible, one can mimic input modulation by systematically changing the relative dosage of a constitutive input-expressing gene, or use a series of constitutive promoters of varying strengths. In another observation, when extracting the modes, the high output modes corresponding to both the low and the high input modes fall on the same curve when plotted against the input values; the same is true for the low output modes (S13 Fig). This is not surprising, because the input value is the only determinant of the output. Therefore, we pool high and low output modes, respectively, and interpret them as the averaged input/output relationship of a circuit; when the behavior is bimodal, two curves are generated.
Validation using direct simulation data
We applied our data generation tool for transient transfections to FO and RIFFM circuit architectures and simulated 500,000 cells at twelve different Doxycycline input concentrations and six noise levels of σ. In Fig 5A we show representative examples of the raw data from transient circuit simulations at a single noise level (σ = 0.16) and various Doxycyline modulations. Note the shift in the scatter plots in response to Doxycycline increase. For each noise level, we extract the corresponding input/output relations of the data set with our analysis strategy that we call PeakFinder Analysis For Flow cytometry, or PFAFF. The algorithm bins simulated cells according to the expression level of the transfection reference protein SBFP2 each bin containing an equal number of cells (9.5% of total population, 10 bins in total). Next, we determine the modes of log-transformed input and output protein distributions of cells residing in each bin for the different Doxycycline levels as described above, and build the input/output relationships corresponding to that bin (S14–S19 Figs). In the stable integration scenario, we build datasets that correspond to different fixed sets of gene copy numbers. Specifically, the copy numbers are set to correspond to the median copy numbers of the bins used to process the simulated transient transfection data (Methods). We simulate 5,000 cells per Doxycycline value and repeat this for twelve different Doxycycline values to cover the entire input range. This simulation is repeated for each σ. These data serve as the gold standard to evaluate the performance of our method for transient transfection data processing, by how well the input/output relationships match the stable integration simulation for matching bin/stable copy number.
(A) Raw simulated transient transfection data for circuits FO and RIFFM at noise level σ = 0.16, modulated by different input levels of Doxycycline (columns). Each circuit (FO, RIFFM) is represented by two rows of charts depicting the input versus their respective output signals. (B) Simulated flow cytometry data set of the bin that lies at the transfection reference protein's global mean (transfection reference: SBFP2) at different gene expression noise levels σ (columns) ranging from 0.00–0.32. The plots show the input/output curves extracted by PFAFF (colored crosses; top row: Cerulean, bottom row: Citrine) atop of the simulated stably integrated circuit at the indicated gene expression noise level σ (grey density plot). FO undergoes an activation in both output colors. The RIFFM circuit shows a bi-modal behavior already at low noise levels for both output colors.
In Fig 5B we show the stable integration simulation and the analysis results from PFAFF, the latter extracted from a transfection reference bin that lies close to the global mean of the reference protein (i.e. SBFP2 bin 5; see S14–S19 Figs for all bins, input and noise levels); the former simulated for the gene copy number that corresponds to this reference protein level. Note that the number of transfection reference bins does not influence the outcome (S20 Fig). We plot the stable integration outputs at various input levels as density plots in the background (grayscale). For the lowest noise case (σ = 0.00), the density plots for the stable integration collapse to curves, as expected. Gradually increasing σ leads to increasingly diffuse input/output relationships. Atop these density plots we superimpose the mode values extracted by PFAFF from the corresponding simulated transient transfection data and binned for the cells that express the same level of the transfection marker as the stable integration. The analysis suggests that the input/output relationship extracted using PFAFF superimposes with the input/output “cloud” simulated for the stable integration, when both reflect similar underlying absolute gene copy number.
In order to expand the number of circuits for analyses, we simulated another commonly studied circuit family–the type-1 incoherent feed-forward motifs. Our simulations include two versions of the I1-FFL (one for each repressor; I1-FFL1 and I1-FFL2; Fig 6A and 6B). We applied the same analyses as before to both I1-FFLs and show the comparison of input/output from stable integrated and transiently transfected circuits (Fig 6C; see S21–S26 Figs for all bins, inputs and noise levels). As is the case for other two circuits, there is excellent correspondence between the input/output curves extracted from simulated transient transfection data, and the input/output behavior of the comparable simulated stable case. This motivated us to expand the number of circuits by a coherent feed-forward, a negative feedback and lastly a positive feedback motif; all of them showing similar, excellent agreement (S27–S29 Figs).
(A) and (B) Circuit architecture of two additional I1-FFLs. (C) Simulated flow cytometry data set of the bin that lies at the transfection reference protein's global mean (transfection reference: SBFP2) at different gene expression noise levels σ (columns) ranging from 0.00–0.32 is processed by PFAFF. The plots show the input/output curves extracted by PFAFF (colored crosses; top row: Cerulean, bottom row: Citrine) atop of the simulated stably integrated circuit at the indicated gene expression noise level σ (grey density plot). Both I1-FFLs show adaptive behaviors in their respective outputs.
The initial qualitative analysis uncovers excellent overlap between the input/output relationships found by PFAFF, and the input/output clouds from the corresponding stable integration simulations. Only at the highest simulated levels of σ, i.e. 0.32, the PFAFF algorithm has minor difficulties with extracting expected input/output relationships. Indeed, a σ of 0.32 is much larger than variations observed typically in nature [9,45]. To obtain a quantitative measure of the correspondence, we extracted modes from the log-transformed protein expression distributions of input (mCherry) and outputs (Cerulean and Citrine) from the stable integration data sets with the same peak finder algorithm that we employ in PFAFF (Methods). We correlated the obtained modes with the modes that were found by PFAFF in the transient transfection case (Fig 7). In the pooled modes from all data sets, meaning all external input levels and bins for each noise level, we find a high correlation between the modes from both simulation scenarios for all expression noise levels (mean of Pearson correlation coefficient ρ>0.91±0.01 for output modes; σ: 0.00–0.32).
Correlation between extracted output modes of stable integration simulation and PFAFF results from transient transfection simulations. The mode values for both output colors (Cerulean and Citrine) for all input modulator levels and for all bins/stable copy number sets are plotted, and Pearson correlation coefficient (ρCer or ρCit) is shown for each plot.
In this study we show that transient transfection data can be used to extract input/output relationships of gene circuits that are comparable to the data that would have been obtained with stably-integrated circuits. Our findings reveal that it is sufficient to focus on small subsets of transiently transfected cells that lie at multivariate modes of input and output expression, post-binning on a transfection reference protein, otherwise known as a "transfection marker". We prove numerically that cells in these modes harbor distributions of circuit genes with the following properties: (1) the modes’ absolute values can be deterministically deduced from the observed level of the transfection reference protein expression, the noise level in the experimental data, and the distance from the global mean of the transfection reference protein; (2) the modes’ ratios are identical to the gene or plasmid ratios used in the transient transfection experiments, for all values of the reference protein expression. Moreover, in the case of multimodal data, large differences in protein expression do not translate into significant differences between the underlying gene copy number distributions, and for all practical purposes the absolute and relative copy numbers can be considered identical for the different protein expression modes. Interestingly, the cells that belong to the bin that lies close to the global mean of the reference protein expression, harbor gene copy numbers whose modes' absolute and relative values correspond exactly to what one would expect from the naive expectation, namely the ratio between the knowable expressed protein level and the knowable global coefficient of proportionality between the protein and gene copy number. The detailed understanding of the gene copy number behavior in the multivariate protein expression modes that can be identified in the experimental data, provides a degree of confidence in relevance of extracted data to the ground truth behavior of the same circuit when stably integrated in a cell. This confidence is confirmed by the direct in-silico validation experiments, where both types of data are directly simulated, the PFAFF workflow analysis is applied to the data from simulated transient transfections, and the results are compared to, and are shown to reproduce, the ground truth.
While transient transfections are often valued as a tool to rapidly analyze genetic circuit behavior, they are rarely used to draw fine-tuned conclusions about the input/output relationship of corresponding stably integrated circuits, more so for multimodal circuits. This is likely due to various pitfalls in existing analysis methods, the most prominent being the insufficient treatment of multimodal systems and the lack of conclusive analysis of the underlying gene copy number distributions in identifiable cell populations. Our analysis strategy allows a thorough comparison of input/output relationships from both scenarios and results in an excellent agreement between them. This will play an important role in gene circuit design and characterization, as it alleviates the need to generate multiple stable cell lines.
Materials and methods
Standard cloning techniques were used to clone all plasmids. We used E. coli DH5a and DH10B as the cloning strains, cultured in LB Broth Miller Difco (BD; Cat. no. 244610) and Ampicillin (100ug/ml, Sigma-Aldrich; Cat. no. A0166-5G) as selection medium.
Cell culture and reagents
All experiments were done with HEK 293 cells (Life Technologies) and were grown at 37°C, 5% CO2 in complete medium (DMEM (Thermo Fischer; Cat no. 11965092) supplemented with 10% fetal bovine serum (FBS; Sigma-Aldrich; Cat. no. F9665) and 1% Penicillin/Streptomycin (Sigma-Aldrich; Cat. no. P4333)). They were sub-cultured by seeding 106 cells into T75 flasks every 3–4 days.
One day prior transfection, cells were passed through a 40um cell strainer (Falcon; Cat. No 352340) and counted with Bio Rad TC10. In each well (uncoated 6-well plates, Thermo Scientific Nunc; Cat. No. 2020–10) 300,000 cells were seeded and incubated for another 24 hours. On the day of transfection DNA was diluted in 250ul Opti-MEM I Reduced Serum (Gibco, Life Technologies Cat no. 31985–962) and mixed with a 244ul Opti-MEM I/6ul Lipofectamin 2000 Transfection Reagent (Thermo Fischer; Cat. no. 11668019). After a 20 minutes’ incubation step at room temperature, the transfection mix was added drop wise to the wells. The cells were incubated for another 72 hours before being measured by flow cytometry.
All samples were measured with a BD LSR Fortessa cell analyzer. The medium was removed and cells were incubated with 300ul StemPro Accutase Cell Dissociation Reagent (Thermo Fischer; Cat. no. A1110501) at 37°C, 5% CO2 for 10 minutes. Reporter specific combinations were used to measure all four fluorescent proteins independently, but still providing a setup of little bleed over. In particular, we used for: SBFP2 a 405nm laser with 445/15, Cerulean 445nm laser with 473/10, Citrine 488nm laser with 542/27, mCherry 561nm laser with 610/20 and iRFP 640nm laser with 710/50 emission filter sets. We used the same PMTs (FSC: 350, SSC: 350, SBFP2: 220, Cerulean: 242, Citrine: 220, mCherry: 245, iRFP: 460) throughout all measurements and controlled for consistency of the instrument by using SPHERO RainBow Calibration particles (Cat no. 559123, BD).
We experimentally co-transfected five different fluorescent protein genes (SBFP2 , Cerulean , Citrine , mCherry , iRFP ; S30 Fig), individually driven by an Ef1a promoter and analyzed them via flow cytometry. The amount of transfected DNA (ng) was adjusted according to each plasmid’s size (nominal ratio: SBFP2: Cerulean: Citrine: mCherry: iRFP = 504: 634: 400: 239: 249). We collected more than 1,000,000 events and stringently gated the live population (~750,000 cells). This experiment created a five-dimensional distribution of fluorescent values.
Fan-out gene circuit experiment
We transfected five gene cassettes on individual plasmids as depicted in Fig 2A into HEK293 cells and activated the circuit through the addition of Doxycycline at eight different input modulation levels (0nM, 0.90nM, 3.15nM, 0.01uM, 0.05uM, 0.13uM, 0.45uM and 1.35uM). After 72h post-transfection we analyzed the induced cells using flow cytometry and collected more than 1,000,000 events per replicate (n = 3). The obtained data were subjected to our analysis pipeline as outlined in section Data Analysis.
We generated the model of multiple constitutively expressed genes using a steady state approximation (Eqs 7 and 21). Once the gene copy number ki and expression parameter βi are determined, the protein output Oi is computed as described in S1 Fig.
ODE circuit models were created with Simbiology, a MathWorks MATLAB 2018b package. Each molecular interaction was modeled according to the law of mass action (S3 Text "Detailed Models" and parameter values in S2 Table). This includes binding and unbinding of a transcription factor to inducible promoters, transcription of mRNAs and translation into proteins. All circuits have the same underlying interaction map. We created four different topologies by inactivating the translation reaction of the respective repressor mRNAs. Therefore, we generated in silico "knock outs" with minimal changes to the model.
Expression rate parameters.
The vector of global proportionality coefficients β used in the simulation of constitutive gene co-transfection is a measure for the conversion of gene copy numbers to the number of proteins. In our circuit models, this coefficient is derived from expression and degradation rate constants. We adjusted the values of βi according to the maximum expression levels of our circuit models to obtain similar and biologically feasible amounts (S1 Table).
In dynamic circuit models, binding/unbinding and transcription rates were either fitted from experimental data (Manuscript in preparation), literature values, or set arbitrarily at biologically feasible values. The translation rates πi are based on previously-reported values  and were adjusted according to the length of each protein (S3 Text "Detailed Models").
Anticipated gene copy number .
In order to compute the values of anticipated gene copy numbers from Eq 27, we first have to determine the Y1 values. This is done by removing the top and bottom 0.1 percentile from the in-silico co-transfection and circuit simulations and binning the obtained data set into equally spaced bins (50 bins for co-transfection simulations; 25 bins for circuit simulations) according to the signal intensity of the transfection marker (O1). Since the O1 distribution within a bin is just a subset of the global O1 distribution, we use the median of the log-transformed O1 signal for each bin, which is then identified as Y1 value. Together with the global proportionality coefficient β1 (see above) and the abundance parameter ai we can compute according to Eq 27 for each bin of the respective data set.
In-silico flow cytometry simulations
Gene expression noise from lognormal distributions.
Within our in-silico simulations, we introduce gene expression noise through the randomization of kinetic parameters. In the case of co-transfection simulations we randomize the vector of the proportionality coefficients b by drawing its individual values from a lognormal distribution , with the mean μβ being their coefficient of proportionality of the respective gene gi (S1 Table) in log-space and the standard deviation σ being one of the six noise levels (0.00, 0.02, 0.04, 0.08, 0.16 or 0.32). In the case of circuit simulations, we introduce variability through the translation parameter vector p. Likewise, it is drawn from a lognormal distribution with the mean being set according to S2 Table in log-space and the standard deviation being again one of the six noise levels.
Gene expression noise from Γ distributions.
Gene expression noise is introduced by drawing individual values b of the proportionality coefficients from a Γ (gamma) distribution, Γ(k,θ). We chose the shape parameter k and the scaling parameter θ, so that the mean (βi) and the variance of the distribution are the same as in the lognormal case. We simulated transient co-transfections at σ = 0.08.
Gene copy number/extrinsic noise.
We introduce extrinsic noise in our transient transfection simulations by drawing gene copy numbers k from a five-dimensional multivariate normal distribution . The mean of the distribution, μ = am, depends on the multiplicity parameter m, which is drawn from a lognormal distribution ), and the abundance vector a (equimolar: a1:a2:a3:a4:a5 = 1.0: 1.0: 1.0: 1.0: 1.0 or nominal: a1:a2:a3:a4:a5 = 1.0: 1.3: 0.8: 0.5: 0.4). For our systematic comparison of gene copy number distributions (S5 Fig), we draw the multiplicity parameter m from a Poisson (Pois(λ), λ = 10) and Γ distribution (Γ(k,θ), k = 0.7436, θ = 13.46), respectively. The covariance Σ = diag((amε)2) is diagonal matrix and depends on the abundance a, the multiplicity m and the constant factor ε = 0.04. For every simulated cell, the multivariate distribution changes along the lognormal distribution and in every iteration five gene copy numbers are drawn from it.
The single-plasmid circuit contains all gene cassettes on a single entity. We achieve this by setting the constant factor ε to zero. Consequently, the covariance matrix of the five-dimensional multivariate distributions Σ also turns zero. The remaining simulation is performed as described below in section “Gene circuit simulations”.
We simulated our simple model independently 5×106 times (C), randomizing both the parameters b and gene copy number k, according to the description above. Each simulated run corresponds to a single cell and contains a randomized set of b and k. We simulated the steady-state and the runs were stored in a single .csv-file that contained all information used to generate the data afterwards. This includes the individual parameters b and k for every cell as well as the output values. Thus, we obtained a data set that resembles a transient co-transfection experiment aided by the information of individual parameters. The simulations were performed in MATLAB 2018b.
1: program: Simulation Transient Co-transfection
2: initialize abundance a, constant factor σe = 0.04
3: for Noise-Level σ in [0.00, 0.02, 0.04, 0.08, 0.16, 0.32]
4: for Cell j in 1:C
5: draw mj from , μm = 1.4979 and σm = 1.2686
6: seed and draw kj from μ = amj, Σ = diag((amjσe)2)
7: draw bj from ,
8: Model: Oj = kjbj
11: save Simulation.csv
12: end program
Gene circuit simulations.
In-silico simulations of flow cytometry data for our circuits requires a mathematical model (ODE) generated by MATLAB’s Simbiology toolbox. The model was exported into the workspace and to decrease computational effort, we generated a SimFunction object. This function has five outputs: the number of fluorescent proteins (SBFP2, Cerulean, Citrine, mCherry) and transcription factor rtTA bound to Doxycycline at steady state (i.e. 1,500,000s). As the SimFunction’s input serves a matrix wherein each column represents the gene copy number (k1: pCS187, k2: pCS171, k3: pCS166, k4: pCS200, k5: pZ91), the Doxycycline level DOX (Z = 12 logarithmically spaced values from 10–500,000 molecules) and eight translation parameters p, each for every protein O produced. Sets of gene copy numbers kj and gene expression noise variations pj were drawn as described above. The simulation input, output as well as all parameters used for each cell are stored in a table and saved as.csv-files for documentation and further analysis. The simulations were performed in MATLAB 2018b.
1: program: Simulation Transient Circuits
2: initialize abundance a, constant factor σe = 0.04
3: for Noise-Level σ in [0.00, 0.02, 0.04, 0.08, 0.16, 0.32]
4: for Input-Level l in 1: Z
5: for Cell j in 1: n
6: draw mij from , μm = 1.4979 and σm = 1.2686
7: seed and draw klj from
8: draw plj from
9: initialize and simulate SimFunction-Model
13: save Simulation.csv
14: end program
Determine copy numbers for stably integrated gene circuits.
We first simulated the transient transfection data set according to our initial parameters, which we drew from previous experiences. After binning the data set according to the protein output from our transfection marker O1, we determined the mode of the gene copy number k1 within each bin. The other values are derived from k1 according to their abundance coefficients. These values serve as the gene copy number for stably integrated gene circuits.
Pre-processing of experimental flow cytometry data.
Retrieved data from BD LSR Fortessa was recorded with BD FACS Diva Software. The resulting files were exported in .fcs format and loaded into FlowJo software . There, compensation of individual fluorescent channels was performed, live population gated and exported as scaled values into .csv-files.
Scaled FACS values were transformed into bi-exponential space when needed via formulas from Parks et al.  with parameters M = 4.5, p = 2, T = 262144 and W = 0.401:
where Δ = X−W for X≥W and Δ = W−X else.
Gene copy number distributions ki in output peaks.
After binning the data set according to the transfection marker (50 bins in case of the co-transfection simulations, 25 bins in case of the circuit simulations), we fit Gaussians to the log-transformed values. We slice a window of ±0.15 log10 units around the mode(s) of the fitted distribution. Within this narrow window, we repeat the process for the remaining genes (co-transfection case: 1. Cerulean, 2. Citrine, 3. mCherry, 4. iRFP; circuit case: 1. mCherry, 2. Cerulean or Citrine). Since all parameters needed for the simulations are stored in an array, we can select all cells within that final slice and look up the gene copy numbers that were used to generate this subset of output data. The distributions of the gene copy numbers are then processed to discover their modes.
Peak Finder Algorithm for Flow cytometry (PFAFF).
The software is available on GitHub (https://github.com/benensonlab/PFAFF). The repository contains the code, detailed S4 Text "PFAFF User Manual", S5 Text "Description of the example data set" and sample simulated data for running the analysis. User-provided data can also be analyzed according to the steps described in User Manual.
The algorithm’s procedure starts by discarding the tails of the transfection control’s distribution. Within this window (i.e. 2.5–97.5% of transfection control fluorescence intensity) the distribution is segmented into bins of equal number of events (i.e. ten bins). Each bin is analyzed sequentially and all values are transformed bi-exponentially. The input distribution (i.e. mCherry) is approximated by a histogram in bi-exponential space and Gaussians are fitted to it. A following set of rules determines the number of fitted Gaussians:
1: program Fit Gaussians to mCherry Distribution
2: if Goodness-of-Fit for one Gaussian > 0.975 then
3: save mode value
6: fit two Gaussians
7: if Goodness-of-Fit for two Gaussians > 0.99 then
8: save mode values
10: fit three Gaussians
11: if distance between two peaks < 0.42 then
12: go back to use two Gaussian fit
14: elseif distance (mean closest-modes) to (two-Gaussian-Fit modes) < 0.3
15: save two-Gaussian-Fit mode and remaining three-Gaussian-Fit mode
16: end if
17: save mode values
18: end if
19: if distance between the two modes < 0.75 then
20: go back to use one Gaussian fit
22: end if
23: end if
24: end program
A window of ±0.1 bi-exponential units is sliced around the peaks’ center. Within that subset of cells, distributions of the output colors (i.e. Cerulean and Citrine) are again approximated by histograms. Much like before a set of rules determines the number of Gaussians that are fitted to these distributions:
1: program Fit Gaussians to Cerulean or Citrine Distribution
2: if Goodness-of-Fit for one Gaussian > 0.975 then
3: save mode value
6: fit two Gaussians
7: if Goodness-of-Fit for two Gaussians > 0.995 then
8: save mode values
10: fit three Gaussians
11: if distance between peaks with highest intensities < 0.9 then
12: remove remaining peak from the data set
13: fit one Gaussian for the highest peak
14: if Goodness-of-Fit > = Goodness-of-Fit for two Gaussians then
15: save mode values
17: save mode values of two Gaussian-Fit
18: end if
19: if distance between the two modes < 0.75 then
20: go back to use one Gaussian fit
22: end if
23: end if
24: end program
For each bin, we repeat this fitting procedure. All extracted modes are re-transformed into flow cytometry units and stored in a table. The output of this algorithm is saved as MATLAB workspaces, that contain variables for generating (weighted) input/output mappings. Furthermore, various plots are generated (density plots of (raw) data, individual fits to data distributions, weighted input/output mappings and weighted mean input/output mappings) and saved as individual files in the result folder (see provided manual for details).
S5 Text. Description of the example data set.
S2 Fig. Co-transfection experiment and in silico simulations.
S3 Fig. In-silico simulations of a transient co-transfection with equimolar plasmid ratio.
S4 Fig. In-silico simulations of a transient co-transfection with nominal plasmid ratio.
S5 Fig. In-silico simulation of transient co-transfections at various initial gene copy number and parameter distributions.
S6 Fig. PFAFF applied to experimental data of a FO circuit.
S7 Fig. Comparison of multi-plasmid and single-plasmid gene circuits.
S8 Fig. In-silico simulations of a transiently transfected monomodal circuit.
S9 Fig. In-silico simulations of a transiently transfected bi-modal circuit.
S10 Fig. Gene copy number ratios from the simulations of transiently transfected bi-modal circuit.
S11 Fig. In silico simulations of stable integrations and transient transfections.
S12 Fig. Workflow for simulation and analysis of genetic circuits.
S13 Fig. Concatenation of high and low input and output modes.
S14 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.00.
S15 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.02.
S16 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.04.
S17 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.08.
S18 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.16.
S19 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (RIFFM and FO) at intrinsic noise level 0.32.
S20 Fig. PFAFF output for various bin numbers.
S21 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.00.
S22 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.02.
S23 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.04.
S24 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.08.
S25 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.16.
S26 Fig. In-silico simulation of stably integrated and transiently transfected (PFAFF input/output) circuits (I1-FFL1 and I1-FFL2) at intrinsic noise level 0.32.
S27 Fig. Results of PFAFF analysis on simulated cFFL flow cytometry data sets.
S28 Fig. Results of PFAFF analysis on simulated negFB flow cytometry data sets.
S29 Fig. Results of PFAFF analysis on simulated posFB flow cytometry data sets.
S30 Fig. Maps of plasmids used in co-transfection experiment.
S31 Fig. (Quasi) steady states of a fan-out circuit at different protein degradation rates.
S1 Table. Model parameter values for co-transfection simulations.
S2 Table. Model parameter values for tested circuit architectures (RIFFM, I1-FFL1/2, FO, cFFL, negFB, posFB).
Benenson group members for discussions. We thank Bart Deplancke, Alexander Stark and Gerald Stampfel for providing the D.melanogaster gene kni.
- 1. Benenson Y. Biomolecular computing systems: Principles, progress and potential. Nat Rev Genet. 2012;13: 455–468. pmid:22688678
- 2. Gardner TS, Cantor CR, Collins JJ. Construction of a genetic toggle switch in Escherichia coli. Nature. 2000;403: 339–42. pmid:10659857
- 3. Nielsen AAKK, Der BS, Shin J, Vaidyanathan P, Paralanov V, Strychalski EA, et al. Genetic circuit design automation. Science (80-). 2016;352: 53-+. pmid:27034378
- 4. Zhang Q, Bhattacharya S, Conolly RB, Clewell HJ, Kaminski NE, Andersen ME. Molecular signaling network motifs provide a mechanistic basis for cellular threshold responses. Environ Health Perspect. 2015;122: 1261–1270. pmid:25117432
- 5. Angelici B, Mailand E, Haefliger B, Benenson Y, Angelici B, Mailand E, et al. Synthetic Biology Platform for Sensing and Integrating Endogenous Transcriptional Inputs in Mammalian Cells Resource Synthetic Biology Platform for Sensing and Integrating Endogenous Transcriptional Inputs in Mammalian Cells. CellReports. 2016; 1–13. pmid:27545896
- 6. Bleris L, Xie Z, Glass D, Adadey A, Sontag E, Benenson Y. Synthetic incoherent feedforward circuits show adaptation to the amount of their genetic template. Mol Syst Biol. 2011;7: 519. pmid:21811230
- 7. Nevozhay D, Adams RM, Murphy KF, Josić K, Balázsi G. Negative autoregulation linearizes the dose-response and suppresses the heterogeneity of gene expression. Proc Natl Acad Sci U S A. 2009;106: 5123–5128. pmid:19279212
- 8. Gregor T, Tank DW, Wieschaus EF, Bialek W. Probing the limits to positional information. Cell. 2007;130: 153–164. pmid:17632062
- 9. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science (80-). 2002;297: 1183–1186. pmid:12183631
- 10. Ozbudak EM, Thattal M, Lim HH, Shraiman BI, Van Oudenaarden A. Multistability in the lactose utilization network of Escherichia coli. Nature. 2004;427: 737–740. pmid:14973486
- 11. Pedraza JH, Van Oudenaarden A. Noise propagations in gene networks. Science (80-). 2005;307: 1965–1969. pmid:15790857
- 12. Vigdal TJ, Kaufman CD, Izsvák Z, Voytas DF, Ivics Z. Common physical properties of DNA affecting target site selection of Sleeping Beauty and other Tc1/mariner transposable elements. J Mol Biol. 2002;323: 441–452. pmid:12381300
- 13. Wilson MH, Coates CJ, George AL. PiggyBac transposon-mediated gene transfer in human cells. Mol Ther. 2007;15: 139–145. pmid:17164785
- 14. Bushman F, Lewinski M, Ciuffi A, Barr S, Leipzig J, Hannenhalli S, et al. Genome-wide analysis of retroviral DNA integration. Nat Rev Microbiol. 2005;3: 848–858. pmid:16175173
- 15. Tratschin JD, Miller IL, Smith MG, Carter BJ. Adeno-associated virus vector for high-frequency integration, expression, and rescue of genes in mammalian cells. Mol Cell Biol. 1985;5: 3251–3260. pmid:3018511
- 16. Gersbach CA, Gaj T, Gordley RM, Mercer AC, Barbas CF. Targeted plasmid integration into the human genome by an engineered zinc-finger recombinase. Nucleic Acids Res. 2011;39: 7868–7878. pmid:21653554
- 17. Hockemeyer D, Wang H, Kiani S, Lai CS, Gao Q, Cassady JP, et al. Genetic engineering of human pluripotent cells using TALE nucleases. Nat Biotechnol. 2011;29: 731–734. pmid:21738127
- 18. Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, et al. RNA-guided human genome engineering via Cas9. Science (80-). 2013;339: 823–826. pmid:23287722
- 19. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science (80-). 2013;339: 819–823. pmid:23287718
- 20. Haefliger B, Prochazka L, Angelici B, Benenson Y. Precision multidimensional assay for high-throughput microRNA drug discovery. Nat Commun. 2016;7. pmid:26880188
- 21. Prochazka L, Angelici B, Haefliger B, Benenson Y. Highly modular bow-tie gene circuits with programmable dynamic behaviour. Nat Commun. 2014;5. pmid:25311543
- 22. Shimoga V, White JT, Li Y, Sontag E, Bleris L. Synthetic mammalian transgene negative autoregulation. Mol Syst Biol. 2013;9. pmid:23736683
- 23. Xie Z, Wroblewska L, Prochazka L, Weiss R, Benenson Y. Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science (80-). 2011;333: 1307–1311. pmid:21885784
- 24. Lapique N, Benenson Y. Digital switching in a biosensor circuit via programmable timing of gene availability. Nat Chem Biol. 2014;10: 1020–1027. pmid:25306443
- 25. Recillas-Targa F. Multiple strategies for gene transfer, expression, knockdown, and chromatin influence in mammalian cell lines and transgenic animals. Molecular Biotechnology. 2006. pp. 337–354. pmid:17284781
- 26. Schreiber J, Arter M, Lapique N, Haefliger B, Benenson Y. Model-guided combinatorial optimization of complex synthetic gene networks. Mol Syst Biol. 2016;12: 899. pmid:28031353
- 27. Davidsohn N, Beal J, Kiani S, Adler A, Yaman F, Li Y, et al. Accurate Predictions of Genetic Circuit Behavior from Part Characterization and Modular Composition. ACS Synth Biol. 2015;4: 673–681. pmid:25369267
- 28. Stanton BC, Siciliano V, Ghodasara A, Wroblewska L, Clancy K, Trefzer AC, et al. Systematic transfer of prokaryotic sensors and circuits to mammalian cells. ACS Synth Biol. 2014;3: 880–891. pmid:25360681
- 29. Wang J, Isaacson SA, Belta C. Modeling Genetic Circuit Behavior in Transiently Transfected Mammalian Cells. ACS Synth Biol. 2019. pmid:30884948
- 30. Munteanu A, Cotterell J, Solé R V., Sharpe J. Design principles of stripe-forming motifs: The role of positive feedback. Sci Rep. 2014;4. pmid:24830352
- 31. Schaerli Y, Munteanu A, Gili M, Cotterell J, Sharpe J, Isalan M. A unified design space of synthetic stripe-forming networks. Nat Commun. 2014;5: 4905. pmid:25247316
- 32. Weinberger LS, Burnett JC, Toettcher JE, Arkin AP, Schaffer D V. Stochastic gene expression in a lentiviral positive-feedback loop: HIV-1 Tat fluctuations drive phenotypic diversity. Cell. 2005;122: 169–182. pmid:16051143
- 33. To TL, Maheshri N. Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science (80-). 2010;327: 1142–1145. pmid:20185727
- 34. Ellis EL, Delbrück M. The growth of bacteriophage. J Gen Physiol. 1939;22: 365–384. pmid:19873108
- 35. Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science (80-). 2010;329: 533–538. pmid:20671182
- 36. Beal J. Biochemical complexity drives log-normal variation in genetic expression. Eng Biol. 2017;1: 55–60.
- 37. Mclean APF, Smolke CD, Salit M. Characterizing the Non-Normal Distribution of Flow Cytometry Measurements from Transiently Expressed Constructs in Mammalian Cells. 2016; 1–15.
- 38. Lillacci G, Benenson Y, Khammash M. Synthetic control systems for high performance gene expression in mammalian cells. Nucleic Acids Res. 2018;46: 9855–9863. pmid:30203050
- 39. Gao XJ, Chong LS, Kim MS, Elowitz MB. Programmable protein circuits in living cells. Science (80-). 2018;361: 1252–1258. pmid:30237357
- 40. Widder S, Schicho J, Schuster P. Dynamic patterns of gene regulation I: Simple two-gene systems. J Theor Biol. 2007;246: 395–419. pmid:17337276
- 41. Huang S, Guo YP, May G, Enver T. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol. 2007;305: 695–713. pmid:17412320
- 42. Fussenegger M, Morris RP, Fux C, Rimann M, Von Stockar B, Thompson CJ, et al. Streptogramin-based gene regulation systems for mammalian cells. Nat Biotechnol. 2000;18: 1203–1208. pmid:11062442
- 43. Kim HD, O’Shea EK. A quantitative model of transcription factor-activated gene expression. Nat Struct Mol Biol. 2008;15: 1192–1198. pmid:18849996
- 44. Rosenfeld N, Young JW, Alon U, Swain PS, Elowitz MB. Gene Regulation at the Single-Cell Level. 2013;1962: 1–5. pmid:15790856
- 45. Bengtsson M, Ståhlberg A, Rorsman P, Kubista M. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 2005;15: 1388–1392. pmid:16204192
- 46. Kremers GJ, Goedhart J, Van Den Heuvel DJ, Gerritsen HC, Gadella TWJ. Improved green and blue fluorescent proteins for expression in bacteria and mammalian cells. Biochemistry. 2007;46: 3775–3783. pmid:17323929
- 47. Rizzo MA, Springer GH, Granada B, Piston DW. An improved cyan fluorescent protein variant useful for FRET. Nat Biotechnol. 2004;22: 445–449. pmid:14990965
- 48. Griesbeck O, Baird GS, Campbell RE, Zacharias DA, Tsien RY. Reducing the environmental sensitivity of yellow fluorescent protein. Mechanism and applications. J Biol Chem. 2001;276: 29188–29194. pmid:11387331
- 49. Shaner NC, Campbell RE, Steinbach PA, Giepmans BNGG, Palmer AE, Tsien RY. Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein. Nat Biotechnol. 2004;22: 1567–1572. pmid:15558047
- 50. Filonov GS, Piatkevich KD, Ting LM, Zhang J, Kim K, Verkhusha V V. Bright and stable near-infrared fluorescent protein for in vivo imaging. Nat Biotechnol. 2011;29: 757–761. pmid:21765402
- 51. Bostrom K, Wettesten M, Boren J, Bondjers G, Wiklund O, Olofsson SO. Pulse-chase studies of the synthesis and intracellular transport of apolipoprotein B-100 in Hep G2 cells. J Biol Chem. 1986;261: 13800–13806. pmid:3020051
- 52. Vallan C. Flow Cytometric Data Analysis with Flowjo. Cytom Part A. 2009;75a: 720.
- 53. Parks DR, Roederer M, Moore WA. A new “logicle” display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytom Part A. 2006;69: 541–551. pmid:16604519