State Estimation of the Time-Varying and Spatially Localized Concentration of Signal Molecules from the Stochastic Adsorption Dynamics on the Carbon Nanotube-Based Sensors and Its Application to Tumor Cell Detection

This paper addresses a problem of estimating time-varying, local concentrations of signal molecules with a carbon-nanotube (CNT)-based sensor array system, which sends signals triggered by monomolecular adsorption/desorption events of proximate molecules on the surfaces of the sensors. Such sensors work on nano-scale phenomena and show inherently stochastic non-Gaussian behavior, which is best represented by the chemical master equation (CME) describing the time evolution of the probabilities for all the possible number of adsorbed molecules. In the CME, the adsorption rate on each sensor is linearly proportional to the local concentration in the bulk phase. State estimators are proposed for these types of sensors that fully address their stochastic nature. For CNT-based sensors motivated by tumor cell detection, the particle filter, which is nonparametric and can handle non-Gaussian distributions, is compared to a Kalman filter that approximates the underlying distributions by Gaussians. In addition, the second-order generalized pseudo Bayesian estimation (GPB2) algorithm and the Markov chain Monte Carlo (MCMC) algorithm are incorporated into KF and PF respectively, for detecting latent drift in the concentration affected by different states of a cell.


Introduction
Recently, several near-infrared (nIR) fluorescent sensors based on single-walled carbon nanotubes (SWNTs) have been developed for detecting biomolecules in the human body [1][2][3][4][5][6][7][8][9]. In response to a continuous incident light source, the SWNT-based sensors detect stepwise changes in emitted light intensity triggered by monomolecular adsorption and desorption (i.e., adsorption and desorption at a single-molecular level) of a trace of proximate molecules on the surfaces of the sensors. The nIR fluorescence can penetrate more deeply into tissues than visible fluorescence without photobleaching or overlapping with autofluorescence from biological substrates [10,11]. Furthermore, compared with small fluorescent probes [12][13][14][15][16][17][18][19][20][21][22], non-diffusive SWNTs allow for a precise spatial resolution at the micrometer scale. As a result of these advantages, SWNTs can act as effective sensing platforms for real-time, direct and selective detection in vivo. In particular, for nitric oxide (NO) and hydrogen peroxide (H 2 O 2 ), μM level concentration could be detected successfully by using this sensing platform and resolve several questions about local generation upon growth factor stimulation and the signalling mechanism in a living cell [7,8].
This sensor technology presents some challenges as well as opportunities. A sensor array system where multiple CNT-based sensors distributed on a small area potentially can be used to follow time-varying, local concentrations of target molecules in vivo and in real time with precise spatial resolution. In turn, precise spatiotemporal control of these molecules may become feasible with the advent of appropriate actuators. Challenges in the experimental side include selective sensor design for target molecules in a desired detection range and actuator design for the spatiotemporal control at micro-scale. On the system's side, an immediate challenge is the development of an on-line state estimation method that can effectively extract concentration information from the stochastic adsorption data.
Some methods have been proposed for quantifying local concentrations of signal molecules near CNT-based sensors [23][24][25]. The estimation task is made challenging by the fact that the adsorption/desorption event is highly stochastic given a small number of molecules involved at the nanoscale sensor's surface. Conventional methods like least squares are limited in terms of performance for such problems. For a more accurate estimation, chemical master equation (CME) describing the evolution of the probability distribution among all possible adsorption states (i.e., the number of adsorbed molecules on the sensor) has been used in the estimation formulation. Based on the exact solution of the CME, maximum likelihood estimation (MLE) has been proposed [23][24][25]. However, the previous works assumed a constant concentration and performed the estimation with a batch set of data, which is not realistic for a sensor system working in a real-time environment in which concentrations show dynamic, time-varying behavior. What is needed is a full state estimation method that can fully and recursively utilize the information coming from the sensors to follow the local concentration in real time.
Bayesian methods have been a popular choice for state estimation of stochastic systems owing to its flexible, convenient formulation and theoretical rigor. For Gaussian systems, only the first two moments of the probability density function (PDF) have to be followed and the Kalman filter (KF) provides a simple solution to the problem. However, data from the CNTbased sensor system shows highly non-Gaussian characteristics that follow convolved binomial distributions [24]. For highly non-Gaussian systems, a class of sequential Monte Carlo methods known as particle filters (PFs) can be attractive as a nonparametric method that can handle any distribution shape [26]. The PF methods represent the required posterior PDF as a set of random samples and associated weights.
This article mainly proposes an effective recursive state estimator for estimating time-varying, local concentrations of signal molecules using the stochastic adsorption and desorption time-profiles onto the surface of the CNT-based sensors. By tracking the concentration of the signal molecules with the help of a rigorously formulated stochastic state estimator, we can gain further insights into their roles in biological systems or the effects of other species on them. The stochastic nature of the adsorption and desorption at the molecular level brings in the chemical master equation (CME) at the sensor level and makes the problem a challenging one that cannot be easily handled by the conventional state estimation techniques. Hence, the state estimation problem studied in this article has not been addressed before in the literature.
To test the feasibility and potentials of the proposed method, we test it in the context of a sensing problem, which is admittedly simplistic and artificial but still is inspired by the real biological problem. Given the known parameters in the model, performances of the KF and PF methods are examined in terms of both accuracy of estimated local concentration of the signal molecule and computational cost. The nano-sensors have previously been used for detecting and measuring signal molecules in human body, to follow the concentrations of signal molecules like nitric oxide (NO) and hydrogen peroxide (H 2 O 2 ), which are consistently generated from enzymes in vascular endothelial cells to regulate various physiological and pathological processes [6,[23][24][25]. Their concentration levels are known to be affected significantly by cell states, the switching behavior of which is simplistically represented by a hidden Markov model in our case study. To solve the simulated estimation problem, KF and PF are designed with the second-order generalized pseudo-Bayesian estimation (GPB2) algorithm and the Markov chain Monte Carlo (MCMC) algorithm respectively, for the Markov jump system with nanosensors. Their performances are compared for the case of a single sensor as well as of multiple sensors.

Single-molecule Sensor System
Carbon nanotube-based sensor. The basic mechanism of SWNT-based sensors is optical detection of discretized light intensity changes induced by adsorption and desorption of target molecules on the sensor's surface at nano-scale. To enhance the sensitivity and selectivity for target molecules, usually present at the micromolar (μM) concentration level, the SWNT surface is functionalized by wrapping the nanotube with various polymers such as collagen [7] or certain DNA sequences [8] (Fig 1). The variation in the SWNT wrapping controls the adsorption rates of different analytes present. For example, collagen-SWNTs have shown different, selective time-profiles of adsorption and desorption events for H 2 O 2 , H + , and Fe(CN) 6 3− in different concentration ranges [6]. Importantly, all time-profile data had reversible features, which indicate adsorption and desorption rates of similar magnitudes. The maximum number of adsorbed molecules is experimentally found to be around 10 [8], and this number is consistent with the maximum number of excitons (an excition is an electron and positive hole pair, which remain near each other due to electrostatic Coulomb force and is free to move through a semiconducting material) diffusion-limited segments on the SWNT [1] for which an average length is about 1~2 μm. So several SWNT-based sensors can be placed in a small area less than 10 μm 2 [8]. Fig 1 shows an example of a sensor array system depicted as sensors randomly distributed on a small area of neighborhood. With this array system, the objective is to estimate a time profile of the local concentration of target molecules with high accuracy.
Stochastic adsorption model. The number of adsorbed molecules is assumed to be read at every sampling time from the sensors, which are distributed in a sufficiently small area of a same concentration level. In developing a sensor model, free target molecules A in its surrounding liquid phase are assumed to adsorb onto unoccupied sites of the nanotube segment θ to form bound molecules Aθ through reversible adsorption: where k 0 A [s −1 ] and k D [s −1 ] are adsorption and desorption rate constants, respectively. The corresponding rates are expressed as where N θ is the number of empty sites and N Aθ is the number of occupied sites. The adsorption rate can be considered to be a first-order function of the local concentration of the surrounding target molecules C(t) [24], where k A is a constant factor in the adsorption coefficient. These equations connect the sensor information (i.e., the number of absorbed molecules) to the concentration in the surrounding media. If the adsorption/desorption events could be deterministic, a continuum (or average) model for the sensor can be formed by one differential equation for the number of adsorbed molecules N Aθ 2 [0, N T ] as a continuous variable with an initial value of N Aθ,0 , dN Ay ðtÞ dt ¼ k A CðtÞðN T À N Ay ðtÞÞ À k D N Ay ðtÞ; N Ay ð0Þ ¼ N Ay;0 A recursive form of the solution obtained by considering the previous measurement N Aθ (t k −1 ) as an initial condition and integrating the equation for one sample interval assuming C(t k ) remains constant over the interval is where k is the index for the time step and Δt is the size of the sample time step size, which is set sufficiently small for the approximation to be accurate. In actuality, the adsorption reaction on the sensor surface is highly stochastic because only a very small number of molecules (~10) are involved. Hence, significant fluctuations occur from the average behavior described in (6). In this case, use of the chemical master equation (CME) composed of differential equations describing the evolution of the probabilities for all possible discrete states of the system is more appropriate [27]. Then, the state of the system is defined as the discrete number of adsorbed moleculesÑ Ay 2 ½0; N T , resulting in N T +1 total possible states. The probability of being in each state is denoted by P i ¼ PrðÑ Ay ¼ iÞ 2 ½0; 1, where i is the number of adsorbed molecules. The CME, along with the appropriate boundary equation, can be expressed by N T +1 ordinary differential equations (ODEs): The monomolecular reaction systems, which were studied by [28], provide a path to an analytical solution of the CME. The adsorption/desorption process can be considered as a monomolecular reaction system with only two species (e.g. adsorbed molecules on sensor surface and desorbed molecules in bulk). For such a system, the probability distribution of the CME is described by a binomial distribution with time-varying parameters. More specifically, the number of adsorbed molecules N Aθ at a time t k is a random variable distributed as a binomial with the number of trials equal to N T and probability parameter equal to λ(t k ), which is related to N Aθ (t k ) calculated from the continuum Eq (6) divided by N T as The local concentration of target molecule C(t k ) enters the probability distribution of Eq (10) through N Aθ (t k ) of Eq (6) appearing in Eq (12) for λ(t k ).
For monomolecular adsorption, the overall population can be divided into two subsets, representing occupied sites and unoccupied sites on the sensor. With some previously measured valueN Ay ðt kÀ1 Þ, the distribution at the next time step can be derived as the convolution of two binomial distributions applicable to the "fully occupied" and "empty" subsets, which are of sizê N Ay ðt kÀ1 Þ and 1 ÀN Ay ðt kÀ1 Þ respectively: The first binomial distribution can be derived from (10)-(12) by assuming the sites are fully occupied initially, and the second binomial distribution can be derived from (10)-(12) by considering the initial state as being empty [28]. If the expression for N Aθ (t k ) obtained by setting N Aθ (t k−1 ) = N T ("fully occupied") in (6) is further substituted into (12), the probability parameter λ(t k ) becomes λ F (t k ) of (14) and N T cancels out. If the same substitution is carried out by setting N Aθ (t k−1 ) = 0 ("fully empty") in (6), λ(t k ) becomes λ E (t k ) of (15).

Recursive State Estimation Design
Based on the observation model proposed in Section 2, the overall system for state estimation can be generally described by the discrete-time state space model, where x k is a single state indicating the local concentration C(t k ) in the neighborhood; w k is zero-mean white noise; y k,j is the measurement of the number of adsorbed moleculesÑ Ay;j ðt k Þ onto the surface of the jth sensor; N s is the number of sensors in the neighbourhood, f(Á) represents the state transition function which can describe production, degradation, mass transport, biological reactions, etc. of the signal molecules; and p(Á) denotes the probability distribution represented by the convolution of the two binomial distributions, as in (13), which describes the stochastic adsorption reaction model. The expression involves both x k and y k−1,j (corresponding to C(t k ) andN Ay;j ðt kÀ1 Þ, respectively), which explains the use of the notation p(y k,j |x k , y k−1,j ). The available information at time step k is the set of measurements Note that other biological effects on the concentration are not considered in the model (16) and (17). The "cell state" as an example of such effects will be included as a hidden Markov state in the later part of this article. The above model can be extended to a multiple-state (vector x) system where concentrations at different spatial locations are measured by separate sets of CNT sensors, which can be useful in cases where one deals with a spatially distributed concentration profile and/or multiple signal molecules over a large sensing area. In this case, the concentrations and therefore the measured data at different locations can be correlated through the mass transfer phenomena, which can be represented by mass transport models such as diffusion equation [29]. To communicate the essence of the problem in a simple and transparent manner, this article focuses on estimation of concentration at a single location, using single or multiple sensors.
Kalman filter. The Bayesian approach offers a systematic way to combine prior knowledge, state and observation models, and measurement information into an informative estimate of the state (i.e., a posteriori probability density function (PDF) of the state p(x k |Y k )). For linear Gaussian systems, the Kalman filter (KF) enables a recursive construction of the exact PDF of the state estimate, which is parameterized by the mean and covariance. Kalman filtering can be applied to the exact probability distribution model (13) by approximation of the exact PDF by a Gaussian distribution function.
The binomial distribution B(n, p) has the mean of np and the variance of np(1 − p) and can be approximated by a normal distribution with the same mean and variance, N ðnp; npð1 À pÞÞ [30]. In this work, the two binomial distributions in the exact observation model can be approximated bỹ Convolution of the two Gaussian distributions N ðm 1 ; , so the observation model can be approximated bỹ Hence, the Gaussian-approximated observation model for the jth sensor is defined by where h(x k,j , y k−1,j ) is same as the mean in (23) and v k,j is zero-mean Gaussian noise with the variance of (23). The KF method can be summarized in a recursion of prediction and correction steps, starting from an initial guess defined by the meanx 1j1 and the covariance P 1|1 . Given the posterior mean and covariance of x k , the mean and covariance of the prior PDF of next state x k+1 iŝ where P is the covariance of the state and Q is the covariance of the process noise w k , and A kÀ1 follows from the linearization The mean and covariance of the posterior PDF iŝ x kjk ¼x kjkÀ1 þ L k y k À where L k 2 R N s is the Kalman gain matrix and C k 2 R N s follows from the linearization The covariance matrix R k 2 R N s ÂN s for the measurement noise can be defined by a diagonal matrix where r k,j is same with the variance in (23) for the jth sensor. Particle filter. To use the non-Gaussian observation model (17) directly, a samplingbased approach known as particle filtering (PF) can be used. PF is based on a discrete weighted approximation of the true posterior PDF with a set of random samples (particles). If the number of samples becomes extremely large, the approximation converges to the true posterior PDF.
The sequential importance sampling (SIS) algorithm is considered as the current standard of PF [26]. The first step of the algorithm is an initialization of N particles and their weights, denoted by fx i k ; w i k ; i ¼ 1; . . . ; Mg. In this step, each particle is sampled from the initial PDF pðx 1 Þ $ N ð x 1 ; Q 1 Þ and the associated weight is initialized to 1/ M. After the initialization, the importance sampling step and the weight update step are repeated. In the importance sampling step, x i k , i = 1,. . ., M are sampled from an importance density q(x k |Y k ) which is a user-defined choice. The importance density is commonly chosen as the prior PDF, In the weight update step, the weight w i k for each i = 1,. . ., M is updated with If (33) is substituted into (34), the weight update equation is described as Based on the samples and normalized weights, the posterior PDF can be approximated as where δ(Á) is the Dirac delta function. The estimated value is commonly calculated as a weighted mean,x In addition, a resampling step can be added to mitigate the degeneracy problem [26]. The degeneracy problem refers to the growing number of samples having negligible weights with iterations. The resampling step eliminates the samples with small weights and concentrates the calculation on those samples with large weights whenever a significant degeneracy problem is detected. After generating a new set of x i k for i = 1,. . ., M by resampling, the weights are reset to 1/ M as in the initialization step. After resampling, the estimated valuex kjk is calculated as a mean of x i k for i = 1,. . ., M.

Results and Discussion
This case study for testing the two approaches is motivated by the problem of detecting tumor cells through NO and H 2 O 2 signal molecules. NO generated from vascular endothelial NO synthase (eNOS) correlates with stimulation of angiogenesis. This activity is intimately linked with metastasis of tumor cells since their survival and proliferation are highly dependent on adequate supply of O 2 and nutrients from blood vessels by diffusion [32][33][34]. Membrane-associated NADPH oxidases are also found in vascular endothelial as well as smooth muscle cells, and generate H 2 O 2 as an important signal molecule in angiogenesis. Produced H 2 O 2 can activate signalling pathways to stimulate tumor cell proliferation and migration [35][36][37]. Knowledge of how concentrations of these signal molecules change as a cell changes its state can help understand their biological roles in tumor cell growth, which in turn can lead to advances in medical treatments. The estimation of the concentration of signal molecules from a normal cell is examined first, and then the more complex case of a cell transitioning from a normal state to a tumor state is considered.

Estimating the concentration of signal molecules from a normal vascular endothelial cell
This section develops a state estimation problem for the signal molecules (NO or H 2 O 2 ) from vascular endothelial cells. The width and length of the endothelial cells is more than 10 μm [38], which indicates that dozens of SWNT-based sensors can be placed on a single vascular endothelial cell and send multiple stochastic monomolecular adsorption data [7] (Fig 2). Among them, sensors near the enzymes generating signal molecules, where frequent adsorption/desorption events are detected, can be selected and used in the estimation of the local concentration of the signal molecules. The small area proximate to the generator of the signal molecules can be considered as a neighborhood sharing same local concentration that represents the cell state as a whole.
It is difficult to obtain from an experimental setup a large dataset that includes sufficient, representative stochastic variations to render a fair and thorough evaluation of estimation performance. Alternatively, representative stochastic adsorption datasets can be generated from kinetic Monte Carlo (KMC) simulations. Each KMC simulation run can be viewed as a realization of the stochastic system that is described by the CME [39]. The adsorption/desorption process involves fairly simple molecular level events and Zhang et al. 2010 [8] showed that experimental data for this system was well described by the KMC simulation.
In this particular simulation study, the number of adsorbed molecules on the sensor is allowed to range from 0 to 10, so the number of possible discrete states is 11. The length of each run is 2000 s and the sampling time interval is 1 s. The starting state is assumed to be 0 (empty of molecules). Adsorption/desorption parameters, k A and k D , are chosen as 100 M −1 s −1 and 0.001 s −1 respectively, which are taken from [6]. In the normal vascular endothelial cell, the signal molecules are released consistently from the enzyme at a low concentration level (~10 μM) [40]. These dynamics can be simply described as an integrated white noise process,  Performance of the two estimation methods can be compared by observing how well the estimates track the true concentration profile throughout the run time from a wrong initial guess ( x 1 ¼ 2 Â 10 À5 ; Q 1 ¼ 1 Â 10 À5 ). Plots of the estimates for the 1-sensor and 5-sensor cases are shown in Fig 5. For the 1-sensor case, the PF estimates follow the true concentration more closely than the KF (Fig 5A). For the 5-sensor case, the gap between the PF and KF estimates is reduced as long time (Fig 5B).
For quantitative comparison, the root-mean-square-errors (RMSEs) of the estimated concentrations per run are averaged over 100 runs that generated different adsorption/desorption data from different local concentration profiles (2000-sample dataset per one run). The averaged RMSE is defined by where N S is the number of samples in one run, N R is the number of runs, C true,i,k is the true local concentration value, andĈ i;k is the estimate for the kth sample time of the ith run. In Table 1, for all cases, the RMSEs of the estimates from PF are smaller than from KF. The difference in the RMSEs of the two methods slowly decreases with increasing number of sensors in the neighborhood, while the computation time of PF is higher and increases more rapidly than that of KF. If the objective is only a nominal state estimate, the benefit of rigorous stochastic modelling in the state estimation is reduced when more information is contained in the dataset (through the use of multiple sensors). Of course, a disadvantage of the KF for any number of sensors is that it is not capable of estimating the non-Gaussian character of the distribution of the state estimates.

Estimating the signal molecules from a cell having two states
In normal vascular endothelial cells, signal molecules generated from the enzymes are at a low concentration level (~10 μM). In tumor vascular endothelial cells, on the other hand, the expression levels and activities of eNOS are abnormally increased compared to the normal endothelial cells (Fig 6), and the elevated level of NO promotes tumor progression and metastasis by inducing angiogenesis as well as tumor cell invasion, proliferation, and migration [41,   ]. For H 2 O 2 , there is also a considerable variation among cells in the concentration level required to initiate a particular biological process. Moreover, it has been observed that different levels of H 2 O 2 can induce distinct responses within a cell. For example, overproduction of H 2 O 2 results in proliferation and migration of smooth muscle cells, contributing to atherogenesis and restenosis [43].
Hidden Markov model. The concentration of the signal molecules affects and is affected by the state of the cell. For real-time state estimation, consideration of all the complex biological processes associated with different cell states is very challenging and linking with concentration variations of the signal molecules can easily become intractable. In addition, the signal molecules are small gaseous molecules showing very fast diffusion (with diffusion coefficients of around 10 −5 [cm 2 /s]) compared to cell activities in tissues [40]. In this context, we simplify the system to having two states: a normal state and an abnormal state. In the normal state, the signal molecules are released consistently at a low concentration level. In the abnormal state, the concentration of the signal molecules increases (drifts) rapidly to a new elevated The values are averaged over 100 runs. 2 The computation time was recorded in seconds using a workstation with 3.40 GHz CPU and 8GB RAM. Such a pattern in Fig 7 can be characterized as a mixture of quiescent and drifting phases, which is called "intermittent drift." A hidden Markov model (HMM) can be used for modelling such shifts in the disturbance pattern [44,45]. HMM represents a useful class of statistical models where a hidden state, H 2 f1; 2; . . . ; Hg transitions probabilistically among possible states in a Markovian fashion. In this work, each member of the set H represents a particular system state, for example, "normal cell" or "tumor cell" states. Mathematically, a finite-state Markov chain is a sequence of random integers, r k , where the transition probability matrix P has elements defined by Based on the transition probability matrix, the intermittent drift in the concentration of the signal molecule, x k can be described by where "1" indicates the normal cell and "2" indicates the tumor cell. The w r k is a white Gaussian noise with covariance Q r k defined by Since there is only a low probability of switching once the system enters a particular regime, a diagonally dominant P is employed, as reflected in (43) and (44). Note that the actual regime is usually not known with complete certainty and must be inferred from measurements. Additional behavior could be incorporated into the model by introducing more hidden states (e.g. other transitional cell states or environmental effects on the local concentration) with appropriate accompanying stochastic models for them.
Second-order generalized pseudo-Bayesian algorithm. For using KF for a Markov jump system represented by (42), the generalized pseudo-Bayesian estimation algorithm of order 2 (GPB2) has been suggested as an effective sub-optimal filter [46]. Letx kjk ðr kÀ1 ; r k Þ denote the estimate conditioned on the two most recent hidden state realizations. Similarly, the corresponding estimation error covariance is represented as P k|k (r k−1 , r k ). The main idea is to generate multiple Gaussian distributions from KF for all possible trajectories of the last two hidden states, and combine them into a single Gaussian distribution, parameterized by fx kjk ; P kjk g. A recursive scheme is characterized by two steps: "branching" and "merging." Starting with fx kÀ1jkÀ1 ðr kÀ1 Þ; P kÀ1jkÀ1 ðr kÀ1 Þg, the branching step is to obtain the set fx kjk ðr kÀ1 ; r k Þ; P kjk ðr kÀ1 ; r k Þg through the prediction and correction steps of KF. The merging step involves the law of total probability and Bayes' rule to collapse the products from the branching step asx kjk ðr k Þ ¼ X 2 r kÀ1 ¼1x kjk ðr kÀ1 ; r k Þ pðr kÀ1 jr k ; Y k Þ ð 47Þ P kjk ðr k Þ ¼ X 2 r kÀ1 ¼1 ½fx kjk ðr kÀ1 ; r k Þ Àx kjk ðr k Þg 2 þ P kjk ðr kÀ1 ; r k Þpðr kÀ1 jr k ; Y k Þ ð 48Þ where c 1 is a constant ensuring that p(r k−1 |r k , Y k ) sums to unity, and p(y k |r k−1 , r k , Y k−1 ) is related to the correction step of KF in the branching step. A point estimate is obtained from where c 2 is a constant ensuring that p(r k |Y k ) sums to unity. Markov chain Monte Carlo algorithm. Adapting PF to the Markov jump system is relatively simpler than KF. Starting with fx i kÀ1 ðr kÀ2 ; r kÀ1 Þ; w i kÀ1 ðr kÀ2 ; r kÀ1 Þg, samples x i k ðr kÀ1 ; r k Þ for i = 1,. . ., M are generated from the same importance density of (33) for all possible trajectory for the recent hidden Markov states, r k−1 = 1, 2 and r k = 1, 2. This approach is called the Markov chain Monte Carlo (MCMC) algorithm [47]. The weight update Eq (35) is modified by including, p(r k , r k−1 |Y k ), w i k ðr kÀ1 ; r k Þ / w i kÀ1 ðr kÀ1 ; r k Þ pðr k ; r kÀ1 jY k Þ Y N s j¼1 pðy k;j jx i k;j ðr kÀ1 ; r k Þ; y kÀ1;j Þ ð 53Þ Finally, the point estimate can be obtained bŷ Detection of tumor cell activity. As stated before, two regimes are considered in the system: the normal cell state and the tumor cell state. The objective is to detect a regime change through the local concentration variations of the signal molecules seen from the nano sensors. The reference work [7] investigates the effect of a growth factor, which stimulates cell growth, proliferation, and differentiation on the H 2 O 2 generation in living cells. From the 3000 s observation after the stimulation with the growth factor at t = 0, it was observed that the H 2 O 2 concentration level increased immediately and reached a maximum in the time range between 600 s and 1800 s. This observation indicates that the tumor cell activity and its effect on the local concentration of signal molecules can be prolonged for a long time (~30 min).
Based on this data, stochastic adsorption/desorption profiles were generated from KMC simulation. The number of adsorption sites on the sensor is 10 and the length of each run is 4000 s with the sampling time interval of 1 s. The starting state is assumed to be a random integer less than 10 (partly occupied) and the k A and k D are assumed to be 100 M −1 s −1 and 0.001 s −1 . Eq (42) is used in the state estimation as the state model. At the 'normal cell' state, the local concentration is stable and affected only by low-level noise (w k $ N ð0; 10 À16 Þ). When the cell becomes a tumor cell, the local concentration becomes elevated by high-level noise (w k $ N ð0; 5 Â 10 À14 Þ). The plots in   Table 2 shows the averaged RMSEs and computational time for PF-MCMC and KF-GPB2 based on 100 runs that generated different adsorption/desorption data from different tumor cell activities. In both methods, the RMSEs of the estimates decrease with increasing number of sensors in the neighborhood with PF-MCMC having smaller RMSE values than KF-GPB2. Though the computation time of PF-MCMC is larger than KF-GPB2, it is far less than the sampling time of 1 s.
In a real application, the neighborhood region proximate to the enzyme should be very small considering the short-life time and high diffusivity of the signal molecules. Therefore, less than 5 sensors might be valid in the state estimation for a single cell [7]. In this context, PF-MCMC can be recommended if accurate estimates of the local concentration of the signal molecules are needed with such limited information.

Conclusions
Two stochastic state estimation methods-Kalman filtering (KF) and particle filtering (PF)were investigated for estimating the time-varying local concentration of signal molecules from stochastic monomolecular adsorption/desorption data on the surface of the carbon-nanotube (CNT)-based sensors. In addition, the second-order generalized pseudo Bayesian estimation (GPB2) algorithm and the Markov chain Monte Carlo (MCMC) algorithm were incorporated into KF and PF respectively, for detecting latent drift in the concentration affected by different states of a cell. The stochastic nature of the adsorption data from each CNT-based sensor was fully modelled by using the chemical master equation (CME). In addition, intermittent concentration variations of the signal molecules were modelled by a hidden Markov model. Performances of the state estimators with the sensor array system were compared through a case study employing KMC simulation. The PF-MCMC combination showed the highest accuracy while having reasonable computation time.
Use of drugs affecting the production of signal molecules by inhibiting the associated enzyme or directly scavenging the signal molecules appears to be a promising strategy to inhibit angiogenesis and therefore tumor growth [48,49]. In order to control the modification of signal molecules in a precise manner, further understanding of various factors involved such as the timing, concentration, and location is required. The proposed state estimators have promise in this endeavor.
Supporting Information S1 Dataset. Original data including true concentration profiles, stochastic adsorption profiles and concentration estimates for plotting Figs 3,5,8,9 and 10. (XLSX) The values are averaged over 100 runs. 2 The computation time was recorded in seconds using a workstation with 3.40 GHz CPU, 8GB RAM.