A predictor-informed multi-subject bayesian approach for dynamic functional connectivity

Dynamic functional connectivity investigates how the interactions among brain regions vary over the course of an fMRI experiment. Such transitions between different individual connectivity states can be modulated by changes in underlying physiological mechanisms that drive functional network dynamics, e.g., changes in attention or cognitive effort. In this paper, we develop a multi-subject Bayesian framework where the estimation of dynamic functional networks is informed by time-varying exogenous physiological covariates that are simultaneously recorded in each subject during the fMRI experiment. More specifically, we consider a dynamic Gaussian graphical model approach where a non-homogeneous hidden Markov model is employed to classify the fMRI time series into latent neurological states. We assume the state-transition probabilities to vary over time and across subjects as a function of the underlying covariates, allowing for the estimation of recurrent connectivity patterns and the sharing of networks among the subjects. We further assume sparsity in the network structures via shrinkage priors, and achieve edge selection in the estimated graph structures by introducing a multi-comparison procedure for shrinkage-based inferences with Bayesian false discovery rate control. We evaluate the performances of our method vs alternative approaches on synthetic data. We apply our modeling framework on a resting-state experiment where fMRI data have been collected concurrently with pupillometry measurements, as a proxy of cognitive processing, and assess the heterogeneity of the effects of changes in pupil dilation on the subjects’ propensity to change connectivity states. The heterogeneity of state occupancy across subjects provides an understanding of the relationship between increased pupil dilation and transitions toward different cognitive states.


Introduction
Functional connectivity (FC) has emerged as one of the most active research areas in functional magnetic resonance imaging (fMRI).The purpose of FC studies is to characterize the undirected statistical 1 arXiv:2210.01281v3[stat.ME] 9 Jan 2023 dependencies between brain regions and thus gain a greater understanding of brain functioning (Friston et al., 1994;Hutchison et al., 2013).Simple approaches to studying FC rely on readily available measures of temporal correlation, such as the partial correlations between two brain regions after conditioning upon all other regions (Fornito et al., 2013;Friston, 2011).Such metrics assume that interactions between brain regions are constant in space and time throughout the fMRI session (static connectivity, Li et al., 2008).Rather, neuroscientists have become increasingly aware that functional connectivity is dynamic and fluctuating, i.e. non-stationary, and that studying the dynamics of FC is crucial for improving our understanding of human brain function (Hutchison et al., 2013;Vidaurre et al., 2017;Lurie et al., 2020).The term "chronnectome" has been introduced to describe the growing focus on identifying time-varying, but reoccurring, patterns of coupling among brain regions (Calhoun et al., 2014).
Recent studies have highlighted how subjects are more likely to experience particular connectivity states when some underlying physiological conditions are present.For example, Chand et al. (2020) have investigated the association between heart rate variations and FC.Similarly, in a sleep fMRI study, El-Baba et al. (2019) have shown that transitions between connectivity states slow as subjects fall into deeper sleep stages.As a further example, Kucyi et al. (2017) have described how connectivity dynamics are associated with attentiveness in a pencil-tapping task.These studies, among others, have motivated the need for models that provide a better understanding of how the transitions between different functional connectivity states are modulated by internal or external conditions measured during the course of an experiment.In the experimental study we consider in this manuscript, we have available fMRI data collected together with pupillometry measurements.Pupil dilation has become increasingly popular in cognitive psychology to measure cognitive processing and resource allocation.It is believed that the changes in pupil diameter reflect brain state fluctuations driven by neuromodulatory systems (Sobczak et al., 2021).For example, the pupil dilates more under conditions of higher attention (Siegle et al., 2003).Thus, pupil dilation measurements can be seen as an index of effort exertion, task demand, or difficulty in an fMRI experiment (van der Wel and van Steenbergen, 2018).Thus, it is of interest to understand how pupil dilation is associated with an increased probability of particular connectivity states experienced by a subject during an experiment (Martin et al., 2021).
Many of the commonly used approaches for studying dynamic connectivity rely on multi-step inferences.For example, in Calhoun et al. (2014) the fMRI time courses are first segmented by a sequence of sliding windows, and then precision matrices are estimated in each window.Finally, k-means clustering methods are used to identify re-occurring patterns of FC states.Post-hoc analyses may be employed to assess the association of the estimated dynamic connectivity states with other available measurements, like pupil dilation measurements (Haimovici et al., 2017).However, the arbitrary choice of the window length and the offset may lead to spurious dynamic profiles and poor estimates of correlations for each brain state (Lindquist et al., 2014;Shakil et al., 2016).Improvements were proposed by Cribben et al. (2012Cribben et al. ( , 2013) ) and Xu and Lindquist (2015), who developed change point detection methods to recursively partition the fMRI time series into stable contiguous segments where networks of partial correlations are estimated by employing the graphical lasso of (Friedman et al., 2008).These methods do not require pre-specifying the segment lengths and can detect the temporal persistence of the functional networks.However, they do not account for the possibility of states being revisited and hence do not conform to the idea that the chronnectome exhibits recurrent patterns of dynamic coupling between brain regions of interest (ROIs).
Other model-based approaches to dynamic connectivity consider the set of ROIs as the nodes (or vertices) of an underlying graph and employ homogeneous hidden Markov models (HMMs) to detect state transitions and infer a discrete set of latent connectivity states over time.Warnick et al. (2018) develop a Bayesian HMM to model dynamic FC as the transition between state-specific graphs, each graph being related to others via an underlying super-graph.Sourty et al. (2016) use product HMMs to describe the evolution of the sliding-windows correlations and capture temporal dependencies across resting-state networks.Chiang et al. (2015) used a Bayesian HMM to estimate the dynamic structure of graph theoretical measures of whole-brain FC.Also, HMMs have been employed in time-varying vector autoregressive (VAR) modeling frameworks for whole-brain resting state connectivity, where the VAR coefficients and the innovation covariance matrix are allowed to change with the latent states (Vidaurre et al., 2017;Ting et al., 2018;Ombao et al., 2018).However, these implementations of hidden Markov models typically assume that the probabilistic model underlying the state transitions is constant throughout an experiment.Crucially, such a homogeneity assumption does not allow to assess the modulatory effect of time-varying physiological factors on the transitions, e.g.how changes in vigilance measured via pupil dilation can impact state transitions (Lurie et al., 2020).
In this paper, we develop a multi-subject Bayesian framework where the estimation of dynamic functional networks is informed by time-varying exogenous physiological covariates that are simultaneously recorded in each subject during the fMRI experiment.More specifically, we introduce a multi-subject non-homogeneous HMM modeling framework where the transition probabilities between states are shared between subjects and vary over time as a fucntion of the covariates.Our setting allows for the estimation of unique connectivity state transitions for each subject.It also permits group-based inferences, via recurring connectivity patterns and sharing of networks among the subjects.With respect to the multi-step inference strategies described above, in our approach both the dynamic connectivity states and their association with the physiological measurements are estimated in a single modeling framework, accounting for all uncertainties.Kundu et al. (2018) have recently proposed a two-step multi-subject fused-lasso approach for detecting change points in functional connectivity.Differently from their proposal, our method does not assume that the experimental design and the timing of the change points between connectivity states are shared among all subjects, and can therefore be applied to more general experimental designs.Indeed, our approach allows for differing state transition behavior across multiple subjects by modeling the state transition parameters hierarchically.Our modeling approach further assumes sparsity in the network structures, by assuming a shrinkage prior on the connectivity networks.Additionally, we propose a strategy for edge selection that combines the posterior shrinkage-informed thresholding approach of Carvalho et al. (2010) with the Bayesian False Discovery Rate controlling method of Müller et al. (2006).
We apply our modeling framework to a resting-state experiment where fMRI data have been collected concurrently with pupillometry measurements, leading us to assess the heterogeneity of the effects of changes in pupil dilation on the subjects' propensity to change connectivity states.Changes in pupil diameter have been linked to attention and cognitive efforts modulated by the activity of norepinephrine-containing neurons in the locus coeruleus (LC).For example, Joshi et al. (2016) have shown that LC activation predicts changes in pupil diameter that either occur naturally or are caused by external events during near fixation, as in many experimental tasks.Therefore, pupil dilation has been used as a proxy for a metric of a person's willingness to exert more effort and involve a greater mental effort to complete a task.Recent methods for studying such association use a multi-step approach, first identifying switches in subjects' state sequences and then calculating the difference between the normalized pupil size before and after the estimated switch (see, e.g.Hussain et al., 2022).In our application, we demonstrate how the model can recover expected change points in dynamic FC states, as those states align quite well with the experimental events regulated by the behavioral task.
The rest of the paper is organized as follows.In section 2 we describe our proposed method and edge selection procedure.We also give a brief synopsis of our Markov Chain Monte Carlo (MCMC) approach to posterior inference.In section 3 we showcase our model performance on simulated data and provide comparisons to the sliding window and homogenous HMM approaches.Lastly, in Section 4, we apply our model to the LC handgrip data with accompanying results and analysis.Section 5 concludes the paper with a discussion.

Methods
In this section, we describe our proposed predictor-informed multi-subject model for dynamic connectivity.This is a single-step fully Bayesian approach that explicitly models the heterogeneity in the dynamics of connectivity patterns across all subjects and -simultaneously -estimates the predictor effects on those dynamics.We achieve this by constructing a non-homogeneous Hidden Markov Model (HMM) where the transition probabilities are functions of time-varying covariates.

An HMM model for dynamic functional connectivity
T denote the vector of fMRI BOLD responses measured at time t in R regions of interest (ROIs), t = 1, . . ., T on subject i = 1, . . ., N .We adopt a Gaussian graphical model framework, and assume multivariate normality of the bold signals, that is , where µ i t is a mean regression term and Ω i t indicates a time-varying precision matrix, i.e. the inverse covariance matrix at each time point.In graphical models, the zeros of the precision matrix correspond to conditional independence; that is, if an off-diagonal element ω jkt = 0, j, k = 1, . . ., R, j = k, then the signals Y i tj and Y i tk (j = k) are conditionally independent.The mean term µ i t can be specified as a general linear model (Friston, 1994) to capture activation patterns over time, as done for example in Warnick et al. (2018).Here, however, since we are interested in capturing connectivity patterns through the modeling of the time-varying precision matrix, we assume without loss of generality that the BOLD signal has been mean-centered by removing any estimated trend and activation component.This "detrending" is not uncommon for studying FC, especially for task-based fMRI data, where the data are first meancentered, to remove any systematic task-induced variance, and the analysis is then conducted on the time series of the residuals (see, e.g.Fair et al., 2007).
We propose to model the dynamics of FC using an HMM framework with S latent states characterizing FC and the brain transitions during the fMRI experiment.Our formulation captures the heterogeneity of individual-specific patterns of connectivity over time, since each subject's fMRI data may be characterized by specific change points and evolution of the brain's functional organization.However, we assume that the connectivity patterns may also be re-occurring and they can possibly be shared among the subjects, thus allowing for group-based inferences.Let (s 1 , . . ., s T ) be a T -dimensional vector of categorical indicators s t , such that s t = s if state s is active at time t, s = 1, . . ., S.Then, we assume the data follow a Gaussian graphical model at time t of the type with subject-level precision matrices which, at each time, are characterized by the values of one among S precision matrices, identifying which state is active at that time.Model (1) therefore implies connectivity networks that vary by subjects and by state.

Modeling connectivity transitions as a function of observed physiological factors
Many neuroscience experiments involve the simultaneous collection of fMRI data together with physiological, kinematics and behavioral data (Wilson et al., 2020).For example, our motivating application considers a handgrip task where pupillometry dilation data (i.e., measurements of pupil dilation sizes) are concurrently recorded.Pupillary dilation is regarded as a surrogate measure for activity in the locus coeruleus circuit, which plays a central role in many cognitive processes involving attention and effort, and it is considered the main source of the neurotransmitter noradrenaline, a chemical released in response to pain or stress.Neuronal loss in the locus coeruleus is known to occur in neurodegenerative disorders such as Alzheimer's disease and related dementias as well as Parkinson's disease dementia.It is therefore important to understand how brain dynamics may be differentially modulated as a function of pupil dilation in different subjects.
Here, we propose to model the dynamics of FC by developing a non-homogeneous HMM framework where estimation is informed by subject-level time-varying exogenous physiological covariates, e.g.physiological factors like the pupillary data in our motivating application.More in detail, we assume that switches between states are regulated by transition probabilities that vary over time and across subjects as a function of B time-varying subject-level covariates as where x i t denotes a B × 1 vector of covariate values for subject i at time t, and ρ i s = (ρ i s1 , . . ., ρ i sB ) is the corresponding B × 1 vector encoding the effect of each covariate on the probability of transitioning to state s for subject i.The parameter ξ i rs defines a baseline transition probability from state r to state s for subject i, that is the transition probability without any covariate effect.To ensure identifiability, we define a state as reference.Without loss of generality, we set s = 1 as the reference state, and also set the coefficients ρ i 1b , b = 1, . . ., B, and ξ i 1 • , i = 1, . . .N equal to zero.Thus, the state transition coefficients are interpreted with respect to the reference state, and we can re-express (2) in terms of the log-relative odds of the transition from state r to state s compared to the transition from state r to the reference state 1, In this formulation, the transition coefficients exp(ρ i sb ), b = 1, . . .B, are more naturally interpreted as the relative change in odds of transitioning to state s compared to transitioning to state 1, after a one unit change in x i tb , holding all other covariates as constant.Similarly, the coefficient exp(ξ i rs ) is interpreted as the expected odds of transitioning from state r to s compared to transitioning from state r to 1, when the time-varying covariates, x i t , are 0 or at a baseline/average value.We assume independent Gaussian priors for the transition parameters ρ and ξ.We further allow for sharing of information in estimating the state transition structure across subjects, by employing a hierarchical modeling formulation for the state transition parameters.More specifically, we assume that the individual coefficients ξ i rs and ρ i sb , b = 1, . . ., B, vary around population-level means, Z rs and η sb , as follows: where ) T , and r, s = 1, . . ., S, b = 1, . . ., B. By allowing each subject to have their own transition parameters the model allows for unique subject-level transition behavior while also capturing population-level estimates through the group level parameters.The interpretation of the group level parameters, η and Z, is similar to their single subject counterparts.The prior means z 0 rs are usually set to 0 except for z 0 rr , r = 1, which is set to be positive to encourage state persistence over time (self-transitions) and thus more stable estimated state sequences.Keeping in mind that these state transition parameters operate on the log odds of transition relative to state 1, and that interpretation of the parameters require exponentiation, a small shift in value for the state transition parameters can result in rather large changes in state transition behavior.To this end, we recommend setting the variance parameters of the priors for ξ, ρ, Z rs and η sb to some small positive value on the order of 0.1.

Modeling sparsity through a graphical horseshoe prior
Functional networks are thought to exhibit the so-called small world behavior, indicating a high degree of clustering and high efficiency in the estimated networks (Wang et al., 2010;Essen and Tononi, 2016).This leads to an expectation of sparsity within functional networks and the associated precision matrices.In a Bayesian framework, sparsity can be enforced by postulating either selection-or shrinkage-inducing priors.Selection involves inferring which off-diagonal elements of the precision matrix should be set to exact zeros.Warnick et al. (2018) achieve such a selection by using a G-Wishart prior to sample positive definite matrices according to estimated adjacency matrices that correspond to the FC networks.This selection approach is intuitive and leads to straightforward inferences via the posterior probabilities of inclusion of the elements of the precision matrix.However, it is computationally challenging and does not scale well to relatively large dimensions of the networks.Here, instead, we take a shrinkage-based approach and model the off-diagonal entries of the state-specific precision matrices Ω s , s = 1, . . ., in (1) by employing a graphical horseshoe prior (Li et al., 2019).Thus, we set where I(Ω s ∈ S R ) is an indicator function to ensure that samples of Ω s belong to the space of positive definite R × R matrices and C + (•; µ, σ) denotes a half-Cauchy distribution with location parameter µ and scale σ.In (5), we further assume a non-informative flat prior for the diagonal elements, i.e. ω jjt ∝ 1.The shrinkage of the off-diagonal elements is obtained through the combined effect of the variance components λ 2 jk and τ 2 in the normal priors for ω jkt , j = 1, . . ., k − 1, k = 1, . . ., R. The parameter τ is a global shrinkage parameter, that controls how sparse the precision matrix is in its entirety.The parameter λ jk:j<k defines instead a local shrinkage parameter, since it allows to shrink Figure 1: Graphical representation of the proposed PIBDFC.The data Y i t are emissions from a distribution that is parameterized by a precision matrix Ω s i t , which encodes the FC and is determined by the state active at time t: s i t ∈ {1, . . ., S, t = 1, . . ., T , i = 1, . . ., N .The probabilities of transitions from s i t to s i t+1 are given by the This entry is modeled according to Equation 3. each individual off-diagonal entry ω jk towards zero, whereas it maintains the magnitude of non-zero entries and thus reduces biases.Following Li et al. (2019), we assume a half-Cauchy prior on τ , τ ∼ C + (•; 0, τ 0 ), with τ 0 indicating an a priori belief about the global sparsity of the estimated graph.In order to specify τ 0 , one can simulate graphs under the informal selection rule of Carvalho et al. (2010), where an edge j,k is selected if E( 1 1+λ jk τ ) < 0.5.We demonstrate such a process in Figure 9 in the Appendix.We find that a τ 0 = 1 gives an expected edge density of approximately 50% while having the largest spread.Figure 1 provides a graphical representation of the proposed predictor-informed Bayesian dynamic FC model (PIBDFC).

Posterior Inference
The posterior distribution for the parameters in the proposed model is not available in closed form.Hence, we revert to Markov Chain Monte Carlo (MCMC) techniques for posterior inferences.In particular, we follow Holsclaw et al. (2017) and employ Polya Gamma auxiliary variables (Polson et al., 2013) to sample the state transition parameters.Based on the sampled Q i •,•,t , we can construct a sequence of transition matrices based on equation (3).After normalizing each row Q i s,•,t so that it sums to 1, we use a stochastic forward-backward algorithm to sample the state sequence (Scott, 2002).Then, conditioned upon the estimated state sequence, it is possible to sample the precision matrix parameters using the blocked Gibbs algorithm presented in Li et al. (2019).Other parameters in the hierarchical model for the states' transitions (4) are sampled via simple Gibbs steps.By iterating through the steps above, we obtain samples from the posterior.We provide a brief summary below: We can rewrite the likelihood for ξ i rs according to Holmes and Held (2006) to be in the form of Equation 6. where Using the Polya-Gamma augmented logistic regression technique of Polson et al. (2013), we get the posterior of ξ i rs to be conditionally Gaussian.
where n rsi is the count of transitions from state r to state s during the timecourse of subject i and N ri is the number of times subject i visited state r.ω i rst is a Polya-Gamma random variable distributed P G(1, ξ i rs − c i rst ).We use a similar strategy to update ρ i rb , the logistic component for subject i for state r and covariate b, achieving the posterior:

Sample s i t :
We sample the sequence of states by adapting the stochastic forward-backward algorithm presented by (Scott, 2002).
3. Sample the matrices Ω i s , s = 1, . . ., S: The conditional posterior for Ω s is as follows: For MCMC inference purposes, Li et al. (2019) adopt auxiliary variables ν λ and ξ τ , in order to modify the Gibbs sampling procedure presented by Makalic and Schmidt (2016).This procedure is performed for a column-wise update in a fashion similar to Wang (2012).For each state, we update Ω s by following the Graphical Horseshoe algorithm letting S = n s * Σs where n s and Σs are the sizes and sample covariance matrices of observations assigned to state s.
4. Sample Z rs , η b : These conditional posteriors follow the typical normal-normal update:

Graph Selection
Our model achieves sparsity of the estimated functional network thanks to the shrinkage of the offdiagonal elements of Ω provided by the graphical horseshoe prior.However, shrinkage priors do not lead to exact zeros.Hence, non-relevant connectivities need to be identified through post-MCMC inference.For example, Li et al. (2019) suggest using 50% posterior credible intervals of the inverse-covariance elements, and then thresholding the off-diagonal element to zero if the interval contains 0, reporting the posterior mean otherwise.However, the resulting selection does not provide a multiplicity control for false discoveries.
We follow a decision-theoretic approach and formulate the graph selection problem as a testing problem based on the posterior evidence of shrinkage for each off-diagonal element of the precision matrix Ω s .Since we consider the posterior estimates of Ω s for each state s = 1, . . ., S, separately, in the following we drop the superscript s for notational simplicity, unless needed for clarity.For any given state s = 1, . . ., S, the j, k off-diagonal element ω jk (j < k; k = 2, . . ., R) provides a measure of the connectivity level, with ω jk = 0 indicating that the connectivity is truly zero, and |ω jk | = 0 otherwise.Let δ jk indicate the decision (action) in the testing problem, that is δ jk = 1 corresponds to rejecting the null hypothesis of no connectivity and δ jk = 0 failure to reject (acceptance).Let D = j<k δ jk indicate the total number of positive (significant) decisions taken.Following Müller et al. (see 2007), for real numbers c 1 , c 2 > 0, we can then determine the optimal set of decisions δ = {δ 12 , δ 13 , . . ., δ R−1 R } by minimizing the following loss function: The loss function compounds a reward for correct decisions (true positives), provided by the first addend, − j<k δ jk |ω jk |, where each correct decision is proportional to |ω jk |'s, and a penalty for false negative discoveries, represented by the second addend, (1 − δ jk ) |ω jk |.The last term encourages parsimony, by increasing the penalty as the number of significant elements increases.The optimal decision is obtained by minimizing the posterior expected loss, where E(ω jk |Y , τ ) is the posterior mean of the off-diagonal elements of the inverse matrix Ω.The minimizer corresponds to a threshold of the posterior means to identify the non-zero elements of the precision matrix, Li et al. (2019) show that the posterior mean is unbiased and it can be represented as a linear function of a shrinkage factor defined by the expected value of the random variable κ jk = 1 1+λ 2 jk τ 2 , which has a compound confluent hypergeometric distribution (Gordy, 1998).More in detail, E(ω jk |Y , τ ) = (1 − E (κ jk |Y , τ )) ω jk with ω jk representing the least square estimate of ω jk , j < k.See Theorem 4.1 in Li et al. (2019), and related discussions in Bhadra et al. (2019).The result extends trivially to the folded normal distribution characterizing |ω jk |.Note that κ jk ∈ (0, 1), and that larger values of E(κ jk ) indicate stronger shrinkage of the posterior estimates toward zero.
Graph selection can be conducted by thresholding an estimate κjk of the shrinkage factor κ jk , i.e. for some threshold η ∈ (0, 1).For example, in the simple regression case, Carvalho et al. (2010) have previously recommended an informal decision rule thresholding ω jk to 0 if 1 − κjk < 0.5 where κjk is the posterior median of κ jk .However, such a rule does not take into account the multiplicity problem induced by the selection of the off-diagonal elements of the precision matrix.The posterior medians κjk provide a measure of the evidence in favor of the null hypothesis, H 0 : ω jk = 0. Hence, a threshold η could be set by controlling a measure of the Bayesian False discovery rate (BFDR, Newton et al., 2004) at a certain level q * , that is to satisfy the equation For a related but different solution to the problem of graph selection, see also Chandra et al. ( 2021), who consider inference on the partial correlation matrix derived from Ω.

Simulation Study
In this Section, we present three sets of simulated datasets that aim at measuring the performance of our model with respect to the detection of non-zero connectivities and the estimation of the latent connectivity states over time.More specifically, in the first two simulation studies, we compare the proposed predictor-informed Bayesian dynamic functional connectivity (PIBDFC) model with two alternative models: a widely-used tapered sliding window (Tapered SW) approach, first outlined by Allen et al. (2014), and the Bayesian Dynamic Functional Connectivity (BDFC) model developed by Warnick et al. (2018).The Tapered SW represents a standard approach in the FC literature, whereas BDFC uses a homogeneous HMM to model latent connectivity state dynamics.The BDFC provides a model-based estimation of exact zeros in the functional networks at the cost of computational scalability and speed, as opposed to our computationally faster soft-shrinkage-based approach.Furthermore, the BDFC does not incorporate any predictor information in the latent state dynamics.Both competing approaches were developed for single-subject inference.We compare to our multi-subject model by concatenating the multi-subject data along the time axis for input into the respective algorithms.All models are run on a Linux computer with an Intel Xeon Gold processor (2x 3.10 GHz) and 4 GB of RAM.For both the PIBDFC and BDFC, we simulated 5,000 posterior samples after 5,000 burn-in draws.When fitting PIBDFC, we set the hyperparameters τ 0 = 1, σ ξ = σ ρ = σ z = σ η = 0.1, following the motivations of Section 2.2.We assess the performance of our model on states' reconstruction by computing a set of metrics for each latent state separately.Let r jk , j < k; k = 2, . . ., R, denote the binary indicator of a nonzero connection between regions j and k.Following the discussion in Section 2.5, let δ jk indicate the decision after the model fit.Then we define the edge true positive rate (TPR) as r jk δ jk / r jk .Similarly, the edge true negative rate (TNR) is defined as The Edge F1 score (F1) is the product of the TNR and TPR, and serves as a measure of the overall performance in graph estimation, balancing between the TPR and TNR.Analogously, we define a metric to assess the performance of the model in the estimation of the states' sequences.Let s i t indicate the true latent state active at time t for subject i and let ŝi t indicate its model estimate.Then, the state sequence accuracy for state s is defined as {I(s i t = s)I(ŝ i t = s)}/ I(s i t = s).
Simulation Study 1: In our first study, we investigate the performance of our model in an ideal setting where the data generation process matches the model closely.We set T = 300 time points, R = 16 ROIs, N = 30 subjects, and S = 3 connectivity states.In this setting, we simulate data with Ω s i t encoding the individual conditional independence structure at time t, identified by the value of the state indicator variables s i t ∈ {1, 2, 3} and the prespecified graphs in the first row of Figure 2. In order to study the effect of the predictor information on the estimation of the transition probabilities and the FC dynamics, we introduce a single binary time-varying predictor variable, x t , which transitions from 0 to 1 when t = T 2 .For each value of the exogenous variable, we set the transition probabilities for the latent state trajectories as follows Therefore, for each subject, the state sequence enforces transitions between states 1 and 2 for the first half of the time series, whereas it enforces transitions between states 2 and 3 for the second half.
We then simulate different state sequences for each subject using equation (3), and replicated the process over 30 independent simulated data sets.In order to assess the performance of the methods for different levels of signal strength, we repeated the simulation experiment using different precision matrices Ω s , s = 1, 2, 3 of the same structure of the top row of Figure 2 but allowing for different values of the non-zero entries.This is done by using the sprandsym function from the Mathematics toolbox of Matlab.This function takes in an adjacency matrix representation of a graph, A s ∈ R R×R where A ijs = I(ω ijs = 0), and outputs a positive definite matrix with the same placement of 0's but random non-zero entries.This output matrix is then normalized to a partial correlation matrix.Thus, we obtained a total of six sets of precision matrices to learn the structure of.We show the aggregated results in Figure 3.The horizontal axis reports the average strength of the non-zero partial correlations for each of the six sets of precision matrices, indicating a level of signal strength.The PIBDFC consistently performs better in connectivity estimation with regard to true positive rate and F1 score, across all levels of partial correlations.The BDFC appears as the most conservative, as highlighted by the large true negative rates, but low true positive rates.Based on the results above, the PIBDFC displays the best balance of finding true non-zero partial correlations while controlling for false positives.
In the following, we illustrate the inferential analyses enabled by the proposed PIBDFC approach by showcasing a single replicate.In Figure 2 (bottom row) we show how the PIBDFC is able to recover the true conditional independence structure underlying the data generation process by estimating the partial correlations between regions and enforcing the true 0's through the BDFR approach devised in Section 2.5.The model is also able to recover the most likely state transition sequence for each subject, as determined by the maximum a posteriori state estimate at each time point.See Figure 4.It is also important to assess the ability of the method to identify true change points in the connectivity structure.Figure 5 reports the estimated connectivity change points for a representative subject.PIBDFC is able to estimate the state sequence well while tying together the increased rate of appearance of state 3 when the stimulus changes from 0 to 1 halfway through the simulated experiment.All models were compared in terms of computation time as reported in Table 1.PIBDFC is also able to draw as many posterior draws in a third of the computation time.Simulation Study 2: In this second simulation study, we measure the performance of our approach with synthetic data that are similar to real fMRI data.More specifically, we use the Matlab simulation toolbox SimTB of Erhardt et al. (2012) and follow the simulation approach of Warnick et al. (2018).
The SimTB toolbox implements a canonical hemodynamic response function (Lindquist et al., 2009), defined as a linear combination of two gamma functions, to simulate fMRI time series.This function is then convolved with a box stimulus function where Gaussian noise with variance = 0.01 is added.FC is then obtained by predefining cliques, i.e. clusters of regions, that have signal (here, 0.5) added to or subtracted from all regions in the clique simultaneously at random time points within a connectivity state.This induces correlation while having non-Gaussian errors.We then simulate the state sequence over T = 150 time points with x t being 0 for the first 75 time points and 1 for the last 75 among all subjects.Similar to Simulation Study 1, we use the exact same Q t among all subjects.We repeat this process for N = 30 subjects over 30 simulation replicates.In Table 2 we show the results to the application on the SimTB data.PIBDFC does a good job at detecting the connectivities between the simulated regions, despite a misspecified likelihood.The performance in both graph and state estimation appears to decline slightly in comparison to the Simulation 1 setting, which is expected.The Tapered SW approach suffers from low specificity.Compared to the standard HMM of BDFC, the proposed PIBDFC performs slightly better at detecting changes in state transitions, thus improving graph estimation performance as a result.This is likely due to the distortion introduced in the partial correlation by the convolution with the hemodynamic response function.In this setting, the covariate information becomes more relevant in helping the model identify changes in the state transition behavior.The computational time is also quite favorable compared to the approach of Warnick et al. (2018), despite allowing for individual differences in state dynamics among the 30 subjects.
Simulation Study 3: In this simulation setting, we compare the performances of our model and the Connectivity Change Point Detection (CCPD) model of Kundu et al. (2018) on edge-and changepoint detection.Contrary to our model, the CCPD model employs a two-stage approach for estimating dynamic FC.In the first stage, the model learns the number and locations of the change points from all available subjects' data.In the second stage, a graphical lasso approach is applied independently to the time scans between two change points.Since the CCPD model assumes that every change point occurs at the same time for each subject, in order to fairly compare the two methods we simulate data under the CCPD assumption of common change points.More specifically, we set T = 300 and generate Y i t ∼ N (0, Ω st ) where s t varies across the following sequence of states:{1, 2, 3, 1} switching at t = 75, 150, 225, for a total of 3 change-points overall.We use the same true partial correlation matrices to generate the data as in Simulation study 1.For the PIBDFC, a time point t for subject i was judged to be a change point if P (s i t = s i t−1 |Y i 1:T ) > 0.95.PIBDFC does not assume common change points and, as a result, does not infer common change points across individuals; therefore, we report the average number of change points across all subjects.Table 3: Simulation Study 3: Results over 30 repetitions.We show the entry-wise true positive and true negative rates for the estimated graphs for the corresponding states.We also show the estimated number of change points.PIBDFC performs comparably to CCPD in the setting where change points are common among subjects despite no explicit assumption of this being the case.
In Table 3, we show the results of the comparison between PIBDFC and CCPD under a shared change point model.CCPD is indeed able to accurately detect the number of change points and the resulting graph structure in each partition well.By thresholding the posterior probability of a change point, our model tends to overestimate the number of change points on average, as it sometimes estimates very sudden changes of state for a brief collection of time points in some subjects.In contrast, in simulation studies 1 and 2, the change points are generated from a process that truly follows a hidden Markov model, leading to more accurate estimates.By leveraging on the assumption of common change points, the two-stage CCPD model can achieve increased accuracy, while our model allows for the incorporation of individual transitions and covariates in the transition probabilities.

Case Study
We apply the proposed PIBDFC model to the motivating dataset.In our application, we demonstrate how the model can recover expected change points in dynamic FC states, as those states align quite well with the experimental events regulated by the behavioral task.We are also able to estimate the effect of pupil dilation on the subjects' propensity to change states.

Experimental design and data collection
In this experiment, subjects performed a handgrip task adapted from Mather et al. (2020).Thirty-one participants (18 females, mean age 25 years ± 4 years) enrolled in this study at the University of California, Riverside Center for Advanced Neuroimaging, but one was excluded due to a history of attention deficit hyperactive disorder resulting in a total of N = 30 subjects.All subjects provided written informed consent to participate, and received monetary compensation for their participation.The study protocol was approved by the University of California, Riverside Institutional Review Board (IRB).Magnetic resonance imaging (MRI) data were collected on a Siemens 3T Prisma MRI scanner (Prisma, Siemens Healthineers, Malvern, PA) with a 64 channel receive-only head coil.fMRI data were collected using a 2D echo planar imaging sequence (echo time (TE) = 32 ms, repetition time (TR) = 2000 ms, flip angle = 77 • , and voxel size = 2 × 2 × 3 mm 3 , slices=52) while pupillometry data were collected concurrently with a TrackPixx system (VPixx, Montreal, Canada).
All subjects underwent a 12.8-minute experiment in which they alternated between six resting state blocks and five squeeze blocks.In the squeeze blocks, they brought their dominant hand to their chest while holding a squeeze-ball (Mather et al., 2020).The five squeeze blocks lasted 18 seconds while the interspersed six resting state blocks had durations of five-, two-, two-, five-, one-, and one-minute, respectively.
All subjects underwent two sessions: one where they executed the squeeze at maximum grip strength (active session), and one where they still brought their arm up to their chest but were instructed simply to touch the ball and not to squeeze it (sham session).The fMRI data underwent a standard preprocessing pipeline using the brain software library (FSL).The pipeline consisted of slice time correction, motion correction, susceptibility distortion correction, and spatial smoothing using a kernel Gaussian smoothing factor set at a full-width half maximum of 0.8475 (Smith et al., 2004;Woolrich et al., 2009).Finally, all data were transformed from the individual subject space to the Montreal Neurological Institute (MNI) standard space using standard procedure in FSL (Smith et al., 2004;Woolrich et al., 2009).
Pupillometry data were collected during the scans, using a sampling rate of 2kHz, preprocessed using the ET-remove artifacts toolbox (github.com/EmotionCognitionLab/ET-remove-artifacts),and downsampled to match the temporal resolution of the fMRI data (Mather et al., 2020).To measure pupil dilations relative to baseline, the dataset was normalized by dividing by subject-specific means of the first five-minute resting state block (prior to any squeeze or hand-raising), leading to percent signal changes.Three subjects' data were discarded due to technical difficulties during the acquisition of pupil dilation measurements, resulting in N = 27 for all subsequent analyses.
Since we used a pseudo-resting state paradigm, our interest was focused on five networks of interest that have all been associated with resting state and have been related to attention in some manner.Default mode network (DMN; a resting state network) and dorsal attention network (DAN; an attention network) were selected because squeezing ought to invoke a transition from the resting state into a task-positive state (Greicius and Menon, 2004).The fronto-parietal control network (FPCN) was chosen because it is linked to DAN and regulates perceptual attention (Dixon et al., 2018).Salience network (SN) was selected because it determines which stimuli in our environment are most deserving of attention (Mather et al., 2020;Menon and Uddin, 2010).Talariach coordinates for regions of interest (ROIs) within DMN, FPCN, and DAN were taken from Deshpande et al. (2011) and converted to MNI coordinates while SN MNI coordinates were taken directly from Raichle (2011) (Deshpande et al., 2011;Laird et al., 2005;Lancaster et al., 2007;Raichle, 2011).Two ROIs from FPCN (dorsal anterior cingulate cortex and left dorsolateral prefrontal cortex) were excluded due to their close location to other ROIs.The locus coeruleus (LC) was localized using the probabilistic atlas described in Langley et al. (2020).Blood oxygen level-dependent (BOLD) signal from each voxel within an ROI were extracted and averaged to represent the overall signal for an ROI.We eventually considered 31 total ROIs: 9 from DMN, 7 from FPCN, 6 from DAN, 7 from SN, and 2 from LC.The MNI anatomical coordinates for the four attention networks and LC were used to center a 5 mm 3 isotopic sphere (Deshpande et al., 2009;Stilla et al., 2007).See the Appendix for a list of the ROIs and corresponding MNI stereotaxic space coordinates and networks.

Model fitting
The 31 ROIs described above formed the vectors of BOLD responses Y i t = (Y i t1 , . . ., Y i t31 ) measured on subject i = 1, . . ., 27 at time t, for t = 1, . . ., 1050.We also included concurrently recorded pupillometry data as a proxy for quantifying the effect of LC engagement on the dynamics of FC (Joshi and Gold, 2022).
We fit our model with different number of total states, i.e., S = 3, 4, 5, 6.However, when assuming more than 3 states, the fit simply degenerated to 3 states in the posterior inference, with no observations assigned to additional states.This result indicates no posterior support for models with S > 3 Thus, here we present the model specification for 3 states with the following settings for the hyperparmeters in (2).We set the group level baseline relative transition prior means z 0 rr = 2 for r = 2, 3 while all other elements of z 0 •• are set to 0. We also set the prior spread of the baseline transitions and pupillary effects σ z , σ η = 0.05.This combination of settings is used to encourage self-transitions, as they correspond to preferring smoother state sequences a priori among all subjects.We set the prior variability of the subject-level transition parameters around the group-level transition parameters, by choosing σ ξ , σ ρ = 0.1, therefore capturing individual differences between subjects on the log-odds of transitioning between states.Lastly, τ 0 , the hyperparameter informing prior knowledge of connectivity network sparsity, is set to 1, as this value corresponds to a prior distribution with a high spread over edge densities (see Figure 9 in the Appendix).7 shows the maximum a posteriori (MAP) estimated state sequences from our model for all 27 subjects.The subjects' rows are ordered by the similarity of the estimated state trajectories as captured by a hierarchical clustering using Euclidean distance.

Results and Inference
By inspecting Figure 6, it is apparent that state 1 shows relatively sparser connectivity than the other two states.In state 1, we can see strong bilateral connectivity among homologous regions in the left and right hemispheres, as well as several nodes in FPCN (dark blue) showing strong connectivity with multiple nodes in SN (light red); likewise, several nodes in DMN (dark red) show connectivity with SN (light red) nodes.There is almost no presence of anti-correlation.The dominance of SN connectivities together with both DMN and FPCN suggests that arousal may be up-regulated in this state.Indeed, Figure 7 suggests that state 1 occurs predominantly during the 'squeeze' periods of the behavioral task, when subjects either squeezed the squeeze ball or held it to their chest.This observation suggests that our model was able to detect those objectively-definable events in the time series of this experimental dataset.
In state 2, we see a quite different pattern: weaker average connectivity when compared to state 1, but also many more of these weaker connections both within-network and between networks.In addition to relatively ubiquitous within-network connections within FPCN (dark blue) and DMN (dark red), state 2 is characterized by cross-network connectivity -and anti-connectivity -between DMN and FPCN.Interestingly, these parallel some of the strongest connectivities from state 1.The relative occupancy in state 2 appears higher in the active condition (Figure 7, right half) than the sham condition (Figure 7, left half), suggesting subjects tended to occupy this relatively strong, broadlyconnected state more often when periodically engaging in actively squeezing the ball.
The strongest connections in state 3 deviate from those identified in states 1 and 2. There is weaker overall connectivity than state 1, but the connections are stronger and sparser (fewer connections) than state 2. We do again see many within-network connections, as well as relatively strong connections between nodes in FPCN (dark blue) and SN (light red), and also again between DMN (dark red) and SN (light red).However, we also see many more connections with SN from DAN (light blue) than in either of the other two states.We can therefore characterize this state as more sparsely connected than state 2 but still with broad connectivity, which is also consistent with the differences visually apparent in this state between active and sham conditions (right and left halves of Figure 7): this state traded off with state 2 for relative percentage occupancy across the subjects.Finally, a unique feature of our model is that it allows the investigation of how pupillary dilation modulates state transitions.Figure 8 provides the posterior distribution of the group (e η , left) and individual (e ρ ) effects of pupil dilation on state dynamics.We start by assessing the relationship between pupil dilation and state transitions for the group.Based on our findings, a 1% increase in pupil dilation relative to baseline is associated with a 31.4% (95%CI : 29.7% − 32.9%) decrease in the odds of transitioning to state 2 and a 34.9% (95%CI : 33.3% − 36.4%)decrease in the odds of transitioning to state 3, in comparison to remaining in the baseline state (state 1).This result is coherent with the findings outlined above since increased pupil dilation (a proxy for increased arousal/effort) appears associated with transitioning toward the less densely connected connectivity structure of state 1, dominated by edges between SN and both DMN and FPCN.We should note that the causal direction of the inferred associations can not be investigated by this model.
Further inspection of the right column of Figure 8 shows that the posterior distributions of the individual effects of pupil dilation e ρ•• is decidedly below 1 for all subjects, i.e. the association between increased pupil dilation and state 1 holds for all subjects measured.Subjects are ordered along the horizontal axis according to their similarity in state trajectories obtained from a hierarchical clustering, based on the Euclidean distance (similarly as in Figure 7).The horizontal dashed line represents the posterior mean from the group estimate in the right panel.It is interesting to note the differing clusters when comparing the posterior distributions of e ρ 2• to e ρ 2• : trending downwards and upwards respectively.Quite importantly, the correspondence between the groupings observed in Figure 7 and Figure 8 is a result of the posterior inference, not necessarily implied by the structure of our model.The differences in state trajectories between subjects lie in the state occupancy when pupil dilation is not higher than the reference, despite all subjects tending to transition to state 1 when raising their arm.
More specifically, subjects clustered in the first half of Figure 8 (right) tend to occupy state 3 during non-squeeze sections and so are even more likely to transition away from state 2 during periods of high pupil dilation.Similarly, subjects in the second half of the Figure tend to occupy state 2 during non-squeeze sections, and are thus very likely to transition away from state 3.This heterogeneity is important as it provides a more thorough understanding of the relationship between increased pupil dilation and transitions toward different cognitive states.

Discussion
We have proposed a multi-subject Bayesian approach for estimating dynamic FC where the brain network state transitions are dynamically informed by concurrently-recorded subject-specific covariates.The proposed method allows for group-level and subject-level inferences on the effects of time-varying covariates on the connectivity dynamics.We have applied our model to multi-subject resting state fMRI data with pupillary physiological data and we have shown associations between pupil dilation and strengthened connectivity between the SN brain regions with both the FPCN and DMN.This association coinciding with subject arm-raising/squeezing suggests SN connections with both FPCN and DMN are associated with subject arousal.
While we focused here on covariates that were concurrently recorded on each subject, our model can also incorporate covariates that are subject-specific and not time-varying.For example, demographic information may be added to the regression terms in (2)-( 3) and inform subject-specific transition probabilities to describe individual variability over the entire fMRI experiment.
Our model assumes a maximum number of states S to be pre-specified a priori.In our application, only a subset of the S available states was visited.However, in general, the number of states could be learned by assuming a Bayesian-nonparametric specification where the number of FC states is learned directly from the data (see, for example, Beal et al., 2002;Fox et al., 2011).However, the computational complexity of the inferential algorithm would increase considerably.Variational Bayes approaches could be implemented to obtain approximate inferences on the network connections.
Finally, the individual connectivity patterns could be associated with clinical or behavioral outcomes, e.g., to examine the individual heterogeneity of responses to treatments.A two-stage scalaron-image approach can be devised where the posterior means of the precision matrices are obtained from our model in the first stage and then used as predictors to investigate the association with the outcome in the second stage.These directions of research will be the object of future investigations.Figure 9: For each value of τ 0 , we simulate 1000 undirected 100 × 100 graphs under the graphical horseshoe prior.Plotted above are the 2.5, 50, and 97.5 percentiles of the edge density as a function of τ 0 .A value of τ 0 = 1 leads to approximately a 50% expected edge density, with high spread, in the sampled graphs.

Figure 2 :
Figure 2: Simulation Study 1: Top: The true partial correlation matrices for each state responsible for generating the simulation data in the Simulation Study 1. Bottom: The estimated partial correlation matrix from the proposed PIBDFC from a single repetition of the simulation.Each estimated partial correlation is the posterior mean of their respective distributions.Cells are set to 0 in post-hoc MCMC by controlling the BFDR at the 0.2 level.See Sections 2.5 and 3 for details.

Figure 3 :
Figure 3: Simulation Study 1: True Positive Rate, True Negative Rate, F1 Score, and state accuracy metrics for the PIBDFC, BDFC, and Tapered SW approaches over different settings of the correlation structure.Along each horizontal axis is the average strength of the non-zero partial correlations for each state, corresponding to different levels of signal strength.

Figure 4 :
Figure 4: Simulation Study 1: Top: The true state transition path for each subject (vertical axis) across each time point (horizontal axis).The color in each cell identifies which precision matrix in Figure 2 generated the simulated the data for each subject-time point pairs.Bottom: The maximum a posteriori estimated state trajectories from PIBDFC.

Figure 5 :
Figure 5: Simulation Study 1: Estimation of the connectivity change points in a representative subject.The horizontal axis indicates the time points while the vertical axis reports the posterior probability P (s 1 t = s 1 t−1 |Y i 1:T ).The posterior probabilities of a change point are in red, whereas the black spikes represent the true change points for the subject.We also display a horizontal dotted line at 0.95 to reflect the informal rule of declaring a change-point if P (s 1 t = s 1 t−1 |Y i 1:T ) > 0.95.

Figure 6
Figure6plots the estimated connectivity networks for each of the three states.Nodes represent ROIs and edges identify the estimated non-zero partial correlations between pairs of nodes.The edge colors correspond to the directionality of the partial correlations and the width corresponds to the magnitude.The dotted colors in the nodes identify clusters of regions within a priori, knowledge-based,

Figure 6 :
Figure 6: Real Data Analysis: the estimated connectivity networks for the ROIs.Nodes represent ROIs and the edges denote the partial correlations between the connected nodes.The edge colors correspond to the directionality of the partial correlations and the width corresponds to the magnitude.Node colors identify clusters of regions into a priori defined networks.See Section 4.3 andTable 4 in the Appendix Figure

Figure 7 :
Figure 7: Real Data Analysis: Estimated states' transition path for each subject.The horizontal axis indicates the TR with vertical dotted lines indicating portions where the subject raises their arm.Subject sequences are aligned so that the first 525 time points show sequences from the sham condition and the time points 526-1050 show sequences from the active condition.The vertical axis displays the subject indices, ordered by similarity in state trajectory according to a hierarchical clustering (based on the Euclidean distance) of their MAP transition behavior.

Figure 8 :
Figure 8: Real Data Analysis: The posterior distribution of the group effect of pupillary dilation e η (left), and individual effects of pupillary dilation e ρ .Rows indicate the propensity for transitioning into states 2 and 3 respectively.For the individual effects, subjects are identically ordered as in Figure 7.The horizontal dotted line is the posterior mean for the group-level effects, η 2 = 0.687 and η 3 = 0.651 respectively.

Table 1 :
Simulation Study 1: results over 30 repetitions.We report sensitivity and specificity metrics for the estimated graphs of the corresponding states, together with the overall accuracy of the estimated state sequences.Standard deviations across the 30 simulations are showed in brackets.The proposed method maintains the best balance between sensitivity and specificity as well as latent state estimation accuracy.

Table 2 :
Simulation Study 2: results over 30 repetitions.We report sensitivity and specificity metrics for the estimated graphs of the corresponding states, together with the overall accuracy of the estimated state sequences.Standard deviations across the 30 simulations are shown in brackets.The proposed method maintains the best balance between sensitivity and specificity as well as latent state estimation accuracy.