Using Bayesian Nonparametric Hidden Semi-Markov Models to Disentangle Affect Processes during Marital Interaction

Sequential affect dynamics generated during the interaction of intimate dyads, such as married couples, are associated with a cascade of effects—some good and some bad—on each partner, close family members, and other social contacts. Although the effects are well documented, the probabilistic structures associated with micro-social processes connected to the varied outcomes remain enigmatic. Using extant data we developed a method of classifying and subsequently generating couple dynamics using a Hierarchical Dirichlet Process Hidden semi-Markov Model (HDP-HSMM). Our findings indicate that several key aspects of existing models of marital interaction are inadequate: affect state emissions and their durations, along with the expected variability differences between distressed and nondistressed couples are present but highly nuanced; and most surprisingly, heterogeneity among highly satisfied couples necessitate that they be divided into subgroups. We review how this unsupervised learning technique generates plausible dyadic sequences that are sensitive to relationship quality and provide a natural mechanism for computational models of behavioral and affective micro-social processes.

• Φ = {φ i (o k )} are the observation probabilities, where φ i (o k ) is the probability of emitting symbol o k when the system is in state s i • π = {π i } is initial starting state This 5-tuple (S, Ω, P, Φ, π) is often abbreviated to λ = (P, Φ, π)) when the states and emission sequences are understood. Three fundamental problems for HMMs To effectively construct a HMM three problems need to be addressed:(1) Evaluation; (2) Decoding; and (3) Learning.
• Evaluation: What is the probability that a particular model produced a particular observation? That is, given a model λ = (P, Φ, π) and an observation sequence O = υ 1 , υ 2 , . . . , υ T of length T , where υ i ∈ Ω , what is the probability that the model generated the observation sequence; that is, what is P [O|λ]?
• Decoding: What is the most likely state transition path associated with an observed sequence? That is, again, given a model λ = (P, Φ, π), what is the most likely sequence of hidden states that could have generated a given observation sequence, O = υ 1 , υ 2 , . . . , υ T of length T ?
• Learning: Estimate the most likely HMM parameters for a given observation sequence; that is, given a set of observation sequences,find the values for λ that maximize P [O|λ].
Fortunately, methods are well established for solving each of the aforementioned problems. The evaluation problem is usually solved by the forward and the backward iterative algorithms. Next, the decoding problem is solved by using the Viterbi algorithm, also an iterative algorithm, to fabricate the best path by sequentially considering each observed symbol. Finally,the learning problem is solved by the Baum-Welch algorithm (an Expectation Maximization algorithm), which uses the forward and backward probabilities to update the parameters iteratively. There are numerous excellent references that detail the basic components of the HMM (i.e., [1], [2], [3]) and will not be reviewed in detail here.
Despite its utility and popularity across numerous fields in engineering and science, HMMs have two significant disadvantages: (1) state duration distributions are restricted to a geometric form that is not appropriate for many real-world data, and (2) the number of hidden states must be set a priori so that model complexity is not inferred from data in a Bayesian way. We address the geometric distribution problem first.

Integrating the HMM and State Duration: Hidden Semi-Markov Models
Most real-world sequential phenomena, especially those involving social and natural dynamics, are highly dependent on time-in-state as a critical feature of state change. HMMs implicitly assume that time spent in a given state is distributed according to a geometric distribution. This distribution is memoryless, meaning that, at a given time t, the waiting time for switching from one state to another is independent of state duration or sojourn time. Stated differently, hidden Markov models have a constant probability of changing state given survival in the state up to that time. Conversely, in social interaction, an interactant's behavior is not memoryless but rather tightly bound to immediacy, within historical and relationship precedent. Consequently, the HMM generates inadequate temporal structure of dynamic processes.
A generalization of the HMM, the hidden semi-Markov model (HSMM) allows the hidden process to be semi-Markovian; this means that the probability of transitions among the hidden states depends on the amount of time that has elapsed since entry into the current state i [4]. The standard HMM emits one observation per state whereas a HSMM state can emit a sequence of symbols; the symbol sequence length -being its duration d -for each state is determined by some state-specific distribution. This state change process along with its variable length symbol emissions is illustrated in Figure 1.
Aside from some minor modifications in the aforementioned evaluation, decoding, and learning algorithms, the HSMM is structurally similar to the HMM with the additional incorporation of a state duration variable [5]. State duration is a random variable and assumes an integer value in the set D = {1, 2, ..., D}. We use the random variable D t to denote the duration of a state that is entered at time t, and we write the probability mass function for the random variable as p(d t |x t = i), where x t is the state sequence with t ∈ {1, . . . , T } . In short, the important difference between HMM and HSMM is that one observation per state is assumed in HMM while in HSMM each state can emit a sequence of observations, where the number of observations produced while in state i is determined by the length of time spent in state i, i.e., the duration d [4].

Integrating the HMM and Infinite States Spaces: HDP-HSMM
As noted above, a shortcoming of the HMM, at least from a Bayesian perspective, is its inability to infer the number of states necessary to model system complexity; more specifically, the state space is created a priori and does not get modified as new data are acquired. In the present study, the problem is straightforward: Find a method for determining the appropriate number of states in the model that is simultaneously Time Adopted from [1] sensitive to couple type and allows for within type couple variability. Fortunately within the last decade a clear methodology for this type of problem arose-the Hierarchical Dirichlet Process (HDP) [6], a Bayesian nonparametric approach that estimates how many groups (i.e., states) are needed to model the observed data. Nonparametric implies, not that the model is without parameters, but rather that model parameters change as data additional data are acquired. Within this framework nonparametric models are termed infinite-dimensional, technically however, rather than being infinite, they can be evaluated on a finite sample in a manner that uses only a finite subset of the available parameters to explain the sample [7]. In other words, within the context of an HDP-HMM and HDP-HSMM, the number of states is unbounded and allowed to grow with the sequence length-new states are added as model complexity increases.
To understand the Hierarchical Dirichlet Process (HDP), it's necessary to first grasp the concept and relevance of the Dirichlet Process (DP). The Dirichlet process is a stochastic process that defines a probability distribution over infinite-dimensional discrete distributions, meaning that a draw from a DP is itself a distribution. It's typically described as a distribution over probability distributions. A DP has two parameters, a concentration parameter α and the base measure β; the base distribution β is the parameter on which the nonparametric distribution is centered, which can be thought of as the prior; it is similar to the mean of a Gaussian distribution and the concentration parameter α, given as a positive scalar, is analogous to a variance measure and expresses the the strength of belief in β. For small values of α, samples from a DP are constrained to a small number of atoms, larger values generate more dispersion across the distribution. Viewed differently, α is the inverse of the variance. Estimating parameters for the DP is obtainable in multiple ways-as a Stochastic Process [8], a Chinese restaurant process [9], a Pólya urn process, or a Stick Breaking construction [10]; with some slight variation, these are analogous.
The HDP's unique hierarchical arrangement provides mathematical convenience and conserves dimensional spareness, in our associated article this refers to States. Essentially the HSMM is a mixture model where each component corresponds to a state of the HSMM. Given a state, the model needs to emit a sequence of observations and advance to the next state. Within this framework, component parameters specify the distributions associated with these stochastic choices; that is, the transition parameters