^{1}

^{*}

^{2}

Conceived and designed the experiments: JPC SW. Performed the experiments: JPC SW. Analyzed the data: JPC SW. Contributed reagents/materials/analysis tools: JPC SW. Wrote the paper: JPC SW.

The authors have declared that no competing interests exist.

We introduce a theory of sequential causal inference in which learners in a chain estimate a structural model from their upstream “teacher” and then pass samples from the model to their downstream “student”. It extends the population dynamics of genetic drift, recasting Kimura's selectively neutral theory as a special case of a generalized drift process using structured populations with memory. We examine the diffusion and fixation properties of several drift processes and propose applications to learning, inference, and evolution. We also demonstrate how the organization of drift process space controls fidelity, facilitates innovations, and leads to information loss in sequential learning with and without memory.

Human knowledge is often transmitted orally within a group via a sequence of communications between individuals. The children's game of

“Send Three- and Four-Pence, We're Going to a Dance”

This phrase was heard, it is claimed, over the radio during WWI instead of the transmitted tactical phrase “Send reinforcements we're going to advance”

To answer these questions we introduce a theory of sequential causal inference in which learners in a communication chain estimate a structural model from their upstream “teacher” and then, using that model, pass along samples to their downstream “student”. This reminds one of the familiar children's game

To begin, one player invents a phrase and whispers it to another player. This player, believing they have understood the phrase, then repeats it to a third and so on until the last player is reached. The last player announces the phrase, winning the game if it matches the original. Typically it does not, and that's the fun. Amusement and interest in the game derive directly from how the initial phrase evolves in odd and surprising ways. The further down the chain, the higher the chance that errors will make recovery impossible and the less likely the original phrase will survive.

The game is often used in education to teach the lesson that human communication is fraught with error. The final phrase, though, is not merely accreted error but the product of a series of attempts to parse, make sense, and intelligibly communicate the phrase. The phrase's evolution is a trade off between comprehensibility and accumulated distortion, as well as the source of the game's entertainment. We employ a much more tractable setting to make analytical progress on sequential learning, based on

Specifically, we develop our theory of sequential learning as an extension of the evolutionary population dynamics of genetic drift, recasting Kimura's selectively neutral theory

To get started, we briefly review genetic drift and fixation. This will seem like a distraction, but it is a necessary one since available mathematical results are key. Then we introduce in detail our structured variants of these concepts—defining the

Those familiar with neutral evolution theory are urged to skip to Section Sequential Learning, after skimming the next sections to pick up our notation and extensions.

Genetic drift refers to the change over time in genotype frequencies in a population due to random sampling. It is a central and well studied phenomenon in population dynamics, genetics, and evolution. A population of genotypes evolves randomly due to drift, but typically changes are neither manifested as new phenotypes nor detected by selection—they are

Selectively neutral drift is typically modeled as a stochastic process: A random walk that tracks finite populations of individuals in terms of their possessing (or not) a variant of a gene. In the simplest models, the random walk occurs in a space that is a function of genotypes in the population. For example, a drift process can be considered to be a random walk of the

The theory of genetic drift predicts a number of measurable properties. For example, one can calculate the expected time until all or no members of a population possess a particular gene variant. These final states are referred to as

The analytical predictions for the time to fixation and time to deletion were developed by Kimura and Ohta

The following explores what happens when we relax the memoryless restriction. The original random walk model of genetic drift forces the statistical structure at each sampling step to be an independent, identically distributed (IID) stochastic process. This precludes any memory in the sampling. Here, we extend the IID theory to use time-varying probabilistic state machines to describe memoryful population sampling.

In the larger setting of sequential learning, we will show that memoryful sequential sampling exhibits structurally complex, drift-like behavior. We call the resulting phenomenon

We begin with the definition of an

Under these assumptions the Fisher-Wright theory reduces drift to a binomial or multinomial sampling process—a more complicated version of familiar random walks such as Gambler's Ruin or Prisoner's Escape

This model of genetic drift is a discrete-time random walk, driven by samples of a biased coin, over the space of biases. The population is a set of coin flips, where the probability of HEADS or TAILS is determined by the coin's current bias. After each generation of flips, the coin's bias is updated to reflect the number of HEADS or TAILS realized in the new generation. The walk's absorbing states—all HEADS or all TAILS—capture the notion of fixation and deletion.

Let

On average there is no change in frequency. However, sampling variance causes the process to drift towards the absorbing states at

One important consequence of the theory is that when fixation (

Time to deletion is also shown (dashed line), Eq. (5).

Populations are produced by repeated binomial sampling of

Kimura's theory and simulations predict the time to fixation or deletion of a mutant allele in a finite population by the process of genetic drift. The Fisher-Wright model and Kimura's theory assume a memoryless population in which each offspring inherits allele

How can genetic drift be a memoryful stochastic process? Consider a population of

At first, this appears as a major difference from the usual setting employed in population biology, where populations are treated as unordered collections of individuals and sampling is modeled as an independent, identically distributed stochastic process. That said, the structure we have in mind has several biological interpretations, such as inbreeding and subdivision

The model class we select to describe memoryful sampling is the

The

Maintaining our connection to (haploid) population dynamics, we think of an

Consider a simple binary process that alternately generates

State

Enforcing the alternating period-2 pattern requires two states,

Each transition is labeled

In state

Beyond using

We are now ready to describe

An initial population generator

Thus, at each step a new representation or model is estimated from the previous step's sample. The inference step highlights that this is learning: a model of the generator is estimated from the given finite data. The repetition of this step creates a sequential communication chain. Sequential learning is thus closely related to genetic drift except that sample order is tracked, and this order is used in estimating the next generator.

The procedure is analogous to flipping a biased coin a number of times, estimating the bias from the results, and re-flipping the newly biased coin. Eventually, the coin will be completely biased towards H

Before we can explore this dynamic, we first need to examine how an

Recall the Alternating Process from

However, if we consider allele

This leads us to introduce

A state machine representing a periodic sampling process enforces the constraint of periodicity via its internal memory. One measure of this memory is the

Instead, the condition for stasis can be given as the vanishing of the

While more can be said analytically about structural drift, our present purpose is to introduce the main concepts. We will show that structural drift leads to interesting and nontrivial behavior. First, we calibrate the new class of drift processes against the original genetic drift theory.

The Biased Coin Process is represented by a single-state

The drift of Pr[H

Note that the drift of allelic entropy

The time to stasis of the Biased Coin Process as a function of initial

Kimura's predicted times to fixation and deletion are shown for reference. Each estimated time is averaged over

Not surprisingly, we can interpret genetic drift as a special case of the structural drift process for the Biased Coin. Both simulations follow Kimura's theoretically predicted curves, combining the lower half of the deletion curve with the upper half of the fixation curve to reflect the initial probability's proximity to the absorbing states. A high or low initial bias leads to a shorter time to stasis as the absorbing states are closer to the initial state. Similarly, a Fair Coin is the furthest from absorption and thus takes the longest average time to reach stasis.

The Biased Coin Process represents an IID sampling process with no memory of previous flips, reaching stasis when Pr[H

Like the Alternating Process, the Golden Mean Process has two causal states. However, the transitions from state

To compare structural drift behaviors, consider also the Even Process. Similar in form to the Golden Mean Process, the Even Process produces populations in which blocks of consecutive

The Even and Biased Coin Processes become the Fixed Coin Process at stasis, while the Golden Mean Process becomes the Alternating Process. Note that the definition of structural stasis recognizes the lack of variance in the Alternating Process subspace even though the allele probability is neither 0 nor 1.

It should be noted that the memoryful Golden Mean and Even Processes reach stasis markedly faster than the memoryless Biased Coin. While

To illustrate the richness of structural drift and to understand how it affects average time to stasis, we examine the complexity-entropy (CE) diagram

Two such CE diagrams are shown in

What emerges from these diagrams is a broader view of how population structure drifts in process space. Roughly, the

We refer to these curves as

Before describing the diversity seen in the CE diagram of

Alternating Process and Fixed Coin pathways are clearly visible in the left panel where the Golden Mean subspace exists on the upper curve and the Biased Coin subspace exists on the line

A

More formally, the time to stasis

Since the AP pathway visits only one subspace, the bottom panel shows the stasis time of the FC pathway as the weighted sum of the Golden Mean (GM) and Biased Coin (BC) subspace times:

These expressions emphasize the dependence of stasis time on the transition parameters at jump points as well as on the architecture of isostructural subspaces in drift process space. For example, if the GM jumps to the BC subspace at

Inference of

Instead of inferring and re-inferring an

To capture structural loss, we monitor near-zero transition probabilities where an

Having explained how the pseudo-drift algorithm introduces structural innovation and loss we can now describe the drift runs of

By way of closing this first discussion of structural drift, it should be emphasized that none of the preceding phenomena occur in the limit of infinite populations or infinite sample size. The variance due to finite sampling drives sequential learning, the diffusion through process space, and the jumps between isostructural subspaces.

Much of the previous discussion focused on structural drift as a kind of stochastic process, with examples and behaviors selected to emphasize the role of structure. Although there was a certain terminological bias toward neutral evolution theory since the latter provides an entree to analyzing how structural drift works, our presentation was intentionally general. Motivated by a variety of potential applications and extensions, we describe these now and close with several summary remarks on structural drift itself.

Let's return to draw parallels with the opening example of the game of

By way of contrast, structural drift captures the language-centric notion of dynamically changing semantics and demonstrates how behavior is driven by finite-sample fluctuations within a semantically organized subspace. The symbols and words in the generated strings have a semantics given by the structure of a subspace's

In the drift behaviors explored above, the

Extending these observations, the Iterated Learning Model (ILM) of language evolution

ILM incorporates the sequential learning and propagation of error we discuss here and provides valuable insight into the effects of error and cultural mutations on the evolution of language for the “human niche”. There are various simulation approaches to ILM with both single and multiple agents based on, for example, neural networks and Bayesian inference, as well as experiments with human subjects. We suggest that structural drift could also serve as the basis for single-agent ILM experiments, as found in Swarup et al.

Beyond applications to knowledge transmission via serial communication channels, structural drift gives an alternative view of drift processes in population genetics. In light of new kinds of evolutionary behavior, it reframes the original questions about underlying mechanisms and extends their scope to phenomena that exhibit memory in the sampling process or that derive from structure in populations. Examples of the latter include niche construction

An intriguing parallel exists between structural drift and the longstanding question about the origins of

Epochal evolution, though, presented an alternative to the view of metastability posed by Fisher's model and Wright's adaptive landscapes

Given an adaptive system which learns structure by sampling its past organization, structural drift theory implies that its evolutionary dynamics are inevitably described by punctuated equilibria. Diffusion in an isostructural subspace corresponds to a period of structured equilibrium in a subbasin and subspace jumps correspond to rapid innovation or loss of organization during the transit of a portal. In this way, structural drift establishes a connection between evolutionary innovation and structural change, identifying the conditions for creation or loss of organization. Extending structural drift to include mutation and selection will provide a theoretical framework for epochal evolution using any number of structural constraints in a population.

We focused primarily on the drift of sequentially ordered populations in which the generator (an

Though they have not tracked the structural complexity embedded in populations as we have done here, a number of investigations consider various classes of structured populations. For example, the evolutionary dynamics of structured populations have been studied using undirected graphs to represent correlations between individuals. Edge weights

By studying fixation and selection behavior on different types of graphs, Lieberman et al. found that graph structures can sometimes amplify or suppress the effects of selection, even guaranteeing the fixation of advantageous mutations

Graph evolution is a model of population structure complementary to that presented by structural drift. In the latter,

The Fisher-Wright model of genetic drift can be viewed as a random walk of coin biases, a stochastic process that describes generational change in allele frequencies based on a strong statistical assumption: the sampling process is memoryless. Here, we developed a generalized structural drift model that adds memory to the process and examined the consequences of such population sampling memory.

Memoryful sampling is a substantial departure from modeling evolutionary processes with unordered populations. Rather than view structural drift as a replacement for the well understood theory of genetic drift, and given that the latter is a special case of structurally drifting populations, we propose that it be seen as a new avenue for theoretical invention. Given its additional ties to language and cultural evolution, we believe it will provide a novel perspective on evolution in nonbiological domains, as well.

The representation selected for the population sampling mechanism was the class of probabilistic finite-state hidden Markov models called

We revisited Kimura and Ohta's early results measuring the time to fixation of drifting alleles and showed that the generalized structural drift process reproduces these well known results when staying within the memoryless sampling process subspace. Starting with structured populations outside of that subspace led the sampling process to exhibit memory effects including structural innovation and loss, complex transients, and greatly reduced stasis times.

Simulations demonstrated how an

Drift processes with memory generally describe the evolution of structured populations without mutation or selection. Nonetheless, we showed that structure leads to substantially shorter stasis times. This was seen in drifts starting with the Biased Coin and Golden Mean Processes, where the Golden Mean jumps into the Biased Coin subspace close to an absorbing state. This suggests that even without selection, population structure and sampling memory matter in evolutionary dynamics. The temporal or spatial memory captured by the

We demonstrated how structural drift—diffusion, structural innovation and loss—are controlled by the architecture of connected isostructural subspaces. Many questions remain about these subspaces. What is the degree of subspace-jump irreversibility? Can we predict the likelihood of these jumps? What does the phase portrait of a drift process look like? Thus, to better understand structural drift, we need to analyze the high-level organization of generalized drift process space.

Fortunately,