Anomalous diffusion analysis of semantic evolution in major Indo-European languages

How do words change their meaning? Although semantic evolution is driven by a variety of distinct factors, including linguistic, societal, and technological ones, we find that there is one law that holds universally across five major Indo-European languages: that semantic evolution is subdiffusive. Using an automated pipeline of diachronic distributional semantic embedding that controls for underlying symmetries, we show that words follow stochastic trajectories in meaning space with an anomalous diffusion exponent α = 0.45 ± 0.05 across languages, in contrast with diffusing particles that follow α = 1. Randomization methods indicate that preserving temporal correlations in semantic change directions is necessary to recover strongly subdiffusive behavior; however, correlations in change sizes play an important role too. We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation, such as the choice of fitting an ensemble average of displacements or averaging best-fit exponents of individual word trajectories.


Introduction
Cumulative cultural evolution at enormous scale and speed makes us a strikingly different species than the rest of the living world [1].The ability of accumulating techniques and solutions one bit at a time, aggregating them across time and space gave us unprecedented power, that we use and abuse to shape the world.But how is this process of human cultural evolution unfold?Are there universal patterns that hold across cultures and eras?One human activity that allows us to zoom into the cumulative patters of human thought is language production.We all comprehend and produce, all the time, keeping the cogwheels of language evolution moving.How do these cogwheels move?Language has a unique property: it is made of atomic elements, morphemes, minimal units that possess "meaning", whose definition is well agreed upon.Much of the information in any intricate pattern of text is concentrated on the scales of selecting and combining these atoms in a combinatorial manner, just like much of the information that determines the bauplan of a biological organism is concentrated on the level of genes.Studying the dynamics of these elementary units or a subset of them (i.e., units that determine content and not grammatical function), at a large-scale, data-driven way might shed light to basic universal mechanisms of cultural evolutionary processes.Motivated by this line of reasoning, recent efforts focused on uncovering universal (i.e.language, time, genre, etc. independent) dynamical rules that govern frequency changes of (variants of) words or ordered combinations of them, called n-grams [2].This effort provided us a handful of remarkable discoveries, including peaks or valleys of individual terms mirroring specific societal-political processes (e.g., censorship, propaganda, ideology shifts, cultural-technological drift, natural or social catastrophes, etc.), aggregate behavior of groups of terms (e.g., decay rate of fame of a cohort of people across historical time) [3], competition dynamics among linguistic variants (e.g., [4]) or synonyms, and even a marked difference between the temporal correlational patterns of word frequencies referring to natural or social processes [5].
Language provides us a unique opportunity to peer into abstract cultural processes in another way: in that meaning can be anchored by asking human subjects to report about it.This makes it possible to automate the process of meaning extraction (to some approximation) by relating algorithmic outcomes with those of large-scale psycholinguistic experiments, as well as other natural language processing tasks [6,7].In particular, one data-driven technique for estimating semantic similarity of words has been dominant over the past decades: distributional semantics.It builds on the so-called distributional hypothesis, paraphrased as "a word is characterized by the company it keeps" [8]: similarity in meaning can be approximated by comparing neighborhoods of words over large corpora [9].Many flavors of the distributional hypothesis has been formalized with classical methods [10,11], yet the advent of large-scale estimation of semantic similarity came with the "machine learning revolution" [6].From these techniques, a successful and computationally efficient variant is the Word2vec embedding algorithm [12,13].Word2vec estimates semantic similarity based on sampling co-occurrences of words.It implements dimension reduction over the set of pairwise semantic similarities to embed words in a relatively low dimensional space (e.g., tens of thousands of words in a few hundred dimensional Euclidean space) [14].With a corpus of time-labelled co-occurrences in hand, one can, in principle, track changes of these semantic similarities in an automated way by comparing embeddings corresponding to different times.Indeed, a first endeavor utilizing this approach has identified two novel statistical patterns governing the semantic change of words: the law of conformity, stating that words with lower frequency change their meaning faster, and the law of innovation, finding that more polysemous words also tend to exhibit higher rate of semantic change [15].Although the authors applied multiple word embedding variants, the validity of these results mirroring social-technological-linguistic effects as opposed to being mathematical artefacts is still debated [16].It is because although state of the art word embedding methods match reported semantic similarities across experiments, languages, and training corpora, they produce systematic biases over non-semantic features.One particular bias is due to the power-law distribution of word occurrences, known as Zipf's law [17,18], resulting in embeddings where low-frequency words tend to appear close to the center of the embedding whereas high-frequency words are pushed to the periphery.Such implicit biases point at the necessity of careful comparison of obtained results with those based on various randomized replicas of the dataset, systematically removing statistical dependencies until the phenomena at hand is no longer apparent.Furthermore, with diachronic embeddings, three additional issues arise.First, embedding dimensions are arbitrary.There is no guarantee that dimensions match across subsequent timesteps.Second, Word2vec is a sampling-based method, and therefore, it is non-deterministic.Third, there are underlying symmetries along which embeddings are degenerate: namely, a set of transformations that change embeddings (and the embedding of context words) yet leaves co-occurrences invariant.In order to tackle all four of aforementioned obstacles, in this paper we develop a dynamical alignment method that is i) symmetry-agnostic, ii) averages over many runs to yield a robust estimate of embedding positions and their variances, iii) based on these principles, finds a best alignment over all timesteps, and iv) compares obtained results with those of systematically randomized versions of input data, removing various statistical dependencies in a step-by-step manner.Current diachronic embedding methods mostly focus on point iii) [19,20,21]; here we suggest that all points above are needed for a robust estimation of semantic trajectories.
Although instantaneous statistical patterns of ensembles of words (e.g., positions, velocities) are interesting, there is a fundamental aspect in which evolutionary (be it cultural or biological) processes differ from most physical ones: historical contingency.We therefore shift focus from an ensemble of words to an ensemble of word trajectories point of view, and ask: are there robust statistical regularities, universal across languages, that govern semantic evolution?Building on a theoretically underpinned pipeline of embedding, temporal alignment, and systematic randomization outlined above, we analyize ensemble of word trajectories using methods from non-equilibrium statistical mechanics.We focus on identifying and explaining systematic deviations from stochastic trajectories of standard diffusive particles, pointing at non-trivial long-range spatiotemporal correlations between word trajectories.

Maximally representation-agnostic temporal embedding
For constructing word embeddings, we use Word2vec's Skip-gram model with negative sampling (SGNS), which is one of the most widely used word embedding algorithms due its computational efficiency and ability to capture semantic relationships in a simple mathematical form [12,22].It uses a two-layer neural network to represent each word i with two D-dimensional vectors called 'word vector' (v i ) and 'context vector' (w i ) such that a global cost function C, depending on all word vectors and all context vectors is (approximately) minimized.The objective of the Skip-gram model is to optimize the estimated empirical log-probabilities of word j occurring in the context of word i, using the cost function C defined as where L is the length of the text and C(i) is the linguistic context of word i.Following the recommendations of [6] and [15], we set the dimension D of the embedding space to D = 300 and the context size to 2. Figure 1a visualizes a single embedding projected to 2 dimensions by t-sne [23], a non-linear dimension reduction method.
While constructing semantic trajectories of words from time-labelled co-occurrence data, one needs to account for two sources of arbitrariness in the process: stochasticity and symmetry.The first, stochasticity, comes from the nature of modern data-efficient semantic embedding methods, such as Word2vec.Since these approaches use a neural network to find the best possible embedding, sampling techniques, initial weights, and the choice of numerical optimization method (such as stochastic gradient descent) lead to non-deterministic embeddings.Second, the cost C itself exhibits multiple global minima associated with the same input data.In particular, the form of the estimated log-probabilities defined in (1) implies that any transformation that leaves all dot products v i • w j invariant results in the exact same cost C.The simplest example of such transformation is a linear rescaling of word vectors v → λv and inverse rescaling of context word vectors w → λ −1 w, but in principle, any invertible linear transformation of word vectors v → Rv, together with (the transpose of) its inverse applied to context vectors w → (R −1 ) T w leaves the dot product invariant (see Methods for details).
However, an additional constraint comes from focusing solely on words (and not contexts) when constructing temporal trajectories: the constraint that ensures that the ambient embedding space does not shrink or expand over time, formalized as where V is the D × D empirical covariance matrix of word positions (with D = 300 being the embedding dimension) and σ i is the standard deviation of word positions along principal dimension i.As shown in Methods, this reduces the possible transformations R to the orthogonal ones, obeying R −1 = R T .Consequently, a single embedding is identified with an equivalence class containing {Rv i , Rw j } with any orthogonal R.
We use this orthogonal freedom to define maximally representation-agnostic trajectories in two steps as shown in Figure 1b.First, we minimize the effect of stochasticity, described below, in any single time.Without stochasticity, any embedding starting from the same co-occurrence data would belong to the same equivalence class, and therefore it would be possible to find a transformation R j for each embedding j such that they all numerically coincide.With stochasticity, however, different embedding realizations, starting from the same data, land in (slightly) different equivalence classes, and as a consequence, perfect alignment among them is not possible.The best one can do is to find a transformation for each embedding such that an overall distance measure between all embeddings is minimized (see Methods for details).This allows us to "average out", i.e., minimize the effect of stochasticity at any single timestep by taking the average of all aligned embeddings.Second, we align averaged embeddings corresponding to different times in the same way: we find an orthogonal transformation that minimizes the distance between the embedding at subsequent times to construct word trajectories.Such word trajectories are thus maximally smoothened.Although this raises the question whether maximal smoothening washes away real phenomena, we will see that this smoothening procedure applied to random walk does not change measured observables such as the anomalous diffusion exponent α of the process.Figure 1c shows the measured semantic change of three selected words within a decade, projected to 2 dimensions, visualized over a static background.

Semantic subdiffusion across languages
Starting from word trajectories, we construct the ensemble of all trajectories and ask whether such an ensemble of trajectories obeys any robust statistical regularities.We focus on measuring the deviation of such trajectories from those of standard diffusing particles (i.e., a random walk), quantified by the anomalous diffusion exponent α, defined as [24,25,26,27] where ∆x = x(t) − x(0) is the D-dimensional displacement vector of a word at time t compared to its starting position at time t = 0.In particular, we are interested in a potential deviation from various null-model trajectories that i) possess short memory, ii) are weakly interacting, iii) are driven by non-varying dynamical rules, and iv) step sizes or waiting times between steps are not overly heterogeneous.Such stochastic trajectories are characterized by an anomalous diffusion exponent α = 1.On the other hand, if either i), ii), iii), or iv) (or other conditions we do not discuss here) are violated, the process might belong to the anomalous diffusive regime with an observed exponent α = 1 [24,25,26,27].Figure 1d illustrates subdiffusive trajectories, corresponding to various α < 1 exponents, generated by fractional Brownian motion (fBm) [28].Note that we chose fBm for visualization purposes only; fBm generates trajectories with long-range temporal correlations, corresponding to the violation of point i) above .Actual semantic trajectories might be governed by (a mixture of) other underlying dynamical rules, as discussed below.
To measure the actual value of the anomalous diffusion exponent α, we proceed from single trajectories to an ensemble of trajectories in two alternative ways: (i) we first average |∆x| 2 (t) over individual trajectories to obtain |∆x| 2 (t) and then we fit the ensemble-average anomalous diffusion exponent, α , based on |∆x| 2 (t) ∼ t α ; (ii) alternatively, we first fit the anomalous diffusion exponent α to single trajectories and then we average over words to obtain the mean anomalous diffusion exponent ᾱ.Although individual trajectories deviate considerably from the simple scaling behavior given by |∆x| 2 ∼ t α , somewhat surprisingly, their ensemble average, |∆x| 2 (t) , follows the scaling given by t α with high accuracy.This is illustrated in Figure 2a, showing the squared displacement |∆x| 2 (t) of several individual English words as well as the ensemble average |∆x| 2 (t) of all English words.
The grey cells of Table 1 list obtained exponents α and ᾱ for five languages, English, French, German, Italian, and Spanish.In all languages, both the ensemble-average anomalous diffusion exponent α and the mean anomalous diffusion exponent ᾱ are significantly lower than α = 1, i.e., semantic trajectories follow subdiffusion.In particular, they follow subdiffision with an ensemble-average anomalous diffusion exponent 0.4 < α < 0.5 for all five languages.This is in strong contrast with a random walk generated using the same parameters (see Methods for details), corresponding to a fitted α ≈ 1, as shown by Figure 2b.

Comparison with randomized trajectories
Given the robust observation that ensemble-average semantic trajectories of words follow subdiffusion with α ≈ 0.4 − 0.5 across languages, one might ask the following two questions.(i) Is this an artifact of the diachronic alignment procedure?(ii) If not, what is behind the observed subdiffusion?In other words, what (combination of) microscopic models of stochastic collective dynamics might explain this macroscopic result?In the following, we focus on these two questions.We apply a series of randomization methods to the original trajectories, gradually removing temporal correlations in step sizes and step directions of individual trajectories to see which of these, if any, play a role behind subdiffusion.As shown in Figure 2c and Table 1, step sizes of a trajectory are randomized three different ways, which we call random sizes, sizes from distribution, and shuffled sizes; step directions are randomized two ways, random directions, shuffled directions (see Methods), which, together with the original trajectories, gives three times four combinations.
Figure 2c and 2d show the average squared displacement (∆x) 2 of all English words, and the distribution of anomalous diffusion exponents α fitted to individual word trajectories separately, under all twelve combinations of trajectory randomization methods (including the original trajectories).The same plots for French, German, Italian, and Spanish are shown in the SI; fitted anomalous diffusion exponents α and ᾱ are listed for all five languages in Table 1.directions can alter the total displacement along a trajectory only by a small amount, thus, the last data points in the middle row (shuffled directions) must be very close to the last data points of the corresponding panels of the last row (original directions).On the other hand, shuffled directions do remove temporal correlations in step directions, making trajectories follow approximately diffusion-like behavior until the constraint on total displacement does not affect them.This can be clearly seen in the middle row of Figure 2c; in Figure 2d, we decided not to fit exponents to individual word trajectories in the middle row (shuffled directions) to avoid systematic bias in the exponents depending on the fitting range.
We further investigate a related effect, the effect of keeping the total size of the word cloud constant, as formalized by Eq.( 3).Ensembles of randomized trajectories in Figure 2c and 2d do not obey this constraint; we therefore generated a random walk, corresponding to random step directions and random step sizes, with the additional constraint on keeping the total cloud size constant.This simulated trajectory is illustrated in figure 2b ("random walk").While the total displacement is limited by the size of the word could, this only appears to affect the random walk trajectory when the displacement reaches the radius of the cloud (and then converging to approximately to the square root of two times the radius, corresponding to two orthogonal vectors with lenght equals to the radius).This, along with the middle row of Figure 2c, suggest that global constraints on total displacement do not cause the observed subdiffusive behavior.

Discussion
Cumulative culture is arguably one of the most distinct characteristic feature of human behavior.Understanding "laws" that govern cultural evolution is a crucial step towards understanding Homo Sapiens itself.Here we study cultural evolution through the evolution of meaning of words, as formalized by the distributional hypothesis: "a word is characterized by the company it keeps" [29].We rely on large time-labelled corpora, Google Ngram, available in several languages.Cultural evolution is notoriously difficult to study without making a large number of subjective assumptions and interpretations.One of the main contribution of this work is that it tries to make underlying assumptions (that are in the form of mathematical formalizations and their interpretation) as explicit as possible.
Our data processing and analysis pipeline consist of three phases.First, semantic relations between all words are extracted through a state-of-the-art implementation of the distributional hypothesis: Word2vec, trained with the so-called Skip-gram with negative sampling method.This algorithm provides a high-dimensional embedding of all words such that pairwise distances reflect semantic similarity.Apart from being fast and data efficient, it is also well anchored in human language representation through psycholiguistic studies.Second, to extract evolutionary trajectories, embeddings at different times need to be weaved together.This is a highly non-trivial process since the mapping between embedding and co-occurrence statistics is degenerate: many embeddings are consistent with the same co-occurrence data.When constructing trajectories, we need to break this symmetry: one specific embedding from each equivalence class needs to be chosen.We jointly construct equivalence classes and choose one specific representative of each class, both informed by the dynamics.In particular, we choose trajectories to be maximally smoothened.As we show, this maximal smoothening, applied to an ensemble of diffusing particles, does not alter the anomalous diffusion exponent α of the process, suggesting that it would not alter α significantly when actual semantic trajectories are considered, either.The third phase of the process is the comparison of the ensemble of semantic trajectories of words with various randomized counterpart ensembles.The randomization method we consider dissects various temporal correlations in semantic trajectories by randomizing step directions and step sizes separately, while still applying the same method for temporal alignment (i.e.trajectory smoothening).With all this, we seek to answer the following questions: (i) is there any robust statistical regularity regarding the ensemble of actual semantic trajectories of words?(ii) If yes, what might be the reason?What microscopic dynamical rules can explain the observed macroscopic (ensemble-level) statistical regularities?This work provides an answer to the first question: semantic trajcetories are very different from ordinary random walk (diffusion); semantic trajectories are strongly subdiffusive.In particular, actual semantic trajectories follow subdiffusion with an anomalous diffusion exponent α ≈ 0.4 to 0.5, in strong contrast with a random walk belonging to the α = 1 class.
We point out here that short-range temporal correlations in trajectories, not extremely inhomogeneous step sizes, or weak correlations between trajectories do not cause the resulting trajectories to deviate from α = 1.Instead, subdiffusion can be explained by qualitatively different microscopic dynamical rules.These include (i) long temporal correlations in trajectories, (ii) extremely inhomogeneous step size distributions, (iii) stochastic dynamics of "jamming" (overly densely packed) particles, (iv) changing average step sizes over time (corresponding to a changing diffusion coefficent), (v) diffusion in disordered media, and many others.Although investingating possible combinations of these microsopic dynamics as explanations of semantic subdiffusion is a subject of future work, we can at least exclude some of them based on our results.
Step size distribution ("Brownian vs Levy flight") and even correlations in step sizes do not seem to contribute to subdiffusion at all.Correlations in step directions explain some but far from all: ensembles of trajactories where step sizes are randomized but step directions are kept still exhibit α ≈ 0.8 − 0.9 in contrast with actual trajectories that follow α ≈ 0.4 − 0.5 (that include correlations both among step sizes and directions, and possibly cross-correlations between these categories too).
Subdiffusive behavior has been observed in various within-cell processes, such as in the stochastic trajectory of messenger RNA inside living E. coli cells, α ≈ 0.7, [30], channel proteins in the membranes of living cells, α ≈ 0.9, [31], and telomeres within eucaryotic cell nuclei, α ≈ 0. Although answering these questions is outside the scope of this current paper, we hope that our results stimulate extensive future discussions with a strong interdisciplinary focus along the lines of the questions listed above.

Diachronic embedding
We first subsample the co-occurrences corresponding to each time window to eliminate systematic sample size bias (the amount of data included in the Google Ngram database steadily increases with time in all five languages).We set this sample size to N c = 10 7 co-occurrences, set by the first (few) time windows with the least amount of data.We observe that N c = 10 7 sets a good tradeoff between sample size (and thus trajectory noise), vocabulary size, and trajectory lenght for this database.
After generating the D = 300 dimensional embeddings for each time window M = 80 times by Word2vec with Skip-gram, we first align all M embeddings within a time window.Then, word positions across embeddings are averaged to obtain a more robust estimate of semantic positions for each word at a given time window.These average embeddings, one for each time window, are then aligned across time, as shown by Figure 3h.
The alignment of two embeddings consists of two steps.First, one has to define the class of transformations that keep semantic relationships invariant, and second, the transformation within this class that minimizes the difference between the two embeddings has to be selected.For the first step, we find (see below for a proof) that the requirement of keeping the total size of the word cloud constant, as defined by Eq. ( 3), restricts all linear transformations to those that are orthogonal.This is also in line with both measuring semantic distance as euclidean distance, as discussed above, and also with most commonly used aligment method called the orthogonal Procrustes method, which finds that the best transformation R that accounts for this rotational (orthogonal) freedom is given by R = arg min where the rows of the N × D matrix W and W contain the word vectors of the aligned and reference embedding, respectively, and || • || F is the Frobenius matrix norm.We use the analytical solution [36] for R to find optimal alignments both within a time window and among time windows.

Constant word cloud size defines an orthogonal embedding symmetry
Word cloud size, defined by Eq. ( 3), is written as where V (W ) is the empirical covariance matrix of word positions, extracted from W , the N × D matrix with its rows corresponding to the D-dimensional word vectors of all N words, and Var i (W ij ) is the variance of the word positions along dimension j.Using the definition of variance, The size of the transformed word cloud W , defined as W = RW is The difference between Tr V (W ) and Tr V (W ) is The transformation R satisfies the requirement of constant cloud size Tr V (W ) − Tr V (W ) = 0 only if also written in matrix form as RR T = 1, i.e., R needs to be orthogonal.

Trajectory randomization methods
Figure 3i and j illustrates the two types of trajectory randomization methods we use in this paper to generate the results shown by Figure 2c,d and Table 1: randomization of step sizes and randomization of step directions.Within each type, "randomization" refers to drawing step sizes and directions from various distributions constructed from the original trajectories, described as follows.
Random sizes: step sizes were sampled from a normal distribution with mean and standard deviation corresponding to that of the step sizes of the original trajectory; sizes from distribution: step sizes were sampled from the set of all step sizes corresponding to all words; shuffled sizes: step sizes were sampled from the set of step sizes corresponding to the same word; in other words, the temporal order of step sizes of a trajectory has been shuffled; original sizes: step size for every word at every time step was set to its original value.
Random directions: directions were sampled uniformly over a D-dimensional sphere; shuffled directions: the temporal order of step directions of a trajectory has been shuffled; original directions: the direction for every word at every time step was set to its original value.

Figure 1 :
Figure 1: (a) 2D projection of a high-dimensional word embedding, illustrating semantic similarity.300D embedding of lemmas in the Google ngram English fiction database are generated by Word2vec Skip-gram model and are nonlinearly projected to 2D using t-sne [23].(b) Alignment of embeddings.First, multiple embeddings of the same year are aligned and averaged to reduce embedding noise, then, these averaged embeddings are aligned across time to achieve maximally smoothened word trajectories.(c) The resulting diachronic embedding makes it possible to visualize semantic change.Three selected words (gay, cancel, outlet) with most change within a decade are shown over a static background.(d) Illustration of subdiffusive trajectories generated by fractional Brownian motion.Inset: mean squared distance scales as (∆x) 2 ∼ t α with anomalous diffusion exponent α.

Figure 2 :
Figure2cand 2d show the average squared displacement (∆x) 2 of all English words, and the distribution of anomalous diffusion exponents α fitted to individual word trajectories separately, under all twelve combinations of trajectory randomization methods (including the original trajectories).The same plots for French, German, Italian, and Spanish are shown in the SI; fitted anomalous diffusion exponents α and ᾱ are listed for all five languages in Table1.The top left panel (random step sizes, random step directions) correspond to a random walk; the bottom right panel (original step step sizes, original step directions) corresponds to the original, non-randomized semantic trajectories.Comparing the results of original trajectories with the randomized trajectories (figure2c), conveys three important messages: (i) temporal alignment, applied to an ensemble of uncorrelated trajectories (top left panel), results α = 1 (see top left panel), equivalent to that of uncorrelated trajectories without alignment.(ii) Both correlations in step sizes and step directions are important factors behind subdiffusion.Neither of them alone can produce trajectories with an α lower than 0.8, yet the two combined gives α ≈ 0.5.(iii) If step sizes do not vary significantly, the shuffling of the

Figure 3 :
Figure 3: Key steps of the pipeline that creates diachronic embeddings from raw google ngram data.(a) Lemmatization of words.(b) Filtering stopwords.(c) Counting co-occurrences with context window of lenght 2. (d) Smoothening co-occurrences by applying a sliding time window to co-occurrence counts.(e) Vocabulary, i.e., the final set of lemmas we include in our analysis, is defined by the lemmas that occur at least 100 times at all time windows.(f) Training dataset was created by subsampling all co-occurrences in order to have a constant number of co-occurrences N c = 10 7 at each time window.(g) Word embedding by Word2vec with Skip-gram.(h) Alignment of different embeddings (both within and across years).(i) Randomization of the temporal order of step sizes of each word trajectory separately.(j) Randomization of the temporal order of step directions of each word trajectory separately.

Table 1 :
Ensemble-average anomalous diffusion exponent α and the average ᾱ of the anomalous diffusion exponents of all words, under various combinations of trajectory randomization methods, for all five languages.Grey cells show the exponents of the original, non-randomized trajectories.Note that the exponents for English are extracted from Figures2c and 2d; for the other four languages, the analogous Figures are included in the SI.
3, [32].As in the biological examples above, subdiffusive behavior of semantic change raises questions both at a mechanistic, proximal level (what microscopic dynamical rules underlie subdiffusion?) and at an evolutionary, distal level (is subdiffusion adaptive or is it a consequence of physical-informational constraints?If it is adaptive, what is it for and what selection pressures led to its emergence?).