Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse


Recent advances in cultural analytics and large-scale computational studies of art, literature and film often show that long-term change in the features of artistic works happens gradually. These findings suggest that conservative forces that shape creative domains might be underestimated. To this end, we provide the first large-scale formal evidence of the association between poetic meter and semantics in 18-19th century European literatures, using Czech, German and Russian collections with additional data from English poetry and early modern Dutch songs. Our study traces this association through a series of unsupervised classifications using the abstracted semantic features of poems that are inferred for individual texts with the aid of topic modeling. Topics alone enable recognition of the meters in each observed language, as may be seen from the same-meter samples clustering together (median Adjusted Rand Index between 0.48 and 1 across traditions). In addition, this study shows that the strength of the association between form and meaning tends to decrease over time. This may reflect a shift in aesthetic conventions between the 18th and 19th centuries as individual innovation was increasingly favored in literature. Despite this decline, it remains possible to recognize semantics of the meters from past or future, which suggests the continuity in meter-meaning relationships while also revealing the historical variability of conditions across languages. This paper argues that distinct metrical forms, which are often copied in a language over centuries, also maintain long-term semantic inertia in poetry. Our findings highlight the role of the formal features of cultural items in influencing the pace and shape of cultural evolution.


Recent advances in cultural analytics [1] and large-scale computational studies of creative domains such as art, literature and film provide increasing evidence that change in features of artistic works happen gradually over extended periods of time. This can be seen in lexical choices in fiction [2, 3], writing styles [4, 5], the shortening of cinematic shot lengths [6], the long-term recognizability of literary genres [7] and aesthetic choices in poetry [8]. This picture is puzzling because it suggests continuity and slow global processes in traditions that scholars have often perceived as volatile fields of innovation, competition and constant conflict of elites [9, 10]. The contrary evidence for a punctuated equilibrium pattern of cultural evolution [11] or a cyclical turn-around of styles [1214] is also abundant, but a question remains: Are we underestimating the continuity and conservative forces at work in cultural traditions associated with creative freedom? This study asks this question about a practice which tends to be imagined as extremely individualistic: the composition of poetry. We apply a data-driven semantic analysis to the formal characteristics of texts across several languages. Our goal is to address one of the fundamental issues in versification studies: the relationship between poetic form and its meaning.

Modern poetry is often seen as a space of boundless innovation and individualized self-expression. However, there is at least one aspect of poetry that is defined by conservative persistence over centuries and millennia: poetic meter (Table 1). Meters are rarely invented individually; rather they are stable prosodic patterns that usually arrive in the hands of a poet after having long histories of use in local and global traditions. This persistence of formal features in poetry invites us to shift our focus from individuals to meters as cultural items that participate in long transmission chains [15] that diffuse them far and wide.

Table 1. Examples of accentual-syllabic metrical types and labeling strategies.

S—denotes a strong position in the foot (stress expected), W—weak.

Research into historical poetics and metrics strongly suggests that meter is not agnostic to meaning [1619]. Usage of meters is affected by what was previously written in these meters and by additional signals that poetic forms accumulate and carry over time. In oral traditions, a difference in meter was often functional: it supported genre diversification between the poles of epic and lyric poetry [20]. The rise of written and printed media allowed more diverse poetic metrical forms and expanded their sources. New forms could emerge from the standardization and restructuring of cultural borrowings: foreign traditions, classical Latin and Greek examples, adaptations of local oral versification systems. Through their usage, mnemonic capacities and generic conventions, metrical forms allegedly maintained fuzzy but distinct traditions of usage that were reproduced and updated by generations of poets. This theory of a relationship between a meter and its meaning is known as the “semantic halo of meter”.

In this study, we aim to computationally test the presence of the semantic halo across several modern European poetic traditions (18–20th c.). Most of the evidence for a semantic halo comes from sporadic informal studies of a few European (mostly Slavic) literatures [17, 1922]. Several recent attempts to formalize this concept have shown lexical differences between metrical forms in individual traditions [23, 24] while also broadly confirming the presence of the semantic halo in 18-to-20th century Russian verse [25]. This growing body of evidence suggests that the mechanism which binds form and meaning is widespread and could potentially be universal. However, a reliance on close reading, a lack of formalization and the sporadic nature of the research to date have limited scholars’ ability to generalize about the semantic halo, its nature and historical dynamics.

The current study relies on several assumptions about the nature of poetic form and its relationship to verse semantics. First, we assume that poetry is a socially learned practice that is subject to cultural evolutionary forces [2628]. Poetic formal features such as meter and rhyme are reproduced and developed through a copying process that may be influenced by various factors, or biases. For example, acquisition and popularity of meters might depend on their intrinsic features, or fit to a language prosody (stress-timed languages often adopt accentual-syllabic meters). It also might be driven by cultural prestige, like medieval and early modern Latin hexameter: despite not perceiving any distinction between long and short vowels anymore, scholars and students continued to write “correct” quantitative hexameters for centuries, relying solely on memorized rules [20].

Second, we assume that the meter and meaning association is mainly dependent on its historical environment—the ways it was copied and transmitted before. Despite increasing evidence for the role of iconicity and non-arbitrariness of form-meaning association in natural languages [29], available research on meters and their modern histories do not suggest that the relationships between poetic meter and meaning are fundamentally iconic (but these claims do exist [22]). Usage of a meter may differ across closely-related languages and cultures, can shift over time, be intentionally subverted and influenced by single individuals. It is also hard to defend any perceptual or motor foundation for the analogy between poetic form and meaning: even in special cases like work songs or sea shanties where rhythm organizes movement and action, the iconic connection is firstly established between form and function, not between form and semantics. If an alien would hear a recording of a sea shanty that did not use any maritime themes (which is common), liked it and started imitating this particular recording, the history of this form in the alien’s own tradition may very well lack any mentions of, or associations with the sea.

The halo effect, however, does resemble systematicity in languages. Systematicity describes non-random differences in word forms across abstract categories, like word classes or parts of speech [30, 31]: form regularities here provide cues to facilitate class distinction and learning. With poetic meters the relationship might be seen as reversed: many individual poems exhibit semantic regularities that map onto a limited amount of most frequent prosodic forms. Similarly to the documented emergence of systematicity in iterative learning experiments [32, 33], it is plausible that the distinct metrical patterns emerged in oral traditions to maximize semantic and structural differences in response to cognitive and environmental factors (e.g. aiding the memorization and recreation of different texts [34]). Treating meter-meaning relationship as systematic might provide support for claims of non-arbitrariness cases in semantic halo, like association between length of poetic line and mood, widespread in Europe (longer meters tend to be serious and convey importance, while shorter meters associate with lighter themes, songs and jokes) [17].

Given the lack of more detailed research in the domain, we look at the semantic halo in modern traditions as dependent on the power of structural differences and exposure to previous examples. In theory, nothing should stop a poet from choosing any theme when employing meter X, but in practice, poets often use meters somewhat similarly to how they were used in the past. The theory of semantic halo implies that meaning is copied preferentially within metrical forms, not across them. This might be partly explained by systematicity, since modern poetry inherited forms that were made different before: they did sound different, emerged from various sources [20], cast different systematic constrains on morphology and syntax [35] that guided the reproduction of syntactic formulas and provided mnemonic cues to a reader or a poet. Even under highly individualistic aesthetics, it would be hard to overcome the power of poetic meters and use them completely free. Indeed, to be “free” of a meter is to get rid of meters altogether, which is what happened in many Western traditions during the course of the 20th century.

Social transmission factors then might be seen as responsible for introducing variation to the grip that meters have on poets. For example, we might assume that at the population level poets are driven by a conformity bias and copy the most frequent examples of meter usage. Normative expectations were frequently taught explicitly, indicating appropriate meters for the occasion (like alexandrine verse for classic French tragedy, or iambic tetrameter for Russian or German ode). However, the pressure for conformity is unlikely to be constant across time: traditional scholarly narrative [36] suggests it is high under neo-classical aesthetics (that say: mirror the reality via certain rules of truth and beauty), and lower under romantic and nationalistic ideas (that say: a poem reflects a poet themselves, while also reflecting a little of a national spirit). This makes us expect a decrease in strength of semantic halo over time in European poetry. At the same time we don’t expect individualistic poetics to completely diffuse the halo effect: the structural difference in meters would continue to maintain distinction in the semantics.

Given the main assumptions, we argue that any poetic tradition that allows for structurally distinct poetic forms to exist over an extended time (whatever the case) should exhibit the semantic halo effect. This study, however, seeks to find the effect mainly in three closely related European accentual-syllabic (AS) traditions: Czech, German and Russian. In addition, we look for further evidence in English poetry and early modern Dutch songs. Since all traditions use the same versification system, metrical forms are inherently comparable, which makes cross-cultural study possible. Data come from corpora of different designs (S1 Appendix), and even include different textual domains (Dutch songs). The heterogeneity of sources ensures that our findings would not result from some artifact of corpus construction. We devise a language-independent methodology to represent each poem as a set of semantic features (latent topics inferred from all available texts within a corpus) and rely on unsupervised clustering to assess the strength of meter-meaning relationships.

The main question of this study is whether poems written in a particular meter also tend to employ similar topics (H1). We formalize this as clustering and classification tasks. Do poems written in one meter tend to group into coherent semantic clusters? And can a meter be recognized based on knowledge of nothing but the semantic features of corresponding poems?

These queries, in turn, raise the issue of semantic diffusion and retention. Diffusion would happen because of possible change in aesthetics and conformity pressure. We expect clustering to be less efficient in the later phases of a tradition compared to its early stages when we use the same sets of meters and similar sampling strategies (H2). This should be evident in all the corpora except for the Dutch songs: texts come from the early modern period and represent a tradition grounded in oral performance, which depends on the reproduction of existing forms and is presumably less prone to long-term innovations.

Finally, we expect that despite the diffusion of the semantic halo, poetic traditions retain a historical connection to the halo’s earlier states. If this is true, then it should be possible to recognize meters from earlier periods using models only exposed to later data and vice versa (H3).

To test these assumptions, we use the abstracted semantic features of poems (represented as topic probabilities) to recognize their metrical organization (represented as the unambiguous labels of metrical types, see Table 1). The distribution of topics or co-occurring groups of words in each poem is inferred with the aid of a generative Latent Dirichlet Allocation (LDA) model [37] trained for each lexically simplified corpus. Metrical labeling of poems combines information about the general metrical scheme used (e.g. iamb) and the particular type of this scheme (e.g. pentameter). We understand metrical types to be the main verse forms reproduced within and across traditions; they each have a distinct historical lineage that may be reconstructed to a greater or lesser extent. To take one example, English iambic pentameter, as introduced in the 14th century, may be traced to 10-syllable French isosyllabic meter, which developed from the Italian 11-syllable form. On the other hand, the so-called “ballad” meter (Iamb 4–3, “common measure”) originated from contacts between local English accentual verse and Latin vagant songs [20, 38, 39]. Where possible, the poems in our corpora were individually annotated with a single unambiguous label related to their metrical type. (The limitations of this approach are discussed in S2 Appendix).

We assume that meter & meaning association is impossible to trace at the level of a single poem since the semantic halo is not a deterministic mechanism [16] that prescribes meanings to texts. Rather this mechanism is probabilistic and observable at the level of central tendencies (which might, in turn, depend on the social transmission factors and aesthetic conventions of a group or a tradition). Given these factors and the skewed distribution of the metrical forms used in a tradition (see S3 Fig), we rely on an approximation of each meter’s semantic features based on random equal-sized samples of poems from a set of the most frequently used forms in the language.

Our analysis confirms the presence of the semantic halo of meter in all observed European traditions. In all cases except for the Dutch example, we also observe the predicted historical decline in the strength of this association. When the sample size is large enough,the cross-period classification also exceeds the random baseline in all cases. There are, however, significant differences among the corpora that may reflect different levels of exposure to change in the poetic traditions.


Meter and meaning (H1)

We approach the association between meters and topics with a clustering setup: we expect that independent samples of poems written in the same meter, as represented by vectors of topic probabilities, will tend to group together. We limit the data to meters that are sufficiently common (# of poems > 500) and then divide them into random samples of 100 poems per meter and aggregate topic probabilities within each sample. K-means clustering is then used to group the samples into k clusters, with k being set as the number of distinct meters. The resulting clusters are compared against the true groups based on the metrical types using Adjusted Rand Index. While Rand Index measures purely the similarity between two cluster groups on the scale from zero (no agreement between the clusters) to one (clusters are the same), Adjusted Rand Index is the corrected-for-chance version of this score where 0 indicates a situation when clustering is no different from a random assignment of data to clusters and 1 indicates perfect match between the two groups [40].

The procedure is repeated 10,000 times with each corpus (Fig 1A). All results point to a high level of association between metrical forms and the semantic features of corresponding poems in each of the observed corpora. This provides strong support for the presence of the semantic halo across all traditions and confirms H1. To illustrate the extent of the association between form and meaning, one such random sample from each corpus is represented by means of two-dimensional PCA biplots (Fig 1B–1F).

Fig 1. Random 100-poem samples taken without replacement per meter in vector spaces defined by LDA topic models.

(A) Adjusted Rand Index of k-means clustering (whiskers give the 5th- to 95th-percentile range). 10,000 random samplings. Crosses show the ARI of the samplings presented in PCA biplots. PCA biplots of (B) Czech (8 meters), (C) German (4 meters), (D) Russian (4 meters), (E) Dutch (4 meters) and (F) English data (2 meters) respectively with eigenvectors for the 5 most contributing topics. Single random sampling.

An additional supervised classification run corroborates the evidence with even smaller samples (Fig 2: first boxplot series).

Fig 2. Accuracy of SVM classifications.

Predicting meter with vectors defined by topic probabilities in random samples of poems (sample size ∈{1, 5}∪{10, 20, …, 100}). (1) trained and tested on the entire dataset (leave-one-out cross-validation), (2) trained on earlier data and tested on later data, (3) trained on later data and tested on earlier data. 10,000 iterations. (A) Czech, (B) German, (C) Russian, (D) Dutch, (E) English.

Diffusion over time (H2)

We expect the semantics of meter to become less recognizable over time: poems emerged within aesthetics that reportedly shifted from normative to individualistic [36] which would allow to untie meters from strict genre expectations. This hypothesis is, however, generally limited to canonical 19th-century poetry, and there is no reason to apply it to early modern popular songs (i.e. our Dutch corpus). As a tradition which relied on oral performance and the persistence of popular melodies, these songs are expected to maintain more stable generic lineages.

To test the hypothesis, we split each corpus into two subcorpora based on the date of publication of particular poems. (Both the amount of data at our disposal and the distribution of meters over time prevent us from dividing the corpora according to a more granular chronology or partitioning the English data in a similar way.) In each subcorpus, we test the strength of the association between metrical labels and sample-wide aggregated semantic features in the manner described above. To ensure comparability, we use the same set of meters and draw a fixed number of samples from the two parts of each corpus.

Based on H2, we expect to see a decrease in clustering accuracy in the later stage of the timeline when compared to the earlier part. We observe this trend in the Czech, German and Russian corpora but not in the the Dutch songs where forms and their semantics maintain similar levels of association over 200 years (Table 2). To formally address the trend in three main languages, we fit a Bayesian regression model on a subset of our clustering results allowing interaction between period and language groups. We directly estimate the difference in clustering (S6 Fig) between periods across languages using predictive draws from posterior (Czech: β = −0.08, lower 95% CI = −0.1, upper = −0.06; German: β = −0.48, lower CI = −0.52, upper = −0.44; Russian: β = −0.11, lower CI −0.15, upper = −0.06).

Table 2. Adjusted Rand Index of k-means clustering in different periods (random 100-poem samples).

10,000 iterations.

Retention of forms over time (H3)

Despite changes over time in the strength of the semantic halo, we expect to see historical continuity in the use of various forms so that we may predict the later stages of a meter’s semantics by knowing its earlier semantic features and vice versa. To test this, we perform supervised classification on each corpus: the training set is restricted to samples from one subcorpus while the test set only includes poems from the other one. The results show high variation (Fig 2) across the corpora, which may reflect differences in their sources and/or historical idiosyncrasies in the usage of meter. The Russian tradition exhibits the most stable use of the main metrical forms over the observed period; there is little difference between past and future forms. In contrast, both the German and Dutch corpora show higher semantic recognizability from the future to the past than in the opposite direction. The assymetry suggests semantic accumulation over time, a pattern where later metrical semantics “enclose” [41, 42] the early usage of a form but are already too different to be recognizable from the past. The recognizability of Czech forms stays stable at a level barely above the random baseline: it is likely connected to this tradition being in a volatile establishment state and changing its metrical preferences midway through the 19th century (S4 Fig).

Overall these results provide weak support for H3, but also highlight the variation in the histories of meter usage.


Our findings strongly support the association between poetic meter and meaning, providing evidence for the theory of the semantic halo. At least within certain accentual-syllabic systems, European poetic traditions demonstrate their use of conservative mechanisms to produce and retain meaning. Metrical forms attract arbitrary semantic features that are reproduced when the form itself is reproduced. The precise mechanisms of this attachment are yet to be understood but different dynamics may be at play. These could include forces intrinsic to the texts (e.g. the mnemonic function of a meter [34]), institutional factors (popularity, example-based teaching, the literary canon [43]) and transmission biases, like conformity and prestige.

By focusing on semantic generalization over poetic nuance, our approach is able to demonstrate the overarching tendencies that distinguish meters from one another. As Fig 2 and distinctive topics for each meter suggest, the most striking thematic distinction occurs between trochaic and iambic meters. Globally, the trochee is tied to a song genre and is associated with the themes of love, vernacular language and national romantic sentiment. This highlights the trochee’s historical roots: across Europe, folk songs were associated with the trochee or trochaic rhythm [16] and they often took form of regular trochaic meters in modern versification systems. Iambic forms, on the other hand, can be distinguished by topics that point to a learned poetic style: introspective reflections, religious and existential themes. Dutch songs demonstrate the difference most clearly: in Dutch, iambic songs cover religious topics and are the products of an educated, written tradition, while trochaic songs maintain vernacular themes (love, joy, work) that tie them directly to their origins in oral culture.

The semantic tradition that separates iambs from trochees should not, however, be seen as universal. The semantic similarity of trochee-4 and iamb-3 in German, for example, points to the similar usage of these two meters and their historical origins in the European tradition of “anacreontics” [16]: lyrical songs of feasts, wine and love that are associated with the neo-classical tradition. During the Romantic period, anacreontics easily mixed with rediscovered folk songs and vernacular language, which made iamb-3 gravitate towards the trochaic meaning space. The semantic proximity of these two forms is also evident in Russian verse [25], which partially inherited the use of this meter from German. Specific cases with a well-documented history are also captured by the model. For example, trochee-5 in Czech was the meter embraced by elites during the national revival and its distinctive topics reflect national pathos, folk imagery and civil uprising [44]. Russian trochee-5, on the other hand, was initially a rare form. A single extremely popular poem in this meter launched a tradition that bears some similarity to the “founder” poem to this day: the LDA model is able to recover topics that refer to its semantic halo that scholars summarise as “introspective travel on the road (real or metaphorical) at night” [16].

Our conclusions about similar meter and meaning association processes in European verse have obvious limitations. First, the three main traditions observed in our study are closely related and did not develop in isolation: German metrical models played a foundational role in establishing modern verse in both Russian and Czech. This means that deeply rooted metrical associations (e.g. the difference between a trochee and a iamb) may result from common origins or cultural proximity, and not from differentiation as a common function of meter. Second, in this work we aggregate different stanzas and rhyme schemes under a few metrical labels, but these forms often had their own distinct usage traditions. In fact, stanza organization may be more relevant than meter for distinguishing genres in languages where verse regularity is based exclusively on the number of syllables (e.g. French, Italian, Spanish). In other words, we cannot define a “poetic form” universally for every poetic tradition, and broader comparative research will need to operate at several levels of abstraction.

We are hopeful that our findings will advance the conversation in metrical studies since they provide the first large-scale formal evidence of an association between meter and meaning in Western poetry. They also pave the way for the incorporation of the explicit modeling of historical processes into literary studies: mechanisms that drive the formation and change of the semantic halo generally remain a mystery and would require well-defined models that link individual interactions to the observed patterns. This turn to cultural evolutionary framework and cultural transmission models could establish common ground with linguistics, anthropology and social sciences for understanding factors behind the changes and continuities in cultural traditions. Identifying the mechanisms that limit cultural transmission [45] and generate long-term patterns in creative domains is also important if we are to understand the rate of cultural evolution [46, 47] and the shape of cultural phylogenies [4851]. If trochaic tetrameters are more semantically similar to their trochaic tetrameter “ancestors” than to any other meter, the form alone could be responsible for establishing and maintaining the divergent traditions within a creative domain or, as we usually call them, genres.

Materials and methods


Our research uses five metrically annotated poetry collections in normalized orthography, each of which concerns one language tradition: Czech [52, 53], Dutch [54], English [55], German [56, 57] and Russian [58, 59]. These collections have disparate sources and vary in size, chronological scope, general composition principles and survivorship bias (the Russian corpus, for example, favors poems that were reprinted in 20th-century scholarly editions). A summary of the corpora before filtering and pre-processing is provided in Table 3. For more details about each collection and a summary statistics, see S1S4 Figs.

We rely on poetic meters to distinguish between verse forms. Meters—the idealized rules of a text’s prosodic composition—arise from the stable recurring rhythmic patterns that organize language prosody in particular ways. In accentual-syllabic (AS) verse, meters are organized around the distinction between weak and strong stress. Regularity in AS systems is based on the recurring groups of syllables: a two-syllable unit in a weak-strong sequence is called a iambic foot; a sequence of feet organizes a poetic line (e.g. iambic pentameter). As a group of contextually aligned lines, a poem gives us information about the metrical type it follows. In classical AS systems, homogeneous metrical types were dominant in poems (e.g. all lines followed the general pattern of iambic pentameter) although the use of alternating types and free form is not uncommon. In our research, we focus on iambic and trochaic metrical types—the most widespread ways to organize verse in European AS traditions.

Meter recognition

All the collections in our study were metrically pre-annotated on a line-by-line basis using language-specific rule-based algorithms [52, 57, 58, 60] or manual methods [54]. Since in all five traditions the dominant versification system is accentual-syllabic, all prosodic patterns and verse organization principles are comparable. However, the task of formally describing a poem in a collection is not straightforward since “metrical form” may be interpreted on several levels (see S2 Appendix).

Where possible, each poem in a corpus was assigned a single unambiguous metrical label (e.g. “iamb-4” (I4), “trochee-5” (T5), etc.). This process involved some heuristics to determine labels: in line with existing annotation simplification principles [16], we considered a poem to be of a particular metrical type if at least 80% of its lines conformed to that same pattern. All heterogeneous cases or non-metrical poems were left unmarked. (Our models were built on all of the available texts, but only the labeled ones were used in the analysis). The loss of data at different filtering steps is presented in S4 Table. Trochee and iamb cover the majority of recognized pool of metrical poems.


All corpora were initially filtered by size to exclude poems that were too short or too long so that the document sizes would remain comparable in the LDA model. We also excluded early modern poems from German and English (S2 Fig) to maintain a comparable chronological range across the corpora. Each corpus went through lemmatization and part-of-speech tagging (MorphoDiTa [61] was used for the Czech corpus while TreeTagger [62] was employed for all the other corpora). Afterwards, we applied lexical simplification to words outside the 1000-most-frequent list so that the frequency distribution was less sparse (LDA models are susceptible to noise and work better with a reduced vocabulary [6365]). Low frequency words were replaced with one of their more common contextual neighbors if that neighbor appeared in the list of the 10 semantically closest words. Semantic similarities were determined independently for each corpus with word-embedding models that had been trained on the respective collections to capture the specific semantic relationships of poetic language. For the word-embedding models, we used word2vec [66] implementation from the Python gensim framework [67].

Semantic features

Our study of the overarching semantic relationships of metrical forms required some abstract representation of poetic language that would allow us to summarize a poem’s “content” at a general level. We used LDA topic models [37] trained for each of the collections on the non-aggregated data. An LDA model infers topics—groups of co-occurring words—from a collection without supervision. Since LDA is a generative algorithm, each poem in a corpus can be uniformly represented by the topic probabilities that generate its distribution of words. In cultural analytics, topic modeling has become a common way of inferring higher-order semantic properties from a collection of texts so that they can be used for further analysis and reasoning [11, 46, 68, 69]. While it is less efficient when used with shorter texts [70], it has proven to be generally applicable to poetry without any major reported drawbacks [25, 71, 72].

Topic modeling, lexical simplification and lemmatization served another important goal in our study: they mitigated the effects of metrical patterns on morphology and sentence structure [5, 35]. We additionally checked that our results would not be fully explained by pure morphology-based clustering (S2 Table).

Our results are reported based on an LDA model trained with 100 topics, but our findings remain robust with the change in the number of topics (we tested models with 20, 50, 100 and 150 topics, S1 Table). The choice of model is therefore not particularly important for the study design. On the other hand, increasing the number of topics may increase the human interpretability of the relationships.

Clustering and classification

Most of the analysis in this study relies on unsupervised approach to classification. There were two reasons for this: the first was data scarcity because of uneven popularity of different meters. The second reason was the risk of over-fitting: in testing associations between meter and meaning, we wished to capture naturally emerging relationships rather than imposing a fixed view about known “classes” on the data.

We used supervised classification solely to model chronological perspectives on meter recognizability (H3). In this case, the training of the model was stratified by time so that learned classes only incorporated “past” or “future” knowledge about the meter usage in a tradition.

For k-means clustering and the Adjusted Rand Index [40] calculation, we used the implementation provided by the Python library scikit-learn [73] (sklearn.cluster.KMeans). The number of clusters was set as the number of distinct meters found in the dataset.

To test H3, Support Vector Machine classification was also performed with the scikit-learn library, (sklearn.SVM.SVC). We used the classifier with a degree-3 polynomial kernel.


To model difference between periods across languages we fit a Bayesian regression using brms interface in R. We use a random subset of 100 clustering results per language per period to introduce some uncertainty to the model. We predict clustering results (ARI) with two-way interaction between periods and languages. Since response ARI values cannot go over 1 and also exhibit extreme uniformity in some cases (early Czech and early German), we assume a truncated normal distribution for ARI with the upper bound on 1. To ensure convergence, we subtract a small dummy value (0.02) from ARI measures to push it slightly away from the truncation boundary (ARI can’t go over 1, but in some cases takes values below 0). We use weakly informative priors on “slope” parameters (βNormal(0, 0.1)). See coefficients (S5 Table) and predictive posterior distribution (S6 Fig) of ARI values across languages.

Supporting information

S1 Fig. Distribution of poem sizes.

Red vertical lines mark our filtering cutoffs.


S2 Fig. Chronological distribution of poems.


S3 Fig. Distribution of metrical types per tradition.


S4 Fig. Historical changes in the use of meters and proportion of given forms in the corpora.

In the Czech works, we see a radical shift from trochee- to iamb-based works. There is also a gradual rise in trochee usage in the Dutch song corpus. This likely reflects the increasing representation folklore texts in print.


S5 Fig. PCA biplots of (A) Czech and (B) German variants of iambic pentameter with eigenvectors of the five most influential topics.

Single random sampling. Each variant is defined as the shortest re-occurring pattern of line endings within the poem (i.e. whether these are masculine, feminine or dactylic).


S6 Fig. Posterior predictive distribution of ARI across languages and periods.

Colored points show posterior means, error bars show 95% credible intervals. Grey points represent empirical values (jitter added to x-axis for better readability).


S1 Table. Random 100-poem samples taken without replacement for each meter in vector spaces defined by LDA topic models(20, 50, 100, 150 topics).

The results of our H1-related analysis show no qualitative variation regardless of the number of topics used to train an LDA model. We repeated the analysis in its full form (10,000 clustering iterations) for four different LDA models and report the Adjusted Rand Index mean along with the interquartile range.


S2 Table. Random 100-poem samples taken without replacement for each meter in vector spaces defined by part-of-speech frequencies.

Adjusted Rand Index of k-means clustering. Accentual-syllabic meters are systems of limitations superimposed on language. As such, they transform natural morphological and syntactical affordances [35]. This is why we can distinguish poetry from prose so easily based on word frequencies [5]. In addition, the distribution of parts of speech differs across metrical forms. Some words are more common in ternary meters simply because they are less likely to appear in binary meters for prosodic reasons. This has little connection with the semantics of meter but instead reflects the structural properties of verse. To make sure our pre-processing steps mitigated the problem of morphological differences, we repeat the clustering procedure from the H1 analysis. Here we use the frequencies of parts of speech that were included in LDA model (nouns, adjectives, verbs) as a feature set. S2 Table. shows that accuracy is visibly lower in this case than when clustering is topic-based.


S3 Table. Distinctive topics for the most common meters.

The most distinctive topics for the most common meters. For each meter, topic probabilities are averaged across the entire corpus. These values are transformed into z-scores across particular meters. The table shows the five topics with the highest z-scores for each meter.


S4 Table. Corpora filtering.

Step 1 gives the number of poems after exclusion of other than accentual-syllabic poems (these include for instance free verse, accentual verse, syllabic verse), poems where less than 80% of lines are written in a single meter, poems outside the selected time span and poems outside the required length span (4 to 100 lines). The figure in parentheses gives how many poems remain after this step as compared to the entire corpus. Step 2 gives the number of poems when keeping only the most common iambic and trochaic meters. The figure in parentheses gives how many poems remain after this step as compared to the previous one.


S5 Table. H2 model coefficients, truncated normal.

Fit of a Bayesian regression model with Hamiltonian Monte Carlo sampling (4 chains, 2000 iterations, 1000 warmup); brms formula: ARI|trunc(up = 1) ∼ 1 + period * language. Categories are index-coded, reference level for predictors is Czech for languages and early for period. Bulk_ESS and Tail_ESS are measures of effective sample size. at 1 signifies chain convergence. effective sample size. at 1 signifies chain convergence.



We would like to thank Klemens Bobenhausen, Benjamin Hammerich, Arthur Jacobs and Yurii Zelenkov for their help with data acquisition and annotation.


  1. 1. Manovich L. Cultural Analytics. MIT Press; 2020.
  2. 2. Heuser R, Le-Khac L. A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method. Standford Literary Lab Pamphlet. 2012;4.
  3. 3. Morin O, Acerbi A. Birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction. Cognition & Emotion. 2017;31(8):1663–1675. pmid:27910735
  4. 4. Hughes JM, Foti NJ, Krakauer DC, Rockmore DN. Quantitative patterns of stylistic influence in the evolution of literature. Proceedings of the National Academy of Sciences. 2012;109(20):7682–7686. pmid:22547796
  5. 5. Storey G, Mimno D. Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus. Journal of Cultural Analytics. 2020;
  6. 6. Cutting JE, Brunick KL, DeLong JE, Iricinschi C, Candan A. Quicker, Faster, Darker: Changes in Hollywood Film over 75 Years. i-Perception. 2011;2(6):569–576. pmid:23145246
  7. 7. Underwood T. Distant Horizons: Digital Evidence and Literary Change. University of Chicago Press; 2019.
  8. 8. Underwood T. The literary uses of high-dimensional space. Big Data & Society. 2015;2.
  9. 9. Tynianov Y. Permanent Evolution: Selected Essays on Literature, Theory and Film. Boston: Academic Studies Press; 2019.
  10. 10. Simmel G. Fashion. International Quarterly. 1904;10(1):136.
  11. 11. Mauch M, McCallum RM, Levy M, Leroi AM. The evolution of popular music: USA 1960–2010 \textbar Royal Society Open Science. Royal Society Open Science. 2015;2. pmid:26064663
  12. 12. Yarkho B. Speech Distribution in Five-Act Tragedies (A Question of Classicism and Romanticism). Journal of Literary Theory. 2019;13(1):13–76.
  13. 13. Peterson RA, Berger DG. Cycles in Symbol Production: The Case of Popular Music. American Sociological Review. 1975;40(2):158–173.
  14. 14. Klimek P, Kreuzbauer R, Thurner S. Fashion and art cycles are driven by counter-dominance signals of elite competition: quantitative evidence from music styles. Journal of The Royal Society Interface. 2019;16(151):20180731. pmid:30958188
  15. 15. Morin O. How Traditions Live and Die. Oxford University Press USA; 2015.
  16. 16. Gasparov M. Metr i smysl: ob odnom iz mekhanizmov kulturnoi pamiati. Moscow: Izdatelskii tsentr RGGU; 1999.
  17. 17. Tarlinskaja M, Oganesova N. Meter and Meaning: The Semantic Halo of Verse Form in English Romantic Lyrical Poems (Iambic and Trochaic Tetrameter). The American Journal of Semiotics. 1986;4(3/4):85–106.
  18. 18. Tarlinskaja M. Meter and Meaning: Semantic Associations of the English “Dolnik” Verse Form. Style. 1989;23(2):238–260.
  19. 19. Červenka M. Z večerní školy versologie II: Sémantika a funkce veršových útvarů. Prague: Akcent; 1991.
  20. 20. Gasparov M. A History of European Versification. Oxford, New York: Oxford University Press; 1996.
  21. 21. Pszczołowska L, editor. Słowiańska metryka porównawcza 3: Semantyka form wierszowych. Wrocław: Zakład Narodowy Imienia Ossolińskich; 1988.
  22. 22. Dobrzyńska T. Verse Forms as Bearers of Semantic Values. Studia Metrica et Poetica. 2014;1(2):103–128.
  23. 23. Piperski A. Semantic halo of a meter: a keyword-based approach. In: Kompiuternaia lingvistka i intellectualnyie tekhnoloigii. vol. 2 of Kompiuternaia lingvistika: lingvisticheskiie issledovaniia. Moscow: RGGU; 2017. p. 342–354.
  24. 24. Orekhov B. Bashkirskii stikh XX veka. Korpusnoe issledovanie. St. Petersburg: Aletejia; 2019.
  25. 25. Šeļa A, Orekhov B, Leibov R. Weak Genres: Modeling Association Between Poetic Meter and Meaning in Russian Poetry. In: CHR 2020: Workshop on Computational Humanities Research. Amsterdam: CEUR-WS; 2020. p. 12–31. Available from:
  26. 26. Boyd R, Richerson PJ. Culture and the evolutionary process. Chicago: University of Chicago Press; 1988.
  27. 27. Mesoudi A. Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences. University of Chicago Press; 2011.
  28. 28. El Mouden C, André JB, Morin O, Nettle D. Cultural transmission and the evolution of human behaviour: a general approach based on the Price equation. Journal of Evolutionary Biology. 2014;27(2):231–241. pmid:24329934
  29. 29. Dingemanse M, Blasi DE, Lupyan G, Christiansen MH, Monaghan P. Arbitrariness, Iconicity, and Systematicity in Language. Trends in Cognitive Sciences. 2015;19(10):603–615. pmid:26412098
  30. 30. Monaghan P, Shillcock RC, Christiansen MH, Kirby S. How arbitrary is language? Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369(1651):20130299.
  31. 31. Pimentel T, McCarthy AD, Blasi D, Roark B, Cotterell R. Meaning to Form: Measuring Systematicity as Information. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. p. 1751–1764. Available from:
  32. 32. Kirby S, Tamariz M, Cornish H, Smith K. Compression and communication in the cultural evolution of linguistic structure. Cognition. 2015;141:87–102. pmid:25966840
  33. 33. Nölle J, Staib M, Fusaroli R, Tylén K. The emergence of systematicity: How environmental and communicative factors shape a novel communication system. Cognition. 2018;181:93–104. pmid:30173106
  34. 34. Rubin DC. Memory in Oral Traditions: The Cognitive Psychology of Epic, Ballads, and Counting-Out Rhymes. Oxford University Press USA; 1995.
  35. 35. Gasparov ML, Tarlinskaja M. The Linguistics of Verse. The Slavic and East European Journal. 2008;52(2):198–207.
  36. 36. Abrams MH. The Mirror and the Lamp: Romantic Theory and the Critical Tradition. Oxford University Press; 1953.
  37. 37. Blei DM. Latent Dirichlet Allocation. Journal of Machine Learning Research. 2003;3:993–1022.
  38. 38. Saintsbury G. A History of English Prosody. Macmillan and Co.; 1906.
  39. 39. Tarlinskaja M. English Verse: Theory and History. De Gruyter Mouton; 1976. Available from:
  40. 40. Hubert LJ, Arabie P. Comparing partitions. Journal of Classification. 1985;2(2-3):193–218.
  41. 41. Chang KK, DeDeo S. Divergence and the Complexity of Difference in Text and Culture. Journal of Cultural Analytics. 2020;
  42. 42. Underwood T, So RJ. Can We Map Culture? Journal of Cultural Analytics. 2021; p. 24911.
  43. 43. Gronas M. Cognitive Poetics and Cultural Memory: Russian Literary Mnemonics. 1st ed. New York: Routledge; 2010.
  44. 44. Sgallová Květa. Dlouhé trocheje v české poezii. Česká literatura. 2012;60(3):321–337.
  45. 45. Durham WH. Advances in Evolutionary Culture Theory. Annual Review of Anthropology. 1990;19(1):187–210.
  46. 46. Lambert B, Kontonatsios G, Mauch M, Kokkoris T, Jockers M, Ananiadou S, et al. The pace of modern culture. Nature Human Behaviour. 2020;4(4):352–360. pmid:31959923
  47. 47. Perreault C. The Pace of Cultural Evolution. PLOS ONE. 2012;7(9):e45150. pmid:23024804
  48. 48. Gray RD, Jordan FM. Language trees support the express-train sequence of Austronesian expansion. Nature. 2000;405(6790):1052–1055. pmid:10890445
  49. 49. O’Brien MJ, Lyman RL. Evolutionary archeology: Current status and future prospects. Evolutionary Anthropology: Issues, News, and Reviews. 2002;11(1):26–36.
  50. 50. Barbrook AC, Howe CJ, Blake N, Robinson P. The phylogeny of The Canterbury Tales. Nature. 1998;394(6696):839–839.
  51. 51. Youngblood M, Baraghith K, Savage PE. Phylogenetic reconstruction of the cultural evolution of electronic music via dynamic community detection (1975-1999). arXiv:201102460 [q-bio, stat]. 2020;.
  52. 52. Plecháč P, Kolár R. The Corpus of Czech Verse. Studia Metrica et Poetica. 2015;2(1):107–118.
  53. 53. Plecháč P. Czech Verse Processing System KVĚTA—Phonetic and Metrical Components. Glottotheory. 2016;7(2):159–174.
  54. 54. van Kranenburg P, de Bruin M, Volk A. Documenting a song culture: the Dutch Song Database as a resource for musicological research. International Journal on Digital Libraries. 2019;20(1):13–23.
  55. 55. Jacobs AM. The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses. Frontiers in Digital Humanities. 2018;5:5.
  56. 56. Bobenhausen K. The Metricalizer: Automated Metrical Markup for German Poetry. In: Küper C, editor. Current Trends in Metrical Analysis. Frankfurt am Main: Peter Lang; 2011. p. 119–131.
  57. 57. Bobenhausen K, Hammerich B. Métrique littéraire, métrique linguistique et métrique algorithmique de l’allemand mises en jeu dans le programme Metricalizer2. Langages. 2015;199:67–87.
  58. 58. Grishina E, Korchagin K, Plungian V, Sichinava D. Poeticheskii korpus v ramkah NKRIA: obschaia struktura i perspektivy ispolzovania. In: Natsionalnii korpus russkogo iazyka: 2006-2008. Novye rezultaty i perspektivy. St. Petersburg: Nestor-Istoria; 2009. p. 71–113.
  59. 59. Korchagin K. Poezija XX veka v poeticheskom podkorpuse Natsional’nogo korpusa russkogo iazyka: problema reprezentativnosti. Trudy instituta im VV Vinogradova. 2015;6:235–256.
  60. 60. Heuser R, Falk J, Anttila A. Prosodic (software); 2010.
  61. 61. Straková J, Straka M, Hajič J. Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics; 2014. p. 13–18.
  62. 62. Schmid H. TreeTagger-a language independent part-of-speech tagger. 1994;.
  63. 63. Boyd-Graber J, Mimno D, Newman D. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. In: Handbook of Mixed Membership Models and Their Applications. CRC Handbooks of Modern Statistical Methods. Boca Raton, Florida: CRC Press; 2014. p. 3–41.
  64. 64. Schofield A, Magnusson M, Thompson L, Mimno D. Understanding Text Pre-Processing for Latent Dirichlet Allocation. In: Proceedings of the 1st Workshop for Women and Underrepresented Minorities in Natural Language Processing. EMNLP; 2017. p. 432–436. Available from:
  65. 65. Uglanova I, Gius E. The Order of Things. A Study on Topic Modelling of Literary Texts. In: CHR 2020: Workshop on Computational Humanities Research. Amsterdam: CEUR-WS; 2020. p. 57–76. Available from:
  66. 66. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc.; 2013. p. 1–9. Available from:
  67. 67. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA; 2010. p. 45–50.
  68. 68. Barron ATJ, Huang J, Spang RL, DeDeo S. Individuals, institutions, and innovation in the debates of the French Revolution. Proceedings of the National Academy of Sciences. 2018;115(18):4607–4612. pmid:29666239
  69. 69. Murdock J, Allen C, DeDeo S. Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks. Cognition. 2017;159:117–126. pmid:27939837
  70. 70. Li C, Duan Y, Wang H, Zhang Z, Sun A,Ma Z. Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings. ACM Trans Inf Syst. 2017;36(2).
  71. 71. Asgari E, Ghassemi M, Finlayson MA. Confirming the themes and interpretive unity of Ghazal poetry using topic models. Neural Information Processing Systems (NIPS) Workshop for Topic Models. 2013;.
  72. 72. Navarro-Colorado B. On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry. Frontiers in Digital Humanities. 2018;5:15.
  73. 73. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.