^{1}

^{*}

^{2}

^{3}

Conceived and designed the experiments: JLC RMZ. Analyzed the data: JLC RMZ SJS. Wrote the paper: JLC RMZ SJS.

The authors have declared that no competing interests exist.

Philologists reconstructing ancient texts from variously miscopied manuscripts anticipated information theorists by centuries in conceptualizing information in terms of probability. An example is the editorial principle ^{+0.09}_{−0.15} likelihood of the rarer word being the one accepted in the reference edition, which is consistent with the observed 547/756 = 0.72±0.03 (95%). Statistically informed application of DLP can recover substantial amounts of semantically meaningful entropy information from noise; hence the extension

How accurately have culturally fundamental texts been transmitted to the present by way of variously miscopied manuscripts? If the accuracy can be measured, can it be improved, and if so, how? Philology traditionally has been concerned almost entirely with information of the semantic kind, that is, with meaning. Here we are concerned instead with what has been called entropy information, information entropy, and Shannon entropy (and sometimes negentropy in recognition that a higher information content corresponds to a higher degree of disorder). In the first study of its kind, we measure the accuracy of transmission in bits/word of meaningful entropy information. The case in point is one of acknowledged editorial excellence and cultural importance: the reconstruction of Lucretius's

Information theory originated in twentieth-century telecommunications engineering, as is well known

Beginning with notions developed independently by Wiener, Shannon established that information is a probabilistic phenomenon closely akin to entropy; that information entropy tends to be lost as noise during transmission in a manner analogous to the increase in physical entropy according to the second law of thermodynamics; and that the losses are recoverable from noise, sometimes completely, from redundancies in the information received

In associating information with probability, philologists at least as early as Erasmus (?1466–1536)

As a statistical generalization, DLP is well grounded. Consider an author's original manuscript (autograph copy) of a text containing N =

As Shannon showed

It can be shown that the total amount of information in the two manuscripts collectively,

Because correlation in the co-occurrence of words and symbols is a characteristic of human language, copying error will tend to result in information loss

Thus Tutivillus, like Maxwell's Demon, is a sorting demon with respect to entropy, but unlike its counterpart, has a dual nature as a randomizing demon with respect to semantic information. Whereas Maxwell's Demon decreases physical entropy by intelligently sorting gas molecules by energy level (which requires information about their energy levels), Tutivillus decreases information entropy by playing perversely on words' correlated co-occurrence.

Let us turn now to finite messages because it is to these that DLP applies. Consider a message so long that the relative abundance ^{–H}^{−H}

If, as expected, Δ

Thus there is no question that DLP

C.P. Snow made awareness of the second law of thermodynamics his litmus test for dividing academics into his famous Two Cultures, humanistic and scientific

The mean value 〈Δ^{+0.09}_{−0.15} likelihood of the rarer word being the better choice, showing the value of the

Ever since Erasmus, if not before, the favored approach to reconstructing a text has been first to reconstruct the “family tree” (stemma) of manuscripts based on the occurrence of major “mutations” (characteristic errors)

The steps in preparing a new edition are: identifying and studying comparatively the surviving manuscripts of the text (exemplars); identifying the characteristic errors that appear to distinguish the major branches of the stemma; reconstructing the stemma in detail by seeking the tree that accounts most parsimoniously for the occurrence of characteristic errors in terms of the relative recency of common descent among exemplars; selecting for further analysis only those readings evidently closest to the author's original, and eliminating from further consideration those variants that contain no additional information; collating the selected manuscripts word by word; and finally, choosing among the alternative wordings in the effort to reconstruct the closest possible approximation to the original text, footnoting the rejected alternatives in the new edition's

In its strictest form, Lachmann's method assumes that the manuscript tradition of a text, like a population of asexual organisms, originates with a single copy; that all branchings are dichotomous; and that characteristic errors steadily accumulate in each lineage, without “cross-fertilization” between branches

Many types of scribal error have been catalogued at the levels of pen stroke, character, word, and line, among others

Limiting ourselves to two manuscripts with a common ancestor (archetype), let us suppose as before that wherever an error has occurred, a word of lemma

Let us suppose further that copying mistakes in a manuscript are statistically independent events. The tacit assumption is that errors are rare and hence sufficiently separated to be practically independent in terms of the logical, grammatical, and poetic connections of words. With Lachmann's two manuscripts of Lucretius, the ∼2,100 variants in ∼50,000 words of text correspond to a net accumulation of about one error every four lines in Lachmann's edition in the course of about five removes, or of roughly one error every 20 lines by each successive scribe. The separation of any one scribe's errors in this instance seems large enough to justify the assumption that most were more or less independent of one another.

Finally, let us suppose that an editor applying DLP chooses the author's original word of lemma _{k}_{k}

Under these conditions, the editor's probability

As equation (4) shows, a single bit of information entropy suffices to predict correctly the outcome of a Bernoulli trial (

Redundancy is possible, which corresponds to the situation

What evidence is there that earlier philologists ever paid anything more than lip service to DLP, and that they indeed understood enough about information in the sense of entropy to recapture measurable amounts of it? Given a suitable text against which to judge the correctness of choices between alternative words, DLP becomes a testable hypothesis. The ideal standard of comparison is the archetype of the manuscripts being used to reconstruct the text. A problem is immediately apparent: an ideal test would be possible only in the seldom if ever realized case in which the archetype has been unequivocally identified subsequent to the reconstruction of its text; for if the archetype were already known, what incentive would there be to reconstruct it? Thus for testing DLP, we must be content with evaluating an earlier, more narrowly based edition against later, more broadly based editions. Ideally, all the editions would be statistically independent of one another, but this is exceedingly unlikely.

We need to test statistically whether the probability _{1}_{2}

But why be concerned with information at all if DLP maintains simply that an editor will more often be correct in choosing the less common of equally acceptable alternative words? As will be explained, it is quite possible for an editor to choose correctly by selecting the less common word more often than not, thereby satisfying DLP (_{2}_{1}

Let us turn now to the case of an archetype whose text contains N =

The expression on the right is the logarithm of the multinomial probability of the particular set of numbers

Suppose now that a copyist has mistakenly replaced an original word of lemma

Questions about expression (8) in relation to continuous as opposed to discrete information are taken up in section 2.4 below.

The average of Δ

Whenever a copying error is made, an amount of information |Δ

The firmest prediction for testing DLP comes from the second law as it applies to information: if the editor has successfully taken advantage of entropy information, then the average Δ

There is no doubt that editorial decisions are based primarily on semantic information. Hence there is reason to believe that entropic information ordinarily contributes less than half of the single bit needed to decide a binary choice, especially since DLP comes into play only when there is enough non-entropic information to establish that both alternatives are acceptable, and more or less equally so. Thus we have a second expectation: that 〈Δ

All that can be estimated from 〈Δ

Evidence comes by way of equation (8). If 〈Δ_{1}_{1}_{1}_{1}_{1}_{2}_{2}_{1}_{1}_{2}_{1}_{2}

To sum up, 0<〈Δ_{1}_{2}

Would the corresponding expression derived from equation (1), Δ_{2} [_{2} [_{2} {_{2} [

How could a text approach the theoretical minimum-information condition

We analyze Lachmann's 1850 reconstruction of Lucretius's ^{II}, and that ω^{II} in turn is twice removed from a lost fourth- or fifth-century ancestor known as Ω

We evaluate Lachmann's reconstruction using the later and much more broadly based reconstruction by Ernout

Ernout's text contains N = 49,658 words belonging to ^{3} bits (∼58KB). The entropy information per word,

We count 2,095 instances in which Lachmann's

We calculate the entropy difference Δ_{2} [6/(23+1)]| = 2.0 bits to be gained or lost. We take Ernout's text as establishing the correct alternative, as if it were the text of the common ancestor ω^{ II}. In this instance, Lachmann chose

The distribution of Δ

Similar results were obtained with the 830 instances in which Lachmann and Ernout, but not necessarily Martin, agree on the alternative lemmata: 〈Δ

As a channel width, 0.26±0.20 bits/word (in significant figures) corresponds by way of equation (5) to a _{1}^{+0.09}_{−0.15} likelihood of the rarer word being correct, in agreement with the _{2}

The implication from 0<〈Δ_{1}_{2}^{+0.09}_{−0.15}, 0.72±0.03), on average.

Our results suggest that the

The results also suggest an extension of DLP as a quantitatively testable hypothesis:

The results call attention to the mathematical nature of philology, and to its connections with information science. They suggest that applications of information theory, particularly statistical aspects developed to high levels of sophistication in cryptography

In the attempt to estimate each word's entropy information as objectively and unambiguously as possible, we treat grammatically justifiable words without regard to inflection, context, and semantic content (meaning); and we calculate entropy information by treating each word's lemma as if it were a symbol. If inflection or association in context were taken into account, it often would be impossible to classify an individual Latin word uniquely as belonging to one and only one symbol, and thus impossible to associate that word uniquely with a definite amount of information. For instance, the noun

Taking all of the inflections of a word like

Although the

The 2,095 textual variants we note in the apparatus criticus for Karl Lachmann's 1850 edition of Lucretius's De Rerum Natura

(0.17 MB PDF)

We thank Bruce Lewenstein, Michael Reeve, Luis Amaral, Enrico Scalas, and anonymous reviewers for their help. Any errors are ours.