Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Long-Range Correlations in Sentence Series from A Story of the Stone

  • Tianguang Yang,

    Affiliation Department of Statistics, School of Mathematical Sciences, Nankai University, Tianjin 300071, P. R. China

  • Changgui Gu,

    Affiliation Department of Systems Science, Business School, University of Shanghai for Science and Technology, Shanghai 200093, P. R. China

  • Huijie Yang

    hjyang@ustc.edu.cn

    Affiliation Department of Systems Science, Business School, University of Shanghai for Science and Technology, Shanghai 200093, P. R. China

Long-Range Correlations in Sentence Series from A Story of the Stone

  • Tianguang Yang, 
  • Changgui Gu, 
  • Huijie Yang
PLOS
x

Abstract

A sentence is the natural unit of language. Patterns embedded in series of sentences can be used to model the formation and evolution of languages, and to solve practical problems such as evaluating linguistic ability. In this paper, we apply de-trended fluctuation analysis to detect long-range correlations embedded in sentence series from A Story of the Stone, one of the greatest masterpieces of Chinese literature. We identified a weak long-range correlation, with a Hurst exponent of 0.575±0.002 up to a scale of 104. We used the structural stability to confirm the behavior of the long-range correlation, and found that different parts of the series had almost identical Hurst exponents. We found that noisy records can lead to false results and conclusions, even if the noise covers a limited proportion of the total records (e.g., less than 1%). Thus, the structural stability test is an essential procedure for confirming the existence of long-range correlations, which has been widely neglected in previous studies. Furthermore, a combination of de-trended fluctuation analysis and diffusion entropy analysis demonstrated that the sentence series was generated by a fractional Brownian motion.

Introduction

Language is what makes us human [1, 2]. Over more than half a century, researchers have found rich patterns embedded in texts. For example, if we rank words in descending order of their frequencies, the frequency generally decreases according to a power-law of the rank, called Zipf’s law [3]. Additionally, there are long-range correlations in the use of words [47]; words reoccur according to a stretched exponential distribution [8, 9]; and network based approaches can expose interesting characteristics of the relationships between words (e.g., small-world properties) [1012]. These findings can shed light on the dynamical mechanisms behind the formation and evolution of languages. They also provide quantitative measures of language characteristics that can be used in various practical problems, such as distinguishing different authors and monitoring the development of linguistic abilities.

Relevant literature has mainly focused on words, but sentences are the natural unit of a text. Sentences supply a context to restrict a word to an exact meaning, and are organized logically into sentence groups, paragraphs, chapters, and entire texts to express ideas at different levels. Rather than syntactical principles, the sequential order of sentences is a result of many interacting factors such as logic, fluency, rhythm, harmony, intonation, and the author’s style. A very recent paper by Drozdz et al. [13] investigated series of sentence lengths and found almost perfect long-range correlations in more than one hundred classical novels from around the world.

It is well known that Chinese language is significantly different to western languages. In Chinese language, an elementary meaning is usually expressed with a single character instead of a word, and sometimes several characters are integrated into a group to express a unique meaning. Several short sentences with different subjects can occur within a single long sentence, if there are logical relationships. This is strictly forbidden in English. At a macroscopic level, Chinese people also have unique habits in terms of how they think. Hence, detecting scaling behaviors embedded in Chinese sentence series is an interesting and nontrivial task, and thus is the focus of this work. The contributions of our work can be summarized as follows.

We used de-trended fluctuation analysis (DFA) [14] to detect long-range correlations in the text of A Story of the Stone, which is one of the four most popular classical Chinese novels. We define the length of a sentence as the number of characters contained in the sentence. There exists a weak long-range correlation up to a scale of 104 (the Hurst exponent is 0.575 ± 0.002). The correlation structure is stable, i.e., different parts of the series have almost the same Hurst exponent. It is widely believed that this novel was written by two authors, i.e., the first to the 80th chapters were written by Xueqin Cao, and the 81th to the 120th chapters were written by E Gao, although there are still some disagreements [15]. However, we found no distinguishable difference in the long-range correlations of these two parts.

The 30th chapter of the novel uses the traditional Chinese punctuation symbol that represents a full stop (a small open circle), whereas the other chapters use the English full stop. This noise was used to test the impact of polluted data on long-range correlations detected using DFA. We found that introducing a small number of bad records (less than 1% of the total records) may result in incorrect conclusions. Therefore, to obtain reliable long-range correlation results we must test the structural stability of a series. To our knowledge, this procedure has been widely omitted in DFA calculations.

Furthermore, we combined DFA and diffusion entropy analysis (DEA) to find that the sentence series followed a fractional Brownian motion.

Materials and Methods

Data

A Story of the Stone is a masterpiece of Chinese vernacular literature and one of the four greatest Chinese classical novels [15]. It is remarkable not only for its huge cast of characters and psychological scope, but also for its precise and detailed observation of life and social structures in the 18th century China. It was written over approximately 10 years from 1749 to 1759. However, it was published anonymously until the 20th century, so there is some debate as to the author contributions. Currently, the 1st to 80th chapters are attributed to Xueqin Cao, and the remaining chapters are attributed to E Gao.

The text can be downloaded from the public repository FigShare with the accession number doi:10.6084/m9.figshare.3759300 (see also S1 File in the Supporting Information). From the text, we identified the positions of full stops (“.”), question marks (“?”), exclamation marks (“!”) and suspension points (“……”). Then, we calculated the increments from the successive positions to form the sentence length series, each element of which is regarded as the length of the corresponding sentence. The sentence length series for the whole text contains 34,759 elements. The two parts attributed to Xueqin Cao and E Gao, hereafter called X-part and E-part, are 23,504 and 11,255 elements long, respectively.

Long-range correlations

Consider the sentence length series, X = {x1, x2, ⋯, xN}, where xn is the length of the nth sentence, and N the total number of sentences. If X is stationary, a long-range power-law correlation implies that the autocorrelation function decays according to a power-law [16], (1) where σ and <X> are the standard deviation and average of X, respectively.

The series X can be described using a random walk with displacements (2) i.e., the nth displacement is the summation of all the first to nth elements in X. This integrated series is called the profile series, denoted by Y. Using the profile series, we can also define a set of displacements in a specified duration τ, Δy(τ) = {y1+τy1, y2+τy2, ⋯, yNyNτ}, and estimate the probability distribution function of Δy, denoted as py). If X has the long-range correlation defined in Eq (1), the profile series will be scale-invariant (self-similar) [17], namely, (3) where δ is the scaling exponent and F(.) a function. There exists a simple relation between δ and α, which is [18, 19]. Hence, the scaling exponent δ can be used to measure long-range correlation behavior in the initial series, X. However, the sentence series (X) is generally non-stationary, so special procedures are required to evaluate the scaling exponent, δ.

De-trended fluctuation analysis.

DFA is a widely used method for detecting long-range correlations embedded in non-stationary time series [14]. We briefly describe it, so that this paper is as self-contained as possible.

First, we extract all possible segments with a predefined length w (window size) from the profile series Y. That is, (4)

Second, we fit each segment with a q order polynomial function, , where ab(n), b = 0, 1, ⋯, q are the fitting parameters for the nth segment, and i = 1, 2, ⋯, w. The fitting curves are taken as estimates of trends in the corresponding segments. Subtracting the trends from the profile segments results in a set of segments of a stationary series, (5)

If there are long-range correlations, the standard deviation, DFAq, will obey a power-law, (6) For completely random, persistent, and anti-persistent series, we will have H = 0.5, H > 0.5, and H < 0.5, respectively.

In our calculations, the order of the polynomial function was q = 2. Because we are interested in scaling behaviors, the scale ranges, for instance, of 100–101, 101–102, 102–103, and 103–104 should have the same contributions when estimating the Hurst exponent. The window sizes were w = [1.2m],m = 1, 2, ⋯, and , where [⋅] is the integer part of a real number. If we plot log[Fq(w)] versus log(w), the points are distributed at almost identical intervals along the log(w) axis. Accordingly, in the least square estimate of the Hurst exponent, all the scales contain almost the same number of points and consequently have identical contributions.

Diffusion Entropy Analysis

Alternatively, the scaling exponent δ can be evaluated using diffusion entropy analysis (DEA) [2022]. Starting from Eq (3), a straightforward computation leads to a simple relation of Shannon entropy versus the scaling exponent. That is, (7) Hence, the slope of this relation is an unbiased estimate of δ if the sentence length series is stationary, which is generally not the case in reality.

We used the central moving average method [2326] to extract the trend in the displacement series, Δy(τ). Let a window with size τ slide along this series. The average value of the covered displacements is regarded as the trend of the central displacement, namely, the th displacement, where [⋅] is the integer part of a real number. Subtracting the trend from the displacement series results in a de-trended displacement series with length N − 2τ + 1.

We divided the interval so that the elements of the de-trended displacement series were distributed into R(t) bins, and calculated the number of de-trended displacements occurring in every bin, denoted as r(τ, j), j = 1, 2, ⋯, R(τ). Then, the probability distribution can be approximated using (8) The Shannon entropy is then estimated using (9) If DE(τ) obeys the relation in Eq (7), the sentence length series is scale-invariant. This method is called diffusion entropy analysis (DEA).

Joint use of DFA and DEA

DEA can produce unbiased evaluation of δ for any kind of process, whereas the Hurst exponent (H) obtained by DFA is dynamical process dependent (this is why we use H instead of δ to represent the calculated scaling exponent in the DFA method) [22]. For a fractional Brownian motion, the estimated value of H is unbiased, namely, H = δ. For a Levy walk process, we have a quantitative relation . For a Levy flight process, the variance diverges and the DFA fails to qualitatively detect the scaling behavior. Hence, a combination of DEA and DFA (i.e., the relationship between H and δ) can provide information on the dynamic mechanism [22, 2731].

Results

In the traditional Chinese punctuation system, a full stop is represented by a small open circle. However, with the increased computer usage, the English language full stop (“.”) has found its way into the Chinese language punctuation system. As a result, the two symbols have become acceptable. In the original downloaded text, the small open circle was used in the 30th chapter, while the rest of the chapters were punctuated with the English full stop (“.”) to signify the end of a sentence. We therefore recognize the symbol “.” as a full stop, and the small open circle as an independent Chinese character. This way, noise was introduced in the sentence series leading to a wrong number of identified sentences in the 30th chapter, and contaminating about 1% of the sentence lengths in the whole novel. Fig 1(a) presents the sentence length series. Most of the sentences are short, while some sentences are tens to hundreds of characters long. As shown in the inset panel of (a), some long sentences are clustered at around the 7850–7950th elements. This clustering comes from polluted records in the 30th chapter.

thumbnail
Fig 1. Statistics of sentence lengths.

(a) Sentence length series. Inset: some sentences with large lengths are clustered together. This cluster is due to the traditional Chinese full stop symbol (small open circle) that was used in the 30th chapter, which produced noisy data. (b) Sentence length follows a right-skewed distribution, which is the log-normal distribution in (c).

https://doi.org/10.1371/journal.pone.0162423.g001

The sentence length obeys a right-skewed distribution as shown by the black solid circles in Fig 1(b). This distribution is log-normal (see the red solid circles in Fig 1(c)), i.e., the logarithm of sentence length is normally distributed. The characteristic length is 17. This distribution can be understood in terms of stochastic models of language [32, 33]. Consider a sentence with length L, and for simplicity assume that every character appears in a certain position of the sentence with identical probability, namely, . The quantity of information contained in the sentence can be measured using Shannon entropy, . The log-normal distribution in Fig 1(c) tells us that the information contained in the sentences is distributed according to a Gaussian function.

Using the noisy series (extracted from the original text), we calculated DFA2 for the total series, the series for the X-part (Chapters01–80), and the series for the E-part (Chapters81–120) as shown in Fig 2. The curves for the entire text and the X-part decrease slightly at large scales (black and red solid circles), but can be regarded as roughly obeying the power-law with Hurst exponents of H = 0.618 ± 0.006 and H = 0.649 ± 0.006, respectively. The E-part follows a perfect power-law relationship with a Hurst exponent of H = 0.55 ± 0.002.

thumbnail
Fig 2. Long-range correlations in the noisy and cleaned series.

Noisy records can result in incorrect estimates of the Hurst exponents (solid circles) and consequently false conclusions, even if they only cover a limited proportion of the total series. The effect of noise can be removed using a cleansing procedure (open circles). The X-and E-part of the text are the first to 80th chapters, and the 81th to the 120th chapters, which are currently attributed to Xueqin Cao and E Gao, respectively.

https://doi.org/10.1371/journal.pone.0162423.g002

This result seems to be perfect, because we can conclude that the whole text, the X-part, and the E-part all have long-range correlations over a wide range (101–104). Furthermore, the X-part has a significantly high Hurst exponent compared with that of the E-part (the difference was ΔH = 0.649 − 0.55 ≈ 0.1). This appears to be because the two parts were written by different authors. Thus, it appears that we can distinguish between the language of the two authors.

But is this true? To answer this question, we used a cleansing procedure to filter out the effects of the noise. We replaced all the small open circles in the 30th chapter with the full stop “.” and re-extracted the sentence series, called cleaned series. As shown in Fig 2, the noise induced deformations were exactly corrected. The Hurst exponents were 0.575, 0.585, and 0.551 for the cleaned total, cleaned X-part, and the E-part (not polluted) respectively. They are almost identical, and the X-part and E-part are no longer distinguishable. So the polluted records may lead to exciting but incorrect conclusions. Hence, one must be careful when using the DFA method to detect long-range correlations, even when using sufficiently long series.

However, using the sentence length series in Fig 1(a), we cannot find significant evidence for the noise. Actually, when we investigate empirical series, there are typically some state changes of the measured objects. For example, the physiological state of a volunteer may change occasionally in a long-term experiment, e.g., walking or sleeping. These occasional changes lead to abrupt changes in the records. But it is very difficult to find evidence of this sort of noise. Determining the correctness of the DFA calculations is non-trivial and cannot be solved using a single result for a series with a specific length. Herein, we will show that a structural stability test may help to confirm the existence of long-range correlations.

We separated the noisy series into 12 non-overlapping segments that were 2881 characters long. The third segment covers the noisy records. For every segment, we calculated the DFA2 as shown in Fig 3(a). All the curves (gray open circles) almost exactly followed a power-law relationship, except for the 3rd segment (red solid circles), which significantly deviated from a straight line. A linear fit resulted in an unreasonably large estimated slope (0.87, see Fig 3(b)).

thumbnail
Fig 3. Structural stability test.

The total polluted series was separated into 12 non-overlapping segments with a length of 2881. (a) All the curves obey almost perfect power-law relationships (gray open circles), except the 3rd segment which deviates significantly (red solid circles). (b) Hurst exponents for the 12 segments. The unreasonable large value of 0.87 for the 3rd noisy segment was corrected to 0.621 (red solid circle) after applying the cleaning procedure. There is a slightly decreasing trend.

https://doi.org/10.1371/journal.pone.0162423.g003

Using the cleaned series, there was a slightly decreasing trend in the scaling exponents. The unreasonable scaling exponent for the 3rd segment reduced to 0.621 (see Fig 3(b)—red solid circle), while the 8th to 12th segments exhibited comparatively smaller values. Since the novel was written over a long period (approximately 10 years), it is difficult to attribute this finding to differences in the two authors’ style or to a style change of a single author. Moreover, since most of the segments portray very similar scaling exponents, we conclude that they are structurally stable thus confirming the existence of long-range correlations in the whole series.

Additionally, the DEA method was used to detect the scaling behaviors embedded in the cleaned total, cleaned X-part, and E-part series. Fig 4 contains plots of diffusion entropy to lnτ and the estimated values of the scaling-invariant exponent. For the total, X-part, and E-part series, the estimates were (H, δ) = (0.575, 0.595), (0.583, 0.593) and (0.551, 0.598), with corresponding values for of 0.541, 0.545 and 0.527, respectively. Hence, we have in all three cases. Because the error interval was [−0.002, +0.002], the results support the conclusion of δ = H. Consequently, the cleansed sentence series was generated by a fractional Brownian motion.

thumbnail
Fig 4. Diffusion entropy analysis of the cleaned total, cleaned X-part, and E-part series.

Estimated values of the scaling exponent for the three series were almost identical.

https://doi.org/10.1371/journal.pone.0162423.g004

To ensure that the scaling behaviors were dependent on the sequential order of the sentences, we shuffled the sentence series and its segments and recalculated the scaling exponents. This resulted in H = 0.5 with confidence intervals less than [−0.003, +0.003] (not shown).

Conclusion

In summary, non-trivial patterns in sentence length series can provide information on the structural behaviors of a text, which reflect characteristics of, for example, the author’s style and logic. This information can be used to construct theoretical models of language formation and evolution, and also be helpful to solve practical problems such as evaluating linguistic ability and distinguishing different authors. Unlike most western languages, the Chinese language differs with respect to its organization at different levels. This work thus focused on a Chinese famous novel, A Story of the Stone, which is attributed to two different authors over a span of ten years.

Weak long-range correlations of the text were found using the DFA and DEA methods. The scaling behavior was perfect over a long scale covering approximately 101–104. The scaling invariance was confirmed using the structural stability test. Although the Hurst exponent slightly decreased with the sequential number of segments of the whole sentence series, we could not attribute this to the difference between two authors or to changes to a single author’s style over ten years.

Results showed that when the DFA method is applied to noisy data, it can produce exciting but false results leading to incorrect conclusions. This was the case even if the noise affects a small proportion of the total series (e.g. 1%) and the series is sufficiently long (in this paper, ∼3 × 104). To confirm the existence of a long-range correlation, we should also apply the structural stability test to determine if different parts of the series have similar scale invariance. This procedure was rarely used in existing publications.

Furthermore, combining DFA and DEA we found that the sentence series of A Story of the Stone was produced by a fractional Brownian motion.

The structural stability test requires powerful new tools to detect scaling behaviors from short time series (see, for example, [3441]). To reach further conclusions on the Chinese language, we will investigate more Chinese novels in the future.

Supporting Information

S1 File. S1_File.txt.

Cleaned text of the A Story of the Stone. The data can also be downloaded from the public repository FigShare, uploaded with a title of A Story of the Stone and an accession number of doi:10.6084/m9.figshare.3759300.

https://doi.org/10.1371/journal.pone.0162423.s001

(TXT)

S2 File. S2_File.m.

Code for de-trended fluctuation analysis.

https://doi.org/10.1371/journal.pone.0162423.s002

(M)

Acknowledgments

The authors thank Stephen Makau Mutua for his polishing the language.

Author Contributions

  1. Conceptualization: HJY.
  2. Data curation: TGY.
  3. Formal analysis: TGY.
  4. Funding acquisition: HJY CGG.
  5. Investigation: TGY.
  6. Methodology: TGY CGG.
  7. Project administration: HJY.
  8. Software: TGY.
  9. Supervision: HJY.
  10. Validation: CGG.
  11. Visualization: TGY HJY.
  12. Writing – original draft: HJY.
  13. Writing – review & editing: HJY CGG.

References

  1. 1. Pinker S. The language instinct. NewYork, HarperCollins; 1994.
  2. 2. Bickerton D. Adam’s tongue: how humans made language, how language made humans. NewYork, Hill and Wang; 2009.
  3. 3. Zipf GK. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge; 1949.
  4. 4. Montemurro MA, Pury PA. Long-range fractal correlations in literary corpora. Fractals 2002 Dec;10:451–461.
  5. 5. Altmann EG, Cristadoro G, Esposti MD. On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci. 2012 Jul;109:11582–11587. pmid:22753514
  6. 6. Ausloos M. Generalized hurst exponent and multifractal function of original and translated texts mapped into frequency and length time series. Phys. Rev. E 2012 Sept;86:031108.
  7. 7. Ausloos M. Measuring complexity with multifractals in texts: Translation effects. Chaos, Solit. Fract. 2012 Nov;45(11):1349–1357.
  8. 8. Laherrere J, Sornette D. Stretched exponential distributions in nature and economy: fat tails with characteristic scales. Eur. Phys. J. B 1998 April;2(4):525–539.
  9. 9. Altmann EG, Pierrehumbert JB, Motter AE. Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. Plos ONE 2009 Nov;4(11):e7678. pmid:19907645
  10. 10. Cancho RF-i, Sole RV. The small world of human language. Proc. Roy. Soc. Lond. B 2001 Nov;268:2261–2265.
  11. 11. Cong J, Liu H. Approaching human language with complex networks. Phys. Life Rev. 2014 Dec;11:598618. And references there-in.
  12. 12. Kulig A, Drozdz S, Kwapien J, Oswiecimka P. Modeling the average shortest-path length in growth of word-adjacency networks. Phys. Rev. E 2015 Mar;91:032810.
  13. 13. Drozdz S, Oswiecimka P, Kulig A, Wkwapien J, Bazarnik K, Grabska-Gradzinska I, et al. Quantifying origin and character of long-range correlations in narrative texts. Information Sciences 2016 Feb;331:32–44.
  14. 14. Peng CK, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL. Mosaic organization of DNA nucleotides. Phys. Rev. E 1994 Feb;49:1685–1689.
  15. 15. Zhou RC. Between Noble and Humble: Cao Xueqin and the Dream of the Red Chamber, edited by Ronald RG and Mark SF. New York:Peter Lang; 2009.
  16. 16. Stanley HE. Introduction to Phase Transitions and Critical Phenomena. Oxford University Press; 1971.
  17. 17. Mandelbrot BB. The Fractal Geometry of Nature. Freeman, San Francisco; 1982.
  18. 18. Buldyrev SV, Goldberger AL, Havlin S, Mantegna RN, Matsa ME, Peng CK, et al. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. Phys. Rev. E 1995 May;51:5084–5091.
  19. 19. Xiao Q, Pan X, Mutua S, Yang Y, Li XL, Yang HJ. Discrete scale-invariance in cross-correlations between time series. Physica A 2015 Mar;421:161–170.
  20. 20. Scafetta N, Hamilton P, Grigolini P. The thermodynamics of social processes: The teen birth phenomenon. Fractals 2001 Jun;9(2):193–208.
  21. 21. Grigolini P, Palatella L, Raffaelli G. Asymmetric anomalous diffusion: an efficient way to detect memory in time series. Fractals 2001 Dec;9:439–449.
  22. 22. Scafetta N, Grigolini P. Scaling detection in time series: Diffusion entropy analysis. Phys. Rev. E 2002 Sept;66:036130.
  23. 23. Alessio E, Carbone A, Castelli G, Frappietro V. Second-order moving average and scaling of stochastic time series. Eur. Phys. J. B 2002 May;27:197–200.
  24. 24. Xu L, Ivanov PCh, Hu K, Carbone A, Stanley HE. Quantifying signals with power-law correlations: A comparative study of detrended fluctuation analysis and detrended moving average techniques. Phys. Rev. E 2005 May;71:051101.
  25. 25. Jiang ZQ, Zhou WX. Multifractal detrending moving-avarage cross-correlation analysis. Phys. Rev. E 2011 Jul;84:016106.
  26. 26. Gao ZK, Fang PC, Ding MS, Jin ND. Multivariate weighted complex network analysis for characterizing nonlinear dynamic behavior in two-phase flow. Exp. Therm. & Flu. Sci. 2015 Jan;60:157–164.
  27. 27. Scafetta N, West BJ. Multiscaling Comparative Analysis of Time Series and a Discussion on Earthquake Conversations in California. Phys. Rev. Lett. 2004 April;92:138501. pmid:15089646
  28. 28. Scafetta N, West BJ. Solar flare intermittency and the Earth’s temperature anomalies. Phys. Rev. Lett. 2003 Jun;90:248701. pmid:12857233
  29. 29. Yang HJ, Zhao FC, Qi LY, Hu BL. Temporal series analysis approach to spectra of complex networks. Phys. Rev. E 2004 Jun;69:066104.
  30. 30. Scafetta N. Diffusion Entropy Analysis of Time Series: Theory, concepts, applications and computer codes for studying fractal noises and Levy walk signals. VDM Verlag Dr. Mller; Jul 2010.
  31. 31. Gao ZK, Yang YX, Zhai LS, Ding MS, Jin ND. Characterizing slug to churn flow transition by using multivariate pseudo Wigner distribution and multivariate multiscale entropy. Chem. Engi. J. 2016 May;291:74–81.
  32. 32. Russell DG, and Quentin DA. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 2003 Nov;426:435–439.
  33. 33. Peter F, and Alfred T. Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European. Proc. Natl. Acad. Sci. USA 2003 Jul;100:9079–9084.
  34. 34. Qi JC, and Yang HJ. Hurst exponents for short time series. Phys. Rev. E 2011 Dec;84:066114.
  35. 35. Zhang WQ, Qiu L, Xiao Q, Yang HJ, Zhang QJ, and Wang JY. Evaluation of scale invariance in physiological signals by means of balanced estimation of diffusion entropy. Phys. Rev. E 2012 Nov;86:056107.
  36. 36. Pan X, Hou L, Stephen M, Yang HJ, and Zhu CP. Evaluation of scaling invariance embedded in short time series. Plos ONE 2014 Dec;9(12):e116128. pmid:25549356
  37. 37. Pan X, Hou L, Stephen M, and Yang HJ. Long-term memories in online users’ selection activities. Phys. Lett. A 2014 Jul;378(35):2591–2596.
  38. 38. Gao ZK, Jin ND. A directed weighted complex network for characterizing chaotic dynamics from time series. Nonlinear Analysis: Real World Appllications 2012,April;13:947–952.
  39. 39. Gao ZK, Yang YX, Fang PC, Jin ND, Xia CY, Hu LD. Multi-frequency complex network from time series for uncovering oil-water flow structure. Sci. Rept. 2015 Feb;5:8222.
  40. 40. Mutua S, Gu CG, Yang HJ. Visibility Graph Based Time Series Analysis. Plos ONE 2015 Nov;10(11):e0143015.
  41. 41. Mutua S, Gu CG, Yang HJ. Visibility Graphlet Approach to Chaotic Time Series. Chaos 2016 May;26(5):053107. pmid:27249947