Predicting Achievable Fundamental Frequency Ranges in Vocalization Across Species

Vocal folds are used as sound sources in various species, but it is unknown how vocal fold morphologies are optimized for different acoustic objectives. Here we identify two main variables affecting range of vocal fold vibration frequency, namely vocal fold elongation and tissue fiber stress. A simple vibrating string model is used to predict fundamental frequency ranges across species of different vocal fold sizes. While average fundamental frequency is predominantly determined by vocal fold length (larynx size), range of fundamental frequency is facilitated by (1) laryngeal muscles that control elongation and by (2) nonlinearity in tissue fiber tension. One adaptation that would increase fundamental frequency range is greater freedom in joint rotation or gliding of two cartilages (thyroid and cricoid), so that vocal fold length change is maximized. Alternatively, tissue layers can develop to bear a disproportionate fiber tension (i.e., a ligament with high density collagen fibers), increasing the fundamental frequency range and thereby vocal versatility. The range of fundamental frequency across species is thus not simply one-dimensional, but can be conceptualized as the dependent variable in a multi-dimensional morphospace. In humans, this could allow for variations that could be clinically important for voice therapy and vocal fold repair. Alternative solutions could also have importance in vocal training for singing and other highly-skilled vocalizations.

glide cartilages in the larynx. The second is a molecular composition factor-how much collagen density can be produced in the vocal cord ligament. Development and evolution has not been uniform with regard to these factors, suggesting that alternative choices are available for growth, training, and repair.

Introduction
A biological trait is usually the result of a trade-off between different selective forces and constraints [1]. Vocal behavior is no exception, and one important set of constraints is related to the mechanism of sound production. In order to understand the design of vocal organs (larynx and syrinx in vertebrates), investigators have often focused on size as the primary determining factor of fundamental frequency and acoustic power produced by a sound source. In fact, a number of size-dependent factors are responsible for the observation that species of larger body sizes tend to produce lower frequencies [2], [3], yet some observations cannot be explained by vocal fold size alone. First, the relation between fundamental frequency (f o ) and body size appears uncoupled within some species [4], [5], [6]. Considering that vocal fold size remains closely linked to body size, other mechanisms must facilitate the f o variations. Second, vocal fold morphology in the mammalian larynx [7], [8] and labial morphology in the avian syrinx [9] vary greatly within and among species. Mechanical properties, a direct consequence of morphological design, also show a large variation and contribute to vocal differences within and between species [10], [11], [12], [8]. Third, the exceptionally large f o range that some species rely on to generate large vocal versatility cannot be explained by size [11]. Here we present predictions from vibrating string theory that offer an explanation for why a larger than expected range of f o can be achievable in large and small species.
If all species had the same tissue construct and the same ability to strain the vocal folds, then a vibrating string model would predict a larger f o range (in Hz) for smaller animals, as will be shown. However, if the range is expressed in high/low ratios, or octaves, the range is normalized across species. It will be shown that, additionally, there is a large variation in this high/low ratio prediction because material properties are not the same and the ability to strain vocal fold tissues is also not the same.
The mechanism for achievement of a large f o range in animals stands in stark contrast to the design of man-made musical string instruments, which utilize multiple strings to cover a wide pitch range. Violins have four strings, classical guitars have six, and pianos have eighty-eight, which are either single, doubled, or tripled. With these multiple strings, violins and guitars can produce on the order of 4-5 octaves of pitch range and a piano can produce a little over 7 octaves. Vocalizations in mammals [13], [14], [15], [16], [17], [18], [19] are generated by airflowinduced vibrations of vocal folds or labia, respectively. Humans, other mammals, and birds can produce 3 octaves, and in some cases 4-5 octaves, with a single pair of vocal folds in the larynx or labia in the syrinx. Vocal folds are basically the equivalent of one double string. What are the properties of these folds or labia that produce such versatile biological "strings"? We show here that geometry plays a role, but the dominant factor is the molecular structure of laminated tissue that can generate orders of magnitude variation in fiber tension.
The morphology of vibrating vocal fold tissue in the larynx is sufficiently complex that voice scientists and clinicians have debated for decades whether "vocal fold" or "vocal cord" is the best descriptor. Prior to Hirano's [7] pioneering work, the term vocal cord was most prevalent, but it was understood that only the vocal ligament, a portion of the entire tissue construct, was cord-like. In human speech, the ligament is not under much tension, making the entire system fold-like in the sense that the superior portion folds over the inferior portion in vibration. Simple mechanical models have been of the mass-spring type to represent folding tissue, [20] but a vibrating string model was also introduced [21], [22].
The conceptualization of a fiber-gel construct, not claimed here to be novel, embraces both the fold and the string construct (Fig 1). The ground substance is a viscoelastic continuum in the form of a homogenous, isotropic gel, similar to the vitreous humor in the eye. With the inclusion of directional fibers in multiple layers (collagen and elastin in the lamina propria and muscle fibers in the thyroarytenoid muscle), the construct develops into an adult human vocal fold. The development is gradual, however, and is likely influenced by vocal demand. At birth, the vocal fold consists of a single layer of ground substance (gel) with sparse fibers randomly oriented [23]. Through childhood and puberty, the gel develops into multiple morphological layers of tissue [26]. The superficial layer of the vocal fold lamina propria remains mostly ground substance (gel-like), whereas the intermediate and deep layers develop into elastin and collagen fibers aligned in a ventral-dorsal direction [24], [25], [26]. The fibers originate and insert on cartilages in the larynx (not shown) that can be moved by laryngeal muscles. The moving boundaries apply variable tension to the fibers.

Methods
The main difference between multiple parallel strings on a violin and multiple "strings" embedded in vocal fold ground substance is the amount of mechanical coupling between the strings. The fibers cannot vibrate independently. There are cross-links in the form of an elastic matrix and there are proteoglycans and glycoproteins that fill the spaces in the form of a viscous liquid, leaving no air spaces between any of them. Such a viscoelastic medium, i.e., a laminated fiber-gel system, is subject to the laws of continuum mechanics. However, when the fibers of one layer are under considerable tension, the layer can be considered a "thick string vibrating in a viscous soup." The string modes of vibration then dominate over the gel modes of vibration [27]. Here we consider such a simplified string model to be appropriate because range of fundamental frequency is largely determined by the fiber component. Small variations near the lower bound of f o are determined by the combined viscoelastic properties of the gel and the fibers, but these variations contribute to a small part of the total range of normal mode frequencies [27].

How is vibration frequency controlled with tissue fibers?
In a string fixed at both ends and under tension, the fundamental frequency of the dominant mode of vibration is where L is the length of the string, μ 0 is the combined shear and tensile stress for vibrational displacement transverse to the string, and ρ is the tissue density. Density is a constant in soft tissue (about 1.04 g/cm 3 ), which leaves control of f o for any fibrous layer to L and μ 0 . In manmade string instruments, length is either held constant (e.g., piano) or varied with finger position (violin or guitar). In vocal folds, length can only be varied by moving boundary cartilages, which means that individual layers cannot be lengthened or shortened independently. Thus, with one common elongation, fiber stress μ 0 becomes the critical variable for f o control between layers. Based on Eq (1), the total variation in f o can be written as which after partial differentiation yields the expression Here we see that an absolute frequency range Δf o (in Hz) varies directly with f o . If the terms in brackets were equal across species, smaller species with higher mean f o would have larger changes in f o . The above expression also shows that a positive change in fiber stress Δμ 0 /μ 0 must overcome the negative change in strain ΔL/L if a positive change Δ f o is to occur.
Non-muscular tissue layers, known as the lamina propria in the vocal folds, can experience an increase in μ 0 only with an increase in length. The length-tension curve must be highly nonlinear for a large f o range. The degree of nonlinearity is related directly to the desired f o range. Stress-strain curves of the vocal ligament are typically exponential [28], [10], [12], of the form where A and B are empirically-determined constants, L is an arbitrary length, and L o is a reference length. According to Eq 1, two fundamental frequencies are related as Note that for B = 0 (constant fiber stress at all lengths), the fundamental frequency ratio is inversely related to vocal fold length ratio. This is the general size principle. The larger the animal, the longer the vocal folds and the lower the frequency if stress is kept constant. The reference length L o is generally taken as the in situ cadaveric length for measurement purposes. From this reference length, the length for phonation can be increased and decreased on the order of ± 50%, but typically more like ± 30%, as will be shown later. Fig 2 shows two contrasting cases of how the same f o range can be produced. In Fig 2(A) the stress-strain curve is steep, with a large B value, and the elongation is small. In Fig 2(B) the stress-strain curve is shallow, with a small B value, but the elongation is large. Anatomically and physiologically, the trade-off is between range of motion between cartilages versus fiber tension in the vocal folds. . Thus, frequency range hinges on two variables, ability to change vocal fold length and nonlinearity of the dominant fiber stress-strain curve. Some data will now be given from various species.

Measurements from human larynges
Vocal fold length change with f o has been quantified in several investigations. [29] used stereo videoscopy to measure the membranous vocal fold length during phonation in 4 female and 3 male human subjects. For the males, L 1 averaged 0.77 cm and L 2 averaged 1.3 cm, such that L 2 / L 1 was 1.7. For the females, L 1 averaged 0.71 cm and L 2 averaged 1.1 cm, such that L 2 /L 1 was 1.5. The fundamental frequency ranged on the order of 100-500 Hz for the males and 130-800 Hz for the females. Thus, a 2 ½ octave f o range was achieved with the L 2 /L 1 ratios of 1.7 for males and the L 2 /L 1 ratio of 1.5 for females. Fig 1 would predict values  A more recent study by Cho et al. [30] on vocal fold length change in humans used an ultrasonic imaging technique to follow anterior and posterior landmarks on the vocal folds. Results showed that L 1 = 1.47 cm and L 2 = 2.0 cm for males for low and high pitch, with L 2 /L 1 = 1.4. For females, L 1 = 1.14 and L 2 = 1.65 also yielded an L 2 /L 1 ratio of 1.4. This is a little less than the ratios of 1.7 for males and 1.5 for females reported by [29]. The small discrepancy is probably related to a smaller f o range in the Cho et al. study, but unfortunately the f o ranges were not reported.

Measurements from non-human species
Measurements for length change versus fundamental frequency are also available from studies using excised larynges [30], [31], [32]. For example, excised domestic dog (Canis familiaris) larynges were vibrated on a laboratory bench with an artificial air supply [31]. Self-sustained vocal fold oscillation was achievable from L 1 = 0.5 cm to L 2 = 1.2 cm, but these lengths were produced mechanically rather than by muscle control. The corresponding f o range was 50 to 230 Hz, somewhat greater than 2 octaves. For this large L 2 /L 1 ratio of 2.4, the value of B from Fig 3 would be predicted to be about 6.0. Some dogs do not have a vocal ligament, but measurements on canine mucosa produced a value of B = 4.4 and measurements on canine thyroarytenoid muscle fibers yielded a value of B = 6.5 [27]. Given that the thyroarytenoid muscle is the fibrous layer in vibration, it would dominate the f o range. The predicted and measured value of B are therefore in agreement. Table 1 shows measured stress-strain relations for various mammalian species, [8], [10][11][12], [33][34][35][36][37][38][39][40][41][42]. In some cases, the frequency ranges are shown. Note that the rhesus monkey has an Table 1. Raw data of body mass, vocal fold length (L 0 ), stress-strain relationship for vocal fold tissue, and average fundamental frequency range. Vocal fold lengths were measured in specimen available to us (unpublished data), except for the African elephant. The variable ε = (L-L 0 ) / L 0 .   Table 1 (mouse to giraffe, in some cases both male and female). Note the general increase in L o with size, plotted logarithmically with the regression line

Species
The regression is a very tight fit over a length range of 1-40 mm and a body mass range of .05-1000 kg, reinforcing the earlier claim that vocal fold length and body mass are tightly related. It is clear, however, that much greater variability is associated with these frequency trends, suggesting that factors other than body mass play a role in fundamental frequency prediction. Combining Fig 4(A) and 4  than a difference f o2 -f o1, is essentially a constant. This is a strong validation of the simple vibrating string model (Eq 5). Taking the ratio of Eq (7) to Eq (8) yields the number 12.0, which constitutes about 3.5 octaves as an average across species.
Empirical data for exponent B versus L o are shown in Fig 4(D). Omitting the one outlier (the rhesus monkey), a mild trend is quantified by the relation B ¼ 6:285 þ 0:0468 L o r 2 ¼ 0:15 p < 0:3 rhesus monkey excluded ð9Þ However, when the outlier is included, the trend disappears. Thus, with the sparsity of data available across species, it is not possible to assert whether or not there is an increase in B with longer vocal folds. What is important to note, however, is the large variation in B across species. Since B is an exponent, a range of 3-15 leads to orders-of-magnitude variations in frequency range.
With this empirical relation between B and L o , a better f o range prediction can be made with Eq 5. If we continue to assume that L 1 = 0.7 L o , as in humans, then Fig 5 shows a contour plot of the f o2 /f o1 range achievable in octaves. The two morphological variables are B on the vertical axis and L 2 /L 1 on the horizontal axis. The figure shows that a greater f o range is attainable with either greater B or greater L 2 /L 1 . The empirical B values allow some species to be identified on the figure. The greater the B value, the smaller L 2 /L 1 needs to be to achieve a large f o range. Conversely, the larger L 2 /L 1 is, the smaller B needs to be to achieve a large f o range. For example, the male rhesus monkey requires only an L 2 /L 1 ratio of 1.6 for a 4-octave range. Humans, lions, and tigers require an L 2 /L 1 ratio of about 2.2 for a 4-octave range. For animals that scream or roar, a larger B value may be a protective requirement for greater vibrational amplitude and vocal fold collision. Contour plot of predicted fundamental frequency range (high/low, f o2 /f o1 ratio) for morphological variables B and L 2 /L 1 . The range depends on two important factors: the rotational flexibility of the laryngeal framework, which facilitates L 2 /L 1 ; and the B value that quantifies the tissue stress response to elongation. For a given B value, a larger fundamental frequency range can be achieved with greater rotational flexibility. For a given L 2 /L 1 ratio, a larger frequency range can be achieved with a greater B value. Note that the changes in the B value are not large to achieve a larger frequency range for a given a given L 2 /L 1 ratio.

Results and Discussion
A simple theory of the range of fundamental frequency f o achievable in various species has been proposed. Laryngeal size, and specifically vocal fold length, is a good predictor of mean f o , but a poor predictor of f o range. When vocal fold tissues become layered and tissue fibers assume a ventral-dorsal direction, the layer with the densest and stiffest fiber composition produces string-like vibration and determines the f o range. This can be a vocal ligament or a layer of muscle fibers. The stress-strain curve of the fibrous layer must be highly nonlinear to overcome the natural tendency for f o to decrease with increased length. For an exponential stress increase with a factor e Bε , where ε is the strain (fractional length change) and B is a stiffness constant, a range of values 5 < B < 15 can produce a 4-5 octave f o range with greater or lesser length change. If the laryngeal framework mechanics allows a large length change, on the order of ± 50% from the resting length, B values on the order of 5-10 can produce the 4-5 octave range. If the larynx is restricted in its range of motion such that only a ± 20% length change is possible, a value of B on the order of 10-15 is necessary to obtain a 4-5 octave range. A laryngeal adaptation for greater length change is greater rotation or gliding between cartilages that anchor the ends of the vocal folds. Alternatively, a tissue layer that can bear a greater tension (i.e., a ligament with high density collagen fibers) can also increase the fundamental frequency range and thereby allow vocal versatility. As a consequence, fundamental frequency can become uncoupled from size. Two large frequency ranges produced by two species can overlap even if the two have dramatically different body sizes.
The proposed framework for fundamental frequency range regulation has three important implications. First, voice production is an example of "many-to-one" mapping, which occurs when the functional property of interest depends on more than one underlying morphologic parameter [43]. In the cases of voice fundamental frequency, the parameters include laryngeal framework mechanics and all variables affecting the B value, i.e. the number and depth of vocal fold tissue layers, vocal fold boundary geometry, and tissue fiber stress. Consequently there are surfaces in a morphospace that represent functionally neutral variations, which means that morphological diversity between vocal folds of different species is not necessarily indicative of functional diversity. The evolution of vibrating tissue design in laryngeal or syringeal sound sources may lead to different morphologies that function similarly. For example, multilayered characteristics have been described in vocal folds of different mammals [8], [12], [44] as well as in alligators [45] and even within the oscillating tissue masses ("labia") in the avian vocal organ, the syrinx [9]. Laryngeal design across mammals is morphologically distinct in each species but fundamental frequency remains overlapping. Findings in excised mammalian larynges [32] or the excised avian syrinx [19] suggest that multiple activation patterns of intrinsic muscles of the larynx and syrinx, respectively, produce a redundant output, i.e. they can facilitate similar vocal frequencies. In a complex laryngeal or syringeal cartilaginous framework, different muscle activations generate different tension settings of the oscillating tissue, yet in combination with the appropriate driving pressure, the soft tissue can vibrate at identical rates [46].
Our findings have a second, more practical implication as it pertains to the treatment of human voice disorders. The observation that multiple vocal fold morphologies can serve the same function, i.e. produce the same fundamental frequency, can be informative for surgical treatment of impaired vocal folds. Surgery to remove vocal fold lesions often results in irreparable loss of normal vibratory mucosa [47]. Restoration of normal human vocal fold morphology may not be feasible in many cases because the deficits are large. Our proposal that fundamental frequency range can be regulated through two distinct mechanisms, and its broader implication that multiple vocal fold morphologies can achieve the same vocal output, suggest vocal function may be restored with alternative strategies. Examples of alternative morphologies already exist in laryngeal surgery in which non-laryngeal tissue is used to restore voice production [48], [49], [50]. However, the concept of alternative morphologies as viable solutions has not been considered systematically in vocal fold repair and deserves further exploration. Computer simulation of voice production can provide the means for intelligent exploration of the vocal fold morphospace to search for viable alternatives. Simulations based on finite-element and finite-difference approaches have been reported over the past two decades [51], [52], [53], [54].
A single simulation produces one set of acoustic output variables given a defined input vocal fold morphology at a fixed subglottal pressure. A meaningful comparison between two different vocal fold morphologies should entail a range of possible acoustic outputs, given a clinically relevant range of subglottal pressures as well as a range of physiologic variations in the vocal fold morphologies. Such a comparison would entail thousands of simulation runs to fully cover the range of inputs. One approach to reduce the computational cost and to increase the efficiency of morphospace exploration is to combine a finite element model (FEM) voice simulation with multiobjective optimization [55]. This approach has been applied to vocal fold surgery simulation, in which the functional viabilities of two alternative vocal fold morphologies were demonstrated in silico [56].
Finally, the current findings relate well to vocal development and vocal training. If the density of collagen fibers in the vocal ligament is increased by exercise (frequent stretching), a speaker or a singer can increase the fundamental frequency range even if the laryngeal framework cannot be altered much due to tight spaces between cartilages. On the other hand, laryngeal massage and framework exercise could widen the spaces, allowing greater f o range with existing molecular constructs. It appears that the development of a theory for fundamental frequency range regulation based on comparative data across species in nature is paramount to understanding possible intervention strategies for improving human communication.