Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy

  • Michael Döllinger ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Software, Supervision, Writing – original draft, Writing – review & editing

    Michael.doellinger@uk-erlangen.de

    Affiliation Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

  • Pablo Gómez,

    Roles Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – review & editing

    Affiliation Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

  • Rita R. Patel,

    Roles Conceptualization, Investigation, Resources, Writing – original draft, Writing – review & editing

    Affiliation Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America

  • Christoph Alexiou,

    Roles Funding acquisition, Investigation, Resources, Writing – review & editing

    Affiliation Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

  • Christopher Bohr,

    Roles Investigation, Resources, Validation, Visualization, Writing – review & editing

    Affiliation Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

  • Anne Schützenberger

    Roles Conceptualization, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

Abstract

Motivation

Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters.

Methods

The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder–Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed.

Results and conclusion

The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups.

Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.

Introduction

The human voice represents an essential aspect of oral communication between human beings. Voice is formed by the interaction and coordination of applied air flow, vocal fold tissue and vocal fold movements. Accurate and precise physiologic interaction of several laryngeal muscle group movements is the basis for normal voice production [1]. The acoustic voice signal originates in the larynx where the two opposing vocal folds are excited by an airflow generated by the lungs (Fig 1). When starting the voice production process (i.e., phonation), the vocal folds are positioned close to each other. Airflow produced from the lungs streams upwards and increases the subglottal pressure below the vocal folds. After exceeding a certain subglottal pressure level, the vocal folds first start to produce small oscillatory motions that then result in a steady-state oscillation (i.e., periodic opening and closing of the vocal folds). A healthy voice signal is normally produced by periodic and symmetric vocal fold oscillations. Also, the closure of the glottis, where the vocal folds almost or entirely close, is considered to be an important part of the normal phonation process [2].

thumbnail
Fig 1.

(A) Sketch of the sagittal section of the head and neck, indicating the vocal folds and the rigid endoscope. (B) Glottis (dark) and on the left and right sides the two vocal folds as seen through the HSV. (C) Larynx with highlighted acoustic sound sources (arrows) resulting from the vocal fold vibrations.

https://doi.org/10.1371/journal.pone.0187486.g001

Depending on gender and age, vocal folds oscillate between approximately 100 and 350 times per second during normal phonation [3]: women (~ 200 Hz– 250 Hz), men (~ 100 Hz– 150 Hz), children (~ 200 Hz– 350 Hz). As these movements are so fast, vocal fold dynamics are best captured and visualized by laryngeal high-speed videoendoscopy (HSV) with recording frame rates between 4000 Hz and 20 000 Hz [48] (Fig 1A).

Since HSV was first introduced, image processing methods have been proposed [911] to allow the quantification of vibratory behavior with objectively computed perturbation measures [1214]. The signals extracted from HSV and analyzed are either the glottal area waveform (i.e., glottis area function over time) or the vocal fold trajectories at a specific vocal fold location, preferably at mid-membranous position [1517]. Both signals represent the oscillatory behavior of the vocal folds; i.e., the opening and closing process. Quantitative analysis based on HSV has added substantial knowledge regarding normal and pathological vocal fold dynamics [4,18,19].

Vocal fold dynamics are highly sensitive towards anatomic tissue changes [2], dysfunctions of the involved muscles [20] and subglottal air pressure [21]. Alterations of these parameters may yield disturbed dynamics, resulting in hoarseness. Typically disturbed dynamics are left–right asymmetries, aperiodicities or glottis closure insufficiency, where the vocal folds do not entirely close [2]. It is highly desirable to early diagnose and quantify pathologic dynamic laryngeal alterations to prevent severe laryngeal tissue damage [22,23].

However, because of the limited space in the larynx, it is difficult to measure and quantify the mechanical laryngeal tissue characteristics directly in-vivo [24]. Additionally, HSV evaluation alone only allows vocal fold dynamics to be described but does not give quantitative information on biomechanical parameters such as tissue elasticity and occurring subglottal air pressure. Hence indirect analysis methods based on the adaptation of numeric biomechanical larynx models towards in-vivo HSV recorded vocal fold dynamics were suggested [25,26].

Initially, these biomechanical models were used to simulate the underlying processes during phonation. Models were developed to allow the investigation of parameter effects such as applied subglottal air pressure, vibrating masses, tissue stiffness and elongation characteristics with respect to the dynamic vocal fold behavior [2729]. These so-called lumped mass models (LMMs) are fairly simple but still enable many dynamic characteristics in the larynx to be reproduced [30]. In the most basic models, the vocal folds are simulated by a self-vibrating source consisting of two spring-coupled masses (2MMs) for each vocal fold (Fig 2A) [31]. Since these 2MMs allowed the simulation of only one trajectory at one vocal fold position, models with more masses (multi-mass models, MMMs) were suggested to permit the simultaneous simulation of the vocal fold dynamics at different positions (Fig 2B) [32]. However, the 2MMs and MMMs only focused on the simulation of the lateral (i.e., horizontal) vocal fold displacements and on the phase differences along the inferior-superior plane. To simulate the often neglected vertical vocal fold movements [33] (i.e., vertical tissue displacement in inferior-superior direction), enhanced and more complex three-dimensional LMMs were introduced [34] (Fig 2C).

thumbnail
Fig 2.

(A) Two-mass model as used in this work with indicated subglottal pressure Ps. (B) Six-mass model allowing the simulation of the vocal fold trajectories along three positions, i.e. at posterior, medial and anterior positions of the vocal folds. (C) Three-dimensional multi-mass model that additionally allows the simulation of vertical dynamics and the vocal fold medial surface at 25 positions along each vocal fold.

https://doi.org/10.1371/journal.pone.0187486.g002

After the development of the LMMs to simulate human vocal fold dynamics by manually adapting the model parameter settings [28,35], the automatic optimization of these numerical models towards HSV recorded vocal fold oscillations was suggested in order to acquire information on parameters responsible for chaotic behavior, certain dynamic conditions and left–right asymmetric oscillations.

The first fully automatic optimization method was realized by using the Nelder–Mead algorithm to automatically optimize the parameters of a 2MM to reproduce HSV recorded human vocal fold dynamics during sustained phonation [25,26]. Three parameters (vibrating mass, stiffness and subglottal pressure) were varied in the so called cost function Γ to minimize the involved periodic oscillatory components, represented by discrete Fourier transformation (DFT) coefficients, between the model trajectories and trajectories extracted from HSV recordings. Based on this work, the 2MM and a genetic algorithm for the optimization were successfully employed to reproduce the trajectories of patients suffering from unilateral vocal fold paralysis [36] and for ex-vivo larynx experiments [37]. Combining a genetic algorithm with a quasi-Newton method a 2MM successfully reproduced vocal fold dynamics in three subjects [38]. As cost function the Euclidean distance of the glottal area waveform was chosen. Lately, statistical methods like a non-stationary Bayesian estimation approach were suggested for optimizing 2MMs but was only tested on theoretical (i.e., simulated) vocal fold oscillations [39]. Further, a time-dependent 2MM was successfully adapted for 20 healthy and pathological voices [40]. For the optimization an adaptive Simulated Annealing approach was chosen to minimize the Euclidean distance between the model and experimental vocal fold trajectories.

Schwarz et al (2008) [41] successfully optimized a MMM with six coupled masses (6-MMM) towards six disordered voices and two normal voices. They applied a Genetic algorithm and split the optimization process into several sub-steps. As cost function Γ, they used a combination of glottal area and vocal fold trajectory consistency measure. Also, MMMs were used to reproduce vocal fold dynamics within pitch rise paradigms [42]. Wavelet coefficients in the cost function were chosen to consider the time dependency of the system. Powell’s Direction Set optimization algorithm was applied to the interval divided (i.e., four oscillation cycles) signal. Thereby, 30 healthy and pathological adult voices were adapted.

Simulating the phonatory process after total laryngectomy, due to cancer, was achieved through coupling eight two-mass models arranged in a circle [43]. For the optimization they selected a combination of Simulated Annealing for a preliminary global search and Powell’s Direction Set method for the final local approximation. As cost function a combination of area difference, intersection and distance measures was chosen. The method was tested on 75 synthetic data sets and four human subjects.

Finally, a three-dimensional model (3DM) was developed and applied to ex-vivo human vocal fold dynamics produced in a hemi-larynx setup [44]. This model allowed the simulation of the entire vocal fold surface from inferior to superior including the vertical dynamics [44]. Owing to the increased number of masses in the 3DM (i.e., 25 on each side) and the rather unfortunate topology of the cost function (i.e., many local minima), the optimization was divided into several coarse steps, followed by fine optimization processes. It started with a global optimization combining Particle Swarm Optimization and Simulated Annealing algorithms. The local optimization was achieved by Powell’s direction set method [45]. By adapting this model to ex-vivo human larynx dynamics, information was obtained on the distribution of vibrating masses and stiffness along the vocal fold surface. The optimization results accurately matched the actual fundamental frequencies and also the experimental measured subglottal pressure values [44].

Overviews of current applied LMMs can be found in [46,47]. Owing to the increase in computational power, more accurate and therefore more computationally intensive so called Finite Element Methods (FEM) and Finite Volume Methods (FVM) simulating 2D and 3D laryngeal dynamics and airflow became popular during the last decade [31,4850]. However, the complexity of these models and therefore the computational costs do not allow them to be used for optimization purposes yet.

As described above, numerical optimization of LMMs has so far focused only on healthy adults and compared model parameters for healthy vs. disordered phonation processes.

To the best of our knowledge no studies have applied LMMs to analyze gender specific differences. Previous ex-vivo and in-vivo studies on human larynges showed the following, partly contradictory, results: Regarding vocal fold stiffness, an ex-vivo study reported, at medial position slightly above the vocal fold edge, smaller stiffness for an elderly women compared to an elderly male [51]. In contrast, it was suggested that vocal folds in men are less stiff than in women [52]; unfortunately the authors did not report the age of the subjects. Gender related subglottal pressure differences were reported for young adults [53]; another study, for example showed no gender specific differences [54, 55]. However, male vocal folds are known to be larger (i.e., increased mass) than female vocal folds [56].

Further, to the best of our knowledge no studies have compared biomechanical LMM parameters between younger and elderly subjects. Previous ex-vivo and in-vivo studies on human larynges reported the following results: Glottal parameters extracted from HSV recordings were successfully applied to differentiate age groups; however no details on the parameters quantities were presented [57]. Age-related morphological changes of the vocal folds influencing the viscoelasticity have been described [5861]. Previous studies suggested lower stiffness for younger males compared to elderly males [51,62]. Histologic analysis reported reduced lamina propria thickness and reduced epithelial cell density for elderly subjects [63]. However, an increase in vocal fold volume (i.e., mass) for 28.7% of analyzed elderly women was also reported [62]. Furthermore, a thickening of mucosa and vocal fold cover in elderly women was described [64]. Higher subglottal pressure in elderly males compared to younger male subjects was suggested [65]. However, other studies reported no differences [66,67]. It was also suggested that aging effects on phonatory behaviors differ in degree and kind for men and women [65]. In summary, a better understanding of the elderly voice and accompanying voice disorders is desired [66,68].

Since it is also not obvious, which optimization algorithms and cost functions are the most appropriate ones, we have decided to apply different approaches in our study. Therefore, by optimizing a LMM towards vocal fold oscillations recorded by endoscopic high-speed imaging, the aims of our study are:

  1. to analyze the performance of three different optimization algorithms and three different cost functions (Γ1, Γ2, Γ3) to automatically adapt a fairly simple LMM (i.e., in our study 2MM) to vocal fold dynamics.
  2. to quantify gender related biomechanical differences in the larynx expressed by 2MM parameters. We hypothesize that healthy young males would have greater masses, lower stiffness and less subglottal pressure compared to healthy young females.
  3. to quantify age related biomechanical differences in the larynx expressed by 2MM parameters. We hypothesize that healthy young adults would have greater vocal fold masses, lower vocal fold stiffness and lower subglottal pressure compared to elderly atrophic subjects. Also, it is expected to find higher kinematic asymmetry in the elderly subjects as reported previously [69,70].

By investigating these objectives, we want to illustrate and emphasize the informative value of LMMs towards vocal fold physiology by analyzing interrelations between underlying laryngeal components such as vibrating mass, stiffness and applied subglottal pressure. Further, we want to illustrate the potential for the differentiation of vocal fold vibratory characteristics based on LMMs.

Methods

Subjects

Four groups of subjects were investigated. Two younger age groups were analyzed with 16 healthy females (18–24 years) and 17 healthy males (19–38 years). The subjects were recruited at the FAU-Erlangen-Nürnberg and University of Kentucky. The subjects had to fulfill the following criteria: negative history of vocal pathology, not a professional vocal user and having a normal voice, as confirmed by a speech–language pathologist. Adults with history of smoking were excluded.

Further six older male subjects (62–86 years) and five older female subjects (72–91 years) were analyzed. These subjects were recruited during regular office hours at the Division of Phoniatrics and Pediatric Audiology of the department Otolaryngology, Head- and Neck surgery at the University Hospital Erlangen. The subjects were diagnosed with vocal fold atrophy and had no further vocal fold pathologies.

The participants gave informed, written consent prior to the participation and this consent procedure was approved by the corresponding local ethics committees (Ethik-Kommission der Medizinischen Fakultät FAU-Erlangen-Nürnberg and Office of Research Integrity Expedited Review Board at the University of Kentucky). Experiments were performed in accordance with the Declaration of Helsinki (1964).

Data acquisition

To record the vocal fold vibrations, a PENTAX Medical (Montvale, NJ, USA) Model 9710 digital gray scale high-speed (HS) camera was used at both universities. The applied temporal resolution of 4000 fps enables the vocal fold oscillations to be captured [4]. The HS camera provides a series of pictures with an image resolution of 512 × 256 pixels with a maximum duration of 4 seconds to visualize the laryngeal dynamics.

The recordings were performed during sustained phonation of the vowel /i:/ with a PENTAX Medical 70° endoscope containing a 300 Watt Xenon light source. For each subject, one sample of typical phonation was recorded and analyzed. For the optimization, a sequence length of 100 ms (N = 400 frames) of sustained phonation was chosen. This interval covered between 10 and 50 oscillation cycles depending on the fundamental frequency. The choice of this interval length is a compromise between a sufficient observation time and computational costs and lies in the range of previous reported interval lengths.

Data processing

First, image processing was performed to determine the glottis area and vocal fold edges [11], Fig 3. The glottis axis was determined using the methods being described in [71,72], Fig 3. The in-house developed software tool “Glottis Analysis Tool” (GAT) was used. The GAT tool has already proven its validity and applicability within several studies [8,73] and is also used by other voice groups [74,75].

thumbnail
Fig 3. Performed steps for image processing yielding the experimental vocal fold trajectories.

https://doi.org/10.1371/journal.pone.0187486.g003

For the adaptation of the 2MM, the trajectories from the mid-membranous position, 50% position between anterior and posterior, of the vocal folds were chosen (Fig 3), since the largest vocal fold amplitudes are expected in this region [76]. The trajectories at the medial vocal fold position are also expected to be the most useful [16]. To guarantee the extraction of the vocal fold trajectories at the 50% glottal mid-line only HSV recordings were used, where no obstructions of the view by the epiglottis was present and the most anterior and the visible parts of the posterior glottis were in view.

Two-mass model (2MM)

The human voice is generated by three-dimensional vocal fold oscillations [76]. During phonation, oscillations of the vocal fold mucosa occur in the anterior–posterior, medio-lateral and vertical directions [20]. Anterior–posterior movements are fairly small and can therefore be neglected [76]. The vertical displacements (up to 2.4 mm) are approximately two-thirds of the dominant medio-lateral displacements [33,76]. However, the vertical component cannot be reconstructed based on the current HSV imaging techniques, owing to the lack of a second imaging tool (e.g., a second camera or laser projection system [77]). So far, extended studies on three-dimensional reconstruction of vocal fold surface dynamics have only been possible in ex-vivo or in synthetic experiments [77,78]. For three-dimensional in-vivo reconstruction, only case studies [79] and proof of concepts [80] were performed lately. Therefore, we focused on the dominant medio-lateral (i.e., horizontal) displacement characteristics at one vocal fold position that can be simulated by the 2MM (Fig 4).

thumbnail
Fig 4. HSE image with indicated glottal axis (vertical blue line) and medial positions (~50% position between anterior and posterior) on left (blue dot) and right (red dot) vocal folds where the trajectories were extracted (left figure).

The 2MM used (middle figure). Extracted trajectories for left (blue) and right (red) vocal folds (right figure).

https://doi.org/10.1371/journal.pone.0187486.g004

Important laryngeal parameters that influence the oscillations are vibrating vocal fold masses (m), tension, elasticity, damping (r), stiffness (k) and subglottal air pressure (Ps). The 2MM enables these parameters to be varied and the effects on the medio-lateral oscillatory behavior at one specific vocal fold location to be observed. Since the here applied 2MM has already been extensively described [28], we will give only information necessary to understand the functionality of the 2MM and the optimization procedure. The 2MM is based on [27,28] and assumes that each vocal fold is formed by two vertically arranged coupled masses: A larger, lower mass (m) and a smaller, upper mass (m). Each part consists of a simple mechanical oscillator with a mass and two springs at which the two masses of one vocal fold are connected by one spring. The 2MMs driving force is the subglottal pressure Ps below the masses. The 2MM is described by a system of eight (α = l,r) ordinary differential equations [25]: (1)

The indices (i, α) represent lower (i = 1) and upper (i = 2) masses; α = l,r represents the left and right vocal fold. Ishizaka & Flanagan (1972) introduced a standard parameter value set that represented the standard vocal fold vibration pattern [81]. The parameters from Eq (1) and their standard values as originally introduced [28,81] are given in Table 1. The nonlinear components (I, I and F1) describe the impact forces (I I) and the subglottal pressure function (F1), described also in detail in earlier work [25].

thumbnail
Table 1. Standard parameters of the 2MM.

In this study, the chosen vocal fold lengths l were 10 mm for women and 16 mm for men. The rest positions x01, x02 for the 2MM optimization were computed based on the mean amplitudes of the HSV trajectories yielding also individual rest areas a0i [26]. During the 2MM optimization, the mi, ki and Ps values are varied.

https://doi.org/10.1371/journal.pone.0187486.t001

For the optimization, the 2MM trajectories (T, α = l,r) from each side are a combination of the displacements of the lower and upper masses. The mass (m, m) that is closer to the glottal midline (i.e., visible from above) contributes to the corresponding vocal fold trajectory T [40].

The HSV recordings do not allow the extraction of the vocal fold oscillations in metric units but only in pixels. Owing to the known differences in vocal fold length, we chose mean values for the vibrating vocal fold length to be 10 mm in women and 16 mm in men. These values are the average membranous vocal fold lengths being reported in previous work [8286]. Due to the metric mapping of the vocal fold lengths, we can calculate an approximate metric equivalent of the length of one pixel and convert the extracted trajectories to metric units.

During the optimization, the parameters vocal fold mass (m), stiffness (k), and subglottal pressure (Ps) are varied. This is based on the study by Steinecke and Herzel (1995) [28], who also introduced a scaling factor Qα (α = l,r) to vary the standard masses (m1α0) and spring variables (k1αo). Laryngeal asymmetry is expressed by the scaling factors Qα The scaling factors Ql (left vocal fold) and Qr (right vocal fold) influence the masses and spring constants in the following way [27,28]: (2) This reciprocal relationship between vibrating masses m and springs k is based on the assumption that the larger the vibrating mass, the smaller is the stiffness of the vocal folds [28]. The 2MM oscillates symmetrically provided that Ql and Qr are equal or only slightly different. If the differences between the Qi are too large, the 2MM vibrations become left-right asymmetric.

Optimization procedure

With the variation of Ql and Qr reflecting mass and stiffness for each vocal fold and the subglottal pressure Ps, it is possible to reproduce physiologic and pathologic vocal fold oscillations [26,36]. The goal of the optimization is to vary these parameters so that the resulting 2MM trajectories (TM) accurately recreate the HSV recorded and extracted vocal fold trajectories (TE). This is realized by a combination of several steps within the optimization algorithm (Fig 5). The cost function judges the quality of the 2MM optimization and compares the model trajectories TMl and TMr with the recorded vocal fold trajectories TEl. and TEr. Three different cost functions (Γ1, Γ2, Γ3), as described below, are used to match the extracted vocal fold trajectories as closely as possible.

Frequency domain (Γ1).

The absolute and phase values of the dominant harmonics between TM and TE are considered as suggested previously [25]. First, a preprocessing step with a Fourier transform for identification of the harmonics involved in the left experimental vocal fold trajectory (TEl) and right trajectory (TEr) is performed. The experimental trajectory’s Fourier spectrum is dominated by only a small number of harmonics, represented by their Fourier coefficients e (i = number of harmonic, α = left (l) or right (r)). Only harmonics e that exhibit at least 25% of the absolute value of the largest coefficient e = max (e) representing the fundamental frequency are taken into account. For irregular or aperiodic vibrations the coefficient e = max (e) corresponds to the dominant frequency. Additionally, the next left and right neighbors of the coefficients are selected to consider slight variations in the harmonics [25]. For the simulated 2MM trajectories TMl and TMr, the equivalent Fourier coefficients (s, i = j) are chosen and considered in the cost function Γ1: (3)

In Γ1, L corresponds to the number of coefficients for the left and R for the right vocal fold. Also, a scaling factor s is included in Γ1 to balance the influence of phase and absolute values of the Fourier coefficients as described in [25]. Γ1 was constructed to reduce the number of local minima and to therefore potentially yield improved optimization results [87].

Time domain (Γ2).

The Euclidean distance between the model trajectories (TMl, TMr) and the recorded trajectories (TEl, TEr) is computed.

Normalized frequency domain (Γ3).

This cost function consists of Γ1, except that the absolute Discrete Fourier Transform (DFT) coefficients are normalized to 1 and that an additional regularization term is added: the Euclidean distance of the absolute values of the DFT coefficients representing the fundamental frequency.

Optimization was rated successful when the following three error criteria were achieved.

(1) Frequency deviation ≤ 5%; (2) glottis closure as seen in the HSV recordings was achieved for the optimized 2MM trajectories; (3) amplitudes of optimized 2MM trajectories were within the amplitude variations of HSV trajectories; see e.g., Fig 4 where the right trajectory (red) varies between 12.5 and 15 pixel. Optimization would be defined successful for this trajectory when the corresponding right 2MM amplitude was also between 12.5 and 15 pixel.

For optimization, three algorithms are run separately on all three cost functions Γ1, Γ2, Γ3: (1) the Nelder–Mead (NM) algorithm [25], (2) the Particle Swarm Optimization (PSO) algorithm [45] and (3) the Simulated Bee Colony (SBC) optimization [88]. This yielded altogether nine optimized parameter sets (Ql, Qr, Ps) approximating the recorded vocal fold trajectories T. Since, the cost functions Γ1, Γ2, Γ3 are computed in different ways and in different domains (frequency and time) their absolute values cannot be compared to judge which cost function is actually better. Hence, the final and best parameter set (Ql, Qr, Ps) was determined as the parameter set having the smallest normalized Euclidian distance Γ between the experimental T and simulated model trajectories T: (4)

Prior to the actual optimization, for each subject an initial value search for Ql, Qr, and Ps was performed for the NM and PSO algorithms to reduce the potential search space and computational time for the actual optimization process [25]. An initial search was not performed for the stochastic based SBC algorithm.

The entire optimization process was performed on an Intel® Core™ i5-4590 Processor (3.30 GHz) using an in-house developed software written in C#. The software contained a GUI for improved handling and visually reviewing the results.

Parameter analysis

For judging the left-right asymmetry in the 2MM and therefore in the vocal fold oscillations a factor Qlr (≥ 1) is used, adapted from [28]. The closer the Qlr quotient is to 1, the higher is the dynamic left–right symmetry: (5)

Pairwise group differences (young females vs. young males) for the computed parameters (Ql, Qr), Qlr, Ps and Γ were statistically investigated. (Ql, Qr) are merged to one data pool, since the absolute differences between both the groups are of interest. Initially, to test for normal distribution, the Shapiro–Wilk test was used. All four parameters were not normally distributed: (Ql, Qr) (df = 50, p = 0.000), Γ (df = 25, p = 0.005), Qlr (df = 25, p = 0.011) and Ps (df = 25, p = 0.001). Hence, Mann–Whitney U-tests were applied for the four group comparisons; the significance level was set to p = 0.05 and no Bonferroni correction was applied.

For comparing the younger vs. elderly subjects and gender specific differences in the elderly groups, only descriptive statistics were applied, due to the small number of elderly subjects. Hence, these observations are limited and have no statistical evidence. Statistical analysis was done using IBM SPSS Statistics 21.

Results and discussion

Applicability of the 2MM optimization

Results of the optimization procedure for the 2MM were only deemed correct, when all three above introduced error criteria were met. Altogether 12 HSV recordings (27.3%) could not be correctly optimized by violating one or more of these error criteria: (1) the fundamental frequency could not be matched (two times); (2) the HSV trajectories and the optimized 2MM amplitudes did not match (nine times); (3) the glottis closure or the glottis closure insufficiency was not reproduced (five times). In Fig 6, typical examples for failed optimization results are given: (A) For the young female the Γ value was within the range of the correct rated optimizations, however glottis closure was not achieved. (B) For the young male the Γ value is in the upper range of the correctly rated optimizations, however the amplitudes did not match. (C) For the elderly female the Γ value was higher than for the correctly rated optimizations, glottis closure was not achieved and the amplitudes did not match. (D) For the elderly male the Γ value was higher than for the correctly rated optimizations, glottis closure was not achieved and the amplitudes did not match.

thumbnail
Fig 6. Examples for a young female (unmatched glottis closure and left amplitude), young male (unmatched amplitude), elderly female (unmatched glottis closure and amplitude) and elderly male (unmatched amplitudes and glottis closure) that illustrate the extracted trajectories and the incorrectly optimized trajectories of the 2MM for the left and vocal fold right side.

https://doi.org/10.1371/journal.pone.0187486.g006

Altogether 72.7% of the HSV trajectories were successfully optimized: 68.8% (11 out of 16) of the young females, 82.4% (14 out of 17) of the young males, 80.0% (4 out of 5) of the older females, and 50.0% (3 out of 6) of the older males. In the following, only these 32 successfully optimized HSV recordings will be considered and discussed.

As can be seen in Table 2, all three applied optimization algorithms yielded optimal results (i.e., lowest Γ value). However, the NM (38%) and SBC (43%) algorithms yielded more often the best results than the PSO (19%) algorithm. For the cost function, Γ2 definitely showed the most promising results. Γ2 yielded the best results most often (94%) followed by Γ3 (6%). Γ1 never yielded the best approximation. With regard to our first aim, the results suggest that all three optimization algorithms are suitable for the 2MM optimization but as cost function only Γ2 seems to be promising.

thumbnail
Table 2. Overview of how often each optimization algorithm (Nelder Mead–NM, Particle Swarm Optimization–PSO, Simulated Bee Colony—SBC) and cost function Γ1, Γ2, and Γ3 yielded the best optimization result; i.e., smallest Γ value–see Eq (4).

https://doi.org/10.1371/journal.pone.0187486.t002

The low values of the objective function Γ confirm the applicability of the 2MM for all four groups, Table 3. A value of Γ = 0 would correspond to a perfect optimization without any discrepancies between experimental and simulated curves. The highest mean Γ values are for elderly males (0.63 ± 0.09), followed by elderly women (0.59 ± 0.18). The best and lowest values are found for young men (0.45 ± 0.06) followed by the young women (0.57 ± 0.20). The difference in Γ (young women vs. young men) is not statistically different (p = 0.434).

thumbnail
Table 3. Mean values, standard deviations and range (minimum–maximum) of Γ, the optimized parameters Ps [cmH2O], Ql, Qr and the symmetry quotient Qlr for the four subject groups are given.

https://doi.org/10.1371/journal.pone.0187486.t003

The values suggest that the varied masses and stiffness parameters of the 2MM might adapt slightly better to the two younger subject groups than they do when elderly subjects are considered. However, higher values of Γ were expected for the elderly groups in comparison with young adults, as elderly subjects were reported to have lower laryngeal dynamic periodicity compared with younger adults [89,90]. This means that the glottis and therefore the extracted vocal fold trajectories oscillate not as periodically as they do in young adults. The 2MM does not allow for the simulation of slight changes in oscillation period length and slight amplitude changes between oscillation cycles (i.e., Jitter and Shimmer) hence yielding consequently higher Γ values for the elderly subjects.

The accuracy of the optimization regarding the fundamental oscillation frequencies of the vocal folds is illustrated in Fig 7, where the experimental trajectory frequencies (fEl, fEr) are plotted against the optimized 2MM frequencies (fMl, fMr). The highest accuracy is given for young women, where the model and experimental frequencies match for all subjects. For young males, the frequencies match perfectly for eight subjects. For three subjects the frequencies deviate for one vocal fold side with Δ = {6.7, 3.8, 3.8 Hz} and for three subject the frequencies deviate for both vocal folds with Δ = {3.8, 3.2, 2.6 Hz}. For older women, the frequencies match for three subjects. For one subject the frequencies deviate for both vocal folds with Δ = {10.5 Hz}. For older men, the frequencies match for two subjects. For one subject the frequencies deviate for one vocal fold with Δ = {7.2 Hz}.

thumbnail
Fig 7. Fundamental frequencies (fEl, fEr) of the experimental HSE-recorded trajectories versus the frequencies (fMl, fMr) of the optimized model trajectories shown separately for left and right vocal folds.

https://doi.org/10.1371/journal.pone.0187486.g007

In summary, 81% of the successful optimized trajectories f matched the fundamental frequencies of the experimental trajectories f. This value is similar to that reported previously [26], where 80% of the original fundamental frequencies were correctly reproduced. Fig 8 shows examples for correctly reproduced vocal fold trajectories.

thumbnail
Fig 8. Examples for a young female, young male, elderly female and elderly male that illustrate the extracted trajectories and the correctly optimized trajectories of the 2MM for the left and vocal fold right side.

The values for the cost function Γ are given.

https://doi.org/10.1371/journal.pone.0187486.g008

Optimized parameters

The computed values for Ps, Qlr, Ql and Qr and found group differences (gender and age) confirm our hypotheses as formulated in aims (2) and (3):

Table 3 gives an overview of the determined Ps, Ql, Qr and Qlr parameter values. The symmetry quotient Qlr shows high symmetry for the two young healthy groups, confirming the previously performed medical diagnosis of normal voice production. In our study, young women and men showed similar, statistically not significant differences with p = 0.202, and highest symmetry with Qlr values of 1.07 ± 0.04 for young males. Young (Qlr = 1.12 ± 0.08) and older females (Qlr = 1.12 ± 0.13) exhibit equal symmetry. Deviations of dynamic left–right symmetry of up to 20% (i.e., Qlr ≈ 1.20) were reported previously [42] and can still be considered as entirely physiologic. Further, slight physiologic and anatomic asymmetries were reported in healthy young and elderly subjects [69,70]. However, the older the subjects, the more prominent and larger the vocal fold dynamic asymmetries might become [91]. This was reflected only for the older male group (Qlr = 1.28 ± 0.20) that showed increased asymmetry values (i.e., higher Qlr values). The older female group was much more symmetric than the elderly male group. These findings confirm our hypothesis of higher kinematic asymmetries in elder subjects for women but not for men.

Young men showed the lowest subglottal pressure Ps with a mean value of 16.49 ± 7.13 cmH2O, but also had the lowest fundamental frequency (147 ± 38 Hz). In contrast, older males showed clearly higher subglottal pressure at 22.61 ± 6.50 cmH2O. Also the fundamental frequency for elderly men was increased (182 ± 22 Hz) confirming earlier studies [62]. Also, for the elderly females the fundamental frequencies (380 ± 117 Hz) were increased compared to young women (328 ± 40 Hz)—contradicting previous observations [92]. Also the subglottal pressures (28.30 ± 12.17 cmH2O) were higher for the elderly compared to the younger female group (21.12 ± 7.16 cmH2O).

The two elderly male and female groups showed both higher subglottal pressure and higher fundamental frequencies compared to their corresponding younger groups. The subglottal pressure for the male groups was smaller compared to the corresponding female groups. Comparing the young gender groups revealed statistically significant differences (p = 0.021); young men showed smaller Ps than young women. This is in contrast to previous studies where males and females showed similar values for both groups (Table 4). Overall, the computed subglottal pressures (10.10–45.70 cmH2O) were much higher compared to previously reported in-vivo value ranges (normal phonation: 3.5–12.8 cmH2O, loud phonation: 5.9–27.7 cmH2O), Table 4. However, the computed Ps values are in the same range as in previous studies (11.6 cmH2O ≤ Ps ≤ 46.3 cmH2O) that optimized the 2MM towards human in-vivo [26] and a 3DM model towards human ex-vivo [44] vocal fold dynamics. In [44], the computed Ps values very well approximated the applied and measured Ps values indicating that the here presented values may not be entirely off. High Ps values were also reported for human ex-vivo larynx experiments (up to 44.0 cmH2O in [96] and up to 35 cmH20 in [97]). However, the computed Ps values in our study most likely overestimated the actual applied Ps values but are still in reported ranges.

thumbnail
Table 4. Overview on subglottal pressure vales (cmH20) as reported for healthy subjects during normal and loud phonation in the literature.

https://doi.org/10.1371/journal.pone.0187486.t004

Ql and Qr are investigated with regard to their absolute values. Clear differences between the four groups were apparent. Young men showed the lowest values for Ql and Qr (0.76–1.99), followed by the older males (1.15–2.36). The computed values are in the same range as reported previously [26]. Older women had the highest values (1.56–4.87) followed by young women (1.93–3.32). Transferring this to the vocal fold physiologically means that younger men and women have higher oscillating masses with smaller stiffness than their older comparison groups. Also, this means that men have larger masses and lower stiffness than the corresponding female groups. It has to be mentioned that the relation “increasing mass–decreasing stiffness” is induced by the modeling parameter Qα as can be seen in Eq (2) [28]. However, it is generally understood that the vibrating portion of the vocal fold masses usually becomes smaller when vocal fold tension is increased [81, 98]. For young women vs. young men the difference for (Ql, Qr) was found statistically significant with p = 0.000.

For stiffness, the found gender differences confirm an earlier study where the amplitude quotient (AQ) as an indirect measure of the viscoelastic stiffness of vocal folds was used [15]. The amplitude quotient is determined by the shape and amplitude of the glottal area waveform. A smaller absolute value of amplitude quotient in young women in comparison with young males was reported, indicative of increased stiffness for the young females. However, it should be noted that the amplitude quotient is not an explicit measure of elasticity. Further, larger absolute values of maximum area declination rate in young women compared to young men were reported [15]. This is indicative of larger absolute peak velocity during the closing phase in young women, hinting to increased stiffness in young women and being again confirmed by the computed larger Ql, Qr values, Fig 9A.

thumbnail
Fig 9.

Scatterplots for the distribution of the four groups relating (A) Ql vs. Qr (B) the fundamental frequencies f0 vs. Ps, (C) f0 vs. (Ql, Qr) and (D) (Ql, Qr) vs. Ps.

https://doi.org/10.1371/journal.pone.0187486.g009

Relationships between parameters

It is notable that the computed lower Ps values (10–20 cmH2O, being 28% of the entire Ps range) account for 97% of all occurring fundamental frequencies (100 Hz– 500 Hz), Fig 9B. This suggests that the subglottal pressure might play a minor role in frequency changes, as observed before [99]. Hsiao et al (2001) showed that the relationship between fundamental frequency and subglottal pressure depends on the tension of the larynx [99]. This means for our results that a lower tension or stiffness (small Qα), as computed for young and elderly males, also means lower fundamental frequencies compared to young and elderly females, as confirmed in Fig 9C, whereas in contrast the Ps values were only slightly reduced (see means in Table 3) and almost in the same range (Fig 9B). In contrast, higher tension, as computed for both female groups, presents higher fundamental frequencies (Fig 9C) at only slightly increased Ps values. The high dependency between f0 and stiffness is also expressed by a high Pearson correlation coefficient of 0.986 (p = 0.000). This relationship was also seen before when for a male and female group different loudness levels (soft–normal–loud: i.e., increasing stiffness) were analyzed [93]. However, this study reported slightly lower Ps values for women compared to men. In summary, the computed Ps values in our study (Table 3) and also the values presented by [93] do overlap for different analyzed subject groups and tasks showing a high inter-individual variability for Ps.

Fig 9D shows the relationship of absolute stiffness and vibrating masses (Ql, Qr) to the subglottal pressure Ps. Young males are clearly separated from young and old females. Older males slightly overlap with both female groups. Further, the Fig 9D shows that the values for both young groups are more centered whereas the values for both elderly groups are more spread out and seem not to be as consistent.

Study limitations and outlook

This study has clear limitations due to the sample size. When comparing the optimized 2MM parameters statistical tests were only performed when comparing young men vs. young women. When comparing age related differences and elderly men vs. elderly women no statistical tests were performed and only non-statistical tested trends were described. This lack of statistical significance is clearly a major limitation. Also only healthy young and atrophic elderly subjects were considered. However, the study yielded clear trends and initial group data for younger healthy and elderly atrophic subjects. Future studies should also investigate how the study parameters vary in elderly and young adults during modified phonation (e.g., pitch raise [40]).

Model and optimization limitations.

The applied 2MM allows simulation of vocal fold oscillations only in the medio-lateral and not in the vertical direction, as reported before for a three-dimensional model [44]. Also, vibrational characteristics and changes along the vocal fold length (anterior-posterior) cannot be captured by the 2MM since only the trajectories at mid-membranous position (50% of the vocal fold length) are simulated. Hence, anterior-posterior phase differences [100] and typical posterior gaps for female phonation [69] are not captured. For analyzing these characteristics the 6-Mass-Model should be applied and optimized [41]. However, investigating such phenomena was not the focus of this study and will be taken into account in our future work. Also the considered trajectories were always extracted at the standardized 50% vocal fold length position with the assumed largest amplitudes. However, the largest amplitudes vary around this position (from posterior to anterior: females (41.1% ± 10.8%) and males (46.5% ± 18.0%)) as reported in [16]. Hence in further studies the influence of this assumption and the potential discrepancy towards the exact individual largest amplitude should be investigated for the optimization results.

The computed and optimized Ps values seem to be overestimated by the 2MM since the found values are much higher than assumed and reported for in-vivo measurements (see Table 4), although such high Ps values were reported for ex-vivo studies. However, this issue has to be clarified in future work. In this context, it has to be noted that the goal of investigating and optimizing LMMs towards vocal fold dynamics is not to directly transfer the quantities of computed masses, stiffness and subglottal pressure but to uncover underlying biomechanical differences between vocal fold dynamics [26,28,81,101].

No subgradient-based algorithms were applied for the optimization [102]. Applying such algorithms may enhance the number of correct optimization results.

The success of the optimization procedure was assessed by three objective criteria. However, an explicit objective measure of how the shape of the trajectories was reproduced is a question for future studies.

A dependent variation of masses and stiffness parameters as initially suggested was performed [28]. However, the independent variation and optimization of masses and stiffness within the 2MM should be considered, since otherwise an increase of mass always goes along with a reduction of stiffness. This dependency might reduce the applicability to certain vocal fold oscillations and also might not reflect certain biomechanical constellations within the vocal folds. Time-dependent parameters should be taken into account, since the 2MM with time-independent parameters does not allow for entirely correct simulation of inter-cycle changes as seen in Fig 8D; i.e., vocal folds show closure during a few cycles and then they do not.

For future classification purposes (i.e., normal vs. pathologies), it might be interesting to vary additional biomechanical parameters like collision and contact forces [103], frequency dependent stiffness [104] and glottal flow [105].

Finally, to enable the clinical application of LMMs in the daily clinical routine in the future, the computational time has to be reduced. Currently, the optimization including the initial value search for one HSV recording takes approximately 60 minutes on a desktop computer.

HSV imaging limitations.

HSV imaging projects the three-dimensional vocal fold vibrations and surfaces onto two-dimensional pictures and movies. The image processing detects the dark region between the two vocal folds as glottis. The positions of the most medial edges of the vocal fold tissue from image processing are taken as experimental trajectories (T). The trajectories (T) of the 2MM are built from the positions of the upper and lower mass (mα) depending on which mass is more medial; i.e., visible from above. Since the exact vertical positions of the trajectories within the HSV images cannot be determined, it is unclear if the vertical position (upper or lower mass) of the model trajectories actually corresponds to the same vertical region (superior or inferior vocal fold edge) of the extracted trajectories. Applying HSV in combination with a laser projection unit that allows the reconstruction of the three-dimensional positions of the entire visible vocal fold surfaces would solve this shortcoming [80,106]. The use of a laser projection unit with HSV would further allow for the extraction of the vertical trajectory components of the vocal folds (96); however then a more complex three-dimensional LMM will have to be applied for optimization; e.g. [34].

Owing to the lack of metric specifications in the HSV recordings, average vocal fold lengths were used for males and females, whereas the individual length of the vocal folds was not taken into consideration. Because of the absence of metric units, the recorded vocal fold trajectories are initially scaled in pixels and then transferred to metric units using the averaged vocal fold length. Hence, the amplitudes may not match the actual oscillation quantities accurately. This shortcoming will also be solved in future studies when using HSV in combination with a laser projection unit [106] allowing for the extraction of metric units for vocal fold trajectories and using individual vocal fold lengths in the optimization procedure.

Conclusion

This study is the first approach to use a LMM for comparing age and gender related differences based on vocal fold dynamics recorded with endoscopic high-speed imaging. The parameter optimization objectively quantified biomechanical differences in terms of dynamic symmetry, subglottal pressure, vocal fold masses and stiffness, across gender and age. The results show promising findings for quantifying vocal fold dynamics and for differentiating normal from disordered voice as well as in differentiating between vocal fold pathologies. However, the 2MM does not have one-to-one correspondence to the actual values of the vocal fold masses, stiffness, and subglottal pressure, but allows for objectively evaluating the biomechanical interrelationships between these variables.

Three different optimization algorithms were tested including three different cost functions. For future studies, the results do not favor a specific optimization algorithm but clearly show that the Euclidian Distance of the trajectories (Γ2) should be chosen as cost function to achieve best results.

Supporting information

S1 Data. Individual successful and failed optimization results.

https://doi.org/10.1371/journal.pone.0187486.s001

(XLSX)

References

  1. 1. Chhetri DK, Neubauer J (2015) Differential roles for the thyroarytenoid and lateral cricoarytenoid muscles in phonation. Laryngoscope 125(12): 2772–2777. pmid:26198167
  2. 2. Inwald EC, Döllinger M, Schuster M, Eysholdt U, Bohr C (2011) Multiparametric analysis of vocal fold vibrations in healthy and disordered voices in high-speed imaging. J Voice 25(5): 576–590. pmid:20728308
  3. 3. Patel R, Dixon A, Richmond AM, Donohue KD (2012) Pediatric high speed digital imaging of vocal fold vibration: A normative pilot study of glottal closure characteristics. Int J Pediatr Otorhinolaryngol 76(7): 954–959. pmid:22445799
  4. 4. Döllinger M (2009) The next step in voice assessment: High-speed digital endoscopy and objective evaluation. Curr Bioinform 4(2):101–111.
  5. 5. Petermann S, Kniesburges S, Ziethe A, Schützenberger A, Döllinger M (2016) Evaluation of Analytical Modeling Functions for the Phonation Onset Process. Comput Math Methods Med 2016:8469139. pmid:27066108
  6. 6. Echternach M, Döllinger M, Sundberg J, Traser L, Richter B (2013) 2013 Vocal fold vibrations at high soprano fundamental frequencies. J Acoust Soc Am 133(2): EL82–87. pmid:23363198
  7. 7. Warhurst S, McCabe P, Heard R, Yiu E, Wang G, Madil C (2014) Quantitative measurement of vocal fold vibration in male radio performers and healthy controls using high-speed videoendoscopy. PLoS One 9(6): e101128. pmid:24971625
  8. 8. Patel RR, Dubrovskiy D, Döllinger M (2014) Characterizing vibratory kinematics in children and adults with high-speed digital imaging. J speech Lang Hear Res 57(2): 674–686. pmid:24686982
  9. 9. Gloger O, Lehnert B, Schrade A, Völzke H (2015) Fully automated glottis segmentation in endoscopic videos using local and shape features of glottal regions. IEEE Trans Biomed Eng 62(3): 795–806. pmid:25350912
  10. 10. Andrade-Miranda G, Godino-Llorente JI, Moro-Velazquez L, Gomez-Garcia JA (2015) An automatic method to detect and track the glottal gap from high speed videoendoscopic images. Biomed Eng Online 14: 100. pmid:26510707
  11. 11. Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Döllinger M (2007) Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med Image Anal 11(4): 400–413. pmid:17544839
  12. 12. Patel R, Unnikrishnan H, Donohue KD (2016) Effects of vocal nodules on glottal cycle measurements derived from high-speed videoendoscopy in children. PLoS One 11(4): e0154586. pmid:27124157
  13. 13. Yokonishi H, Imagawa H, Sakakibara K, Yamauchi A, Nito T, Yamasoba T, et al (2016) Relationship of Various Open Quotients With Acoustic Property, Phonation Types, Fundamental Frequency, and Intensity. J Voice 30(2): 145–157. pmid:25953586
  14. 14. Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J (2016) A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms. Artif Intell Med 66: 15–28. pmid:26597002
  15. 15. Patel RR, Dubrovskiy D, Döllinger M (2014) 2014 Measurement of glottal cycle characteristics between children and adults: physiological variations. J Voice 28(4): 476–486. pmid:24629646
  16. 16. Lohscheller J, Svec JG, Döllinger M (2013) Vocal fold vibration amplitude, open quotient, speed quotient and their variability along glottal length: kymographic data from normal subjects. Logoped Phoniatr Vocol 38(4): 182–192. pmid:23173880
  17. 17. Döllinger M, Dubrovskiy D, Patel RR (2012) Spatiotemporal analysis of vocal fold vibrations between children and adults. Laryngoscope 122(11):2511–2518. pmid:22965771
  18. 18. Ziethe A, Patel RR, Kunduk M, Eysholdt U, Graf S (2011) Clinical analysis methods of voice disorders. Curr Bioinform 6(3): 270–285.
  19. 19. Patel RR, Pickering J, Stemple J, Donohue KD (2012) A Case Report in Changes in Phonatory Physiology Following Voice Therapy: Application of High-Speed Imaging. J Voice 26(6):734–741. pmid:22717492
  20. 20. Patel RR, Liu L, Galatsanos N, Bless DM (2011) Differential vibratory characteristics of adductor spasmodic dysphonia and muscle tension dysphonia on high-speed digital imaging. Ann Otol Rhinol Laryngol 120(1): 21–32. pmid:21370677
  21. 21. Deguchi S (2011) Mechanism of and threshold biomechanical conditions for falsetto voice onset. PLoS One 6(3): e17503. pmid:21408178
  22. 22. Horácek J, Lakkanen AM, Sidlof P, Murphy P, Svec JG (2009) Comparison of acceleration and impact stress as possible loading factors in phonation: a computer modeling study. Folia Phoniatr Logop 2009: 61(3):137–145. pmid:19571548
  23. 23. Unger J, Lohscheller J, Reiter M, Eder K, Betz CS, Schuster M (2015) A noninvasive procedure for early-stage discrimination of malignant and precancerous vocal fold lesions based on laryngeal dynamics analysis. Cancer Res 75(1): 31–39. pmid:25371410
  24. 24. Miri AK (2014) Mechanical characterization of vocal fold tissue: a review study. J Voice 28(6): 657–667. pmid:25008382
  25. 25. Döllinger M, Hoppe U, Lohscheller J, Hettlich F, Schuberth S, Eysholdt U (2002) Vibration Parameter Extraction From Endoscopic Image Series of the Vocal Folds. IEEE Trans Biomed Eng 49(8): 773–781. pmid:12148815
  26. 26. Döllinger M, Braunschweig T, Lohscheller J, Eysholdt U, Hoppe U (2003) Normal voice production: computation of driving parameters from endoscopic digital high speed images. Methods Inf Med 42(3): 271–276. pmid:12874661
  27. 27. Ishizaka K, Isshiki N (1976) Computer simulation of pathological vocal-cord vibration. J Acoust Soc Am 60(5): 1193–1198. pmid:977846
  28. 28. Steinecke I, Herzel H (1995) Bifurcations in an asymmetric vocal-fold model. J Acoust Soc Am 97(3): 1874–1884. pmid:7699169
  29. 29. Zhang Y, Regner MF, Jiang JJ (2011) Theoretical Modeling and Experimental High-Speed Imaging of Elongated Vocal Folds. IEEE Trans Biomed Eng 58(10): 2725–2731. pmid:21118763
  30. 30. Robertson D, Zanartu M, Cook D (2016) Comprehensive, Population-Based Sensitivity Analysis of a Two-Mass Vocal Fold Model. PLoS One 11(2): e0148309. pmid:26845452
  31. 31. Alipour F, Brücker C, Cook DD, Gömmel A, Kaltenbacher M, Mattheus W, et al (2011) Mathematical Models and Numerical Schemes for the Simulation of Human Phonation. Curr Bioinform 6(3): 323–343.
  32. 32. Schwarz R (2007) Model-based quantification of pathological voice production. PhD thesis. FAU-Erlangen-Nürnberg, Technical Faculty, Shaker Verlag.
  33. 33. Döllinger M, Berry DA (2006) Computation of the three-dimensional medial surface dynamics of the vocal folds. J Biomech 39(2): 369–374. pmid:16321641
  34. 34. Yang A, Lohscheller J, Berry DA, Becker S, Eysholdt U, Voigt D, et al (2010) Biomechanical modeling of the three-dimensional aspects of human vocal fold dynamics. J Acoust Soc Am 127(2): 1014–1031. pmid:20136223
  35. 35. Mergell P, Herzel H, Titze IR (2000) Irregular vocal-fold vibratio—high-speed observation and modeling. J Acoust Soc Am 108(6): 2996–3002. pmid:11144591
  36. 36. Schwarz R, Hoppe U, Schuster M, Wurzbacher T, Eysholdt U, Lohscheller J (2006) Classification of unilateral vocal fold paralysis by endoscopic digital high-speed recordings and inversion of a biomechanical model. IEEE Trans Biomed Eng 53(6): 1099–1108. pmid:16761837
  37. 37. Tao C, Zhang Y, Jiang JJ (2007) Extracting physiologically relevant parameters of vocal folds from high-speed video image series. IEEE Trans Biomed Eng 54(5):794–801. pmid:17518275
  38. 38. Pinheiro AP, Stewart DE, Maciel CD, Pereira JC, Oliveira S (2012) Analysis of nonlinear dynamics of vocal folds using high-speed video observation and biomechanical modeling. Digital Signal Processing 22(2): 304–313.
  39. 39. Haldwin PJ, Galindo GE, Daun KJ, Zanartu M, Erath BD, Cataldo E, et al (2016) Non-stationary Bayesian estimation of parameters from a body cover model of the vocal folds. J Acoust Soc Am 139(5): 2683. pmid:27250162
  40. 40. Wurzbacher T, Schwarz R, Döllinger M, Hoppe U, Eysholdt U, Lohscheller J (2006) Model-based classification of nonstationary vocal fold vibrations. J Acoust Soc Am 120(2): 1012–1027. pmid:16938988
  41. 41. Schwarz R, Döllinger M, Wurzbacher T, Eysholdt U, Lohscheller J (2008) Spatio-temporal quantification of vocal fold vibrations using high-speed videoendoscopy and a biomechanical model. J Acoust Soc Am 123(5): 2717–2732. pmid:18529190
  42. 42. Wurzbacher T, Döllinger M, Schwarz R, Hoppe U, Eysholdt U, Lohscheller J (2008) Spatio-temporal classification of vocal fold dynamics by a multi mass model comprising time-dependent parameters. J Acoust Soc Am 123(4): 2324–2334. pmid:18397036
  43. 43. Hüttner B, Luegmair G, Patel RR, Ziethe A, Eysholdt U, Bohr C, et al (2015) Development of a time-dependent numerical model for the assessment of non-stationary pharyngoesophageal tissue vibrations after total laryngectomy. Biomech Model Mechanobiol 14(1): 169–184. pmid:24861998
  44. 44. Yang A, Berry DA, Kaltenbacher M, Döllinger M (2012) Three-dimensional biomechanical properties of human vocal folds: Parameter optimization of a numerical model to match in vitro dynamics. J Acoust Soc Am 131(2): 1378–1390. pmid:22352511
  45. 45. Yang A, Stingl M, Berry DA, Lohscheller J, Voigt D, Eysholdt U, et al (2011) Computation of physiological human vocal fold parameters by mathematical optimization of a biomechanical model. J Acoust Soc Am 130(2): 948–964. pmid:21877808
  46. 46. Erath BD, Zanartu M, Stewart KC, Plesniak MW, Sommer DE, Peterson SD (2013) A review of lumped-element models of voiced speech. Speech Commun 55(5): 667–690.
  47. 47. Cveticanin L (2012) Review on mathematical and mechanical models of the vocal cord. J Appl Math Article ID 928591. http://dx.doi.org/10.1155/2012/928591
  48. 48. Döllinger M, Kniesburges S, Kaltenbacher M, Echternach M (2016) Current methods for modelling voice production. HNO 64(2): 82–90. pmid:26746639
  49. 49. Döllinger M, Kaltenbacher M (2016) Preface: Recent Advances in Understanding the Human Phonatory Process. Acta Acustica united with Acustica 102(2): 195–208. http://dx.doi.org/10.3813/AAA.918936
  50. 50. Mittal R, Byron DE, Plesniack MW (2013) Fluid dynamics of human phonation and speech. Annual Reviews of Fluid Mechanics 45: 437–467.
  51. 51. Döllinger M, Berry DA, Hüttner B, Bohr C (2011) Assessment of local vocal fold deformation characteristics in an in vitro static tensile test. J Acoust Soc Am 130(2): 977–985. pmid:21877810
  52. 52. Hsiao TY, Wang CL, Chen CN, Hsieh FJ, Shau YW (2002) Elasticity of human vocal folds measured in vivo using color Doppler imaging. Ultrasound Med Biol 28(9): 1145–1152. pmid:12401384
  53. 53. Awan SN, Novaleski CK, Yingling JR (2013) Test-retest reliability for aerodynamic measures of voice. J Voice 27(6): 674–684. pmid:24119644
  54. 54. Sulter AM, Wit HP (1996) Glottal volume velocity waveform characteristics in subjects with and without vocal training, related to gender, sound intensity, fundamental frequency, and age. J Acoust Soc Am 100(5): 3360–3373 pmid:8914317
  55. 55. Perkell JS, Hillman RE, Holmberg EB (1994) Group differences in measures of voice production and revised values of maximum airflow declination rate. J Acoust Soc Am 96(2): 695–698. pmid:7930069
  56. 56. Titze IR. Principles of Voice Production. University of Michigan, Prentice Hall; 1994.
  57. 57. Forero Mendoza LA, Cataldo E, Vellasco MM, Silva MA, Apolinarion JA (2014) Classification of vocal aging using parameters extracted from the glottal signal. J Voice 28(5): 532–537. pmid:24880675
  58. 58. Awd Allah RS, Dkhil MA, Farhoud E (2009) Fibroblasts in the human vocal fold mucosa: an ultrastructural study of different age groups. Singapore Med J 50(2): 201–207. pmid:19296037
  59. 59. Ishii K, Zhai WG, Akita M, Hirose H (1996) Ultrastructure of the lamina propria of the human vocal fold. Acta Otolaryngol 116(5): 778–782. pmid:8908260
  60. 60. Hirano M, Sato K, Nakashima T (2000) Fibroblasts in geriatric vocal fold mucosa. Acta Otolaryngol 120(2): 336–340. pmid:11603802
  61. 61. Sato K, Hirano M, Nakashima T (2002) Age-related changes of collagenous fibers in the human vocal fold mucosa. Ann Otol Rhinol Laryngol 111(1): 15–20. pmid:11800365
  62. 62. Pontes P, Brasolotto A, Behlau M (2005) Glottic characteristics and voice complaint in the elderly. J Voice 19(1):89–94. pmid:15766853
  63. 63. Ximenes Filho JA, Tsuji DH, do Nascimento PH, Sennes LU (2003). Histologic changes in human vocal folds correlated with aging: a histomorphometric study. Ann Otol Rhinol Laryngol 112(10): 894–898. pmid:14587982
  64. 64. Hirano M, Kurita S, Sakaguchi S (1989) Ageing of the vibratory tissue of human vocal folds. Acta Otolaryngol 107(5–6): 428–433. pmid:2756834
  65. 65. Higgins MB, Saxman JH (1991) A comparison of selected phonatory behaviors of healthy aged and young adults. J Speech Hear Res 34(5): 1000–1010. pmid:1749230
  66. 66. Ramig LO, Gray S, Baker K, Corbin-Lewis K, Buder E, Luschei E, Coon H, et al (2001) The aging voice: a review, treatment data and familial and genetic perspectives. Folia Phoniatr Logop 53(5): 252–265. 52680 pmid:11464067
  67. 67. Goozee JV, Murdoch BE, Theodoros DG, Thompson EC (1998) The effects of age and gender on laryngeal aerodynamics. Int J Commun Disord 33(2): 221–238. pmid:9709440
  68. 68. Roy N, Kim J, Courey M, Cohen SM (2016) Voice disorders in the elderly: A national database study. Laryngoscope 126(2): 421–428. pmid:26280350
  69. 69. Yamauchi A, Imagawa H, Yokonishi H, Nito T, Yamasoba T, Goto T, et al (2012) Evaluation of vocal fold vibration with an assessment form for high-speed digital imaging: comparative study between healthy young and elderly subjects. J Voice 26(6): 742–750. pmid:22521532
  70. 70. Pontes P, Yamasaki R, Behlau M (2006) Morphological and functional aspects of the senile larynx. Folia Phoniatr Logop 58(3): 151–158. pmid:16636563
  71. 71. Dubrovskiy D (2017) Bildverarbeitung bei endoskopischen Hochgeschwindigkeits-aufnahmen der Stimmlippenbewegungen. PhD thesis, Faculty of Engineering at FAU Erlangen-Nürnberg, Germany.
  72. 72. Chen J (2014) Vocal fold analysis from high speed videoendoscopic data. PhD thesis, Department of Electrical & Computer Engineering, LSU Baton Rouge, LA, USA.
  73. 73. Bohr C, Kräck A, Dubrovskiy D, Eysholdt U, Svec J, Psychogios G, et al (2014) Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males. J Speech Lang Hear Res 57(4): 1148–1161. pmid:24686496
  74. 74. Chen G, Kreiman J, Gerratt BR, Neubauer J, Shue YL, Alwan A (2013) Development of a glottal area index that integrates glottal gap size and open quotient. J Acoust Soc Am 133(3): 1656–1666. pmid:23464035
  75. 75. Dippold S, Voigt D, Richter B, Echternach M (2015) High-Speed Imaging Analysis of Register Transitions in Classically and Jazz-Trained Male Voices. Folia Phoniatr Logop 67(1): 21–28. pmid:25967736
  76. 76. Bössenecker A, Berry DA, Lohscheller J, Eysholdt U, Döllinger M (2007) Mucosal wave properties of a human vocal fold. Acta Acust United Acust 93(5):815–823.
  77. 77. Luegmair G, Mehta DD, Kobler , Döllinger J, M. (2015) Three-dimensional optical reconstruction of vocal fold kinematics using high-speed videomicroscopy with a laser projection system. IEEE Trans Med Imaging 34(12): 2572–2582. pmid:26087485
  78. 78. Luegmair G, Kniesburges S, Zimmermann M, Sutor A, Eysholdt U, Döllinger M (2010) Optical reconstruction of high-speed surface dynamics in an uncontrollable environment. IEEE Trans Med Imaging 29(12): 1979–1991. pmid:21118756
  79. 79. Tokuda IT, Iwakawa M, Sakakibara KI, Imagawa H, Nito T, Yamasoba T, et al (2013) Reconstructing three-dimensional vocal fold movement via stereo matching. Acoust Sci & Tech 34(5): 374–377.
  80. 80. Semmler M, Kniesburges S, Birk V, Ziethe A, Patel RR, Döllinger M (2016) 3D Reconstruction of Human Laryngeal Dynamics Based on Endoscopic High-Speed Recordings. IEEE Trans Med Imaging 35(7): 1615–1624. pmid:26829782
  81. 81. Ishizaka K, Flanagan JL (1972) Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Syst Tech J 51: 1233–1268.
  82. 82. Su MC, Yeh TH, Tan CT, Lin CD, Linne OC, Lee SY (2002) Measurement of adult vocal fold length. J Laryngol Otol 116(6):447–449. pmid:12385357
  83. 83. Titze IR. The Myoelastic Aerodynamic Theory of Phonation. National Center for Voice and Speech; 2006
  84. 84. Hertegard S, Hakansson A, Thorstensen Ö (1993) Vocal fold measurements with computer tomography. Logoped Phoniatr Vocol 18(2–3): 57–63.
  85. 85. Patel RR, Donohue KD, Johnson WC, Archer SM (2011) Laser projection imaging for measurement of pediatric voice. Laryngoscope 121(11):2411–2417. pmid:21993904
  86. 86. Patel RR, Donohue KD, Lau D, Unnikrishnan H (2013) In vivo measurement of pediatric vocal fold motion using structured light laser projection. J Voice 27(4):463–472. pmid:23809569
  87. 87. Döllinger M (2002) Parameter estimation of vocal fold dynamics by inversion of a biomechanical model. PhD thesis, Faculty of Engineering at FAU Erlangen-Nürnberg, Germany.
  88. 88. Pham DT, Castellani M (2009) The Bees Algorithm: Modelling foraging behavior to solve continuous optimization problems. Proc. IMechE—J. Mechanical Engineering Science 223 Part C: 2919–2938.
  89. 89. Ahmad K, Yan Y, Bless D (2012) Vocal fold vibratory characteristics of healthy geriatric females—analysis of high-speed digital images. J Voice 26(6): 751–759. pmid:22633334
  90. 90. Lundy DS, Silva C, Casiano RR, Lu FL, Xue JW (1998) Cause of hoarseness in elderly patients. Otolaryngol Head Neck Surg 118(4): 481–485. pmid:9560099
  91. 91. Yamauchi A, Yokonishi H, Imagawa H, Sakakibara K, Nito T, Tayama N, et al (2015) Quantitative analysis of digital videokymography: a preliminary study on age- and gender-related difference of vocal fold vibration in normal speakers. J Voice 29(1): 109–119. pmid:25228432
  92. 92. Honjo I, Isshiki N (1980) Laryngoscopic and voice characteristics of aged persons. Arch Otolaryngol 106(3): 149–150 pmid:7356434
  93. 93. Holmberg EB, Hillman RE, Perkell JS (1988) Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. J Acoust Soc Am 84(2): 511–529. pmid:3170944
  94. 94. Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. 2nd Edition, Delmar Cengage Learning; 2010
  95. 95. Hertegard S, Gauffin J, Lindestad PA (1995) A comparison of subglottal and intraoral pressure measurements during phonation. J Voice 9(2):149–155. pmid:7620537
  96. 96. Döllinger M, Berry DA, Kniesburges S (2016) Dynamic vocal fold parameters with changing adduction in ex-vivo hemilarynx experiments. J Acoust Soc Am 139(5): 2372–2385. pmid:27250133
  97. 97. Döllinger M, Gröhn F, Berry DA, Eysholdt U, Luegmair G (2014) Preliminary results on the influence of engineered artificial mucus layer on phonation. J Speech Lang Hear Res 57(2):637–647. pmid:24686925
  98. 98. Hollien H, Colton RH (1969) Four laminagraphic studies of vocal fold thickness. Folia Phoniatr 21(3): 179–198. pmid:5380734
  99. 99. Hsiao TY, Liu CM, Luschei ES, Titze IR (2001) The effect of cricothyroid muscle action on the relation between subglottal pressure and fundamental frequency in an in vivo canine model. J Voice 15(2): 187–193. pmid:11411473
  100. 100. Neubauer J, Mergell P, Eysholdt U, Herzel H (2001) Spatio-temporal analysis of irregular vocal fold oscillations: Biphonation due to desynchronization of spatial modes. J Acoust Soc Am 110(6):3179–3192. pmid:11785819
  101. 101. Mehta DD, Zanartu M, Quatieri TF, Deliyski DD, Hillmann RE (2011) Investigating acoustic correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal high-speed videoendoscopy. J Acoust Soc Am 130(6):3999–4009. pmid:22225054
  102. 102. Alt W. Nichlineare Optimierung. 1st ed. Friedr. Vieweg & Sohn Verlagsgesellschaft, Braunschweig/Wiesbaden; 2002
  103. 103. Sommer DE, Erath BD, Zanartu M, Peterson SD (2012) Corrected contact dynamics for the Steinecke and Herzel asymmetric two-mass model of the vocal folds. J Acoust Soc Am 132(4): EL271–276. pmid:23039564
  104. 104. Rupitsch SJ, Ilg J, Sutor A, Lerch R, Döllinger M (2011) Simulation Based Estimation of Dynamic Mechanical Properties for Viscoelastic Materials Used for Vocal Fold Models. Journal of Sound and Vibration 330(18–19): 4447–4459.
  105. 105. De Vries MP, Schutte HK, Veldman AE, Verkerke GJ (2002) Glottal flow through a two-mass model: comparison of Navier-Stokes solutions with simplified models. J Acoust Soc Am 111(4): 1847–1853. pmid:12002868
  106. 106. Semmler M., Kniesburges S., Parchent J., Jakubaß B., Zimmermann M., Bohr C., et al (2017) Endoscopic laser-based 3D imaging for functional voice diagnostics. Appl. Sci. 2017, 7, 600;