How to precisely measure the volume velocity transfer function of physical vocal tract models by external excitation

Recently, 3D printing has been increasingly used to create physical models of the vocal tract with geometries obtained from magnetic resonance imaging. These printed models allow measuring the vocal tract transfer function, which is not reliably possible in vivo for the vocal tract of living humans. The transfer functions enable the detailed examination of the acoustic effects of specific articulatory strategies in speaking and singing, and the validation of acoustic plane-wave models for realistic vocal tract geometries in articulatory speech synthesis. To measure the acoustic transfer function of 3D-printed models, two techniques have been described: (1) excitation of the models with a broadband sound source at the glottis and measurement of the sound pressure radiated from the lips, and (2) excitation of the models with an external source in front of the lips and measurement of the sound pressure inside the models at the glottal end. The former method is more frequently used and more intuitive due to its similarity to speech production. However, the latter method avoids the intricate problem of constructing a suitable broadband glottal source and is therefore more effective. It has been shown to yield a transfer function similar, but not exactly equal to the volume velocity transfer function between the glottis and the lips, which is usually used to characterize vocal tract acoustics. Here, we revisit this method and show both, theoretically and experimentally, how it can be extended to yield the precise volume velocity transfer function of the vocal tract.


Introduction
The vocal tract transfer function, i.e., the complex frequency-dependent ratio of the volume velocity (or alternatively sound pressure) at the lips to the volume velocity through the glottis, is widely used to characterize the acoustics of the vocal tract. It contains the information about the frequencies and bandwidths of the formants (resonances), which are of primary importance in many studies. Besides the formants, most transfer functions contain additional PLOS  information in terms of close pole-zero pairs, which are caused by side cavities like the piriform sinus, the vallecula, a cavity between tongue base and epiglottis, interdental spaces, or the nasal cavity [1][2][3]. The measurement of the complete transfer function of real vocal tract geometries with a bandwidth of up to at least 6 kHz (speech range) and a high signal-to-noise ratio is therefore of paramount interest. The only known direct method to measure the vocal tract transfer function in vivo requires an external broadband excitation of the vocal tract with a transducer placed in the vicinity of the larynx, while measuring the sound pressure radiated from the mouth [4][5][6]. Because the sound of the source must pass the tissue of the neck to excite the air in the vocal tract, the source must be firmly pressed against the neck, which can be inconvenient. Furthermore, depending on the subject, the damping by the tissue may be so strong that the signal-to-noise ratio of the recorded signal is not high enough to be useful.
A more convenient direct method to determine the formant frequencies (but not the full volume velocity transfer function) excites the vocal tract with a volume velocity source close to the mouth opening and simultaneously measures its sound pressure response right next to the source [7][8][9]. The quotient of the recorded pressure and the emitted volume velocity is the impedance of the vocal tract in parallel with the radiation impedance, the peaks of which correspond to the radiation-loaded vocal tract resonances [7]. However, it is very difficult to construct a volume velocity source with a flat response over a sufficiently high bandwidth, and the radiation from the mouth is physically disturbed by the source and the microphone. Furthermore, the input impedance differs from the volume velocity transfer function that is usually used to characterize vocal tract acoustics.
A relatively new method to obtain a detailed vocal tract transfer function of a sustained speech sound consists of measuring the vocal tract shape using 3D magnetic resonance imaging (MRI), segmenting the vocal tract shape from the MRI data, printing a 1:1 physical model of the shape using a 3D printer, and measuring the transfer function of the physical model [2,[10][11][12]. The advantage in contrast to measurements in humans is that the printed models have no limitations with respect to the placement of sound sources and microphones. Two approaches have been described to obtain the transfer function of these models.
One approach is to excite the models with a broadband sound source at the glottis and measure the sound pressure radiated from the lips [11][12][13]. Here, the sound source should ideally be a volume velocity source with an infinite output impedance, which is hence independent of the vocal tract load as assumed by the source-filter theory of speech production [14]. However, constructing and calibrating a broadband volume velocity source with a flat frequency response is intricate. Some studies used a loudspeaker or horn driver connected to an impedance matching horn with a small ( 4 mm) annular aperture at the distal end of the horn, which is attached to the glottal end of the vocal tract model [13,15]. Alternatively, the horn is omitted and the speaker is directly attached to a connector plate with a small hole [12]. For a good approximation of a volume velocity source, the hole must be so small that its acoustic resistance is much higher than the highest input impedance of the models. However, the high resistance of the hole and the cavity resonances of the horn can affect the loudspeaker behavior in an unpredictable way. Therefore, the usable bandwidth of this type of source is typically rather limited. For example, Speed et al. reported an upper band limit of 4 kHz, i.e., the frequency of the first marked zero that could not be equalized due to the limited dynamic range of the loudspeaker [13].
Similar to this is the use of an in-ear headphone as a glottal source [11]. However, to our knowledge, the effect of the vocal tract load on the acoustic excitation of such a headphone has not been examined yet. Another method to produce a well-defined glottal volume velocity is based on a calibrated impedance head connected to the glottal end of the model [16]. This technique allows high precision and dynamic range over a wide frequency range, but requires a sophisticated calibrated impedance head with three measurement microphones.
The other general approach to measure the transfer function is to excite the vocal tract model with an external sound source P s (ω) in front of the lips and measure the sound pressure P 1 (ω) inside the model at the glottal end [1,2,10] as illustrated in Fig 1A. This method avoids the intricate problem of constructing a suitable volume velocity source and only requires an ordinary wideband loudspeaker and a microphone. Kitamura et al. [10] argued that where H(ω) = U 2 (ω)/U 1 (ω) is the volume velocity transfer function between the volume velocities U 1 (ω) through the glottis and U 2 (ω) through the lips (see Eqs AÁ4 and AÁ10 in Kitamura et al. [10]), which is usually used to characterize vocal tract acoustics, Z r (ω) is the radiation impedance, and Z 0 is the characteristic impedance of a plane wave. Therefore, if the frequency response P s (ω) of the source is assumed to be independent of frequency, P 1 (ω) is close to H(ω) in terms of formant frequencies, but the spectral tilt is different because Z r is monotonically increasing with frequency. So far, the magnitude of this tilt and hence the deviation of P 1 from the true volume velocity transfer function has not been examined. In principle, the spectral tilt could be compensated by an adapted source. However, this presupposes the exact knowledge of the source characteristics, i.e., one must be able to quantify the behavior of the source coupled with the vocal tract models. Since in many cases the sources are not independent of the model, this compensation would have to be explicitly determined for each configuration. According to Eq (1), it seems likely that the deviation of P 1 (ω) from H(ω) depends on the model geometry, because the radiation impedance Z r (ω) depends on the mouth aperture and the shape of the lips. The purpose of this paper is to examine the extent to which P 1 differs from the true volume velocity transfer function for different vocal tract shapes and to propose an extension to Kitamura's method that allows the precise determination of the volume velocity transfer function. Therefore, we present an alternative description of the measurement situation in terms of an acoustic circuit model. The analysis of this circuit model shows that the precise volume velocity transfer function can be obtained with an additional sound pressure measurement in front of the closed lips of the vocal tract model (without the need for an actual acoustic flow measurement). This method is used to measure the volume velocity transfer functions of four physical vocal tract models and the results are compared to finite element simulations of the same models.
We must emphasize that the proposed method cannot be used to directly measure the volume velocity transfer function of the real vocal tract in vivo. Instead, the vocal tract performing the articulation of interest has to be scanned in an MRI scanner, and the vocal tract shape has to be segmented from the MRI data and printed as a 3D object. Despite this limitation, there are a range of applications for the proposed method. On the one hand, certain articulatory strategies during the production of phones in speech and singing can be precisely associated with changes in the acoustic transfer function. This may help to examine how professional singers tune the acoustic properties of their vocal tract when they sing at different pitches [11], or how we adapt vocal tract acoustics for different voice qualities (e.g., between spoken and shouted speech). On the other hand, the detailed and precise volume velocity transfer functions for realistic 3D vocal tract shapes can serve as ground truth for the validation of methods to transform 2D or 3D vocal tract models to plane-wave acoustic tube models in articulatory speech synthesis (e.g., [17,18]). The transformation from 3D vocal tract models to low-dimensional acoustic tube models is necessary because full 3D acoustic simulations are far too slow for articulatory speech synthesis in real-time.

Theory
In this section we show that the volume velocity transfer function of the vocal tract, i.e., the ratio of the volume velocity U 2 (ω) through the lips to the volume velocity U 1 (ω) through the glottis, corresponds exactly to a ratio of two pressures P 1 (ω) and P 3 (ω), which can be easily measured when the vocal tract is externally excited with a volume velocity source U s (ω) as in Fig 1A and 1B.
The pressure P 1 is measured at the glottis while the mouth is open and the pressure P 3 is measured right in front of the closed lips.
We start by modeling the measurement situation in Fig 1A with the general acoustic circuit in Fig 2A. Here, the vocal tract is represented in terms of a two-port network where the input (= glottal) pressure P 1 and the input volume velocity U 1 are related to the output pressure P 2 and the output volume velocity U 2 by a 2 × 2 transmission matrix M(ω) = [m ij (ω)] as follows: The transmission of sound between the mouth opening and the location of the external sound source is correspondingly modeled by a two-port network with the transmission matrix denote the joint transfer matrix between the glottis and the external source. Due to the principle of reciprocity [19], det M = 1, det N = 1 and det O = 1. Considering that U 1 = 0, the sound pressure P 1 measured at the glottis can be expressed as a function of volume velocity U s at the sound source: We now consider the case where the mouth of the vocal tract is closed with a plate and the sound source is at the same position as before, as shown in Fig 1B. In this case, the volume velocity U 2 through the mouth of the model is zero. The pressure P 3 that is measured in front of the closed lips is now an open-circuit pressure, as shown by the equivalent circuit in Fig 2B. It can be expressed as a function of the volume velocity U s at the sound source as follows: From Eqs (3) and (4) we can form the ratio P 1 /P 3 : The quotient n 11 /n 21 on the right-hand side of Eq (5) equals the input impedance P 2 /U 2 of the two-port network for the "environment" in Fig 2, which is equivalent to the radiation impedance Z r of the vocal tract. This can be proven by considering the two-port network described by the transmission matrix N(ω), which represents the exterior space between the vocal tract and the external sound source (loudspeaker). The equations to describe the transfer characteristics are as follows: For the case of an inactive loudspeaker, one gets U s = 0, and the ratio can be derived. It should be noted that this equation is only an approximation that presumes that the presence of the loudspeaker does not essentially change the exterior space. This assumption holds the smaller the loudspeaker and the greater the distance from the loudspeaker to the resonator. Combining Eqs (5) and (8), we obtain This pressure ratio is exactly the desired volume velocity transfer function H = U 2 /U 1 of the vocal tract, which is easily verified with the second equation in (2) by setting P 2 = U 2 Z r . Note that for both cases shown if Fig 2A and 2B, U s is assumed to be unaffected by the configuration (open or closed mouth) of the vocal tract model under test, because the whole vocal tract model constitutes a small reflecting surface at a certain distance from the loudspeaker in an otherwise open acoustic space. It is also worth noting that the measurement situation depicted in Fig 1A closely resembles the hearing situation when a sound wave emitted from a volume source U s hits the human ear. In this analogy, the vocal tract corresponds to the ear canal, and its glottal end corresponds to the eardrum. The hearing situation has been well studied in the context of binaural hearing [20] and can be adapted to obtain the same results as we did above.

Preparation of the physical vocal tract models
To test the theory above, four physical vocal tract models were created. Three of the models represent the vocal tract shapes for the vowels /a/, /u/ and /i/ produced by a 24-year-old male native German subject. The participant was a singing student at the Voice Research Laboratory, Hochschule für Musik Carl Maria von Weber Dresden, with whom one of the co-authors cooperates. The MRI images were taken in 07/2015. Data aquisition within this study was approved by the ethical review committee of the medical faculty "Carl Gustav Carus" of the TU Dresden (EK153042011). After having been informed about risk and procedures, the participant provided written consent. The shapes were obtained from MRI data of the vocal tract according to a procedure presented in detail before [21,22]. In brief, the subject produced each of the two vowels for about 12.1 s, while his vocal tract was scanned using a 3 T MRI machine (Magnetom Trio Trim, Siemens Medical Solutions, Erlangen, Germany). The MRI was performed with a 12-element-head-neck-coil using a 3D volume-interpolated-breathholdexamination sequence with 1.22 ms/4.01 ms (echo time/repetition time), flip angle 9˚, a fieldof-view of 300 × 300 mm 2 , a matrix of 288 × 288 pixels 2 , 52 sagittal slices and a slice thickness of 1.8 mm.
The vocal tract cavities in the MRI data were segmented using IPTools [23], slightly smoothed, and exported as triangle meshes representing the vocal tract walls. The termination of the vocal tract models at the lips was approximated by a plane parallel to the coronal plane. The anteroposterior position of this plane was set to the place where the vertical distance between the midsagittal contours of the upper and lower lips reached its minimum. The lateral gaps of the vocal tract in the region of the lips between the corners of the mouth and the termination plane were manually closed during the segmentation.
Because the teeth are invisible in MRI, they were separately reconstructed as triangle meshes by scanning plaster models of the subject's mandibular and maxilla using a NextEngine desktop 3D laser scanner. All triangle meshes were then voxelized with a voxel size of 0.25 × 0.25 × 0.25 mm 3 using binvox [24] (www.patrickmin.com/binvox/) and merged into a single voxel model using 3DSlicer [25] (www.slicer.org). The teeth were positioned relative to the vocal tract cavities by careful visual inspection. The merged voxel models for /a/, /u/ and /i/ were then converted back into triangle meshes using 3DSlicer. The free software package Meshlab (www.meshlab.sourceforge.net) was then used for adaptive mesh simplification and extrusion of the surfaces to create vocal tract walls with a thickness of 3 mm. Finally, the programs netfabb Basic (www.netfabb.com) and ParaView (www.paraview.org) were used to repair defects in the triangle meshes and separate the models into two halves suitable for 3D printing. The model halves were printed with a 3D printer (ULTIMAKER 2, www.ultimaker. com) using polylactic acid (PLA) material, and the two halves of each /a/, /u/ and /i/ were conglutinated (S1 File). In addition to the realistic models for /a/, /u/ and /i/, a uniform tube of 170 mm length and 27.6 mm inner diameter was created as a fourth model (denoted as / e / in the following) and printed in one piece with the 3D printer.
All four models were terminated at the glottal end with a uniform endplate of 3 mm thickness and a hole with 10 mm diameter to allow inserting the measurement microphone with a rubber adapter (Figs 1A and 3A).
Finally, the printed models were covered with plaster of a thickness of about 1 cm to increase the mass of the walls and avoid sound radiation from the model surfaces.

Measurement of the transfer functions of the physical models
For each of the four physical models, the volume velocity transfer function was obtained according to the theory outlined in Section II.A by two successive sound pressure measurements (P 1 and P 3 ) with the setups in Fig 1A and 1B. During both measurements, the model was excited by an external sound source U s (VISATON speaker, type FR 10-8 Ohm in a custom-made cylindrical enclosure) producing an exponential sweep with a power band from 100-10000 Hz and a duration of 21 s according to the method by Farina [26]. The sound source was located 25 cm in front of the mouth opening to prevent near-field effects on the model. The pressure P 1 was recorded with a 1/4" measurement microphone (type MK301E/ MV310, www.microtechgefell.de) inserted into the glottal end of the model so that the microphone membrane was flush with the upper surface of the "vocal folds". For the measurement of P 3 , the mouth opening of the model was closed with a stiff plate of 3 mm thickness and the size of the mouth, fixed to the model with two-sided tape. P 3 was measured right in front of this plate using a probe microphone (ER-7C, www.etymotic.com). For each model, the transfer function was calculated as H(ω) = P 1 (ω)/P 3 (ω). In addition to P 1 and P 3 , the free-field sound pressure P ref (ω) produced by the loudspeaker was measured in the absence of the model at the position where the mouth was. All measurements were conducted in an anechoic chamber at a room temperature of 20˚C.

Calculation of the transfer functions with finite element models
For comparison with the physical measurements above, the volume velocity transfer functions of the four vocal tract models were calculated using the finite element method (FEM). The FE model creation and the numerical simulation were performed similarly to Fleischer et al. [22]. Accordingly, the volume meshes (S2 File) of the four vocal tract shapes in Sec. II.B were created from the surface representations using the software Gmsh [27]. The mesh for the vowel /a/ had a mean element size of 1.91 mm and 101,240 degrees of freedom (DOF), the mesh for /u/ had a mean element size of 2.19 mm and 78,568 DOF, the mesh for /i/ had a mean element size of 1.68 mm and 124,650 DOF, and for / e /, the mesh had 31,048 DOF and an element size of 3.01 mm. Note, that the geometrically simple / e / has a slightly greater element size, because of they are no tiny details. In order to validate the numerical results, the polynomial degree of the shape functions was varied (for 2nd order polynomials the DOF increased to about 230,000 for / e / and up to 900,000 for /i/). The comparison of the simulation results for first and second order polynomials showed that the chosen element size was sufficient for all finite element models in the investigated frequency range even with linear shape functions.
The acoustic simulation was performed with the open-source software FEniCS (http:// fenicsproject.org; [28]) based on the Helmholtz equation where P is the complex-valued sound pressure as a function of the positionx and the angular frequency ω, κ = ω/c is the wave number, and c = 343 m/s is the speed of sound for a temperature of 20˚Celsius. The particle velocity Vðx; oÞ is related to the sound pressure by rP ¼ À jo%Ṽ , where % = 1.20 kg/m 3 is the ambient density for a temperature of 20˚Celsius. For the computation of the volume velocity transfer function, the following boundary conditions were applied: Here, P glottis is the pressure on the model surface region representing the glottis, P lips is the pressure on the surface region representing the lip opening, and P wall is the pressure on the surface of the vocal tract walls (see Fig 3B and 3C for the individual regions). Furthermore,ñ is the outward normal vector of the mesh surface and the wall impedance Z wall was empirically set to 500Á%c = 205,800 kg/(mÁs) 2 for appropriate damping. Since the wall impedances of the printed 3D models are not known, the simplest model (/ e /) was used to adjust the wall impedance in such a way that the transfer function was well approximated but did not change too much in comparison to the solution for the hard-walled model. The background is that for this simple model-due to small reflections within the model-the wall damping must have a small influence. The estimated value was then adopted for the other models. It is conceivable that the wall impedance depends on location and frequency (see [22]), but in order to limit the calculation effort, a constant value was used. The radiation impedance Z r was set to that of a rigid piston with a radius r lips ¼ acting into an infinite baffle [22,29]. The acoustic pressure P lips at the lip opening was determined in the center of the area representing the lip region. Based on these boundary conditions, the transfer function H FEM (ω) = U lips (ω)/U glottis (ω) was calculated, where U lips = A lips Á V lips , U glottis = A glottis Á V glottis , and A lips and A glottis are the crosssectional areas of the lips and the glottis, respectively. For each of the models, the transfer function was calculated with a frequency resolution of 3 Hz from 0 to 6 kHz, taking up to 8 h time per model on a standard desktop computer. Fig 4A-4D show the pressure P 1 measured at the glottis, the pressure P 3 measured right in front of the closed lips, and P ref measured without the model in front of the loudspeaker for each /a/, /u/, /i/ and / e /. Here, it can be seen that the P 1 spectra resemble typical volume velocity transfer functions for these vowels as claimed by Kitamura et al. [10]. However, we also see a clear drift between the spectra for P 3 and P ref with differences of up to 10 dB at 6 kHz. Fig 5 shows that the drifts are generally similar for all four models, but that they differ in detail.

Results and discussion
The drift differences between the models are smaller than we initially expected, because the radiation impedance Z r in Eq (1) suggests that the drift depends on the mouth aperture (which varies from 0.44 cm 2 for /u/ to 5.98 cm 2 for / e / in our models). On the other hand, the similarity is less surprising in the context of the actual measurement setup, because P 3 was measured in front of the closed mouths of the models. Fig 6A-6D show the ratios of the pressure spectra P 1 /P 3 (which is theoretically equivalent to the volume velocity transfer function) and P 1 /P ref (which just compensates for the frequency response of the loudspeaker, as was done by Delvaux & Howard [2], for example), as well as the volume velocity transfer functions H FEM calculated with the FEM.
The spectra P 1 /P 3 and P 1 /P ref clearly reflect the difference between P 3 and P ref in Fig 4A-4D. It is also notable that the proposed pressure ratio P 1 /P 3 is much closer to the FE calculation than P 1 /P ref for all four models. The RMS spectral differences in the 0 − 6 kHz range between P 1 /P 3 and H FEM are 3.6 dB, 2.8 dB, 3.8 dB and 1.0 dB for the models /a/, /u/, /i/ and / e /, respectively, while they are as high as 7.8 dB, 7.0 dB, 7.3 dB and 5.4 dB between P 1 /P ref and H FEM . Notably, the usage of P ref as the denominator in the volume velocity transfer functions has some limits, despite the fact that P ref is supposed to correct for the loudspeaker spectral characteristics. If one considered Fig 2B and assumed that the voval tract model was not there (U 2 6 ¼ 0) the leading equations would be After some analysis, and the relation Z r = n 11 /n 21 shown above, one gets Comparing Eq (14) with Eq (9), one can see at least two significant differences. First, the ratio P 1 /P ref depends on U s which in turn is not valid as we are interested in a transfer function which is, by definition, not dependent on the excitation. Secondly, this serious problem can only be bypassed if either n 11 = 0 (this is in conflict with the principle of reciprocity) or U 2 = 0 (lips closed). Both options are not valid. Further, implementation of an arbitrary shaped stiff plate to force U 2 to be zero is also not a valid approach, because for this case the transmission matrix N(ω) would by changed significantly, which in turn would affect the subsequent analysis. However, calculating the proposed pressure ratio P 1 /P 3 not only prevents the general drift compared to the true volume velocity transfer function, but may also prevent spurious spectral "defects", which might be misinterpreted as true spectral information. For example, the spurious peaks in P 1 /P ref at around 2 kHz disappear in P 1 /P 3 due to the normalization by P 3 (see Fig 6A-6D). A notable feature for /a/, /i/ and /u/ (Fig 6A-6C)-in contrast to / e / (Fig 6D)-is that there are strong zeros between 4-5 kHz. These zeros are known to be caused by the sinus piriformes which are side cavities of the main vocal tract [2]. These side cavities are not present in / e /. To assess a potential effect of the measurement method on the formant frequencies, the first four formant frequencies were determined by peak picking in the magnitude spectra of P 1 /P 3 , P 1 /P ref and H FEM for all four models. The results are given in Table 1, together with the relative formant deviations between the formants in P 1 /P 3 compared to H FEM and P 1 /P ref compared to H FEM . The average formant deviation between P 1 /P 3 and H FEM is 1.074%, and the average formant deviation between P 1 /P ref and H FEM is 1.131%. Hence, the formants of the two measured transfer functions are similarly equal to the formants of the reference FE simulation. The overall deviation between measured and simulated formants is less than 2%, which is much smaller than the differences reported in the few previous studies that made similar comparisons [1,13]. In addition, Tables 2 & 3 show the bandwidths and amplitudes of the resonances of all models and their deviation from the finite element models. The average bandwidth and amplitude deviation between P 1 /P 3 and H FEM are 25.3 Hz and 4.1 dB, and between P 1 /P ref and H FEM these values are 31.6 Hz and 5.1 dB. Also here, there are no big differences between the two measured volume velocity transfer functions. It should be kept in mind that these values strongly depend on the selected wall impedance Z wall . It is quite possible to optimize the values in such a way that the deviations of amplitudes and bandwidths are minimized. However, this would go beyond the scope of this work.  A limitation of this study is the approximation of the lip openings of the vocal tract models in terms of straight cuts. For most speech sounds, the lips form a wedge-like opening of the vocal tract, which have non-negligible acoustic effects [12]. When the proposed method is used to measure the volume velocity transfer functions of vocal tract models with such realistic lip shapes, the precise positioning of the microphone in front of the closed lips might play a role, and closing the mouth may become more complicated (modeling clay could be used). Furthermore, it might become necessary to measure the pressure at multiple points on the outer double-curved surface of the closed lips. It can be expected that all these individual transfer functions differ slightly from each other. The averaged transfer function should then be considered as the result. However, in most cases we would expect that the averaged transfer function is very close to the one that is obtained when P 3 is measured in the midsagittal plane in the middle between the upper and lower lip. This issue deserves further investigation.

Conclusion
In this paper we presented a precise method for the measurement of the volume velocity transfer function of 3D-printed models of the vocal tract based on acoustic excitation with an external sound source, which avoids the obstacles and limitations involved in transfer function measurements with a glottal source, requires little special equipment (except the necessity of an anechoic chamber), and is simple to conduct. This method is an extension of the approach presented by Kitamura et al. [10] and has the advantage that the relative levels of the measured resonance peaks correspond to those of the true volume velocity transfer function, and that the overall level of the transfer function corresponds to the true level (i.e., a level of 0 dB at a frequency of 0 Hz). Furthermore, we have investigated the resulting deviation that happens without the proposed normalization. This deviation consists of a general upward drift of the spectral level with increasing frequency, and is relatively independent from the vocal tract model geometry. However, the fine structure of the spectral drift may introduce spurious peaks or troughs into the transfer function, which may cause misinterpretations. The proposed technique prevents this problem and facilitates a more accurate acoustic characterization of the increasingly used 3D-printed vocal tract models in speech and singing research than before. Although the presented procedure is not applicable to in vivo situations, it has a range of applications in basic phonetic research and the potential to improve methods for articulatory speech synthesis. Finally, the proposed method is not limited to models of the vocal tract but can be used for most kinds of tube-like acoustic resonators.
Supporting information S1 File. Printing models. Files containing the printing models of the vowels /a/, /u/, /i/, and / e /. (ZIP) S2 File. Volume meshes of the finite element models. Files containing the volume meshes as used for finite element modeling of the vowels /a/, /u/, /i/, and / e /. (ZIP)