The Vocal Repertoire of the African Penguin (Spheniscus demersus): Structure and Function of Calls

The African Penguin (Spheniscus demersus) is a highly social and vocal seabird. However, currently available descriptions of the vocal repertoire of African Penguin are mostly limited to basic descriptions of calls. Here we provide, for the first time, a detailed description of the vocal behaviour of this species by collecting audio and video recordings from a large captive colony. We combine visual examinations of spectrograms with spectral and temporal acoustic analyses to determine vocal categories. Moreover, we used a principal component analysis, followed by signal classification with a discriminant function analysis, for statistical validation of the vocalisation types. In addition, we identified the behavioural contexts in which calls were uttered. The results show that four basic vocalisations can be found in the vocal repertoire of adult African Penguin, namely a contact call emitted by isolated birds, an agonistic call used in aggressive interactions, an ecstatic display song uttered by single birds, and a mutual display song vocalised by pairs, at their nests. Moreover, we identified two distinct vocalisations interpreted as begging calls by nesting chicks (begging peep) and unweaned juveniles (begging moan). Finally, we discussed the importance of specific acoustic parameters in classifying calls and the possible use of the source-filter theory of vocal production to study penguin vocalisations.


Introduction
Establishing a comprehensive classification of bird vocalisations is important for avifaunal surveys, allows comparisons between species and individuals [1], and also contributes to planning effective management and conservation strategies [2]. Indeed, vocalisations have the potential to provide a variety of information about bird sex, age, behavioural state, condition, and relationships with surrounding animals [3]. Moreover, avian vocalisations are important to establish phylogenetic relationships and in the discovery of new species [1].
Bird calls are produced through the syrinx [4], which manifests several anatomical differences compared to the mammalian larynx. In particular, the syrinx is located at the base of the trachea, while the mammalian larynx sits above it [5]. Moreover, the syrinx is a two-part organ where the sound is produced by an independent set of muscles, along with membranes at the right and left sides [6]. Unlike mammalian vocal folds, this anatomical configuration allows many birds, including penguins, to produce two independent signals simultaneously [7]. However, syringeal constriction functionally resembles the larynx in mammalian phonation, and the trachea can act as a filter to dump or accentuate certain frequencies, creating formant peaks [5], thus modifying the spectrographic structure of calls. For these reasons, the source-filter theory of mammalian vocal production [8,9] has also been used to explain the acoustic output of many avian vocalisations [10,11]. Moreover, regarding birds, it has been demonstrated that the energy distribution in the spectrum can be affected by modifications of the pharyngeal cavity and the oesophagus [12].
Penguins have three basic call types: contact calls, agonistic calls, and display songs [13]. Display songs can be further divided into ecstatic display songs (uttered by single birds) and mutual display songs (uttered by pairs). Moreover, penguin songs have smallest units, namely syllables, which may be combined into phrases [13]. Historically, penguins' vocal behaviour has been extensively investigated in Antarctic, sub-Antarctic, and Australian species, which use display songs for recognition between mates and between chicks and parents [14]. In particular, Aubin et al. [7] demonstrated that non-nesting species, such as the Emperor Penguin (Aptenodytes forsteri) and the King Penguin (Aptenodytes patagonicus), use the two-voices system as principal mean to identify each other. Further, Jouventin and Aubin [15] showed that in nesting species, such as the Adélie Penguin (Pygoscelis adeliae) and the Gentoo Penguin (Pygoscelis papua), the pitch of the song and the frequency and relative values of harmonics are the main cues for individual recognition. Conversely, much less research effort has been directed toward the study of the vocal behaviour of the temperate and equatorial species of the genus Spheniscus.
The African Penguin is highly social and breeds on islands and coastal areas of South Africa and Namibia [16]. This species makes use of several distinctive vocalisations for intra-specific communication [17]. However, currently available descriptions of the vocal repertoire of S. demersus (summarised in Table S1) are mostly limited to basic descriptions of calls. Thumser and Ficken [18] reported five distinct vocalisations made by two captive populations of African Penguin. These authors also measured some temporal parameters and three frequency parameters on two vocalisation types, that they labelled as haw and bray, and which correspond to the ecstatic display song and mutual display song, respectively, described by Eggleton and Siegfried [17] and Jouventin [13]. They also published spectrographic representations of these two calls. Overall, the data presented by Thumser and Ficken [18] are very limited as recordings were obtained from a restricted number of birds and acoustic signals, and only took place during the breeding season (Table S1). Moreover, the lack of acoustic measurements on the majority of the call types does not provide an adequate structural and quantitative description of the entire vocal repertoire of this species.
The African Penguin is seriously threatened, because the total population has dramatically decreased in recent years to less than 75-80,000 mature individuals [19]. The decline is mainly due to loss of habitat, reduction of fish stocks, environmental pollution (including oil spills), and egg collection [16,20,21]. For these reasons, this species is currently included in CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) Appendix II, in CMS (Convention on the Conservation of Migratory Species of Wild Animals) Appendix II, and its classification within the Red List of Threatened Species of the IUCN (International Union for Conservation of Nature) was changed from ''Vulnerable'' to ''Endangered'' in 2010.
Animal sound recording and analysis technology have greatly advanced in recent years [22]. Technological improvements now enable the implementation of extended audio recordings, the automation of the process of signal analysis, and the measurement of a variety of spectral and temporal acoustic parameters with a limited computational effort [22,23]. Recent studies of animal vocalisations are also focussed on statistically quantifying the similarities or differences between acoustic signals by means of multivariate statistical techniques [24] or mathematical computational approaches [25], in order to eliminate subjectivity.
Here, we examined the vocalisations of the African Penguin by collecting audio and video recordings from a captive colony in Italy. Firstly, we categorised vocal signals by visual inspection of spectrograms, and by matching the vocalisations to the behavioural contexts in which they were produced. Subsequently, we measured a variety of spectral and temporal acoustic parameters that we used for statistical validation of the vocal categories. We aimed to provide a detailed description of the entire vocal repertoire of this species and to standardise terminology for use in future studies. Finally, we discuss the importance of the different acoustic parameters in characterizing the vocal types.

Ethics Statement
The study complies with all applicable Italian laws, with the Guidelines for the Treatment of Animals in Behavioural Research and Teaching [26] and with the Ethical Guidelines for the Conduct of Research on Animals by Zoos and Aquariums [27]. The research was carried out with permission from ZOOM Torino (www.zoomtorino.it), Cumiana, Italy (44u569N, 7u259E). This zoological institution has rigorous standards for animal welfare and is accredited by the EAZA (European Association of Zoos and Aquaria) and UIZA (Unione Italiana Giardini Zoologici e Acquari). Since all recording procedures were non-invasive and did not cause any disturbance to the animals during their normal daily activity, this study does not fall in any of the categories for which approval of an ethic committee is required by Italian laws.

Penguins and recordings
Vocalisations and associated behaviours were collected from a captive colony of 48 African Penguins at ZOOM Torino, Italy. The composition of the colony in December 2011 was 15 males, 17 females, 8 juveniles (3 to 12 months), and 8 nesting chicks (,3 months). Penguins were housed in an outdoor communal exhibit of 1500 m 2 , including a pond of 120 m 2 (maximum depth 3 m) and each penguin was identified with wing tag. Data were collected using the all-occurrence sampling method [28] over 24 non-consecutive days from September to October 2010, and 80 non-consecutive days from August to December 2011. All recordings were collected from outside the exhibit, without any manipulation of the penguins and without the use of playback stimuli.
Acoustic recordings were carried out with a RØDE NTG-2 semi-directional microphone (frequency response 20 Hz to 20 kHz, max SPL 131dB) connected to a TASCAM DR-680 digital recorder (48 kHz sampling rate). During recording sessions, the microphone was mounted on a RØDE PG2 Pistol Grip to reduce handling noise and was placed at a distance of 1-10 m from the vocalising penguins. Segments containing acoustic recordings were saved in WAV format (16-bit amplitude resolution) and stored on a secure digital (SD) memory card for later analyses. Simultaneously to acoustic recordings, we monitored the penguins' activities using a JVC Everio GZ-MG330 camcorder with 356 Optical Zoom for a detailed identification of the behavioural contexts in which calls were produced. In particular, we identified behaviours according to the ethogram for this species provided by Eggleton and Siegfried [17].

Spectrographic analysis
We analysed 271 hours of audio recordings. For each audio file, the waveform and the FFT (Fast Fourier Transform) spectrogram were generated with the Praat v. 5.3.39 [29] sound editor window, using a customised spectrogram setting [view range = 0 to 10000 Hz, window length = 0.02 s (Gaussian window shape, 23 dB bandwidth 65 Hz), number of time steps = 1000, number of frequency steps = 500 (frequency resolution 20 Hz), dynamic range = 50 dB]. The visual examinations of spectrograms allowed us to identify 1171 vocalisations that we subsequently divided into macro vocal categories. In particular, we identified: a contact call (n = 331), an agonistic call (n = 138), an ecstatic display song (n = 179), a mutual display song (n = 293), and a nesting chicks' vocalisation, namely the begging peep (n = 160). Moreover, we were able to distinguish an additional vocal type, namely the begging moan, emitted as a food request by juveniles (n = 70). Since the begging peep and begging moan were uttered by penguins in long sequences, in order to avoid the risk of pseudoreplication, we only considered one signal from each sequence. ] on which to collect acoustic measurements. The large number of vocalisations excluded in this second phase (66.61%) was mainly due to the difficulties encountered during field recordings. In particular, 114 mutual display songs were discarded because of overlapping songs between mates (usually vocalising within the nest) and 36 ecstatic display songs were discarded because of overlapping between males vocalising at the same time in different areas of the exhibit. Regarding the rest of the excluded signals, they were not considered as being acceptable for the measurement of acoustic parameters because they showed an insufficient signal-to-noise ratio of the pitch. Indeed, although our recordings were collected in an outdoor enclosure without severe reverberation and sound distortion effects that characterise many indoor exhibits, a high level of background noise was present, mainly due to the high number of visitors.

Acoustic analysis
For each selected vocalisation, we measured 15 spectral and temporal acoustic parameters (Table 1) using semi-automated procedures with a custom-built program [30,31] in Praat v. 5.3.39 [29]. We used descriptors related to the 'source' component of calls (F0). Moreover, we considered the energy quartiles as filterrelated vocal parameters but we did not measure formant peaks, as whilst they were evident in certain call types, they were only weakly detectable in others, to the extent of being unrecognisable, for example, in chicks' vocalisations. This decision was made in order to only include variables that could be collected from all signals.
We extracted the F0 contour of each call using a crosscorrelation method [Sound: To Pitch (cc) command]. Depending on the acoustic characteristics of each vocal type, we used a time step of 0.01-0.02 s, a pitch floor of 150-1000 Hz, and a pitch ceiling of 350-2500 Hz. From each extracted F0 contour, we obtained the frequency value of F0 at the start (F0Start) and at the end (F0End) of the call; the F0 range (F0Range); the mean (F0Mean), minimum (F0Min) and maximum (F0Max) F0 frequency values across the call. In addition, we obtained the F0 mean absolute slope (F0AbsSlope), which is a measure for the average local variability in F0, by computing the average slope between adjacent points on the pitch curve. Furthermore, we measured the number of complete cycles of fundamental frequency modulation per second (FM rate), and we quantified the number of complete cycles of amplitude modulation per second (AM rate). We also calculated Jitter [the mean absolute difference between frequencies of consecutive F0 periods divided by the mean frequency of F0 (Jitter (local) command)] and Shimmer [the mean absolute difference between the amplitudes of consecutive F0 periods divided by the mean amplitude of F0 (Shimmer (local) command)] values. Jitter and Shimmer are measures of the cycle-to-cycle variations of fundamental frequency and amplitude, respectively [32][33][34]. For a detailed description of the algorithms used by Praat to calculate Jitter and Shimmer, please refer to Boersma [35]. These parameters have been widely used for the study of pathological disorders of the human voice [36], speaker recognition [37] and, above all, in the analysis of arousal and valence in human and non-human mammal vocalisations [38][39][40]. Finally, we measured the frequency values at the upper limit of the first (Q25%), second (Q50%) and third (Q75%) quartiles of energy, using a linear amplitude spectrum, and we included the total duration of each call (Dur) in the analyses.
Finally, on the ecstatic display song, in order to describe the structural proprieties of this complex call, we identified syllables (according to the terminology used by Jouventin [13]) and we measured the mean number of syllables per song, and the sum of all inter-syllable intervals (s). However, we limited the spectral analysis to the longest syllable of the song.

Statistical analysis
All analyses were performed in SPSS v. 20 (SPSS, Inc. 2010). Firstly, we log-transformed our data as they significantly deviated from a normal distribution (Kolmogorov-Smirnov test). In addition, to meet the assumption of independence between the acoustic variables, we performed a Principal Component Analysis (PCA) using an orthogonal varimax rotation [41]. The PCA reduce the original set of acoustic measurements to a new set of uncorrelated principal components (PCs). PCs showing eigenvalues .1 were used to classify vocalisations with a stepwise, crossvalidated (leave-one-out) discriminant function analysis (DFA). In particular, we entered the type of call as the grouping variable and the PCs scores as predictors. Finally, we used the Wilks' Lambda

Spectrographic classification of the vocal repertoire
A spectrographic representation of the vocal categories identified by visual inspection of spectrograms is presented in Figure 1. Below, we describe the call types in detail, including the contexts of emission.
-Contact call (Figure 1a; Video S1) The The begging moan was only emitted by juveniles (3 to 12 months of age). This vocal signal shows a clear harmonic structure and a short duration (0.2760.11 s). Juvenile penguins emitted long sequences of 1 to 10 begging moans, but they immediately stopped calling when they were fed, or when the parent moved away. During utterance, juveniles performed quick lateral movement with their heads. -Begging peep (Figure 1f; Video S6) The peep is a begging call emitted by chicks (,3 months of age) inside the nest either in the presence or absence of their parents. The average duration of a single peep recorded in this study was only 0.3660.07 s but this call was repeated by chicks in long sequences lasting for several minutes, until they were fed. The peep is a high-pitched vocalisation (F0 mean = 18516199 Hz), and we observed harmonic frequencies of up to 17 kHz.

Statistical classification of the vocal repertoire
Descriptive statistics of vocal parameters for each vocalisation type are presented in Table 2. The original set of 15 acoustic parameters was transformed by the PCA into three PCs showing eigenvalues .1 ( Table 3) that accounted for 91.33% of the total variance (PC1 = 60.0%, PC2 = 14.59%, PC3 = 9.64%, PC4 = 7.03%). In particular, PC1 was highly correlated (r.0.70) with F0 values (source-related parameters), PC2 with Jitter and Shimmer (parameters related to F0 variation) and call duration, PC3 with the upper limit of the first, second and third quartiles of energy (filter-related parameters), and PC4 with both FM rate and AM rate.
The stepwise, cross-validated DFA correctly classified 90.5% of the vocal signals according to the predicted vocal categories that we assigned by inspection of spectrograms. The analysis generated four discriminant functions which revealed a highly significant difference between call types (Wilks' l DF1/4 = 0.002,  Table 4.

Discussion
Here we provide the first detailed acoustic analysis of the entire vocal repertoire of the African Penguin by selecting and analysing 391 vocal signals collected from a captive colony. Firstly, we categorised the vocalisations based on the visual inspection of spectrograms and behavioural contexts of vocal emissions. According to the general categorisation of penguin calls provided by Jouventin [13], we were able to identify four different call types uttered by adult African Penguins and two begging vocalisations [42] emitted by nesting chicks and unweaned juveniles, respectively. In particular, we found a contact call produced by single members of the colony when visually isolated from the rest of the group or from the partner. Specific behaviours associated with this vocalisation are the ''look around'' and ''slander walk'' [17]. According to Jouventin [13], we suggest that this vocalisation enables isolated penguins to locate other members of the colony. Moreover, we report an agonistic call uttered during fights or when intruding penguins approached a nest already occupied by a pair. It was also produced by penguins that were chasing away other members of the colony. This vocalisation was frequently preceded or followed by a peck from the emitter. We occasionally recorded agonistic calls during the feeding sessions, especially when penguins were gathered together and there was a high level of arousal in the group. In this case, we suggest that this call was being posed as an acoustic threat. Associated with the agonistic call are the specific behaviours of ''point'', ''gape'' and ''peck'' [17]. This utterance is perceived by human listeners as being rough and hoarse, probably due to the high Jitter and Shimmer values. The ecstatic display song is a call produced during the ecstatic display [17]. The African Penguin has the nickname of ''jackass'' as it makes a donkey-like sound. In our study, this vocalisation was exclusively observed in the breeding season. We hypothesise that it served both to attract mates and as advertisement display of nest occupancy. Moreover, we observed that when a penguin performed the ecstatic display song, it was frequently followed by many other members of the colony in chorus. Conversely, the mutual ecstatic song was performed during the mutual ecstatic display [17], especially when a mate arrived at the nest. Partners often emitted this call simultaneously, overlapping in a duet. Specifically, mates stand facing each other with their wings held against or slightly away from their sides. We observed that many pairs also emitted this call as a threat towards penguins that came too close to their territory. Regarding begging vocalisations, we identified a begging peep emitted by chicks (,3 months of age) inside the nest, which probably has the function of stimulating food regurgitation by the parent. Finally, we detected a  begging moan uttered by juveniles (3 to 12 months of age), which has not been previously reported in the literature, and is thus described here for the first time. At this age, penguins have not yet moulted for the first time and, therefore, they still have the characteristic juvenile plumage. During emission of this call, the juvenile bird stands up near a parent, places its beak perpendicular to the beak of its parent, and utters until it is fed. For this reason, we can state that this call still maintains a clear contextual use as a food request. However, it is important to note that acoustic features of this vocalisation have many more similarities with adult calls, in all the source-related parameters and energy quartiles (especially Q75%), than with begging peeps of chicks (Table 2). Moreover, the FM rate and AM rate values were similar to those measured on the adult contact calls (Table 2). These findings suggest complete development of the African Penguin vocal apparatus during the early months of life. Accordingly, Heath and Randall [43] observed that captive-reared chicks of this species can reach the body weight of the adults in approximately 120 days, with variations depending on the energy characteristics of the diet. For each vocal signal, we measured 15 spectral and temporal acoustic descriptors that we used to perform a principal component analysis followed by classification of signals with a stepwise, cross-validated discriminant function analysis (DFA). The DFA correctly classified 90.50% of the penguins' calls according to the predicted vocal category previously identified by visual inspection of spectrograms. The accuracy we achieved is higher than that obtained in recent vocal classification studies in both birds (e.g. 83.3% obtained by Baldo and Mennill [44]), and mammals (e.g. 79.6% obtained by Barros et al. [45]; 69.1% obtained by Déaux and Clarke [46]). To date, this is the first study to provide acoustic measurements and statistical validation for the entire vocal repertoire of the African Penguin.
Jitter and Shimmer parameters were important factor loadings in PC2, and we measured the highest values in the agonistic call and mutual display song vocalisations. Both these vocalisations were uttered when a high level of arousal was present in the emitter. In particular, the first call type is produced in aggressive behavioural contexts, while the second is uttered both when members return to the nest and towards intruders in territorial clashes. Jitter is known to provide human listeners with cues about the utterer's affective state [38], and several authors have suggested that Jitter and Shimmer could be reliable indicators of the level of arousal in non-human mammals [39,40]. Our findings demonstrate that these measurements could also be reliable indicators for detecting vocal types associated with behavioural contexts characterised by a high level of arousal in penguins.
The vocal categories we examined mostly correspond to those reported by Thumser and Ficken [18] in the repertoire of two captive colonies of African Penguin. However, these authors labelled calls with the terminology used by Boersma [47] to verbally describe vocalisations of wild Galapagos Penguins (Spheniscus mendiculus). In particular, for two vocal types for which acoustic measurements were performed by Thumser and Ficken [18], we found concordance for the contact call duration but not for the mean fundamental frequency. Concerning the ecstatic display song, we found compatible values for the total duration of the song, number of syllables, the duration of the longest syllable and mean fundamental frequency of the longest syllable. By contrast, we did not find a similar sum of the intersyllable intervals as our average value was three times greater than that reported by Thumser and Ficken [18]. Finally, we identified a new type of syllable in the ecstatic display song (Figure 1c, indicated by arrow number 3) emitted during the inspiration phase. Playback experiments will be necessary to investigate whether this utterance has a biological significance or is just the result of an intense inhalation of air.
Although we cannot exclude that the list of calls in this studied colony may be incomplete (given that a captive environment has been proven to restrict the acoustic repertoire of animals [48]) it is highly likely that our classification is exhaustive for the vocal repertoire of free-living African Penguins. Eggleton and Siegfried [17] provided a verbal description of six different vocalisations in wild adult African Penguin. In our study, we found a correspondence for two of these six calls, namely the ecstatic display song and the mutual display song. However, we were unable to identify vocalisations that could be specifically assigned to the ''aggressive barking'', ''growling'' and ''aggressive braying'' reported by this group, and keepers involved in the daily management of the colony confirmed this observation. These vocal categories were also not present in the studies of Thumser and Ficken [18] and Jouventin [13]. In the absence of spectrographic representations and quantitative acoustic measurements for comparison, we can only hypothesise, by the description of the behavioural contexts of emission, that these would merge into the agonistic call. The additional partitioning by Eggleton and Siegfried [17] could be the result of a subjective perception by different human listeners of the same call type heard in different agonistic contexts.
The source-related (F0) acoustic parameters measured in this study were the most important in discriminating between call types (PC1). However, we suggest, from observing the spectrograms (Figure 1), and from the heavy factor loadings (r.0.70) of the frequency quartiles that were grouped together in PC3 (Q25%, Q50% and Q75%), that a filter effect of the vocal tract may exist in the vocal output of this species. In particular, we observed that the values of the frequency quartiles vary according to the call type uttered. Accordingly, previous studies [12] have related the energy distribution to the mode of production of bird calls, showing that birds can use the pharyngeal constriction and inflection of the oesophagus to induce a modification of the energy distribution in the spectrum.
To date, the ''two-voices'' system [7,14] in non-nesting species, and the pitch of the song and the relative values of harmonics in species that build nests [15] have been recognised as important acoustic cues for individual recognition in penguins [14]. Conversely, the ''source-filter'' theory of voice production [8], occasionally applied to birds [11], has never been extensively used to investigate whether acoustic cues of individuality, body size, gender or age could be encoded in penguin vocalisations. Further studies, to examine in detail the vocal behaviour of the African Penguin, from a source-filter perspective would be especially Table 4. Classification results of the stepwise cross-validated (leave-one-out) discriminant function analysis. valuable. In particular, research efforts should be directed towards measuring formant frequencies [5,11] in selected call types (particularly contact call and display songs), and evaluating whether individual variation in morphology and size of the vocal apparatus could result in individual acoustic distinctiveness [30]. Identifying reliable cues of vocal individuality in the African Penguin vocalisations would also be instrumental in developing technology for recognising and tracking wild penguins through emitted sounds, and estimating population sizes of this endangered species, whilst minimising any disturbance of the penguins. A recent study by Borker et al. [49] underlined the importance of vocal activity for studying large seabird colonies. In particular, they showed how the automated acoustic survey approach can both moderate biases common in standard survey approaches (e.g. collection of data by different observers), and even reduce costs in the monitoring of remote colonies.
In conclusion, this study (1) identifies and provides a statistical validation for six vocal categories in the repertoire of the African Penguin; (2) reports a new vocalisation (begging moan) used as a food request by juveniles towards parents, and a syllable emitted in the inspiration phase of the ecstatic display song, never previously described in the literature; (3) standardizes the terminology for the calls of this species; (4) suggests the use of the source-filter theory to further study the vocal communication in nest-building penguins of the genus Spheniscus.

Supporting Information
Table S1 Published studies on the vocal repertoire of the African Penguin.

(PDF)
Video S1 Contact calls uttered by adult African Penguins to maintain cohesion with colony members located out of visual range.