Subarctic singers: Humpback whale (Megaptera novaeangliae) song structure and progression from an Icelandic feeding ground during winter

Humpback whale songs associated with breeding behaviors are increasingly reported outside of traditional low latitude breeding grounds. Songs from a subarctic feeding ground during the winter were quantitatively characterized to investigate the structure and temporal changes of the songs at such an atypical location. Recordings were collected from 26. January to 12. March, 2011, using bottom mounted recorders. Humpback songs were detected on 91% of the recording days with peak singing activities during 9.–26. February. The majority of the recordings included multiple chorusing singers. The songs were characterized by a) common static themes which transitioned consistently to predictable themes, b) shifting themes which occurred less predictably and c) rare themes. A set median sequence was found for four different periods (sets) of recordings (approximately 1 week each). The set medians were highly similar and formed a single cluster indicating that the sequences of themes sung in this area belonged to a single cluster of songs despite of the variation caused by the shifting themes. These subarctic winter songs could, thus, represent a characteristic song type for this area which is comparable to extensively studied songs from traditional low latitude breeding grounds. An increase in the number of themes per sequence was observed throughout the recording period including minor changes in the application of themes in the songs; indicating a gradual song progression. The results confirm that continual singing of sophisticated songs occur during the breeding season in the subarctic. In addition to being a well-established summer feeding ground the study area appears to be an important overwintering site for humpback whales delaying or canceling their migration where males engage in active sexual displays, i.e. singing. Importantly, such singing activity on a shared feeding ground likely aids the cultural transmission of songs in the North Atlantic.

With multiple singers chorusing in a single recording it is possible to evaluate the minimum number of singers that are singing at the same time within the detection range of the recorder. In this study, overlapping song units were used to estimate the minimum number of singers within each observed 10-minute recording. Within each of these recordings a minimum of two overlapping events were identified were the latter would verify the first estimate. There can be more singers within the recording than what the minimum estimate provides since the number of singers can only be evaluated with certainty when song units from different whales overlap in time. Therefore, the estimate only provides the minimum number of singers within the 10-minute time frame. That estimate can be useful when looking into the occurrence of peak singing events and when there are more or fewer than e.g. two singers in the area. The more singers that are overlapping their songs in time the more likely it is that more singers are within the study area then when few whales are overlapping. Examples of overlapping events where the minimum number of singers was estimated are shown in Fig A. Importantly, the observers need to verify that some of the signals overlapping in time are not a harmonic component of a lower frequency signal used in the estimate. The contour of two different signals, used in the estimate, should be either 1) significantly different, 2) not aligned exactly in time or 3) the frequency gap between them should not equal the frequency of the lower signal. The purpose of the third criteria is to avoid treating harmonics of some signals as independent signals from another whale if criteria 1) and 2) are not met. The frequency range between two harmonics of a signal is the same as the frequency of the signal's fundamental component (the lowest tone in a harmonic signal). Signals become distorted with distance, therefore, observers should not use vague signals that are similar to other signals in the estimate.
The coloring of the song units in Fig A was done in Photoshop CC version 2014.2.0 using the magnetic lasso tool to track the contours of the signals and select them. Using this tool provides more accurate coloring of the signal contour instead of coloring by hand. Once a signal was selected a paint bucket tool was used to fill up the selected signal with color. Supplemental material on humpback whale song unit detection rate Estimating the influence of the 1) number of singers, 2) signal to noise ratio and 3) percentage of sound file with song units on the rate of song unit detections by the automatic detector A subset of 87 sound files was randomly selected from the total 1268, 10-minute sound files containing humpback whale songs. The sound files constituting the subset were from the whole recording period. The resulting correlation between the number of singers per sound file and the detections per minute of effort per sound file is shown in Fig B (A) with clearer demonstration of that result using boxplot in Fig C. Additionally, the effect of the average signal-to-noise ratio (SNR) per sound file on the detection rate was tested including the effect of the proportion of song units during each 10-minute sound file on the detection rate. The proportion of song units within a sound file was estimated manually in Adobe Audition 2.0. Since the tested variables were not normally distributed the non-parametric Spearman's rank correlation was used on the data. There was a significant correlation between the detection rate and the tested variables as shown in Fig B (A-C) which demonstrates how these variables affected the detection rate. All tested variables showed significant positive correlation with the detection rate by the automatic detector using the non-parametric Spearman's rank correlation (r =roh) indicating a significant effect on the resulting detection rate (per minute of effort) in each sound file. A) correlation between the minimum number of singers per sound file and the detection rate, B) correlation between the signal to noise ratio (SNR) per sound file and the detection rate, C) correlation between the % of each sound file which included song units and the detection rate.

Fig C. Automatic detections of songs vs. the number of singers.
A boxplot showing the distribution of automatic detections with increased number of singers per sound file. All examined sound files with more than 15 detections per minute of effort, apart from one, included two or more singers. If the detection rate exceeded 25 detections per minute of effort the sound files included at least three singers while sound files with more than 30 detections per minute of effort included 4 or more singers.

Supplemental information about song sequences and delineation
Usually, the delineated sequences were not full sequences since the recordings were set to record for 10 minutes every 5 minutes. A full 10 minute recording is provided in Fig D and the same recording with delineated phrases is provided in Fig E. When chorusing whales are recorded the chorusing can prevent the observer from delineating songs from a spectrogram. This happens when chorusing whales are singing phrases which include song units of similar intesity (likely due to a similar distance from the hydrophone) and similar frequency, particularly when two or more whales are singing the same or similar phrases at the same time. When delineating songs qualitatively (since no autonomous methods are currently available) the observer needs to know where one phrase ends and another one starts. Transitional phrases assist the observer to know with good certanty what phrase type comes after another phrase, that can be useful when more than one singer is within a recording. However, such transitional phrases do not always occur or may be masked in a chorusing event. An example of a recording is provided in Fig F where phrases from at least four whales are overlapping. This one minute clip from a ten minute recording includes overlapping phrases of similar frequency and intensity. The overlapping of phrases in this recording was too intense for a human observer to know with certainty where one phrase is ending and another one is starting. Such parts of a recording can, thus, not be used and prevent the observers to delineate a full song or, as in this case, a full 10-minute recording.   The dendrograms in Fig G show the variation in sequences obtained from delineated recordings from each recording period. It is evident that the variation would likely be less if the delineated sequence had never been cut off, thus showing only full sequences and a true variation. The dendrograms in Fig G show, nonetheless, the predominant phrases in the sequences, i.e. "14a-13b-12-4b", and provide a good overview of the variation in song sequences from the study period.
The assigned set medians (SM) from each period contained the predominant sequences found in the majority of the phrases, i.e. "14a-13b-12-4b", however, with an increased number of different sequences and phrase types the summed distance of the set medians increased (Table A), indicating that with greater number of song sequence variants the less representative a single song sequence is for a given period. The mean Levenshtein Distance (LD) score of each SM for each period increased from the 1 st period and to the 4 th , also the number of song sequence groups (created with hierarchical clustering) increased, though not as substantially. Since the set medians were mostly the same in all the periods it is likely that only one song was sung in the study area during the recording period with a shared sequence within the majority of the songs. However, the variance in the usage of phrases appart from the static sequences was very high, particularly during the latter periods (3 and 4). According to the Levenshtein Similarity Index (LSI) 3-5 groups (with minimum of 40% similarity) of sequences were identified during each period with the greatest number of groups (5) during the last period, i.e. 2.-12. March (Table A). The mean Levenshtein Distance (LD) for each set median sequence (SM) was highest during that same period (Table A). From Fig G it is evident that the SM's were quite commonly observed within sequences though it varied what other more dynamic (shifting) themes were included. Perhaps, the SM for period 4 was the least representative for its period compared to the SM's of the previous three periods.  Table B. Delineated song sequences of full song cycles. Delineated sequences of full song cycles captured within a 10 minute recording. The sequences are all from different sound files despite of some being detected during the same day. F = Full recorded phrase sequence within a 10-minute recording; T = Theme types in a full song which terminates when the start theme reoccurs; D = Delineated sequences according to the delineation protocol. Different delineated sequences are sepparated with a comma (,). Bold sequences are full songs according to the protocol and italic sequences are not full songs according to the protocol.

Phrase sequence development within the subarctic songs
The smallest difference in phrase usage according to Dice's Similarity Index (DSI) was between period-1 and -2 (92% similarity), period-2 and -4 (97% similarity) and period-3 and -4 (93% similarity). The occurrence of the most common phrases, i.e. phrase-14a, 13b, 12 and 4b remained relatively stable between periods. The occurrence of phrase-3c gradually decreased while the occurrence of phrase-17 gradually increased from the first to the last period (Fig G). The occurrence of the less common phrases, such as phrase-6, -8, -4a, -13c and -11b, fluctuated more between periods (Fig 6). The most commonly observed themes were either composed of static or shifting phrases. The phrases 13b, 12 and 4b were static phrases which transitioned very consistently to a certain phrase in all periods while the shifting phrases 17, 13c, 14a and 3c transitioned to various phrases throughout the whole recording period (Table C). Other phrases were less common and contributed less to the transition variance. Two transitional phrases were sometimes repeated once or twice by some singers before the transition completed. During these events the transitional phrases were categorized as regular phrases and assigned the phrase names '14b' and '13a', these are two versions of a typical transition between phrase-14a and -13b (Fig 4). These were the only transitional phrases observed to be repeated sequentially as if they were regular phrases.

The differences in signal quality between periods
Measurements were done on song units from the analysed sound files to investigate the different trend in the signal-to-noise ratio (SNR), the received signal level and the detection rate between periods.
A pairwise comparison using Wilcoxon rank sum test was conducted to estimate whether there was a statistical difference between periods for each measured value in Table 4. As a result, there was a significant difference between all periods in terms of the signal-to-noise ratio (p<0.001) though the least difference was found between period 3 and 4 (P = 0.04). The detection rate was statistically different between all periods (p<0.001) except periods 2 and 4. The received signal level was significantly similar between period 1 and 2 and between period 3 and 4, but significantly different between the first two and the two latter periods. That indicates that the whales were within a closer proximity to the recording station during the two latter periods (Table D).
These analysis show that the quality of the signals and the detection rate increased significantly in period 3 and continued to be relatively high in period 4. There was nonetheless a large variation in signal quality and detection rate during all periods, particularly during the 3 rd period (18.-25. Feb). All analyzed sound files included full phrases with at least 10 dB SNR but some phrases had lower SNR. The lower SNR values were more common in the recordings from the first period (27. Jan-2. Feb) compared to the three following periods which suggests that some higher frequency phrases may have been lost (e.g. phrases 17 and 15) ( Fig H).  The mean signal-to-noise level (SNR), the mean received signal level and the mean number of detections per analyzed sound file per period. In each analyzed sound files there were song units with a minimum of 10 dB above the background noise, however, some song units in the sound files could be lower and were included in the measurements, which explains the standard deviation (SD) for each period.