The role of learned song in the evolution and speciation of Eastern and Spotted towhees

doi:10.1371/journal.pcbi.1013135

Fig 1.

Example spectrograms of Spotted and Eastern towhee songs.

Examples of Spotted towhee songs (A-E) and Eastern towhee songs (F-J) from Macaulay Library (ML) and Xeno-canto (XC). The repeated syllable at the end of each song is often called the ‘trill’. Recording ID numbers: (A) ML191126; (B) ML90009051; (C) XC575461; (D) XC577591; (E) XC577593; (F) ML15276; (G) ML200973; (H) ML54283841; (I) ML76939751; (J) ML450438861; see metadata (https://github.com/CreanzaLab/TowheeAnalysis) for detailed recording information.

More »

Expand

Fig 2.

Spectrogram illustrating the definition of specific song elements.

On this spectrogram of an Eastern towhee song, we indicate a syllable, silence, bout, maximum syllable frequency, and minimum syllable frequency. The color of the syllable indicates its unique syllable type. Spectrogram generated from Macaulay Library recording 52506381, https://macaulaylibrary.org/asset/52506381.

More »

Expand

Table 1.

Results of generalized linear models for relationship between song feature data and longitude, latitude, species classification, and file conversion status of Spotted and Eastern towhee.

More »

Expand

Table 2.

Results for machine learning models for analyses of song feature data from song recordings of Spotted and Eastern towhees.

More »

Expand

Fig 3.

Results of a Linear Discriminant Analysis trained on towhee song data.

Plot of LD1 results of a subset of towhee song bouts (N_test = 697) using a Linear Discriminant Analysis trained on raw song feature data from a balanced training set of Spotted and Eastern towhee bouts (N_train = 1592). The model revealed 86.8% prediction accuracy (balanced accuracy = 86.9%; Cohen’s κ = 0.73). Points are jittered vertically for visualization.

More »

Expand

Fig 4.

Spatial distribution of Spotted towhee and Eastern towhee song data and genetic data.

Song data is shown in panels A–C (N_{total_recordings} = 2785; N_{Spotted_towhee} = 1067; N_{Eastern_towhee} = 1718) and genetic data is shown in panels D–F (N_total = 23; N_{Spotted_towhee} = 18; N_{Eastern_towhee} = 5). (A) Distribution of song recordings in North America. (B) Principal component analysis (PCA) of song bouts using 16 song features. (C) Procrustes analysis of song data using PC1 and PC2 from the PCA in panel B. (D) Distribution of genetic sequences obtained from the Barcode of Life Data Systems database and NCBI. (E) PCA using single nucleotide polymorphisms of aligned sequences of the cytochrome oxidase subunit I regions of the mitochondrial genome. (F) Procrustes analysis of genetic data using PC1 and PC2 from the PCA analysis in panel E. Ellipses indicate 95% confidence intervals. Base maps (panels A, C, D, and F) were made with Natural Earth (http://www.naturalearthdata.com/).

More »

Expand

Fig 5.

UMAP projection of Eastern and Spotted towhee song-feature data.

Each point represents an analyzed song bout (N_{total_bouts} = 2785; N_{Spotted_towhee} = 1067; N_{Eastern_towhee} = 1718), with Eastern towhee songs shown in shades of red and Spotted towhee songs in shades of blue. The lighter colors represent recordings from the zone of species overlap. Black dots indicate the 27 recordings from individuals that were classified as potential hybrids (“hybrid/unsure”).

More »

Expand

Fig 6.

Geographic distribution of random forest model predictions of Spotted and Eastern towhee species identity.

Predictions were based on a model trained on 16 song features from samples of Spotted towhees and Eastern towhees. (A) We trained a model on song data from the entire geographic range of both species (N_{Spotted_towhee} = 796; N_{Eastern_towhee} = 796) and tested how well it predicted the species identification of a subset of all song samples (N_test = 697; accuracy = 89.5%). (B) The same results from (A) are plotted based on the percent of trees in the random forest classifier that supported the correct species identification; when less than 50% of trees supported the correct identification, the classifier made an incorrect species prediction. The average confidence in species classifications tended to decrease toward the zone of overlap for both species. (C) We then trained a second model on a subset of samples obtained only from the non-overlap zone (N_{Spotted_towhee} = 796; N_{Eastern_towhee} = 796) and tested it on a random subsample of song bouts from both the zone of non-overlap (N_{test_nonoverlap} = 216; accuracy = 93.5%) and the zone of overlap (i.e., 102°W - 91°W; N_{test_overlap} = 216; accuracy = 84.3%). (D) The same results from (C) are plotted based on the percent of trees in the random forest classifier that supported the correct species identification; again, the average confidence of species identifications tended to decrease toward the zone of overlap for both species. (E) We used the same model from panels C and D to predict species identity of song bouts from recordings of “hybrid/unsure” towhees (N_predict = 27). The model predicted that 16 of these “hybrid/unsure” recordings were Spotted towhees and 11 were Eastern towhees, with no discernable longitudinal gradient in the predictions. The dotted line represents the zone of overlap determined by the co-occurence of Eastern towhee and Spotted towhee song recordings (102°W - 91°W). Base maps (panels A, C, and E) were made with Natural Earth (http://www.naturalearthdata.com/).

More »

Expand

Table 3.

Results for Mantel tests of each song feature distance matrix versus geographic distance matrix.

More »

Expand