Physically Challenging Song Traits, Male Quality, and Reproductive Success in House Wrens

Physically challenging signals are likely to honestly indicate signaler quality. In trilled bird song two physically challenging parameters are vocal deviation (the speed of sound frequency modulation) and trill consistency (how precisely syllables are repeated). As predicted, in several species, they correlate with male quality, are preferred by females, and/or function in male-male signaling. Species may experience different selective pressures on their songs, however; for instance, there may be opposing selection between song complexity and song performance difficulty, such that in species where song complexity is strongly selected, there may not be strong selection on performance-based traits. I tested whether vocal deviation and trill consistency are signals of male quality in house wrens (Troglodytes aedon), a species with complex song structure. Males’ singing ability did not correlate with male quality, except that older males sang with higher trill consistency, and males with more consistent trills responded more aggressively to playback (although a previous study found no effect of stimulus trill consistency on males’ responses to playback). Males singing more challenging songs did not gain in polygyny, extra-pair paternity, or annual reproductive success. Moreover, none of the standard male quality measures I investigated correlated with mating or reproductive success. I conclude that vocal deviation and trill consistency do not signal male quality in this species.


Introduction
In species with traditional sex roles, intrasexual selection favors male traits that enhance their ability to out-compete other males for mating opportunities, and intersexual selection favors traits that make males more attractive to females [1]. Sexual signals are generally thought to be honest signals of male quality, because signal receivers should rapidly evolve to disregard dishonest signals [2,3]. For a signal to honestly indicate male quality, there must be a cost of or constraint on signal production that makes it expensive or impossible for low quality males to produce high quality signals [4,5] (reviewed in [2,3]). Signals that incorporate challenging motor displays may be particularly likely to be costly or constrained, and therefore to be honest signals [6]. Signal complexity may also be under strong selection (e.g., [7]), and there may be divergent selective pressures such that species selected to have more complex songs are not under selection for performance-based signals, while species with strong selection on performance-based signals may not be under strong selection for signal complexity [8].
Birds' songs are elaborate signals that probably represent a substantial motor challenge because they involve coordinating movements of the respiratory system, the vocal organ (the syrinx), and the upper vocal tract [9,10]. As such, they have been extensively studied with regard to honest signaling [7,11]. Vocal deviation and consistency are two aspects of song that have recently received a great deal of attention as potential honest signals of male quality, because they appear to represent particularly challenging motor displays.
Vocal deviation is a measure of how quickly the bird modifies sound frequency in a trill, or a series of repeated syllables [12,13]. In trill production, a bird cannot simultaneously maximize frequency bandwidth and trill rate [12]: a broad frequency bandwidth requires a large-magnitude change in the volume of the oropharyngeal cavity [10] and in beak gape [14], while a high trill rate requires rapid repetition of those changes. Due to mechanical constraints, then, there is an upper limit on the combination of frequency bandwidth and trill rate [12]. Deviation from this performance limit is thought to reflect trill difficulty: ''low deviation'' trills that combine a relatively broad frequency bandwidth with a relatively fast trill syllable repetition rate represent a greater physical challenge [12]. In several species, females prefer males with lower deviation, more challenging trills [15][16][17][18][19][20], and may even alter investment in eggs depending on the vocal deviation of males' songs ( [21] and references therein). Vocal deviation correlates with male quality (e.g., age and mass, [22,23] but see [24]) and affects how males respond to playback in several species [25][26][27][28][29] but see [30]).
Consistency is a measure of how precisely a sound is reproduced each time the bird repeats it, and it can be measured at the level of either whole songs or individual, repeated syllables. Producing consistent songs and trills might require an especially high degree of integration across multiple brain regions, including the direct motor control of respiratory, syringeal, and vocal tract muscles [9,[31][32][33]. Complicating this putative honesty mechanism, the anterior forebrain pathway of song learning actively introduces variability (i.e., reduces song consistency) under many conditions, suggesting that males do not sing at their maximum song consistency at all times (reviewed in [33]). Though the mechanism of signal honesty is not fully elucidated, a growing body of literature supports the hypothesis that consistency is an important signal of male quality in birds: consistency is positively associated with field indicators of female preference [20,32,34] and male quality [34][35][36][37], and trills with different consistencies elicit different responses from males in playbacks [36,38] (but see [30]). Consistency in the timing of notes within a song can be negatively affected by experimentally-induced stressful rearing conditions in zebra finches (Taeniopygia guttata) [39].
Consistency and vocal deviation have attracted substantial attention, and to date most evidence suggests that they carry honest information about male quality. However, behavioral ecologists recognize that other aspects of song can also affect the difficulty of song production [40,41] and that sexual selection can favor different traits in different taxa (e.g., [8]). Further study is therefore needed, especially in species with complex song structure, to determine how widely vocal deviation and trill consistency are used as signals. I studied whether vocal deviation and trill consistency affect male mating success in the house wren (Troglodytes aedon). In this species, song is the most probable target of sexual selection, as the species is dull-colored and males are only slightly larger than females [42], but males sing much more frequently and with more elaborate song structure than females do [43,44]. Males sing at a high rate during territory establishment and mate attraction [45], and females are more likely to visit a nest box if male song is broadcast from it [46], though they may chose mates based on territory characteristics rather than male or song characteristics [47].
In this study, I tested the hypotheses that vocal deviation and trill consistency are honest indicators of male quality and that they affect mating and reproductive success in house wrens. Neither the influence of fine-scale acoustic song features on female choice, nor the relative importance of female choice and male-male competition in determining reproductive success, is known in this species. Variation in male reproductive success depends heavily on success in attracting a secondary (i.e., polygynous) female, and to a lesser extent, on success in extra-pair (EP) paternity [48], which accounts for about 15% of offspring [49]. Both polygyny and EP paternity can be affected by female preferences and by male competitive ability (e.g., [50,51]), so I discuss success in polygyny and EP paternity as variation in ''mating success'' rather than in terms of attractiveness. A direct test of female preference in this species is not feasible, as females do not respond well to captivity (pers. obs.) However, male house wrens do not respond differently to songs that differ in vocal deviation and trill consistency [30].
After investigating how vocal deviation and trill consistency related to each other, I tested the following predictions of the hypotheses that vocal deviation and trill consistency reflect male quality and affect mating and reproductive success. 1) Singing ability and male phenotypic quality should be correlated.
Specifically, male quality measures should negatively correlate with vocal deviation (since lower vocal deviation indicates a more challenging trill) and positively correlate with trill consistency. 2) Singing ability should relate to mating success. Polygynous males, males that sire EP offspring in other broods, and males that maintain a high proportion of within-pair (WP) paternity within their own broods should sing with lower vocal deviation and higher trill consistency. 3) Males with higher annual reproductive success (i.e., total number of offspring sired) should sing with lower vocal deviation and higher trill consistency. I further tested for relationships among quality measures and mating and reproductive success, for completeness.

Field Procedures
I studied house wrens nesting in boxes at two partially-wooded sites at the Cornell University Research Ponds in Ithaca, NY (see [52,53] for details on study sites). I captured, banded, and bled most breeding adults and offspring between April and August 2008-2011. For adults, I measured wing chord (Avinet wing rule, 0.5 mm accuracy), tarsus length (SPI Dial calipers, 0.1 mm accuracy), and weight (to the nearest 0.1 g with a Pesola spring scale). I monitored all breeding attempts on the field sites and banded chicks at approximately 7 days of age. To prevent premature fledging, I did not continue to count offspring after banding; in estimating reproductive success, I assumed that all banded chicks fledged unless I saw obvious signs of depredation or nestling starvation. Annual reproductive success for each male was the sum of the number of chicks fledged from all his nests, accounting for gains or losses due to EP paternity. Some nests were involved in brood size manipulations, and these males were not included in analyses of total reproductive success.

Trill Measurements
House wren songs typically begin with relatively low-amplitude introductory notes and end with a series of trills, with each trill composed of a different syllable type. Within each trill, mean pitch is generally fairly constant, but each succeeding trill usually occurs at a lower mean pitch than the one before [30].
trill measurements, I used recordings from playbacks conducted in 2009 and 2010 (see details in [30]), made with a Marantz PMD 690 recorder and Sennheiser ME 67 or MKH 816 shotgun microphone at a 48 kHz sampling rate and a 16-bit depth. I isolated individual songs from each playback recording in Syrinx PC [54]. I measured frequency bandwidth (the bandwidth encompassing 99% of the sound energy in the syllable) and trill rate (syllables/sec) for each trill, using spectrograms in RavenPro 1.4 [55] (Hann window with 80.1% overlap in the time domain, giving 111 sample hop size, 4096 DFT size and 11.7 Hz grid). Though measurements were visualized on the spectrogram, they were calculated from the power spectrum and so should be robust to variation in amplitude due to differences in the distance between the recordist and the bird. I performed upper-bound regression on the relationship between frequency bandwidth and trill rate to estimate the performance limit on frequency modulation [12] and found the predicted triangular distribution [30]. Vocal deviation was the orthogonal distance from each trill to this performance limit (estimated limit: frequency bandwidth = 2168.50* trill rate (in Hz) +6019 Hz).
For trill consistency, I cross-correlated the spectrograms of individual syllables within trills using SoundXT [56] with the following settings: FFT length 1024; data length 50%; Hann window; 80% overlap; masking method broadband; 50% masking; masking adjustment bias; spec pairwise; correlator type matrix standard method. I bounded the cross correlation at 200 Hz above the highest high frequency and 200 Hz below the lowest low frequency for the trill to minimize interference from background noise. Trill consistency was the mean cross correlation score within a single trill. I did not allow the cross correlator to shift sounds in frequency, which would have allowed a comparison of note ''shape'' regardless of pitch, because the signaling value of pitch changes within a trill is unknown. Without an a priori expectation that either total similarity (including similarity of pitch) or shape similarity (eliminating pitch differences) is the biologically relevant signal, I decided to cross-correlate notes at their actual pitches for comparability with other studies on consistency. I measured four acoustic covariates of vocal deviation and trill consistency: pitch, trill duration, timing of the trill within the song, and trill type. I defined pitch as the mean high frequency of the trill [30]. The timing within the song was the time from the beginning of the song to the beginning of the trill. All trills used in this study could be assigned to one of eight syllable types that are shared among males in the population (approximately 96% of trills can be assigned to one of these eight types [30]), and 5.8860.18 (mean 6 SE) syllable types were included per male per year (range 2-8).
The total sample size was 4569 trills (mean 6 SE, range: 62.664.14, 8-193 trills per male per year, distributed across 59 males, with 14 males measured in two years).The unit of analysis in this study was the trill; individual songs contributed 2.1160.02 measureable trills to the study (range, 1-6). Five hundred thirtyeight trills were recorded before playback, and the remaining 4031 Table 3. Polygyny in relation to trill and male quality; no effects remained significant after correcting for multiple testing. Trill and male quality measures were the dependent variables, and polygyny status (monogamous vs. polygynous) was the independent variable in mixed-effects models controlling for male identity, year, and (for trill measures only) trill type, pitch, trill duration, and the time of the trill in the song. Age, the only categorical dependent variable, was modeled with logistic regression; the estimated difference is the difference in log-odds of being an after-second-year male if the male is polygyous, and the effect size is partial r. For continuous variables, estimated differences are between successful and unsuccessful males, with a positive difference indicating that the more successful males had a higher score for quality, and the effect estimate is Cohen's d.
Abbreviations: obs., observations. CI, confidence interval. H:L, Heterophile:Lymphocyte ratio. PB, playback. doi:10.1371/journal.pone.0059208.t003 Table 4. Within-pair paternity success relative to trill and male quality, with effects that remained significant after correcting for multiple testing in bold. Trill and male quality measures were the dependent variables, and the proportion of social offspring sired was the independent variable in mixed-effects models controlling for male identity, year, and (for trill measures only) trill type, pitch, trill duration, and the time of the trill in the song. Age, the only categorical dependent variable, was modeled with logistic regression; the estimated difference is the change in log-odds of being after second year with increasing proportion of social offspring sired. Following the recommendation of [74], the effect size for age is a partial r using within-pair success as a categorical variable (all social offspring sired versus at least on social offspring sired by another male). For continuous variables, effect estimates are the slope, and the effect size is partial r. Abbreviations: obs., observations. CI, confidence interval. H:L, Heterophile:Lymphocyte ratio. PB, playback. doi:10.1371/journal.pone.0059208.t004 were recorded during or immediately after playback. Vocal deviation does not differ between the pre-playback and the during/post-playback time periods, while trill consistency increases slightly but significantly from pre-playback to during/post playback (unpublished results). To maintain high statistical power, I included all trills in the analyses. Results were qualitatively unchanged if I instead restricted analyses to trills recorded during/ post playback, which should equalize motivational state across males and allow for a more accurate between-male comparison.

Male Phenotypic Quality
I captured 125 males a total of 253 times over four breeding seasons, at varying stages of nesting. I measured the following putative male quality attributes: size, body condition, age, health, and aggressiveness. I tested for correlations between measures of male quality and trill quality, mating success, and reproductive success from the same year only. Two males banded in a previous year were recorded in 2009 or 2010 but not captured, and therefore are not included in male quality correlations. Of the remaining 71 male-years for recording, 45 males were captured on the same day as song recording, and 26 were recorded 2663.6 Table 5. Extra-pair paternity success relative to trill and male quality, with effects that remained significant after correcting for multiple testing in bold.  Table 6. Paired comparisons of song and male quality measures for extra-pair (EP) males and the within-pair (WP) males they cuckolded; no effect was statistically significant after correcting for multiple testing. days before capture. A subset of males and quality measurements from 2009-2011 are also included in Cramer et al. [57], addressing other questions. I estimated body size as wing chord, tail length, and tarsus length; for males measured multiple times in a year, I used the mean of the measurements from the year. I could not collapse these variables using principal components analysis because it is statistically inappropriate to include multiple captures for only a subset of individuals, but wing and tail measures increased with age, so it was necessary to use measurements from the appropriate year for males captured in multiple years. For body condition, I used the standardized residual of a regression of weight on tarsus, controlling for date and time of day captured. In correlations with trill quality, I used the body condition score closer to the recording date for males captured multiple times in a year. For correlations with mating and reproductive success, I used the first measure of body condition from that year, though results were unchanged if I instead randomly chose a measurement occasion (not shown). Measurement repeatability was highly statistically significant (sensu [58], r.0.68, F.5.18, p,0.0001 for tarsus, wing, tail, and weight, n = 112-114 measurements on 50-51 males for tarsus, tail, and weight; n = 62 measures and 28 males for wing, for captures within the same year).
I could assign age only for a subset of individuals (84 male-years) that had been banded on-site in a previous year. I categorized males as second-year (SY) if they had been banded as nestlings the previous year (i.e., this was their first breeding season) and aftersecond-year (ASY) if they had been banded as adults in a previous season. I did not make finer-scale age assignments among ASY males that were present multiple years.
In 2009, I used two ecoimmunology techniques to assess male health (see [57] for details). Briefly, I followed procedures in [59] to measure the bactericidal capacity of 10-ml whole blood samples collected from the brachial vein after ethanol sterilization. Scores for the bactericidal assay are thought to increase with improved innate immunity [59]. All samples for this data set were collected during pre-nestling breeding stages. I also took blood smears and had the ratio of heterophiles:lymphocytes counted by the Animal Health Diagnostic Center at Cornell University Veterinary College; the heterophile:lymphocyte ratio increases in response to stress [60]. For males that had two blood smears taken, I used the one closer to the recording date for song analyses, and I used the first measure for analyses with mating and reproductive success. Results are unchanged if I instead randomly chose which measure to include in the latter analyses (not shown).
I derived aggression scores from playback experiments conducted in 2008, 2009, and 2010 as part of other studies [30,61]. Briefly, for each playback experiment, I did a series of presentations to each male, with ''song-bouts'' during which a single stimulus song was repeated consecutively at a biologically relevant song rate. Song-bouts were separated by periods of silence. In 2008, a single song stimulus was repeated for six songbouts [61]. In 2009 and 2010, each male heard three song-bouts, with each song-bout repeating a different manipulation of a single song. I found no evidence that the stimulus manipulation affected male response [30]. Each year's experiment had other unique attributes (e.g., speaker brand and distance to the nest box) that could affect responses to playback, so I included a year/ experiment variable in analyses. For all playback experiments, I calculated the mean song rate during the entire playback trial. I calculated the mean proportion of time the male spent within 5 m of the speaker and the mean number of flights across the speaker (i.e., within a 2 m ring across the speaker) during song-bouts only, since there were many zero values during silent periods. No measure of aggressiveness was repeatable across years (sensu [62], controlling for experimental protocol, all r,0.3, all p.0.9, calculated with 20 males exposed to 2 or 3 playback experiments each, and a total of 43 playbacks). Each male was the subject of only one playback experiment per year, and each male heard a unique stimulus set.
Paternity Analysis, Male Mating Success, and Reproductive Success I followed the PCR protocol of [53] and genotyped all adults as well as 857 offspring from 182 nests using a panel of 7 microsatellite markers. Because of financial constraints, no genotyping was conducted for one study site for 2008. I conducted paternity analysis using Cervus 3.0, including the social mother as a known parent [63]. I confirmed mis-matching alleles by regenotyping both the parent and the offspring. To most conservatively estimate EP paternity, I attributed a chick to EP paternity if it had more than one trio-wise mismatch with its social parents that could not be attributed to a null allele. In assigning EP sires, I allowed EP fathers to have a single null-allele mismatch [64] with his putative offspring. Nests from 2008 were also included in [53]. I defined WP success as the proportion of social offspring that a male sired, calculated separately for each year but combining all social nests within a year. Results were unchanged if I weighted analyses by the number of social offspring with paternity data (not shown). A male was considered to have EP success if he was an EP sire of at least one chick in that year. Similar results were found if I instead analyzed the total number of EP offspring sired in a year (not shown). For EP success analyses, I excluded males that deserted the study site immediately after capture. Because there was suitable habitat for house wrens surrounding my study site, I may have failed to detect EP offspring of males that bred on site but that only gained EP success off-site. Failed detections should not bias the results, as they should be random with respect to the song variables measured.
I considered a male polygynous if he attracted a female to a second nest box while his primary female still had an active nest (i.e., simultaneous polygyny, as in [65]; results remain qualitatively unchanged if I also consider males polygynous if they had different females for each brood). For three nests, I was unable to determine whether the male was simultaneously polygynous, and these nests were excluded. For ease of discussion, I consider males to have higher ''mating success'' if they were polygynous, sired a high proportion of their social offspring, and/or gained EP success elsewhere. In paired comparisons of EP males and the WP males they cuckolded, the EP males are considered to have higher mating success.
For an additional four nests, I could not distinguish rapid mateswitching from complete loss of WP success, due to a gap in field observations, and I excluded these males from analyses of WP success. Two additional males were excluded from analyses of WP and reproductive success because a majority of the offspring had two mismatches that could be attributed to null alleles. While having two null alleles is relatively unlikely, it is possible given the null allele rates in the population (Table 1), so the genetic sire is unclear. For these six nests, where genetic fathers could be assigned with high confidence, chicks were included towards males' total genetic reproductive success. Reproductive success was defined as the number of chicks a male sired (WP and EP) that fledged.

Statistical Analysis
I first tested for associations among different measures of mating success using logistic regression with year as a fixed effect and male identity as a random effect. To test whether the likelihood of losing WP paternity depended on a nest's polygyny status (i.e., whether that nest belonged to a monogamous male, or was the primary versus secondary nest of a polygynous male), I coded each nest as containing all WP or at least one EP offspring, and also as belonging to one of these three polygyny statuses.
To determine how much variation in trill measures was between-male versus within-male, I assessed the proportion of variation in vocal deviation and trill consistency that was attributable to a random effect of male identity, in a model including fixed effects of year and four acoustic covariates: trill type, pitch, the time of the trill within the song, and trill duration (repeatability, sensu [62]). Multicollinearity among acoustic covariates was not problematic, as the variance inflation factors from models without the random effect were all less than 5. To test whether vocal deviation and trill consistency were correlated, I followed the methods of [66]: I first calculated the mean consistency for each male -trill type -year combination separately, and I then calculated the difference from each trill to the mean for that male -trill type -year combination. I used both of these variables together in a model including the four acoustic covariates (trill type, pitch, the time of the trill within the song, and trill duration) and year as fixed effects and male identity as a random effect, to predict vocal deviation as a function of between-and within-male variation in consistency [66]. The between-male effect was estimated using the mean for the male -trill type -year Table 7. Annual reproductive success relative to trill quality, male quality, and male mating success, with the effects that remained statistically significant after correction for multiple testing in bold. combination term, and the within-male effect was estimated using the difference from each trill to this mean. I assessed whether vocal deviation and trill consistency were associated with male quality measures by fitting general linear mixed models with the song measure as the response variable, fixed effects of year, a single male quality measure, the four acoustic covariates (above), and a random effect of male identity. Because the heterophile:lymphocyte ratio and the bactericidal assay were analyzed in a single year, those models did not include year effects.
Next, I assessed whether any of the male or trill quality measures was related to mating success. I constructed a separate general linear mixed model for each measure of mating success (polygyny, WP success, and EP success) and for each male quality or trill measure. Data were missing from different variables for different males, so constructing separate models allowed me to maximize sample size for each analysis. Each model used the song or male quality measure as a response variable, and the following predictors: a measure of mating success, a fixed effect of year (except for health measures), and a random effect of male identity (except for health measures). For analyses of trill measures, I also included the four acoustic covariates (above). To test for relationships between reproductive success and trill quality, I used the trill quality measure as the response variable, with reproductive success, year, and the four acoustic covariates as fixed effects and male identity as a random effect. This approach reverses the logical response and independent variables, but the reversal is necessary to account for the non-independence of trill measurements (i.e., multiple trills were measured per male, and the random effect of male identity can only control for this pseudoreplication if trill measures are the response variable), and it allows me to control for the acoustic covariates. Moreover, the goal of the analysis is to measure the association between the two variables, and the strength of the association should be unaffected by which variable is response versus independent.
For analyzing the relationship between reproductive success and male quality and mating success, I used reproductive success as the response variable, with fixed effects of year and the male quality/ mating success variable of interest, and male identity as a random effect.
For paired comparisons of EP males to the WP males they cuckolded, I conducted paired t-tests for each male quality measure. For trill measures, because I had many measurements for each male, I constructed general linear mixed models to predict vocal deviation or trill consistency, with role (EP versus WP), year, and the four acoustic covariates as fixed effects, and random effects of male identity and a grouping variable to associate EP males with the WP males they cuckolded.
All analyses used response and independent variables measured in the same year (e.g., trills measured in 2009 were compared to male quality measures, mating success, and reproductive success in 2009 only). Vocal deviation, size measures, body condition, and song rate in response to playback approached normal distributions and were not improved by transformation. Trill consistency was transformed as 2log(1-trill consistency), and I took the square root of reproductive success plus one. The ratio of heterophiles:lymphocytes was log-transformed, flights across the speaker were raised to the power of 0.55, the proportion of time within 5 m of the speaker was arc-sine square-root transformed. Following [67], variables were not transformed when they were used as predictors. Transformation was unnecessary for paired tests of EP males and the WP males they cuckolded, as the differences were normal. The percent bacteria killed was strongly skewed and could not be transformed for normality.
Most tests were performed in JMP 7.0 [68], which uses the Kenwood-Roger approximation for degrees of freedom in mixed models. Degrees of freedom are therefore intermediate between the number of individuals and the total number of measurement events. Mixed models with a categorical response variable (e.g., whether male age differed between different levels of mating success) were performed in R version 2.15.1 [69] using the LMER function with a binomial error distribution [70]. The statistical significance of the repeatability of song and aggression measures was determined using the package nlme [71] following [67]. Where necessary, I used the Wald test in package AOD [72] to find significance of a factor with multiple levels.
To correct for multiple testing, I used false discovery rate [73], implemented in R. I conducted table-wise corrections (with tests of mating success combined across tables). P-values listed in the tables are un-corrected, and I note whenever statistical significance changed after correction for multiple testing. I calculated standardized effect sizes and their confidence intervals according to [74], and for mixed effects models, using R code from [75] for non-central confidence intervals. Following [74], I consider effect sizes small, medium, or large with r = 0.1, 0.3, or 0.5, or d = 0.2, 0.5, or 0.8, respectively.

Paternity
The seven microsatellite markers gave high power for determining paternity (non-exclusion probability for the parental pair, 0.0003%, estimated using all adult genotypes). Several markers had low levels of null alleles (Table 1): single pair-wise mismatches consistent with being null alleles occurred in over 40 chicks for each parental sex. Sixteen of 857 offspring had single mismatches with their social mothers that could not be attributed to null alleles, and nine offspring had single non-null mismatches with their social fathers. Non-null mismatches between the mother and offspring may be due to mutation, since intra-specific brood parasitism has not been reported in this species [65]. I therefore allowed these single mismatches with putative parents of either sex. Across all years, 13.5% (116/857) of offspring in 37.6% (68/181) of nests were EP young. Results presented here are not changed substantially if these offspring with single non-null mismatches were instead attributed to EP paternity.

Acoustic Covariates and Repeatability of Trill Measures
In most models, the four acoustic covariates significantly affected vocal deviation and trill consistency measurements (not shown). Vocal deviation was generally lower (i.e., higher performance trills) when the trill was higher-pitched (also see [30]), had a longer duration, and occurred later in the song. Trill consistency generally increased (i.e., higher performance trills) when the trill was lower-pitched, shorter, and later in the song.
Syllable types differed consistently in both vocal deviation and trill consistency.
Vocal deviation and trill consistency were weakly but significantly repeatable: 17.6% of the variation in vocal deviation and 29.5% of the variation trill consistency was attributable to a random effect of male identity (controlling for trill type, pitch, time in the song, and trill duration, p,0.0001; sensu [62]). Performance was significantly, positively related between the two measures: trills with lower vocal deviation (i.e., higher performance) had higher consistency due to both between-male effects (effect estimate 6 SE, 25.5560.44, t = 212.50, p,0.0001) and within-male effects (23.0260.29, t = 210.57, p,0.0001). The effect size of these relationships was small to medium, with the partial r (95% confidence interval) for the between-male effect being 20.16 (20.19, 20.14) and for the within-male effect being 20.14 (20.16, 20.12).

Correlations between Trill Quality and Male Quality
Few correlations between trill quality and male quality were statistically significant ( Table 2). Vocal deviation correlated with tail length, such that males with longer tails sang lower-deviation (i.e., more challenging) trills (Table 2), though these results were not robust to correction for multiple testing. Trill consistency correlated with age, wing chord, and response to playback, and these relationships remained statistically significant following correction for multiple testing ( Table 2). The effect size for age was medium to large, while the other effect sizes were small.
In the cross-sectional analysis, after-second year (ASY) males sang more consistently than second-year (SY) males (Table 2;  The apparent effect of wing chord on trill consistency is likely driven by age effects on both size and consistency: older males sing more consistently and had longer wings than SY males (ASY wing chord 51.260.17 mm, SY 49.760.36, F 1,63.33 = 15.63, p = 0.0002, n = 80 observations). When I simultaneously assessed the effect of age and wing chord on consistency, the age effect remained highly significant while wing chord did not (not shown). Age effects did not appear to drive the relationship between tail length and vocal deviation.
Most measures of male quality were not strongly intercorrelated in simple regressions, with the following exceptions. All three playback response measures were positively correlated (song rate with flights across the speaker: r 2 = 0.11, p = 0.001; song rate with proportion time close to the speaker: r 2 = 0.06, p = 0.01; flights across the speaker with time close to the speaker, r 2 = 0.36, p,0.0001). These three variables were not collapsed with principal components analysis because they showed different patterns in analyses [27]. Wing chord correlated positively with body condition, tarsus, and tail. The correlation between tarsus and tail only approached significance (p = 0.07). Age only affected wing and tail measures (see above).

Trill and Male Quality: Relation to Male Mating Success
Trill quality did not correlate with polygyny ( Table 3). Males that lost a higher proportion of WP success had lower vocal deviation and higher trill consistency-both putatively ''better'' song characteristics (Table 4). Similarly, males that did not gain EP success in other nests on site had lower vocal deviation than males that gained EP success, although the males that gained EP success did have higher trill consistency (Table 5). Despite the statistical significance of these patterns at the population level, paired comparisons of EP males to the WP males they cuckolded revealed no differences in trill quality, even with high sample size ( Table 6).
Male quality did not correlate with mating success (Tables 3, 4, 5) and did not differ between EP and WP males in paired comparisons ( Table 6). Before correction for multiple testing, males that flew across the speaker more in response to playback tended to be more likely to be polygynous (Table 3), and WP males sang at a higher song rate in response to playback than the EP males that cuckolded them ( Table 6). None of these patterns was robust to correction for multiple testing.
Age effects are of particular interest, as they are important in other house wren populations. However, there were no significant relationships between age and mating success (  Tables 3, 4, and 5 account for non-independence using a random effect of male identity). For paired comparisons of EP and WP sires, I had age data on both males for only 19 pairs; in 16 pairs, both males were the same age, in two pairs, a younger male cuckolded an older male, and in the final pair, the older male cuckolded the younger male.
Trill Quality, Male Quality, and Male Mating Success: Relation to Annual Reproductive Success Neither trill consistency nor vocal deviation correlated with reproductive success, though there was a trend with a small effect size for a positive correlation between consistency and reproductive success ( Figure 1, Table 7). The three measures of aggression correlated positively with reproductive success, with moderate effect sizes, but these were not significant after correcting for multiple testing (Table 7). Polygyny, WP success, and EP success all had positive and moderate to strong effects on reproductive success, though the effect of EP success was not significant after correction for multiple testing (Table 7).

Discussion
These results indicate that low vocal deviation and high trill consistency are not used as signals of high male quality in house wrens. Vocal deviation and trill consistency did appear to reflect underlying singing ability, since trill measures were repeatable and correlated with each other. However, these measures of trill quality mostly did not correlate with body condition or health (Table 2), and males with ''better'' songs did not have higher mating success (Tables 3, 4, 5, and 6) or higher reproductive success (Table 7). In three of the four analyses where mating success and song quality were significantly related, less successful males had better songs, the opposite of what I had predicted, and the effect sizes were small. I therefore conclude that there is little, if any, evidence that vocal deviation and trill consistency affect mating interactions in house wrens.

Song and Male Quality
While most measures of male quality did not correlate with trill quality (also see [24]), older males did sing with higher trill consistency (Table 2), a result that is highly consistent with the current literature. In several other species (reviewed by [33,76]), as in house wrens, older males sing more consistently than males in their first breeding season. Moreover, a longitudinal study in great tits (Parus major) showed that consistency decreased with age among relatively old males [37], a pattern also present in house wrens (though with limited sample size). Relatively young males are thought to improve their song consistency over time as they practice singing, generating the prediction that males with higher song output should also sing relatively consistently (reviewed in [33]). Supporting this prediction, house wrens with more consistent trills sang at higher rates in response to playback. If song rate during playback reflects a male's overall song rate, perhaps these males have simply practiced their songs more and therefore have higher trill consistency. Thus, trill consistency might honestly indicate male age, or the extent to which he has practiced singing, in house wrens.
However, older males did not have higher mating or reproductive success, suggesting that a signal of age may not be useful in this population. Older males did not have higher WP or EP success, were not more likely to become polygynous, and did not fledge more offspring. Sample sizes for age comparisons were somewhat limited.
Trill consistency also correlated positively with another aspect of playback response, the amount of time spent close to the speaker. This result is surprising, since the trill consistency of a playback stimulus does not affect how house wren males respond to playback [30]. However, the effect size is small, suggesting that the statistical significance may be simply due to the high power because of the large number of trills measured.

Song Quality Relative to Mating and Reproductive Success
Males that lost a higher proportion of WP paternity sang ''better'' trills (lower vocal deviation and higher trill consistency), and males that gained EP success in other nests sang with ''worse'' vocal deviation than males that failed to gain EP offspring elsewhere. These patterns are in the opposite direction from the predictions. It is plausible that investment in song trades off against investment in an unmeasured aspect of male quality that confers higher mating success, although in this case, the benefit of investing in song, which apparently gives little or no reproductive advantage, is unclear. Males that gained EP success did have higher trill consistency than males that did not, a relationship in the predicted direction, and this relationship may contribute to the trend for males with higher trill consistency to have higher reproductive success (Table 7). However, effect sizes for all of these relationships were low. Moreover, paired comparisons of EP males to the WP males they cuckolded did not reveal the same patterns (Table 6). Because a paired test should be more powerful at detecting biologically relevant patterns of cuckoldry, I suspect that the significant differences between males with and without EP success, and the correlation between trill measures and the proportion of social offspring sired, are also caused by the large number of trills measured.
Differences in mating success could be driven either by malemale competition or female choice. If male-male competition has the stronger effect on mating success, the lack of a strong effect of song on mating success is not surprising: vocal deviation and trill consistency do not appear to play an important role in male-male interactions in house wrens [30]. If female choice has the stronger effect on mating success, it appears that females do not base their choices on vocal deviation or trill consistency, since these trill measures do not relate to mating success. Alternatively, females may assess males based on spontaneous song but not on songs sung during territorial conflicts; since this study relied primarily on songs recorded during and immediately after playback, such an effect would not have been detected. However, house wrens do not have qualitatively different singing styles during playback and spontaneous singing, and moreover, females may be better able to compare two males' singing ability when those males are countersinging [77], which would suggest that songs recorded during playback should be particularly relevant to females. A direct test of female preferences was not possible in this species.
It is thought that birds perceive frequency ratios rather than frequency bandwidths [78,79], which suggests that it would be more biologically relevant to calculate vocal deviation based on frequency ratio rather than the standard measurement, frequency bandwidth (B. Lohr, pers. comm.; also see [30]). I investigated a ratio-based measure and how it related to male quality, mating success, and reproductive success [80]. However, the ratio-based and bandwidth-based measures of vocal deviation showed similar patterns, and the bandwidth-based measure had, if anything, stronger relationships with male quality and success measures [80].
The hypothesis that sexual signals are pivotal in EP mating decisions is commonly cited, but not supported in a recent metaanalysis of EP paternity in birds [81]. Similarly, a recent, thorough study of song sparrows (Melospiza melodia), a species where a great deal is understood about how song functions in male-male communication, found no differences in song between EP and WP males [82]. Perhaps researchers have not yet determined which aspects of signals are the salient ones for EP mating, or perhaps EP mating in many species is not driven by differences in male quality or signaling ability.

Male Quality and Mating Success Relative to Reproductive Success
Polygynous males and males that maintained a higher proportion of WP paternity, unsurprisingly, had higher annual reproductive success than monogamous males and males with a lower proportion of WP paternity ( Table 7). Males that gained some EP success showed a strong tendency to have higher reproductive success. This result is consistent with the finding that polygyny has a stronger effect on variation in male reproductive success than EP paternity does [48]. In contrast to other previous studies, polygynous males in this study did not lose more WP paternity in their secondary nests than in their primary nests (c.f., [65]). As a further contrast with previous work, I found no relationship between age and mating success, while Soukup and Thompson [65] found that older male house wrens tend to be less likely to be cuckolded and are significantly more likely to be polygynous than younger males. There may be intraspecific variation in EP mating behaviors (e.g., [83]).
Other measures of male quality did not relate to mating or reproductive success, though there were intriguing trends for more aggressive males to have higher reproductive success. Previous work in house wrens shows that EP offspring are not healthier than their WP half-siblings [84], so it is perhaps unsurprising that health measures were not related to EP or WP success. Moreover, body condition generally does not differ between EP and WP males across many species of birds [81], suggesting that the typical measures of body condition are not meaningful in birds, or that body condition is not relevant to EP mating decisions.
While much of the work on house wrens has focused on life history rather than sexual selection, it is interesting to note that we still do not know what male traits confer a mating and reproductive success advantage in this intensively-studied species. Nests initiated late in the season may be more likely to contain EP offspring [85] (though not in this population [53]), but this effect appears to be independent of the quality of the male himself [85]. Rare alleles at some loci [86] correlate with EP success, but that pattern was not observed in the same individuals when additional loci were considered [87]. As described in the introduction, male song appears to play a role in attracting females, but the nature of that role, and what aspects of a song make it particularly attractive, is still unclear [45][46][47]. Post-copulatory processes could, theoretically, also play a role in creating variation in male EP and WP success, but no correlations have been found between success in EP paternity and sperm characteristics [80]. The dynamics of mate choice in this species remain a mystery.

Conclusions
Physically challenging aspects of song production may be likely to be honest signals of male quality [6]. However, sexual selection can promote different signal properties in different lineages [8,88], and neither vocal deviation or trill consistency appears to be under sexual selection in house wrens. Perhaps complexities of house wren song structure complicate the interpretation of these particular parameters for listening birds (as I argue in [30]), and another song parameter, such as complexity itself, could be the target for sexual selection in house wrens.