Determinants of stadium attendance in Italian Serie A: New evidence based on fan expectations

This article aims to analyse the impact of the main determinants of match-day stadium attendance for seven seasons—2012–13 to 2018–19—of the Italian football Serie A. The main element of novelty is that the dataset is split into three sub-categories based on the pre-season fans’ expectations to verify whether the impact of attendance determinants varies depending on teams’ expected performance. Our results—based on Tobit model regressions—identify some significant differences across the three subsets. However, the difference that seems to be the most significant revealed a common preference of Italian fans towards higher quality opponents.


Introduction
Sports economists have a long history of modelling demand at live sports events since seminal work [1][2][3]. Research especially soared in the early 2000s, perhaps due to the exponential commercial growth of professional sport. Generally, sports economists analyse stadium attendance following Borland and McDonald's model, encompassing consumer preferences, economic factors, quality of viewing, sporting contest and supply capacity [4]. Among these categories of attendance determinants, Rottenberg's [1] uncertainty of outcome hypothesis has garnered the most interest, suggesting that the more uncertain the outcome of a sporting event, the higher the interest will be [5][6][7].
However, empirical research investigating the uncertainty-of-outcome hypothesis provides inconsistent evidence: especially at match level, this hypothesis rarely holds in European football [8][9][10][11][12][13][14]. Since many leagues have a multi-prize structure leading to multiple sub-competitions, the fixture-specific uncertainty may not matter. However, the importance of that fixture within the league sub-competitions does. Therefore, the number of teams in contention for the different sub-competitions may drive match-level demand in European football rather than uncertainty-of-outcome, as proven by more recent empirical studies [12,13,[15][16][17][18][19].
Another explanation of the limited evidence supporting the uncertainty-of-outcome hypothesis in European football is the reference-dependent preferences model [20], which bases fans' decisions on prospect theory [21]. Here, demand is a function of individuals' decision-making under uncertainty, where demand increases as the opponent becomes inferior a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (loss aversion) or superior (David v Goliath effect [10]): either way, demand increases as match certainty increases. Furthermore, prospect theory suggests that non-expected utility maximisation significantly influences aspects of decisions made under uncertainty [22]. Consequently, competing for a sub-competition within a league which is higher than expected should significantly increase demand in the context of competitive intensity. Likewise, competing for a subcompetition within a league which is lower than expected should decrease demand. Therefore, this article analyses Italian Serie A match-day attendance between 2012-13 to 2018-19 (seven seasons), split into three sub-categories based on the pre-season fans' expectations. This approach follows previous work [16] and allows us to verify whether the impact of attendance determinants varies depending on teams' expected performance.

Materials and methods
The study uses a dataset of 2498 fixtures covering seven seasons of the Italian Serie A, from 2012/13 to the 2018/19 season. The league is made up of 20 teams playing each other in 38 game weeks within each season. Games were analysed starting from the third fixture of each season, as only one or two fixtures are not sufficient to differentiate among the different sporting prizes, which is instrumental to creating the variables capturing competitive intensity. Moreover, 22 games were excluded because of the following reasons; (i) not played, (ii) played behind closed doors, (iii) played in a city different from the one where the home team was based-due to stadium renovation works or other stadium-related issues, or, (iv) played with a reduced stadium capacity-lower than the number of season ticket holders. These were The dataset was then divided into three subsets based on the expected final position for each club at the beginning of the season. Two types of pre-season expectations were considered, one based on Eurobet "ante-post" odds, the other on the teams' overall payroll ( Table 1). The first subset includes teams expected to end the season in the first seven positions, the second subset comprises teams expected to finish in the middle of the table (8 th to 13 th place), and the third includes those expected to finish at the bottom (14 th to 20 th place).
For each of the three groups, we estimated different specifications of a demand model, including variables capturing most of the determinants of attendance identified by [4] and inspired by other works investigating these determinants without differentiating the analysis against fans' expectations [16,17]: Our dependant variable is the number of match-day tickets sold, excluding the season tickets-obtained from www.stadiapostcards.com. It is essential to exclude season-ticket holders for match-level analysis as they pre-purchase all fixtures regardless of the peculiar characteristics of a game [8]. X ijt is a vector of independent variables, Z a vector of dummy variables, whereas S is a vector of dummies capturing season fixed effects, α, β, and γ are the associated coefficients, and e ijt the disturbance term.
Tobit regressions [22] with individual cut-off points were estimated to account for the supply capacity constraint on attendance [4] and the consequential truncation of the number of match-day tickets sold at the upper boundary [23]. We used "available" tickets-measured as the difference between the stadium capacity and the number of season ticket sales-as the individual cut-off points. Consequently, 18 observations within the Tobit model were right- censored for the whole dataset. Regressions were performed using the metobit command in Stata 16 software [24]. The explanatory variables were aligned to [4]. To account for economic factors, we included a) the annual unemployment rate of the municipality where the game was played (unemployment); b) market size measured by the total number of supporters of the two teams across the whole Italian territory (home_fans and away_fans) as a result of a survey conducted by the specialised website www.tifosobilanciato.it; c) distance, in km, between the two cities of teams involved in the game (distance), proxying the travel cost for away fans and used in previous works [25,26] and d) substitution effect by measuring the number of games scheduled at the same time and broadcast on TV (substitutes). It is worth mentioning that, in the seven seasons under investigation, all the Serie A matches were broadcast on Sky: therefore, the inclusion of a dummy variable capturing the "direct" substitute effect is redundant [4,9,10,[27][28][29][30][31][32][33].
To account for the quality of viewing factors, we included a) a set of integer (temperature, humidity) and dummy (rain, storm, fog, snow) variables for weather conditions; and b) a set of dummy variables for the timing of contest: working_day, indicating whether a match was scheduled on a weekday or not [17,34,35]; sat_aft, for matches scheduled at 3 pm on Saturdays; sat_eve, for matches scheduled at 6 pm on Saturdays; sat_nig, for matches with a 8.45 pm kick-off on Saturdays; sun_eve, for matches scheduled at 6 pm on Sundays; sun_nig and sun_noon for matches with a 8.45 pm and 12.30 pm kick-off on Sundays, respectively. The dummy capturing matches scheduled at 3 pm on Sundays was excluded from the model since it showed a Variance Inflation Factor (VIF) coefficient higher than 10, leading to the presence of strong collinearity in our estimates. Its high VIF can be explained by its strong correlation with the variable substitutes. Unsurprising, since the games with the higher number of substitutes are the traditional 3 pm Sunday kick-off. We decided to exclude a variable capturing the quality of stadium facilities since there has been limited stadium investment in Italian football. Finally, we included a set of variables capturing the characteristics of the sporting contest [4]. The count of matches in each season (fixture) was included in quadratic form to verify the existence of a non-linear relationship with the number of spectators, as first suggested in [11]. home_rank indicates the position in the standings of the home team before the game, and home_wages is a control variable capturing their relative wage-the ratio between the team's payroll and the average seasonal payroll. The opponent's quality is proxied by their position in the standings (away_rank) and their relative wage (away_wages). home_promotion and away_promotion are dummy variables aimed at capturing whether newly promoted teams attract more fans in home and away games. Whereas goal_average is the sum of the goal averages scored by home and away teams before the game, and rivalry is a dummy equal to 1 if the game involves two rival clubs and 0 otherwise. We considered the rivalries identified by Paliotto [36] and based not only on the clubs' geographical location but also on the historical sensibility of fan groups [37]. The uncertainty of outcome variable (outcome_uncertainty) is calculated as the absolute difference between the home and the away team win probabilities [9], which is more sensitive to the actual gap between teams than draw probabilities [10] and is obtained from the BET365 dataset. The two dummy variables ncs_prize and pcs_prize, measure any negative and positive change in the standing during the home team's last two games, determine a change in the prize gained, and aim to capture the league standing effect [2]. Three variables were then used to account for competitive intensity [38][39][40], implying that demand is stimulated when more teams are in contention for multi-prize sub-competitions (qualification to playoffs or international competitions and promotion-relegation, among others). top indicates whether the home team was fighting for the title (1 st place), a direct entry to the Champions League (2 nd place until 2016/17; 2 nd to 4 th place in 2017- 18 and 2018-19) or an entry to the Champions League qualifying round (3 rd place until 2016-17). europa indicates direct entry to the Europa League (4 th and 5 th place until 2016-17; 5 th and 6 th place in 2017- 18-2018-19) or an entry to the Europa League qualifying round (6 th place until 2016-17; 7 th place in 2017- 18-2018-19). bottom indicates the avoidance of the last three positions, determining relegation. As of 2017-18, the Italian league qualifies four teams directly to the UEFA Champions League group stage, following a reform in the admission rules to the competition and an improved UEFA ranking coefficient [41]. Moreover, if the Coppa Italia winner did not finish the season in the first five positions (six as of 2017-18), they would gain direct qualification to the Europa League group stage, so that the team finishing 5 th (6 th as of 2017-18) would gain an entry to the Europa League qualifying round rather than the team finishing 6 th (7 th as of 2017-18). This occurrence happened twice during the period under investigation, in 2012-13 and 2018-19. However, the Coppa Italia final is scheduled at the end of the season; therefore, it is reasonable to assume that, with the season underway, fans still look at the position in the standings as decisive for the Europa League qualification.
The three dummies capturing competitive intensity are functions of the point difference for the home team relating to the league prizes [12,16,17]. These are determined by taking into account three temporal horizons: the next match, the next two (i) and the next three matches (ii). For example, a team was considered in contention for a prize in the first temporal horizon if there was a gap of no more than 3 points. If the home team was in contention for more than one prize among Champions League and Europa League qualification within the same temporal horizon, only the highest prize was taken into account (1 for this prize, 0 for the other prizes). For example, if a team was 3 points ahead of the first team out of the Europa League zone and 2 points behind the Champions League zone, it was considered in contention for the Champions League qualification. In the two-match temporal horizon, if one match (e.g., 3 points) was sufficient for a higher prize (e.g., Europa League qualification) whereas two matches (e.g., 5 points) were required for a lower prize (e.g., relegation battle), we took into account the higher prize. In the three-match temporal horizon, if two matches (e.g., 6 points) were sufficient for a lower prize and three matches (e.g., 7 points) were required for a higher prize, we took into account the lower prize. When we estimated our model with the inclusion of the three-match temporal horizon competitive intensity, games were analysed starting from the fourth fixture of each season, which reduced our dataset from 2498 to 2428 games.

Results and discussion
Our results are shown in Tables 3-10. Our regressions were performed using the whole set of explanatory variables. However, for a clearer presentation of the results, we have divided them into four groups based on Borland and MacDonald's categorisation of the determinants of attendance [4]. Results seem relatively consistent regardless of the type of pre-season expectations used to create the three subsets. All the explanatory variables-except for ordinal (fixture, home_rank and away_rank) and dummy variables-are expressed in natural logs, and the estimated coefficients are interpreted as elasticities. Dummies capturing season fixed effects are not reported for the sake of brevity. Variance inflation factors (VIF)-calculated for the whole dataset-are significantly lower than 10 for our independent variables, proving the absence of strong collinearity (see S1 Appendix). Robust standard errors are estimated as the Breusch-Pagan test reveals the presence of heteroskedasticity.

Economic factors
There are similarities and differences among the three subsets regarding the economic factors of attendance, as shown in Tables 3 and 4. Distance impacts all the clubs' attendance regardless of the fan expectations. Less travel time is a good predictor of attendance since increased travel time often represents increased opportunity cost associated with attending a game for the supporters of the away team [27,[42][43][44][45][46]. The home team's market size is another factor that shows the same (positive impact) attendance in all three subsets: a larger fanbase corresponds to a higher demand for tickets. However, the unemployment rate of the city hosting the home team is significant in both tables only for the first subset. The positive coefficient of this variable is not surprising if we consider that previous research showed that the relationship between unemployment and attendance is ambiguous [47,48]. Since the first subset includes all the clubs from the four most populous Italian cities (Rome, Milan, Naples and Turin), the positive coefficient of the unemployment variable may indicate that football attendance is considered a social outlet for unemployed people in the metropolitan areas. This conclusion could be indeed extended to high unemployment areas, if we consider that in Table 4 the coefficient is significant and positive also for the third subset, but negative for the second, and the average unemployment rates across the period considered indicate that first and third subsets show values (11.14% and 10.76% respectively) significantly higher than the second subset (8.93%). Conversely, the away team's market size was significant only for the other two subsets. Therefore, when clubs are not expecting to fight for top finishing positions, the opposing team's market size is significantly positive. Since those clubs with larger market sizes often resemble those competing for top prizes, three reasons may explain this. Firstly, the lossacceptance from prospect theory [21] and the possibility of an upset against a team competing for a top prize. Secondly, fans gain utility from watching higher level performances from the opposition. Thirdly, teams with larger market sizes have fan bases spanning across the country. Therefore, demand increases as they play in other areas outside of their locality. The number of games scheduled simultaneously and broadcast on TV does not have any impact on any of the three subsets. Therefore, there is no evidence of a substitution effect with other games. This finding shows that even single ticket buyers in the Italian Serie A demonstrate a certain degree of loyalty since the concurrence of other-potentially more appealinggames does not influence the decision to attend a game regardless of the expected team performance. Tables 5 and 6 show the coefficients of the variables linked to the quality of viewing in line with [4]. When considering weather conditions, the only variable that consistently affects attendance is rain, with the expected negative sign [5,28,[49][50][51]. In contrast, the temperature only impacts positively the top 7 clubs' fans in both tables and negatively the second subset in Table 6. This last result may depend on the fact that most games played with warmer weather conditions are scheduled towards the end of the season, when teams expected to finish between the eighth and the thirteenth position could see an increase in meaningless games if they are not in contention for any prize.

Quality of viewing
Interestingly, kick-off time does not have a consistent impact on attendance to games scheduled for weekends across the three subsets. We can only observe a clear aversion of the top 7 clubs' fans to the Sunday night games in both tables and Saturday evening games in Table 5, where there is also a positive coefficient for the games scheduled at noon on Sundays in the second subset. In contrast, games scheduled for working days attract fewer spectators to home games for teams expected to finish in the first 13 positions. In the analysis of these variables, we find more similarities than differences across the three subsets, as both weather conditions and scheduling of contests do not generally show robust evidence of impact on attendance.

Sporting contest: Game characteristics
Within the game characteristics category of factors (Tables 7 and 8), fixture consistently negatively impacts attendance for teams expected to finish between eighth and the thirteenth positions. Such a finding may reflect the increase in meaningless games if they are not in contention for any prize as the season progresses, consistent with our findings regarding the negative impact of temperature on the same subset. There is no consistent evidence that the home team position in the standing positively influences attendance, as the home_rank coefficient is significant only for the second subset in Table 7, for the other subsets-especially the third one-in Table 8. This may suggest that a better position in the standings could have a positive impact especially on supporters of the teams expected to fight to avoid the relegation. Regarding away_rank, its coefficient is significant only for the second subset in Table 7. Therefore, only supporters of the teams expected to finish between the eighth and the thirteenth position seem attracted by opponents that are having a better season performance. However, if we consider the away teams' payroll, there was a consistent significant positive impact across all the sub-groups, especially in the third one. Consequently, the model demonstrates-consistently with previous studies [14,17,31,34,35]-that the opponents' quality matters to all fans, but more so to fans of lower-performing clubs. Interestingly the home_wages coefficients show a negative sign for the first and second subsets and a positive sign for the third. This finding possibly reflects how a home team's talent stimulates season-ticket sales rather than match tickets, especially for fans who are used to higher quality players, unlike fans of teams expected to fight to avoid relegation.
The enthusiasm of newly promoted teams' fans-potentially stimulating demand-is not consistently reflected by an increase in the home games. The home_promotion coefficients are significant only in Table 8-but in the away games' attendance, indicating that the promotion effect impacts more on loyal fans, who are most likely to buy season tickets and follow their Table 9. "Uncertainty-of-outcome" determinants of attendance in Italian Serie A: Sub-groups based on BET 365 antepost. team to away games. The fact that away promotion coefficients are higher in the first subset may again indicate the relevance of the opponents' quality: that said, we would need specific data on the number of tickets sold to away team supporters to obtain definitive conclusions. A higher expected number of goals scored in a game does not impact attendance across the three subsets. When teams face a rival opponent instead, ticket sales increase for clubs expecting to finish after the seventh place (sub-groups 2 and 3), implying lower table clubs' fans are more attracted by historical rivalry than big clubs' fans. We also conducted further tests, replacing rivalry with a dummy variable accounting for matches between clubs in the same region and a dummy variable accounting for matches between clubs in the same city. We first kept distance among the explanatory variables, then took it off because it may capture part of the variability as in regional or local derbies; the distance between the two cities is limited. Including distance among the regressors, coefficients for both regional derbies and city derbies are still not significant for the first subset. Conversely, without distance, they become significant for positions 1-7, yet still smaller than positions 8-20. Further confirming the conclusion that rivalry is more critical to lower-performing clubs' fans than top-performing clubs' fans within the Italian Serie A. Results of these further tests are available upon request.

Sporting contest: Uncertainty-of-outcome
The most interesting result of this research is undoubtedly related to the much-debated uncertainty-of-outcome hypothesis (Tables 9 and 10). There is a significant difference between the first subset and the other two. The models for the first subset (positions 1-7) show negative outcomeuncertainty coefficients. However, models for the other two subsets (positions [8][9][10][11][12][13][14][15][16][17][18][19][20] show the opposite. Consequently, the uncertainty-of-outcome hypothesis holds for the high performing clubs, suggesting their fans prefer a balanced game. In contrast, all other clubs experience higher demand when one of the two teams shows a clear competitive edge. Consequently, this demonstrates how clubs expecting to finish below seventh place demonstrate reference-dependent preferences, exhibiting traits of loss-aversion and loss-acceptance. The fact that the uncertainty-of-outcome hypothesis is verified for the top 7 clubs but not the other two subsets may be simply a confirmation that the quality of opponents matters to all fans [17,34,35]. Comparing the top 7 teams' expected with their actual performance, regardless of predictions based on betting odds or team wages, for five out of the seven seasons considered only one team did not manage to finish in the top 7, and only two teams for the other two seasons (2013-14 and 2014-15). Considering that the difference between home and away team win probabilities heavily depends on the quality and current performance of the teams involved, arguably the most balanced games for the top 7 teams are those played against teams in the same subset. Our evidence confirms this by a significantly lower average absolute difference between the two teams' winning probabilities when both teams involved in a match belonged to the first subset (0.13 when based on "ante-post" betting odds, 0.23 when based on team wages) than when a team in the first subset hosted one in the second (0.53 for both grouping methods) or the third (0.61 and 0.60 respectively).
Fans supporting clubs expected to finish in the middle of the table are attracted by games where the absolute difference between the two teams' winning probabilities is higher. Therefore, this is consistent with a convex relation between match-day tickets sold and home win probabilities emerging from an attendance model with reference-dependent preferences and loss-aversion [19]. The analysis of the average absolute difference between the two teams' winning probabilities shows that the most unbalanced games were those where a team in the second subset hosted a team in the third (0.30 for both grouping methods). In these games the odds were-on average-in favour of the home teams, as the average non-absolute difference is equal to 0.30 for the first grouping method and 0.20 for the second. Consequently, this confirms loss-aversion characteristics for fans of teams expected to finish in the middle of the table as they are more attracted by games where their favourite team has higher winning probabilities. Games against teams in the first subset are-as expected-more unbalanced (average absolute difference equal to 0.25 for the first grouping method, 0.24 for the second) than games against teams in the same subset (0.21 and 0.19, respectively). The odds-on averagelogically favour the away teams from the first subset (average non-absolute difference equal to -0.22 for both methods) and the home teams when playing games against teams in the same subset (0.18 for the first grouping method, 0.17 for the second). Therefore, regardless of whether home fans enjoy the possibility to admire or upset stronger opponents, quality still matters more than uncertainty-of-outcome for fans of clubs in the second subgroup.
In the third subset, comprising teams with the lowest overall quality (the average team wage is the lowest regardless of grouping method, see S2 Appendix), we also verify a higher preference for unbalanced games. The analysis of the average absolute difference between the two teams' winning probabilities indicates that games where teams expected to fight to avoid the relegation hosted teams in the first subset show higher average values (0.35 for the first grouping method and 0.36 for the second) than the other games (0.19 and 0.20 respectively). If we consider the average non-absolute difference, we also find that the differential in favour of the home teams is higher when playing against teams in the same subset (0.17 for both grouping methods) than when playing against teams in the second subset (0.04 for the first grouping method and 0.06 for the second). This seems consistent with the above-mentioned attendance model with reference-dependent preferences [20].
Therefore, if we consider the quality of the opponents, the positive impact of uncertaintyof-outcome in the first subset is not necessarily in contradiction with the negative impact shown in the other two subsets. A higher opponents' quality corresponds to a higher uncertainty of outcome for the top 7 clubs and a lower uncertainty of outcome for the others. Under this consideration, it indicates that opponent quality matters to all fans regardless of expected performance-a notion also demonstrated by the above-mentioned positive significance of away team wages coefficients for all the subsets.
Finally, further differences emerge when analysing the league standing effect, with the first subset (positions 1-7) sensitive to positive changes in the current position leading to contention for a better prize and the competitive intensity variables. Not surprisingly, clubs in the first subset see increased match-day ticket demand when fighting for the league title, Champions League qualification or Europa League qualification, with a clear preference for the first two. In contrast, bottom shows a significant coefficient only when the three-game horizon is taken into account. A situation that is more likely to occur in the early season when a top team may have had a bad start but is still not very distant from better prizes either. Europa League qualification seems to be an attractive prize for fans of clubs in subgroup 2 (positions 8-13), but only in Table 10, whereas demand consistently increases when they are fighting relegation -which, based on the pre-season expectations, would be a considerable underperformance. This is similar for teams expected to finish in the last seven positions, as coefficients for the bottom are significant in all the specifications, showing increased ticket sales for games when league survival is in doubt. Arguably, this demonstrates it could be financially more profitableconcerning the match-day revenue-to be in a relegation battle than secure in a mid-table position. In Table 9, we can also see that-not surprisingly-fighting for better prizes, a situation that is again more likely to occur in the early season, positively impacts attendance in the last subset. These results shed new light on previous work on European football, where the analysis was conducted without differentiating teams based on fan expectations [12,16,17] and the league standing effect turned out to be mostly insignificant, whereas being in contention for sporting prizes generally showed a positive impact on attendance.

Conclusions
This article investigates the main determinants of attendance for the Italian Serie A. The large dataset enables us to create-for the first time-three subsets to explore potential similarities and differences in the behaviour of people supporting clubs expected to have different seasonal performances. This allows a more accurate consideration of the effects of fan expectations on attendance compared to Bond & Addesa [18], where they were simply embedded in the model for the whole dataset.
While we find some similarities across the three subsets, especially concerning weather conditions, scheduling and number of TV substitutes, a significant difference between the first subset and the other two seems to emerge when considering the sporting contest characteristics. Fans supporting teams expected to finish in the first seven positions seem to be particularly attracted by more balanced games, those supporting the other teams by more unbalanced games. A more in-depth analysis shows how these two results are not necessarily in contradiction, as this may indicate a general preference of Italian fans towards higher quality opponents. This notion is further supported by the positive significance of the variables capturing the quality of away teams for all the subsets. That said, higher home win probabilities are also a decisive factor in the second and third subsets, which seems consistent with the attendance model with reference-dependent preferences.
Regarding competitive intensity, Europa League qualification still has a certain appeal to top 7 clubs' fans, perhaps representing a consolation prize, and for fans of clubs in the second subset. Surprisingly, fans of teams expected to finish between 8 th and 13 th positions are more attracted by games where their team is fighting to avoid relegation, which-less surprisinglyis also the most appealing prize for fans of teams expected to be in contention for it. The positive coefficients of rivalry only in the second and third subsets suggest that winning against rival teams may be considered an additional "prize" for mid-table and small clubs.
While our analysis presents significant differences in fan behaviour when accounting for the sub-groups of fans, based on their expectations, the main difference turned out to reveal a common preference of Italian fans towards higher quality opponents. Future research may try to replicate this study for other countries/leagues to verify whether this fan preference would also emerge in other contexts.