Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Social media perceptions of college football performance and season length 2019–2023

  • Michael L. Smith ,

    Contributed equally to this work with: Michael L. Smith, Austin B. Berenda

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

    smit4785@purdue.edu

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

  • Austin B. Berenda ,

    Contributed equally to this work with: Michael L. Smith, Austin B. Berenda

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

  • Valerie Kilders ,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

  • Nicole Olynk Widmar ,

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Writing – review & editing

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

  • Courtney Bir ,

    Roles Methodology, Validation, Writing – review & editing

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Oklahoma State University, Stillwater, Oklahoma, United States of America

  • F. Bailey Norwood ,

    Roles Methodology, Validation, Writing – review & editing

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Oklahoma State University, Stillwater, Oklahoma, United States of America

  • Zack Neuhofer ,

    Roles Conceptualization, Data curation

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

  • Lauren Bales

    Roles Data curation, Investigation

    ‡ VK, NOW, CB,BN,ZN and LB authors also contributed equally to this work.

    Affiliation Department of Agricultural Economics, Purdue University, West Lafayette, Indiana, United States of America

Abstract

College Football carries a strong social, cultural, and economic importance in the United States with teams, universities, the public and others leveraging social media and the online sphere to promote and discuss events and happenings. In our study, we quantify the mentions and underlying public sentiment of online posts related to American College Football originating in the 2019–2023 seasons on social and news media in the United States. We complement this with an analysis of how team performance clustered into conference levels relates to social media data using ordinary least squares. We find the impact of the 2020 season, which had pandemic-induced schedule adjustments, was felt across the sports landscape, but impacted different conferences in diverse ways. We further observe that public sentiment during the season tends to be higher in Power Five conferences that have a lower winning percentage. Additionally, our results suggest that the COVID season corresponds to decreased mentions. The effect on sentiment is less clear, but we more generally find that the winning percentage (positively) predicts sentiment.

1. Introduction

North American college football has emerged as a culturally significant sport over the past century, specifically in the United States. The game has continually evolved over the decades to its current form. Long gone are the days of ‘three yards and a cloud of dust’ football [1], which was often heard over the radio, accessible only to locals in the recent mid-century. Today, college football is the most profitable National Collegiate Athletic Association (NCAA) sport [2] and shown on TV all over the country. Top matchups consistently receive over 5 million viewers in the regular season and nearly 30 million viewers in the post-season [3]. The respective TV contracts are correspondingly significant: the television contract for the 2024 college football playoffs alone exceeds $470 million and contracts with individual conferences to broadcast college football and other sports can surpass $1 billion annually [4].

In recent seasons, programs have expanded to not only focus on TV contracts, ticket sales, and merchandise sales but also embrace the synergistic opportunities provided by social media. Some hope the growth of program’s online presence can function to build and expand a brand and enhance program interaction with fans [5] but it also further adds to the proliferation of college football’s influence on society and culture.

We extend on these previous studies by employing social media listening to quantify online mentions and the underlying public sentiment related to college football during the 2019–2023 seasons. Relevant literature on the topic of social media engagement and sentiment concerning sports is not widespread. Where it does exist, it is often relegated to studying the effect of a few games in a professional sports league. For example, research into emotional contagion and solidarity in social media users commenting on NFL games finds that anger (among social media users) encourages discussion on social media, while sadness reduces one’s likelihood of discussing the outcome of a game [6]. Relatedly, previous research into the spatial and temporal relationships of tweets about professional basketball games (specifically in the NBA playoffs) has found that winning increases the relative positive percentage of posts with a positive sentiment and losing causes relatively more negative posts for a short period, but the authors did not calculate the percentage change in the relative positive posting percentage caused by a loss [7].

Similarly, also looking at the NBA, Gong et al. (2020) found that when consumers (or fans) begin to suspect a team is tanking, this is reflected on social media, which subsequently impacts game attendance [8]. “Tanking” refers to a team deliberately underperforming to improve prospects for a better draft pick in the off-season, in hopes of improving their long-run performance [9]. Specifically, they found that a higher volume of social media mentions suspecting the team of tanking, generates a lower attendance figure for the next game, but it does not generate long run effects into future seasons. The same question cannot be directly applied to collegiate teams because they have a different incentive structure as new players are attained through recruiting efforts, not a league-wide draft. Instead, when looking at college football, understanding the relationship between conference performance and social media activity requires a longer-term examination of historical data on both metrics.

We aim to fill this gap by leveraging social media listening data in conjunction with conference performance metrics to uncover patterns and trends that reveal how on-field performance translates to digital engagement and online sentiment. In addition, we examine how the public responded online to the differing strategies—resulting in varied season lengths—adopted by college football teams nationwide in response to the COVID-19 pandemic. The COVID-19 pandemic presents a natural case study for this research question as it had a major impact on professional as well as amateur sports throughout the United States and beyond.

Following the initial nationwide COVID-19 outbreak in the United States, significant and widespread closures, including schools, public buildings, and most public places, disrupted the normalcy of nearly all daily activities by March 2020. By March 9th, 2020 all major sporting events had ceased in the US including in the National Basketball Association (NBA), and the National Hockey League (NHL), both of whom were nearing completion of regular seasons. Additionally, Major League Baseball postponed their season start. In collegiate sports (all of which are governed by the NCAA), the women’s and men’s basketball tournaments were canceled. The most obvious result of this was reduced revenues in the form of tickets to attend games [10]. Some of these leagues were able to resume activity during 2020 and offered enhanced digital platforms to immerse fans in. These platforms often provided data analytics and player tracking, while actively leveraging social media to engage fans [11].

One of the primary effects of the pandemic on college football was a significant reduction in the season length and travel schedules in college football. While the college football playing season occurs from August to early January, the operation runs year-round [12]. Spring and summer practice are an important time for a coaching staff to develop talent, integrate new players, and recently, court transferring players while convincing current players to stay. The emergence of COVID-19 restrictions on facilities and gatherings created significant hurdles to bringing teams together for spring training [13].

In 2020, the college football season was slated to begin in late August. Each of the Power Five conferences committed to reworking their season schedules to reduce or eliminate games played against opponents outside of their conferences (known as non-conference). Power Five Conferences are defined as the Atlantic Coast Conference (ACC), the Big Ten (Big Ten), the Big 12 (Big 12), the PAC 12 (PAC 12) and the Southeastern Conference (SEC). The exact date of Week 0 depends on the season but generally falls around Labor Day weekend. In 2020, the Power Five conferences decided to delay the start of their seasons by varying lengths of time [1417]. Additionally, college football has a few teams that are not members of any conference including Notre Dame University, the United States Military Academy (West Point), the University of Connecticut (UConn), and the University of Massachusetts Amherst (UMASS). Some of these “independents” delayed the start of their seasons as well.

Once the 2020 season began in College Football, some additional games were canceled due to case levels of COVID-19 on the football team rosters. Like other sports, attendance at these events was strictly controlled. As the season concluded, a post-season bowl boycott movement emerged, resulting in 39% of bowl games being canceled [18].

We leverage variation in season structures over time—comparing the 2020 season with other seasons in our dataset—as well as across the Power 5 conferences, due to their differing responses to the pandemic. We find that the impact of the 2020 season was felt across the sports landscape but impacted different conferences in diverse ways.

We extend upon previous studies of social media reactions to the COVID-19 pandemic’s impacts on sports and the intersection of public health and related social phenomena [19]. Our study can help researchers better understand the level of engagement that social and digital media users have with college football. We further provide insights into the relationship between public discourse, team performance as well as season length. This information can inform decisions by brand managers and marketers in the athletic departments of the Power Five football teams as well as beyond the confines of college football.

Furthermore, we present a combined dataset resulting from the merger of social media data, as well as on-field team performance data, which is aggregated by conference. What results is a novel dataset that can be used by researchers and students to examine public opinions of a quickly changing sports environment.

2. Methods

Social media listening

To capture the public’s social media discourse about college football and assess responses to different season lengths, we employ social media listening facilitated via the Quid platform (formerly known as NetBase and then NetBase Quid). The methodology employed in our analysis follows and builds upon previous studies using social media listening. Widmar et al. (2020) hypothesized that social media sentiment can be viewed as a performance measure and Lai et al. (2023) highlighted the relationship between public sentiment towards cruises and resulting stock performance which was unique in probing public sentiment on difficult-to-quantify goods [20,21]. Similar explorations include a study of the connection between social media sentiment federal data on water quality [22]. Focusing more on the impact of specific events, other studies employing social media listening have looked at the relationship between social interactions and disaster events [23], baby formula shortages [24], as well as threats posed by mosquitoes [25].

In line with these studies, we assessed online and social mentions of topic related terms and the corresponding sentiment of the mentions across social media platforms, online news forums, blogs, etc. A mention constitutes a post about, or with reference to, a keyword or “included term” used to parameterize the searches. The research team generated inclusionary search terms such as the school name (ex. “Ohio State Football” and “Michigan Football” which would be two of the teams in the Big Ten conference cluster), each school’s team brand name (ex. “Buckeye Football” and “Wolverine Football”), each bowl game played in the past 4 seasons (ex. “Rose Bowl” and “Fiesta Bowl”), and key terms referring or relevant to the college football season. A complete list of schools and bowls included in the primary terms is provided in Table 1 (and the complete list of primary terms is in Appendix Table 1 in S1 File). Each keyword term was duplicated with a hashtag to account for those mentions (ex. “#BuckeyeFootball” and “#WolverineFootball” which would both be collected as part of the Big Ten mentions and net sentiment). Additionally, college football has a few teams who are not members of any conference. Some of these “independents” delayed the start of their seasons as well. Independents include the United States Military Academy (West Point), the University of Connecticut (UConn), and the University of Massachusetts Amherst (UMASS). The University of Notre Dame is included in our dataset as a member of the ACC despite playing football as an Independent most seasons. Notre Dame joined the ACC for the 2020 season and regularly plays ACC opponents during seasons as an Independent. This paper analyzes Power 5 quality teams at a conference level, thus omitting Notre Dame due to its independent status would eliminate a major fanbase, as major teams have been defined as Power 5 or Notre Dame. While Notre Dame is not officially a member of the ACC, they had to play only ACC opponents in 2020 due to COVID restrictions, and they are considered members of the ACC for scheduling purposes including frequent play against ACC opponents, and the ability to qualify for Bowl Games under bids reserved for the ACC [26,27].

thumbnail
Table 1. List of schools searched with conference groupings included.

https://doi.org/10.1371/journal.pone.0325840.t001

It is common when using social media listening that some posts are returned that match a keyword, but with a different meaning. As such, researchers must tune the web scraper to only include the intended meaning of the keywords. Tuning is completed by excluding terms. A list of excluded terms is provided in Appendix Table 2 in S1 file. This list was developed by examining data results that are not relevant to the intent of the search. These terms relate to other versions of football including football in primary school, professional football, and the sport of soccer, which is widely known as fútbol.

A series of thematic sub-searches, which allow the parsing out of mentions and sentiment related to specific topics, were conducted. A sub-search to examine each of the Power Five Conferences, which are the top performers in the sport, was conducted to better understand how online and social media developed about each of the major conferences. Moreover, doing so also allows us to test whether there are differences in online and social media engagement and tone of conversation depending on the difference in season length, which was determined by the conference a team played in. Each conference sub-search uses team names in the same format as the primary search.

Quid quantifies mentions about a topic, but also analyzes the tone, or positivity versus negativity, of the search results (sentiment). The sentiment is summarized via a sentiment score that represents the ratio of positive relative to negative mentions. As such it can be expressed as

where expresses the net sentiment in week . As a ratio it is bounded between +100 (only positive posts) and −100 (only negative posts). Whether a mention is positive, negative, or neutral is determined by Quid’s proprietary Natural Language Processing (NLP) model, which assigns an individual sentiment to search results by examining the context of the inclusionary search terms. Posts can be classified as positive, negative, or neutral, building upon previous literature studying the use of NLP on social media posts about professional sports [28]. Neutral posts are not used in the calculation of net sentiment scores; these neutral posts are used in other calculations and insights including mention counts [29]. Table 2 shows the annual count of positive, negative, and neutral posts as well as sources for data collection.

thumbnail
Table 2. Sources (and counts) of data collection and counts of positive, negative, and neutral scores on sentiment of posts.

https://doi.org/10.1371/journal.pone.0325840.t002

While Quid’s NLP is highly advanced, we made minor adjustments to recategorize the sentiment of certain terms which may be initially misclassified once accounting for the specific context of our search and analysis. For example, in the context of college football, terms like offensive are not necessarily negatively connotated. Thus, we reclassified such terms to better reflect the actual sentiments that were being conveyed by the posts’ authors. In total, we reclassified three terms: defensive, offensive, and upset.

Searches were conducted for the geographies of the US and US minor outlying islands, exclusively in English. The overall dates searched for this analysis included from August 18, 2019, through January 20, 2024. Weekly quantity of mentions and sentiment of search results data was collected on the 2019, 2020, 2021, and 2022 seasons on November 16–17, 2023. Data on the 2023 season was collected February 13–14, 2024 to allow for the completion of the season prior to data collection. The exact dates used for data collection for each season are detailed in Appendix Table 4 in S1 File.

On-field conference performance data

A secondary dataset on individual team performance using the Sports Reference website (sports-reference.com) was developed. Sports Reference is a website collecting and reporting data relevant to sports in the US. We collected information about the weekly performance of the teams, game scores, game locations, dates of the games, the status of the game as conference or non-conference, and the status of the team’s opponent as Football Bowl Subdivision (FBS) or Football Championship Subdivision (FCS) (“Non-Major” in the dataset). Conferences were allocated according to how they were structured in the social media dataset. The dataset also compiled season performance as the season progressed. These weekly team performance data were aggregated at the conference level to create a breakdown of overall performance by week of each conference. Note, that the Power Five conference winning percentages may be above 50% because each team will play a handful of non-Power Five opponents per season. Non-Power Five teams are not included in this research.

The complimentary secondary data permits us to examine the relationship between on-field performance and online and social media results. Specifically, we can assess whether the quantity of sentiment of online and social media mentions exhibits a relationship to conference weekly performance.

Statistical and econometric analysis

To assess whether mentions or net sentiment reveals a significant relationship to conference weekly performance, we employed linear regression using ordinary least squares (OLS) [30]. The baseline models are specified as:

(1)(2)

where and correspond to the net sentiment and mentions in conference i in week t, respectively. is a dummy variable equal to 1 if week t falls into the 2020 regular season (and the accompanying post-season), and 0 otherwise. is the winning percentage of the conference in the given week. Including this variable allows us to test the relationship between on-field performance and social media conversations. Lastly, and are error terms for sentiment and mentions, respectively. For the model which includes all conferences, we include robust standard errors, assuming that these would cluster at the conference level. We also isolate mentions and sentiment of each conference and use their on-field performance as a predictor term (along with the COVID-19 dummy variable).

To control for unobserved, time-invariant characteristics of each conference we further estimate expanded OLS regressions (equations 3 and 4) that include dummy variables for each of the Power Five conferences, omitting the Pac 12 as the reference category. These dummy variables capture conference-specific fixed effects, accounting for any latent, stable factors unique to each conference. Robust standard errors are computed, clustered at the conference level, to address potential intra-cluster correlation in the error terms. These expanded OLS take the form:

(3)(4)

Where is the vector of coefficients and C is the vector of dummy variables j through n and the other variables are the same as they were in equations (1) and (2).

While fixed effects control for unobserved heterogeneity across conferences, the structure of our dataset – observations nested within weeks across multiple conferences—introduces a hierarchical or multilevel structure. Specifically, each calendar week is observed repeatedly across all conferences.

The repeated nature of the data precludes formal declaration of time series, making tests like the Durbin-Watson test untenable. To account for potential week-level unobserved heterogeneity and correlation among observations that share the same week, we estimate multilevel mixed-effects models with random intercepts for weeks. This approach allows for a unique baseline level for each week, capturing shared shocks or temporal patterns that affect all conferences during a given week. These mixed-effects models take the form:

(5)(6)

Where is a random intercept for an index of weeks and each other parameter remains as they were before.

Lastly, to compare average weekly mentions and sentiment between COVID-affected and non-COVID seasons, we conduct two-tailed t-tests assuming unequal variances. Recall, that the 2020 season was shortened and had no inter-conference play until the postseason. This comparison is conducted for the national dataset, as well as the Power Five conference individual datasets. The timeline studied began the week of the first college football game which is generally around the 3rd full week of August. Due to the high cost of collecting social media datasets specific to each individual team, we analyze playing performance and social media activity at the conference level. We contribute the attached dataset to provide opportunities for future analysis and classroom exercises which could entail analysis of individual team performance on their conference social media data or collecting social media data at the team level.

3. Results and discussion

Descriptive results

Table 3 displays the total and average mentions and sentiment broken down by conference, only mentions occurring during the playing season were included. The SEC and the Big Ten dominate each season in terms of mentions. Rankings of the highest sentiment conferences vary by season, but we see that over the 5 season the Big 12 and the Pac 12 have the highest sentiment, indicating the highest ratio of positive to negative posts. The Big Ten and the SEC sentiment scores rank 4th and 5th over the 5 seasons, albeit, still with a positive sentiment score of around 37%. Curiously, this suggests the most successful conferences have the highest number of mentions but also the lowest weekly sentiment. This could be because it is difficult to appropriate sentiment, especially in comments mentioning multiple teams. Effectively, a fan might love watching their team win, as much as they love watching another team lose. Such metrics could be a result of fans from less successful conferences expressing their disdain for the Big Ten and SEC. Additionally, this might suggest the games elicit strong levels of emotion due to the nature of the game.

thumbnail
Table 3. Rankings of top mentions and sentiment by power five conference.

https://doi.org/10.1371/journal.pone.0325840.t003

Looking at performance on the field, we find the SEC had the highest winning percentage in 4 out of the 5 seasons (Appendix Table 3 in S1 File). The Big Ten was the Second-place finisher in 3 of the 5 seasons and the PAC 12 consistently ranked last in yearly conference winning percentage. However, each conference except the PAC 12 had a seasonal winning percentage above 50% and the PAC 12 is rarely far below 50%. This makes sense given the fact that most games played during the season are against conference opponents. It can be inferred that each conference will have a winning percentage of 50% for in-conference games. The Big Ten, Big 12, and Pac 12 all generally play 9 in-conference games to be played by every team during the season, while the SEC and ACC generally play 8. If each Big Ten team plays 9 conference games, 126 Big Ten conference games will be contested, and each game will yield a winner and a loser. Thus, each conference has a winning percentage of around 50%. Deviations from 50% are determined by non-conference games. The more non-conference games that are won by individual teams, the higher the conference’s total winning percentage will be.

Statistical results

Table 4 presents the results of the OLS model, analyzing weekly mentions and sentiment related to college football and teams affiliated with individual conferences over a four-year period. Looking first at the National data we find no statistically significant relationship between conference winning percentage or the COVID indicator for either net sentiment or mentions. The same holds for the ACC conference.

However, results for the other conferences are more mixed. We find a statistically significant positive relationship between net sentiment and weekly winning percentage for the Big 12, Big Ten, Pac 12, and SEC, with the strongest response observed for the SEC. In contrast, the COVID variable is negatively and significantly associated with net sentiment for the Big 12, Big Ten, and Pac 12. These results could suggest that fans are marginally more passionate in reaction to a win in the SEC, or that fans who comment on SEC and ACC games cared less about COVID-19 (possibly because their conferences decided to start the playing season earlier than the other conferences). Turning to mentions, COVID is negatively associated with weekly mentions at a statistically significant level (at least at the 10% level) for all conferences except the ACC. Additionally, weekly winning percentage is positively associated with mentions for the Big 12, Big Ten, and Pac 12.

Table 5 shows results from the expanded OLS model for sentiment and mentions. Omitting the Pac 12 dummy variable requires that we interpret conference fixed effects against the Pac 12 as a baseline. From this viewpoint, net sentiment is lower in the Big Ten, ACC, and SEC, but higher in the Big 12. We observe that at the 10% significance level, an increased winning percentage has a positive effect on net sentiment. Similarly, we do not observe that the COVID-19 2020 season is an accurate predictor of changes in net sentiment.

thumbnail
Table 5. Expanded OLS with dummy variables for Mentions and Net Sentiment at the Conference level.

https://doi.org/10.1371/journal.pone.0325840.t005

Examining results concerning mentions, we see that the conference level fixed effects have strong levels of statistical significance. Relative to the PAC 12, we see the Big Ten and SEC have considerably higher levels of weekly mentions, while the Big 12 coefficient on mentions is very close to 0. The ACC variable reflects roughly 1700 fewer mentions per week compared to the Pac 12. We do not find evidence that weekly winning percentage is a strong predictor of mentions in the expanded model. However, the COVID-affected 2020 season is associated with a statistically significant reduction in mentions, indicating lower online fan engagement during that period. This effect is significant at the 10% level in the expanded OLS model, whereas it was significant at the 5% level in the reduced-form OLS specification.

Results from equations (5) and (6), shown in Table 5, a few notable takeaways emerge. When predicting conference mentions, our intercept gives us a baseline count of mentions we could expect to see if all other variables are equal to zero. At a high level of statistical significance, we see the 2020 season, which was impacted by COVID-19, saw considerably fewer mentions of the teams during the college football season. We do not find evidence that better on-field performance translates to higher levels of engagement (measured in mention count) on social media. Examining the random effects, we observe high levels of variations across weeks of the season and high levels of unexplained variance which suggests additional factors at play in the data (discussed further in the limitations).

When predicting conference sentiment with mixed effects, shown in Table 6, the model intercept provides a useful baseline, indicating that net sentiment is generally positive, with an average of +26 The results show that weekly winning percentage has a strong positive effect on net sentiment, suggesting that better on-field performance is associated with more favorable fan reactions. Additionally, the 2020 season—disrupted by COVID-19 and characterized by a shortened schedule and limited inter-conference play—was associated with significantly lower net sentiment. The residual variance on Conference net sentiment was 23.1. Considering the possible range of net sentiments [−100,100], this indicates substantial unexplained variation remains in the model. This underscores the complexity of factors that shape online sentiment beyond just performance and season structure.

thumbnail
Table 6. Mixed effects models of mentions and net sentiment at the conference level.

https://doi.org/10.1371/journal.pone.0325840.t006

More succinctly, we find that engagement is positive in season length and winning percentage. This shows the public reaction to the 2020 season (which saw fewer games and almost no inter-conference play) resulted in lower engagement (in mentions) and lower sentiment (among those who did engage) on social media. These findings suggest that having a winning team may generate positive knock-off effects for an athletic department and conference brand.

However, it would be difficult to isolate the effects of season length and performance from the broader context of the pandemic, as the unique circumstances of COVID-19 are not easily replicable. Ultimately, the extent to which athletic departments can leverage these dynamics for storytelling, brand development, and fan engagement depends on the level of social media interaction—interaction that, according to our findings, is closely tied to both the quality (i.e., success) and quantity (i.e., number of games) of on-field performances.

Many of our coefficients are statistically significant and interpretable. Comparing conferences, movement in sentiment and mentions is explained in the Big Ten and CFB altogether, however we lack statistical significance to explain these movements in the ACC. It is difficult to read into what this variation in statistical significance between conferences may suggest. By the nature of our combined dataset, with limited games per season, it is possible that the difference in sample size and timing could generate different findings. Alternatively, these findings might make sense, in the ACC and PAC 12, there may be a lower level of average general football interest among the community.

To further analyze the effect of season length on fan engagement, we construct a T-test comparing the shortened season to the four other seasons collected. The results of the T-test of mentions and sentiment are shown in Tables 7 and 8, respectively. Taken together, we see the national dataset exhibits a significant difference in both sentiment and mentions between 2020 and non-2020 seasons (based on the two-tail result). In both cases, the national data set showed a decline in both the quantity of attention and sentiment of the associated media for the 2020 shortened season. Effectively, this can be interpreted as a decline in the national level of engagement during the 2020 season. Results by conference are mixed, however. We find a significant difference between mentions in the Big Ten, Big 12, and the SEC. In these conferences, alterations to the season had an impact on fan engagement. In those three cases, mentions were lower. However, only the Big 12 exhibits significant differences between sentiments, which also fell in 2020. These differences suggest that the varying approaches taken by each conference to managing the 2020 season—such as start dates, number of games, and policies on inter-conference play—may have influenced how fans engaged with and emotionally responded to their teams online.

thumbnail
Table 7. Results of T-test of two samples differing by season length. Assuming unequal variances.

https://doi.org/10.1371/journal.pone.0325840.t007

thumbnail
Table 8. Results of the T-test for two samples differing by season length. Assuming unequal variances.

https://doi.org/10.1371/journal.pone.0325840.t008

4. Conclusion

We examine and quantify public engagement with US college football at a unique time in the history of the sport. The game and the league of teams have seen considerable changes in the past few seasons including the emergence of the COVID-19 pandemic which impacted the 2020 playing season. Using social media listening techniques, we assess the volume and sentiment of public discourse surrounding college football, with a particular focus on the Power Five conferences. We relate this engagement to two key factors: the effects of the COVID-19 season and each conference’s winning percentage.

Our results reveal that, at the individual conference level, higher winning percentages are associated with increased mentions and more favorable net sentiment. Looking at the aggregated data, we similarly observe a positive effect of winning percentage on net sentiment across both the expanded OLS model, which includes conference-level fixed effects, and our mixed-effects model. However, this pattern does not hold at the aggregate level where the winning percentage does not significantly predict the volume of mentions in either model.

Both models do show that the COVID-19 season is associated with a decline in engagement, as measured by mentions. The impact of the COVID-19 season on sentiment, however, is mixed. The fixed effects model does not identify it as a significant predictor, while the mixed-effects model suggests it led to decreased sentiment among online users. For the individual Power Five conferences we saw a varied impact of the 2020 COVID-19 season. While sentiment remained largely unchanged—with the exception of the PAC 12—the shortened season notably decreased mentions in the Big Ten, Big 12, and SEC.

These findings offer several insights. First, winning improves online sentiment, a valuable insight for athletic brand managers aiming to anticipate and shape public engagement. Particularly, considering the NCAA rule change towards allowing college athletes to profit from their Name, Image, and Likeness (NIL), social media attention has grown in importance towards marketing and profiting as young talent. Indeed, in a survey of 1,100 Division I athletes, it was found that 72% of commercial activity by college athletes falls into the social media category [31].

Additionally, this relationship offers a compelling case study for educational contexts, where students can test and refine such findings using novel datasets. Second, the observed decline in online engagement during the COVID-19 season highlights the importance of gameplay and season continuity in maintaining fan interest. This suggests that playtime is important to encourage fan interaction, and reduced season lengths will not help engage fans in the sport.

Limitations and avenues for future research

Our study is subject to several limitations and, importantly, opens several promising avenues for future research. First, online media data is aggregated at the conference level. While it is costly to capture, and complex to communicate, future research could examine the mention counts and net sentiment of each team along with each team’s respective on-field performance figures.

Secondly, intra-seasonal (weekly) fluctuations in the number of games played within a conference (regardless of the opponent being in or out of the conference) might be hypothesized to influence mentions and net sentiment in the same week (and possibly the week after). Our model, which posits mention counts and net sentiment as a function of winning percentage and the existence of COVID protocols, does not account for whether there was a game in time t not the number of games played. However, the winning teams get talked about whether they play that week or not a lot of the time because of the way rankings are discussed- which is proxied by the winning percentage Subsequent studies could establish a finer temporal resolution and restructure their dataset to be able to account for such short-term dynamics.

Across models, we receive repeated indications that there is unexplained variation. The introduction of additional predictor terms could enhance the efficiency of our models. Some additional variables that may help would be a dummy variable for rivalry games, upsets (unexpected losses), and a variable reflecting the number of people watching a game in person, on TV, and radio, among other possibilities. The nature of our dataset precludes robust examination of temporal dependencies that one would expect in time-varying social media data. Two-way clustered standard errors would account for within-conference correlation as well as within-week correlations, but these are untenable without transformation because the week index is not perfectly nested into conferences. Alternatively, generating lagged mentions and sentiment variables could be useful for detecting temporal relationships that persist over time. However, the structure of the dataset would require transformation to avoid pushing lags for one conference into the lagged series of another. We thus encourage future studies to further examine this issue and build upon the dataset generated for this study.

Methodologically, our dataset’s structure limits the ability to model time-varying dynamics. The hierarchical and nested nature of the data—teams within conferences, games within weeks—poses challenges for traditional time-series or panel analysis. For example, two-way clustered standard errors (e.g., by week and conference) would be ideal to address potential correlations within groups but are infeasible here due to imperfect nesting of time periods across conferences. Likewise, generating lagged versions of engagement or sentiment measures could illuminate temporal dependencies, but such transformations risk contamination between conference-specific series unless the data are carefully segmented. A future line of work could involve restructuring the dataset into a more formal panel format, enabling richer temporal modeling techniques, such as autocorrelation tests or dynamic panel regression models.

Finally, future studies may replicate our findings in other collegiate sports or professional leagues, allowing for comparisons across athletic domains. They may also consider using open-weight or open-source NLP models for sentiment scoring to increase replicability and transparency. While our use of Quid offers access to licensed social media data and NLP pipelines, we recognize the increasing emphasis on open science, as discussed by Abdurahman et al. [32]. We encourage the development of complementary studies using reproducible, open-source frameworks to validate and extend our results.

Supporting information

S2 File. Raw data used in the analyses presented in the manuscript.

This Excel file contains multiple sheets: a README sheet; a Time Series sheet with conference-level performance, online media mentions, and net sentiment; and additional sheets showing annual game-level performance by conference.

https://doi.org/10.1371/journal.pone.0325840.s002

(XLSX)

Acknowledgments

The author would like to thank Alyx Fisk, Anam Ali, Sachina Kagaya for their support to this work.

References

  1. 1. Johnson W. Goodbye to three yards and a cloud of dust. 1969. https://vault.si.com/vault/1969/01/27/goodby-to-three-yards-and-a-cloud-of-dust 2024 May.
  2. 2. Malone G. Which college sports make the most money?. https://finance.yahoo.com/news/college-sports-most-money-130012417.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAIXQY3XfrQtR-WjGrv-K__ABefgAUy43zyvtfbwaQWHql04Eks2GZwRlYOOy3Fa7cK4WiEwyyZcTeXNrOQ2ERmxvs9fr4P5VdYGcAp. 2022. 2024 May.
  3. 3. Sports Media Watch. College Football TV Ratings. 2023. https://www.sportsmediawatch.com/college-football-tv-ratings/. 2023.
  4. 4. Business of College Sports. Current College Sports Television Contracts. 2024. https://businessofcollegesports.com/current-college-sports-television-contracts/
  5. 5. Hipke M, Hachtmann F. Game-Changer: A Case Study of Social-Media Strategy in Big Ten Athletic Departments. 2014.
  6. 6. Margolin D, Liao W. The emotional antecedents of solidarity in social media crowds. New Media & Society. 2018;20(10):3700–19.
  7. 7. Gong X, Wang Y. Exploring dynamics of sports fan behavior using social media big data - A case study of the 2019 National Basketball Association Finals. Applied Geography. 2021.
  8. 8. Gong H, Watanabe N, Soebbing B, Brown M, Nagel M. Do consumer perceptions of tanking impact attendance at national basketball association games? A sentiment analysis approach. Journal of Sport Management. 2020.
  9. 9. Landgraf LA. A defence of tanking in sports. Journal of the Philosophy of Sport. 2024;51(1):89–101.
  10. 10. Ehrlich JA, Ghimire S, Khraiche M, Raza MF. COVID-19 countermeasures, sporting events, and the financial impacts to the North American leagues. Managerial Finance. 2020.
  11. 11. Skinner J, Smith ACT. Introduction: sport and COVID-19: impacts and challenges for the future (Volume 1). European Sport Management Quarterly. 2021;21(3):323–32.
  12. 12. Fox Sports. Fox Sports. 2024. https://www.foxsports.com/stories/college-football/college-football-spring-games-schedule-dates-tv 024 May.
  13. 13. Dellinger R. How the Coronavirus Impacts College Football’s Spring Schedule. 2020. https://www.si.com/college/2020/03/11/spring-practices-games-coronavirus. 2024 May.
  14. 14. Sports Reference. Sports-reference.com/cfb. Sports Reference. 2024. https://www.sports-reference.com/cfb 2024.
  15. 15. SEC Sports. SEC announces new 2020 football schedule. SEC Sports. https://www.secsports.com/news/2020/08/sec-announces-new-2020-football-schedule. 2020.
  16. 16. ACC. ACC unveils 2020 football schedule. https://theacc.com/news/2020/8/6/acc-unveils-2020-football-schedule.aspx. 2020 August 6.
  17. 17. PAC 12. Pac-12 announces 2020 football schedule. 2020. https://pac-12.com/news/2020/10/3/pac-12-announces-2020-football-schedule
  18. 18. Forde P, Dellenger R. Was the 2020 College Football Season Worth It?. 2021. https://www.si.com/college/2021/01/11/college-football-2020-season-covid-19-daily-cover 2024 May.
  19. 19. Mehra V, Singh P, Bharany S, Singh Sawhney R. Sports, crisis, and social media: a Twitter-based exploration of the Tokyo Olympics in the COVID-19 era. Social Network Analysis and Mining. 2024
  20. 20. Widmar NO, Bir C, Clifford M, Slipchenko N. Social media sentimentas an additional performance measure? Examples from iconic theme park destinations. Journal of Retailing and Consumer Services. 2020;56:102157.
  21. 21. Lai J, Bir C, Widmar NO. Public sentiment towards cruises and resulting stock performance in 2017–2021. Journal of Hospitality and Tourism Management. 2023;56:1–7.
  22. 22. Lambert LH, Bir C. Evaluating water quality using social media and federal agency data. J Water Health. 2021;19(6):959–74. pmid:34874903
  23. 23. Widmar NO, Rash K, Bir C, Bir B, Jung J. The anatomy of natural disasters on online media: hurricanes and wildfires. Natural Hazards. 2022;961–98.
  24. 24. Jung J, Widmar NO, Ellison B. The Curious Case of Baby Formula in the United States in 2022: Cries for Urgent Action Months after Silence in the Midst of Alarm Bells. Food Ethics. 2023.
  25. 25. Widmar NO, Bir C, Long E, Ruple A. Public perceptions of threats from mosquitoes in the U.S. using online media analytics. Pathogens and Global Health. 2021.
  26. 26. Moore T. Why Is Notre Dame Independent and Not in a College Football Conference?. 2023. https://www.nbc.com/nbc-insider/why-is-notre-dame-independent-and-not-in-a-football-conference#:~:text=What%20conference%20does%20Notre%20Dame,for%20poll%20and%20playoff%20contention 2024.
  27. 27. Fiutak P. Will Notre Dame Football Fully Join the ACC? Potential Targets for ACC Expansion. https://www.si.com/college/notredame/will-notre-dame-football-fully-join-acc-potential-targets-for-acc-expansion. 2024. 2024 May.
  28. 28. Zadeh AH. Quantifying fan engagement in sports using text analytics. Journal of Data, Information and Management. 2021;197–208.
  29. 29. Quid. Intro to Natural Language Processing. https://help.quid.com/en/articles/8167490-intro-to-natural-language-processing-nlp. 2024.
  30. 30. Wooldridge J. Introductory Econometrics. 2002.
  31. 31. Smith M. Social media dominates NIL activity, latest data shows. Sports Business Journal. https://www.sportsbusinessjournal.com/Daily/Issues/2022/02/17/Marketing-and-Sponsorship/NIL-data/. 2022. 2024 May.
  32. 32. Abdurahman S, Atari M, Karimi-Malekabadi F, Xue MJ, Trager J, Park PS, et al. Perils and opportunities in using large language models in psychological research. PNAS Nexus. 2024;3(7):pgae245. pmid:39015547