SNAP judgments into the digital age: Reporting on food stamps varies significantly with time, publication type, and political leaning

The Supplemental Nutrition Assistance Program (SNAP) is the second-largest and most contentious public assistance program administered by the United States government. The media forums where SNAP discourse occurs have changed with the advent of social and web-based media. We used machine learning techniques to characterize media coverage of SNAP over time (1990–2017), between outlets with national readership and those with narrower scopes, and, for a subset of web-based media, by the outlet’s political leaning. We applied structural topic models, a machine learning methodology that categorizes and summarizes large bodies of text that have document-level covariates or metadata, to a corpus of print media retrieved via LexisNexis (n = 76,634). For comparison, we complied a separate corpus via web-scrape algorithm of the Google News API (2012–2017), and assigned political alignment metadata to a subset documents according to a recent study of partisanship on social media. A similar procedure was used on a subset of the print media documents that could be matched to the same alignment index. Using linear regression models, we found some, but not all, topics to vary significantly with time, between large and small media outlets, and by political leaning. Our findings offer insights into the polarized and partisan nature of a major social welfare program in the United States, and the possible effects of new media environments on the state of this discourse.


Introduction
The Supplemental Nutrition Assistance Program (SNAP, formerly known as the Food Stamp Program) is the United States federal government's primary form of food assistance to lowerincome Americans, and is the second-largest welfare program, with a budget of more than $68 billion in 2017 [1]. The program, which serves over 46 million individuals, includes means- testing and work requirements, both of which have been the topic of Congressional debate and reform efforts since the program's formalization under the federal Food Stamp Act of 1964. Following a large program expansion under the 2008 American Recovery and Reinvestment Act (ARRA), the size, scope, and nature of SNAP has been increasingly scrutinized by elected officials. Concerns over program fraud, unhealthy eating, and luxury purchasing have also punctuated recent ethical and administrative debates about SNAP [2]. The federal budget proposed by the Trump Administration in February 2018 would codify many of the earlier policy proposals of conservative lawmakers, such as increased work requirements, devolution of funding to state agencies based on block-grant formulas, and significant budget cuts. While the tenor of policymaking debates over SNAP has shifted since the program's formation in 1964, many key points of dissention are recurrent themes. For instance, frequent debates include whether or not the program disincentivizes work, encourages welfare dependency, or subsidizes unhealthy eating [3,4]. The tenor of these political debates takes on new dimensions in the modern era, where one in ten households receives SNAP benefits and millions more are eligible, but not enrolled in the program [1]. Government reports indicate that the program now serves a vital role in preventing severe poverty [3,5], and progressive thinktanks and politicians have pointed to the program's prevalence among hourly wage-earners, suggesting that private sector employers effectively "subsidize" low employee wages with broad-based categorical welfare programs such as SNAP [6,7].

Influence of media on policy discourse
Media studies have demonstrated how, through the selection and framing of topics, the news media acts as an agenda-setter-impacting the salience of issues and influencing public opinion and policymakers alike [8][9][10]. Several studies have specifically examined how recipients of social benefits are portrayed in the media (as deserving or undeserving), and hypothesized how this may also influence public opinion and policymakers [11,12]. A content analysis of editorial pages about perceptions of immigrants on welfare, found that prior to a major welfare reform in 1996, older immigrants were often portrayed as undeserving of federal assistance [11]. After the reform, which restricted immigrants' access to certain welfare benefits, the number of editorial pieces advocating for the worthiness of older immigrants increased dramatically. The framing of social welfare recipients or other populations targeted by social policies in the media can have significant implications for the decisions of policymakers [13,14].
Typically, content analyses of print journalism have included a sample of articles published in major newspapers; however, with the advent of online journalism and social media, an increasing number of individuals access news via digital platforms. Digital news sources now play a major role in the media sector; a 2016 Pew Research Center survey about news use found that 38% of Americans reported getting news from digital news platforms, which was second only to those who got their news from TV sources [15]. As the media landscape has rapidly shifted, researchers have turned to analyses of popular media-sharing platforms to better understand how news content is disseminated [16]. Given these shifts in news media platforms, a much broader approach-requiring new tools for qualitative content analysisis needed to understand the full scope of coverage for particular social policies and programs.
This study used a machine learning technique, structural topic modeling, to assess when, where, and how SNAP has been covered by news media. In particular, we were interested in understanding how SNAP was portrayed by different types of news media outlets and considering how media discourse on SNAP could affect public opinion and policymaking related to this large social welfare program.

Materials and methods
This study was completed in three general steps: 1) assembling corpora of online and print media about SNAP, 2) refining and preparing these databases for statistical analyses using structural topic models to account for possible temporal variation in topics [17], and 3) identifying topics and estimating their prevalence according to different document metadata, such as the date of publication. Fig 1 provides a graphical representation of these steps.

Corpus assembly
A corpus of print news media (subsequently called the "Print Corpus") was assembled from a series of queries to the LexisNexis Academic Newspaper Stories/Combined Papers archive, which includes 2,702 major news publications in the United States and elsewhere. Given the size of the archive, we applied simple primary search terms: "food stamp," "SNAP," or "Supplemental Nutrition Assistance Program," and we also included items classified by Lexis-Nexis under the index term "Food Stamps." We limited the search to articles published between January 1, 1990 and December 31, 2017.
To create an online media corpus (called the "Online Corpus"), we used a Google News web-scraping algorithm based on several broad search keywords to include all mentions of SNAP in online media from 1990 to the present. Primary inclusion terms were: "food stamps," "SNAP," and "Supplemental Nutrition Assistance Program," with additional search terms to identify media that uses state-specific program names (e.g., "CalFresh" in California). A full list of search terms is available in the Supplemental Materials (S1 File). Additional refining of both media corpora was completed in R with pattern-matching functions (available in the tidyr package [18]) to remove documents that contained words that may be related to food assistance (e.g., agriculture, hunger, famine) though ultimately were not about SNAP (e.g., "Farmers experience a cold snap" or "Stamp out hunger"). To our knowledge, the research procedures described above complied with the terms of service for websites from which data were collected.

Data reduction
The initial search queries within the LexisNexis database generated 79,481 documents. Of these, 58,778 documents were included in the Print Corpus. For the Online Corpus, the initial web-scrape collected 13,987 documents, and ultimately 3,850 documents were included in the final corpus. Fig 1 illustrates the methods used to compile, reduce, and analyze these datasets.

Corpus pre-processing
With functions available in the stm R package [17], several preprocessing steps were taken to prepare the corpora for modeling, such as removing common English "stopwords" (e.g., "the"), punctuation, numbers, and capitalization, as well as word stemming, which retains word stems and discards non-stem parts such as "ing," "ed," or "s" (e.g., stemming yields "shop" for "shopping," "shopped," and "shops"). We also removed "rare" terms, occurring in less than 1% of all documents in a corpus, as well as common term found in more than 90% of corpus documents. Finally, documents with missing data were removed.

Metadata extraction and assignment
To allow qualitative investigation of change over time, as well as comparison between media sources and outlets, various document metadata were attached to all documents. For both the Print and Online Corpora, the date of publication was included as metadata, as well as the source's status as a "Major Paper" or "Other Paper" based on the URLs of the 50 most widelycirculated print newspaper URLs, as reported in a recent Pew Research Center report [15]. This allowed us to introduce a general proxy for national versus local/regional influence or reputation, especially within the Online Corpus, given previous study findings regarding false or misleading "news" by some web content providers [19]. To consider the partisan nature of SNAP coverage, we assigned a "partisanship score" to the Online Corpus ranging from -1.0 (very liberal) to 1.0 (very conservative) [20], based on prior work by Bakshy and colleagues' [16] assessment of the 500 most prevalent "hard content" URLs shared on Facebook. To apply this same partisanship measure to the Print Corpus, we assigned scores to all "Major Paper" sources (n = 21) that were able to be matched to the Bakshy et al. URLs. This subset of the Print Corpus ("Print Subset Corpus") included 11,984 documents.

Statistical analyses
Machine learning methods are generally characterized in one of two ways: supervised or unsupervised analysis. In supervised machine learning analyses, pre-classified documents or data are used to "train" a computer algorithm that can then classify new, un-classified data. These analyses can be useful when distinct categories or typologies are known, a priori. Alternatively, unsupervised learning analyzes patterns within a text corpus to identify a given number of themes or categories. This approach is ideal for handling large datasets that do not inherently have well-defined categories, and/or would be prohibitively costly in terms of time or other resources to pre-classify. Here, we apply a particular unsupervised machine learning approach, the "structural topic model," which has been previously used in analyses of large text databases related to political discourse and attitudes [21]. A specialized R statistical package, stm, allows for automated selection of optimal model parameters within a set number of topics, and enables flexible estimation of metadata effects (e.g., changes over time or between liberal and conservative media sources) on topic prevalence [17].
One of the primary outputs of structural topic models are estimates of topic proportions for each document (θ), whereby each document is associated with n estimates, where n is the number of topics in the model. These estimates can be used to assign topics to documents or assess the general distribution of topics across a corpus. The manyTopics function (part of the stm package) identifies an optimal model for a given number of topics, though inspection of average topic semantic coherence (i.e., the relatedness of words within a topic to one another) and exclusivity (i.e., the extent to which a topic contains words that do not appear in other topics) for each model was also used to assess the optimal number of topics across all models in a corpus.
For both the Print and Online Corpora, a series of structural topic models were generated for three to ten possible topics with the date of publication and a binary variable for major newspapers (based on the aforementioned Pew study [15]) included as metadata. For the Online Corpus and Print Subset Corpus, a continuous variable for average political alignment was also included as metadata. Descriptive statistics were generated for all expected topic proportions (θ), and documents were assigned to topics by their maximum θ value. The stm package helps summarize each topic by representative words according to several measures: probability, FREX (i.e., its frequency and exclusivity), "lift" (i.e., its weighted frequency in one topic relative to its frequency in other topics), and "score" (i.e., its log-frequency in one topic relative to its log-frequency in other topics) [22]. Documents with the highest θ values for a given topic were also inspected to better understand the topics indicated by the representative words. The estimated prevalence of topics and effects of the various metadata were measured with linear regressions fit by the estimateEffects function of stm [17]. All statistical estimations used 95% confidence intervals and are summarized as Supplemental Material (S2 Appendix).
Eight-topic models were selected for the Print, Print Subset and Online Corpora, based on their average topic semantic coherence and exclusivity, compared to models with higher or lower numbers of topics. These values for all numbers of topics are provided as Supplemental Material (S1 Table). Table 1 provides a full summary of key words from all topic models, as well as the distribution of topics across different document characteristics. Topics in the Print Corpus were related to administrative and community notices (Topic 1, FREX: shall, applic, section, appropri, lifelin,  1.3), 22% of documents were from major news outlets, and the average political alignment score was -0.14 (SD = 0.55).

Topic variation
To explore the variance of topic proportions over time, both unadjusted and adjusted regression models were fit with parameters from the structural topic models. The expected prevalence of topics was observed to significantly vary between time intervals in all corpora, adjusting for average political alignment (for the Print Subset and Online corpora only) and major news outlet status (for the Print and Online corpora). All topics were found to vary significantly between time periods in the Print Corpus and Print Subset corpora, and five of eight topics had significant time variation in the Online Corpus. Fig 2 provides a visualization of how estimated topic proportions vary within two years of the introduction and passage of the 2014 Farm Bill. Significant differences between major and non-major media outlets were also observed. Among the Print Corpus, documents from major newspapers had significantly more coverage for half of the topics, and significantly less coverage for the remaining topics. Major media outlets were less predictive of Online Corpus topics and were significantly associated with less coverage for two topics, and more coverage for three topics. Table 2 gives a full summary of these estimates.
For the Online Corpus, the average political alignment of a media source was significantly associated with expected topic coverage: three topics were significantly associated with rightleaning outlets, and five topics with left-leaning outlets. For the Print Subset Corpus, four topics were significantly associated with right-leaning outlets, and one with left-leaning outlets.

Maximum topic prevalence
On average, maximum topic θ values for Print documents were 0.53 (SD = 0.17), 0.48 (SD = 0.15) for Print Subset documents, and 0.49 (SD = 0.14) for Online documents. For "topic-dedicated" documents with a maximum θ value of 0.50 or greater (i.e., at least 50% of a document's text was estimated to pertain to a single topic), a similar distribution is evident between the Print, Print Subset and Online Corpora (average maximum θ values 0.68, 0.63, and 0.64, respectively). Among these more focused documents in the Print Corpus, 23% were classified as Print Topic 8 (FREX: got, mother, son, daughter, didnt, apart, feel); in the Print Subset Corpus, 20% were classified as Print Subset Topic 8 (FREX: billion, budget, bill, deficit, farm, medicar, cut), and in the Online Corpus, 27% were classified as Online Topic 1 (FREX: fraud, investig, jeff, alleg, attorney, prison, arrest). Table 3 summarizes the distribution of topics across all documents in greater detail.
Overall, 46% of Print Media documents predominantly pertained to a single topic ("topicdedicated"), compared to 37% of Print Subset documents and 42% of Online documents. This indicates that the majority of documents included multiple topics, with a possibility of some documents having almost an equal split of two or more topics. Among certain topics, however, the average document's maximum theta value was much higher; for example, approximately 77% of the documents classified as Online Topic 1 (FREX: fraud, investing, jeff, alleg, attorney, prison, arrest) were topic-dedicated, but only 20% of Online Topic 7 (FREX: medicaid, elig, immigr, care, servic, grant, militari) documents were similarly topic-dedicated, indicating that the topic more often co-occurred with other topics. These patterns are further detailed in Table 3.

Discussion
Our findings show how the discourse around SNAP in print and online media significantly varies based on when and where news is reported. Below we highlight two key findings, which may have particularly implications for the broader political discourse and the public perception of SNAP: 1) changes in topic coverage of SNAP over time and in print versus online media, and 2) thematic reporting on SNAP by partisan leaning of news source.

Changing topic coverage of SNAP in print and online media may influence policymaking
In both the print and online media, we found significant variations over time in topic coverage around SNAP (see Fig 2). These fluctuations were more dramatic for some topics than others. For example, within the Online Corpus, discussion of federal budget (e.g., the Farm Bill) (Online Topic 5), anti-hunger programs (Online Topic 2), SNAP shopping and retailers (Online Topic 3), work requirements (Online Topic 4) and fraud (Online Topic 1) varied significantly, while changes in coverage of the other three topics were less dramatic. Within the Print and Print Subset corpora, all topics varied significantly with time, and many topics were significantly more likely to occur in certain time periods and significantly less likely to occur in others.
As other media content studies have indicated, coverage and framing of a policy issue is often different before versus after a major policy shift [11]. Not surprisingly, within the print corpus we see a significant increase in coverage of the Farm Bill (e.g., Print Topic 4) around both 2008 and 2012-years in which a new Farm Bill was drafted and debated in Congress. Within the online corpus, there is a temporal shift in topic coverage related to policy changes in work requirements (e.g., Online Topic 4) implemented as part of the American Recovery and Reinvestment Act (ARRA). Prior to 2015, state-level work requirement waivers were sunset as part of ARRA, and we notice a marked uptick in Topic 5 coverage in this time period.
Given that SNAP is an income-eligible entitlement program used by one in seven Americans [1,23], it is not surprising that the substantial participant population would generate interest in questions around eligibility (e.g., Online Topic 5). However, while this topic is well  represented in online articles (13% of the corpus), deeper investigation suggests that these articles often detail efforts to restrict or curtail eligibility, including proposals to impose drug testing requirements or reinstate work requirements for "able-bodied adults without dependents," or ABAWDs. Thus, within online media, the topic of eligibility emerges predominantly at times of major policy discussions and often uses negative framing of SNAP recipients, which has implications for policy decisions around issues such as work requirements. However, shifts in SNAP topic coverage are not just reactions to current events. Given what is known about how media outlets, both print and online, act as agenda-setters for public discourse and for policymakers [8][9][10]24], the significant variations we found in topic coverage of SNAP may also have implications for direction of SNAP policymaking, as well. The changes in coverage within the Online Corpus of the fraud and abuse topic (a topic in which SNAP recipients are often portrayed in a negative light) are also notable, as they highlight the potential influence of negative media framing on SNAP policy decisions. Alternatively, the episodic nature of certain topics' media coverage may instead be attributable to Congressional policy schedules, whereby certain topics are perennially brought into the public consciousness via news media as they are more relevant for motivating or achieving political outcomes. Still, it is worth considering media as a mutually reinforcing mechanism by which policymakers' partisan attitudes and debates are amplified and disseminated to the general public, perhaps generating support or opposition for a particular proposal or idea. This study finds significant variation in the prevalence of topics over time, and that these temporal trends appear stronger for some topics more than others, though it is beyond the scope of our analysis to conclusively determine the direction of the relationship between news media and policy.

Reporting by partisan leaning of news media source
Based on our analysis, it is evident that the political leaning of media outlets influences the focus of the discussion around SNAP. Within the Online Corpus, all of the topic areas received significantly more coverage in either conservative or liberal media outlets (see Table 2 and Fig 2). Substantively, this looks like more left-leaning media outlets (e.g., Huffington Post or the Washington Post) framing the discussion of SNAP through topics related to poverty rates, federal budget, hunger, SNAP shopping and food access, while more conservative media outlets tend to use the framing of work requirements, program enrollment or eligibility, and fraud. While public dislike of welfare has been previously documented among Americans [4,12], recent research by political scientist Suzanne Mettler highlights the potency of particular words in shaping public opinion for government-funded social support [25]. According to Mettler's study, Americans who disliked the term "welfare" tended to be more distrustful of government policy in general, even if they themselves have benefited from some kind of government support (e.g., Medicaid, unemployment insurance). Word choice is powerful and particularly polarizing words such as "welfare" can frame discussions about social programs in a negative light. Indeed, within our study, the word "welfare" was associated with Print Topic 2, Print Subset Topic 3, and Online Topics 4 and 6. Of these topics with political alignment scores, two significant associations were identified: Online Topic 4 (right-leaning) and Online Topic 6 (left-leaning), suggesting that perhaps the intended news audience is also an important dimension to the use of polarizing terminology.
Currently the largest federal food assistance program and one of the largest forms of federally funded financial assistance, SNAP has persistently been the focus of fiscal conservatives. The debate around SNAP has been laden with many of the same contentious issues as with conventional cash welfare including the deservingness of recipients and dependency or abuse of the program by the poor [4,26]. The framing in conservative media outlets of SNAP through the lens of fraud, eligibility and welfare, rather than as a discussion around hunger, poverty and nutrition may also influence the ongoing political debate that surrounds the program.

Limitations
One major limitation to this study is its chronological focus (1990-2017). While out of necessity given the nature of archived online content and web-scraping technologies, this does limit our ability to monitor changes in topics over broader time horizons. Nevertheless, both the magnitude and reach of online media sources (including traditional media outlets with an online presence and new online-only publications) underscores and mirrors broader shifts in the media landscape, and reinforces our interest in online-accessible documents. Additionally, structural topic models are but one method of assessing large qualitative datasets, and, given their reliance on machine learning, may not always produce externally valid results. However, because we reviewed and qualitatively assessed model outputs, thus refining and enhancing our media corpora based on preliminary findings, we believe that the final topics described have a higher degree of validity.

Conclusions
We attempted to document the various ways in which the Supplemental Nutrition Assistance Program has been represented in news media during the last several decades. Our findings illustrate how the current era of online media complicates previous content analysis strategies. To adapt to these challenges, we employed a topic modeling procedure that was sensitive to changes over time in order to find internally and externally valid topics. In an eight-topic model, themes included a variety of social, political, and economic issues, with expected topic proportions ranging from approximately 9 to 21 percent, and wide variability for certain topics over time. We also found that SNAP coverage fluctuates or remains steady depending on the topic, and while most news media documents featured at least partial representation of multiple topics, some topics, especially fraud, frequently dominated the documents in which they appeared. Combined with the significant political alignment of most topics, we conclude that certain types of SNAP media coverage are both highly partisan in nature and stand relatively apart from the broader media ecosystem.