Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

What can we learn from corporate sustainability reporting? Deriving propositions for research and practice from over 9,500 corporate sustainability reports published between 1999 and 2015 using topic modelling technique


Organizations are increasingly using sustainability reports to inform their stakeholders and the public about their sustainability practices. We apply topic modelling to 9,514 sustainability reports published between 1999 and 2015 in order to identify common topics and, thus, the most common practices described in these reports. In particular, we identify forty-two topics that reflect sustainability and focus on the coverage and trends of economic, environmental, and social sustainability topics. Among the first to analyse such a large amount of data on organizations’ sustainability reporting, the paper serves as an example of how to apply natural language processing as a strategy of inquiry in sustainability research. The paper also derives from the data analysis ten propositions for future research and practice that are of immediate value for organizations and researchers.


Growing legislative pressure and increasing public concern about the global climate and the carrying capacity of the earth have led to increasing demands for organizations to act in sustainable ways [1]. Consequently, the number of organizations that publish information on their sustainability practices has grown steadily [2]. One way in which organizations communicate these practices to stakeholders is through sustainability reports—usually published annually with financial reports [3]—that report on the organization’s “economic, environmental and social impacts caused by its everyday activities” [4].

For the last fifteen years, researchers have sought to shed light on the publication of organizational sustainability practices in sustainability reports in order to determine how organizations interpret the challenge of sustainability. Some researchers have focused on the frequency of reporting and other high-level information in order to gain insights into the general development of sustainability reporting [2,5], while others have used qualitative content analysis techniques to provide an overview of certain organizations’ reporting practices [1]. Other research has examined the content of sustainability reports in a more quantitative way through text-mining techniques, focusing in particular on the frequency of certain terms that are related to sustainability practices [1,3,6,7]. Another study explored the references made to ecological limit by analysing the context of use of a predefined list of terms related to ecological limit [8]. Besides the last study, all of these studies have taken only a limited number of reports into consideration.

In contrast, the present study employs text-mining techniques to conduct topic modelling on 9,514 sustainability reports published between 1999 and 2015. In particular, we apply Latent Dirichlet Allocation (LDA), which is used to identify themes and their distribution in large collections of documents [9].

We extend the current research on sustainability reports in three ways. First, we use a recent data sets by including sustainability reports that were published as recently as the beginning of 2015, but we also extend the time frame back to 1999. Second, we analyse a significantly large number of sustainability reports—9,514 reports. By extending the time frame and the number of reports, we include a high diversity of reports in terms of sector and published year, which allows us to show the development of topics over time and their distribution among sectors. Third, to our best knowledge, we apply a methodology—LDA—that has not yet been used to examine sustainability reports. This method allows us to examine the documents without a predefined list of terms and thus provides us with a broader view on the content of the sustainability reports than other studies were able to gain.

In seeking to shed light on organizations’ sustainability reporting, we focus on identifying (1) sustainability practices and their development over time; (2) the coverage of economic, environmental, and social aspects in sustainability reports; and (3) the differences in sustainability reporting (and practices) among certain sectors [3].

The paper proceeds as follows. The next section provides background on corporate sustainability and sustainability reports. Then we describe the data and methods used. After presenting and discussing the results of our analysis, we conclude in the last section.

Research background

Corporate sustainability

Through the Brundtland Commission’s publication of the report, Our Common Future, the concept of sustainability and particularly the definition of sustainable development as “meet(ing) the needs of the present without compromising the ability of future generations to meet their own needs” [10] has gained popularity [11]. The term corporate sustainability is often used in the context of organizations, but it has no commonly accepted definition [11]. Some authors focus on the environmental aspects of sustainability, others on the social aspects, and others take an integrated view, combining sustainability’s environmental, social, and economic aspects without prioritizing any one dimension [1113]. We see corporate sustainability as lying at the interface of economic contribution, environmental performance, and social responsibility [14]. Further, we agree with the work of Dyllick and Hockerts [15], that these three dimensions of corporate sustainability can be seen distinct on an operational level, but should be integrated on strategic level.

Definitions of sustainable practices differ among the sectors in which organizations operate, but sustainable practices can generally be divided into environmental, social, and economical sustainable practices. The environmental practices refer to the consumption of natural resources and the release of emissions, both of which should be below a rate that ensures the health of the eco-system [15]. Thus, the environmental practices are concerned with reducing environmental degradation through the conservation of resources, including energy [16], and sustainable waste management [17]. Reporting on the environmental practices focuses on eco-control, environmental cost accounting, and life cycle analysis [17,18]. Potential indicators of environmental performance are air emissions, biodiversity, energy use, noise, resource depletion, solid waste, transport, and water use and discharge [14].

The social practices aim at adding value to the local community [15] and helping to maintain stable communities and their quality of life [16] under the umbrella of human rights [17]. Besides corporate citizenship, corporate philanthropy [17], social partnership, and social sponsorship [14], the social practices focus on the development of human capital [17] through, for example, employee training programs, improvement management, apprentice programs, fringe benefits, flexible work time models, health and prevention programs, flexible workplace design, qualification programs for job returnees, minority-promotion programs, and occupational child care [18]. Other topics include stakeholder involvement and customer satisfaction [14].

The economic sustainability practices reflect the guarantee of long-term liquidity and above-average return to the stakeholders [15]. These practices include corporate governance, risk and crisis management, codes of conduct and compliance, corruption and bribery, talent attraction and retention [17], promotion of economic viability [16], economic profitability, and economic equity [19].

Organizations are driven to engage in sustainability by regulatory pressures [14,20,21], pressure from customers and employees [21], and pressure from the organization’s management, as investment in sustainable practices may improve financial performance [22]. In order to implement sustainability in an organization, it must be ensured that sustainability is embedded in the overall strategy, that it has organizational support (including top-management support, bottom-up support), that it includes all business units, that stakeholders are intrinsically and extrinsically motivated, and that the sustainability performance data can be tracked [23]. Barriers to the adoption of sustainable practices include concerns about the ease of implementation and production risks [24].

One way that organizations’ sustainability activities can become visible is through the publication of corporate sustainability reports [11]. Here, we review the history of sustainability reports and the corresponding research on these reports.

Sustainability reports

Organizations’ reporting of non-financial data started in the 1970s with “social balance sheets” [25]. At first, organizations reported on the social benefits they paid to their employees quantitatively [25]. Later they also included information on product quality and social engagement [25]. After several environmental catastrophes in the 1980s, organizations started to report on the environmental aspects of their efforts as well [25], with the first publication of a separate environmental report in 1989 [2]. In the following years, the focus shifted solidly to environmental reports and from somewhat argumentative reporting to proactive reporting with competitive elements [25]. Consequently, today’s sustainability reports are often seen as marketing instruments. Involvement of public relations departments and third parties in the compilation of sustainability reports, as well as industry-specific foci because of industries’ differing stakeholders, also suggest that sustainability reports are often used as marketing instruments [1,26]. Around 2000, the focus shifted again to include more social and financial aspects of companies’ sustainability efforts [13,25]. While in 1999, 98 percent of the reports published by the largest 250 multinationals were concerned only with environmental issues, by 2002 this percentage had declined to 71 percent [2]. In addition, the names of the reports changed from corporate citizenship report (emphasizing the social aspect), to corporate (social) responsibility report and then finally to sustainability report [25]. The number of organizations that report on their sustainability activities has steadily increased to the point at which sustainability reports have become standard procedure [1]. Today many reports follow the format published by the Global Reporting Initiative (GRI) [2], but even though the reports increasingly focus on performance indicators, improvement in the creditability of these figures is needed, as organizations often report only a few indicators, sometimes provide only summarized figures, and do not indicate whether the figures are estimates or measures or how changes were made [1]. Many organizations follow the GRI standard to increase the credibility of their reports [27], and stakeholders are often directly involved in determining the content of the reports [28].

The number of companies that publish sustainability reports differs from sector to sector [2]. In the past, in industrial sectors like chemicals, computers and electronics, cars, utilities, oil and gas, and food and beverages, the number of organizations that publish sustainability reports was higher than average [2], while financial companies, trade and retail, services, communications, and media were less active in reporting their sustainability activities [2]. Since the amount of companies publishing a sustainability report for the first time is decreasing since 2003 [1], one might expect that this distribution among sectors might still be true today.


We employ a semi-automated text-mining technique on publicly available sustainability reports to determine the topics they address. These techniques usually represent documents as vectors. In the easiest form, such a vector includes for each term in the document the number of appearance. However, such a vector has a high number of dimensions (each one reflecting one term). Thus, we need to reduce the dimensionality of the resulting vector [29] in order to be able to handle these huge amount of data.

For this, we are using LDA since, in the resulting vector of LDA, each dimension corresponds to one topic or concept [29]. A topic is a probability distribution over all of the terms that co-occur in the underlying documents [29] and one document is a probability distribution itself over all topics in the corpus [30,31]. That means, when describing a topic, the author takes words with a certain probability from the pool of terms related to that topic [31]. For instance, when writing about the topic of climate change, terms like climate, CO2, emissions, GHG, warming, or temperature have a high probability of appearing, while terms like employee benefits, social responsibility, or gains have a lower probability.

Topics are identified through considering which terms are often occurring together, thus, it is assumed that the more often terms occur in the same document, the more likely it is, that they belong to the same topic. Each sustainability report consists of several topics. The probability distribution of one of these documents shows how prominent the identified topics are in this specific report.


Before the analysis begins, a few data pre-processing steps are necessary: (1) collecting data, (2) converting documents from pdf to text files, (3) filtering out documents that are not in English (to avoid issues related to translation), (4) tokenizing (splitting the documents into words [32]), (5) cleaning the text, (6) lemmatizing, and (7), removing stop words.

We collected 15,351 sustainability reports published between 1999 and 2015 for our study. We retrieved the PDF documents from the GRI website ( by crawling and scraping the contents automatically. For each document, we collected metadata like publication year, company, and sector. With the help of a Python script we created, we converted these pdf files into text files for further analysis. In the course of these steps, we excluded 1,152 encrypted documents that could not be extracted easily. To avoid issues related to translation, we limited the documents to those written in English, so we used another Python script to identify those documents. The remaining 9,514 sustainability reports written in English serve as the basis for all further analyses.

Next, we tokenized the documents—split them into tokens that include words and special symbols like punctuation marks—and cleaned up the text by bringing all characters into lower-case and removing special characters and numbers. Then we lemmatized the words using the WordNetLemmatizer, and eliminated standard stop words (i.e., general-purpose words like articles, pronouns, and conjunctions). For this, we used the stop word list “English” provided by the NLTK package of Python. Further, we removed terms that appeared in fewer than two documents. We manually checked the remaining vocabulary to exclude other irrelevant terms like country names.

Latent Dirichlet Allocation (LDA)

The purpose of the LDA process is to find in each document a mix of topics, where each topic is described by a mix of terms [31]. Thus, the probability distribution of the mix of topics differ from that of the mix of terms [31]. The hyper-parameter α describes the shape of the per-document topic distribution, and the hyper-parameter β describes the shape of the per-topic word distribution [32]. The distributions are estimated by the algorithm using Dirichlet priors [31]. Gensim (for Python) and Mallet (for Java) are among the extant efficient and effective implementations of LDA [29].

We use the Mallet implementation, which automatically estimates the hyper-parameters α and β, to conduct the LDA analysis. The number of topics, defined in advance, depends on the intended level of topic specialization [31]. We wanted to assign labels to each of the resulting topics, but for when there are too few dimensions, the topics tend to be more general, as they are a broad mixture of terms that makes it difficult to assign specific labels, and when there are too many, the topics become too specific. We employed the algorithm on three, five, ten, twenty, fifty, seventy, and one hundred dimensions and compared the results, deciding to focus on seventy dimensions, as this number gave us a broad variety of topics without going into too much detail.

The algorithm produced two result sets per topic. The first result set consists of all terms of the corpus and the degree to which they are likely contribute to the topic [31]. The second result set contains all documents in the corpus and the probability that the corresponding topic occurs in the document.

In interpreting the results, the five to twenty most probable terms for each topic are usually examined in order to identify the degree of commonality and, thus, specify the label of the topic [29]. Our analysis focused on the terms with the twenty highest probabilities. Five researchers, including the first author of this work, examined the seventy topics and classified each as describing environmental sustainability, social sustainability, economic sustainability, sustainability in general, or no sustainability at all. All researchers were provided with definitions of environmental, social, and economic sustainability based on the practices described in the research background of this paper. Since sustainability reports also contain information that is not related to sustainability, we expected to find several topics that were not relevant to our further analysis.

In the next step, we continued the examination of the topics that are relevant to sustainability by analysing the prominence of industries in each topic in terms of their mean probability of occurring in the topic. We excluded a few topics that consisted of terms that seemed to describe sustainability practices but instead described the business of the most probable sectors. For instance, one topic contained words like oil, gas, and energy, which appear to describe energy sources as a topic of environmental sustainability. However, analysis of the most probable sectors showed that this topic is used primarily by the energy sector, so it probably describes their business activities. Therefore, we excluded this topic. For the remaining topics, we analysed the mean probability per year, per country, per continent, and per organization size. We found all information except that for the continent in the meta-data provided by the GRI database and assigned countries to their continents based on a map from the United Nations Statistics Division (

We also assigned to each topic that was relevant to sustainability a label that describes the topic’s content. Therefore, the first author of this study made one proposal based on the twenty most probable terms of each topic, discussed it with the second author, and resolved any disagreement. Thereby, labels were selected in order to represent the twenty most probable terms (those terms that are used with high probability when describing the topic) of the specific topic.

Table 1 provides an overview on the conducted steps as well as the main decisions that had to be made in order to conduct the analysis.


We analyse 9,514 sustainability reports published between 1999 and 2015 by 3,906 different organizations. The most common industries were financial services, followed by the energy sector, the mining sector, and food and beverage products.

We find forty-two topics that are related to sustainability. We conduct several analyses for each topic, including its development over time, its distribution over industries, countries, continents, and size of organization. Based on this analysis, we come up with ten observations that are summarized in Table 2 and are described in the following. Further, the Appendix contains an overview of all seventy topics, including the most prominent terms of each one as well as the probability of occurrence (how high is the chance that the specific term appears in the context of this topic) of these terms in the context of this topic.

Observation 1: Organizations report on environmental, social, and economic sustainability

During our first interpretation phase, we looked at each topic/collection of terms and assigned this topic to environmental sustainability, social sustainability, economic sustainability, general sustainability or not related to sustainability. The corresponding assignment can be found in the table in Appendix. We can find topics that are related to environmental sustainability, as well as topics that are related to social or economic sustainability. Thus, we state that organizations report on environmental, social, and economic sustainability.

Observation 2: Topics on environmental, social, and economic sustainability are equally distributed

In total, we identify 42 topics that are related to sustainability from which we assigned 31 topics to be either related to environmental or social or economic sustainability. In total, there are eight topics related to environmental sustainability, 13 topics related to social sustainability and eleven topics related to economic sustainability. Thus, all three dimensions are covered by roughly the same number of topics.

Observation 3: Economic sustainability topics are of increasing importance for organizations

For each topic, the LDA algorithm provides us with the probability that this topic appears in a specific document. And for each document we know the year in which it was published. In order to understand how the probability of a topic changed over years, we calculated the mean probability for this topic in all documents of a specific year. We further calculated this mean probability not only for one specific topic but for a group of topics, e.g., all topics that we previously assigned to being related to economic sustainability. Fig 1 provides an overview on the development of the mean probabilities of each of the three dimensions. As the linear trend line shows, the probability of environmental sustainability topics is slightly decreasing, while the one of social sustainability topics is more or less stable. The trend line of economic topics shows a constant increase. Particularly, the probability that an economic topic appeared in a sustainability report strongly increased from 2010 to 2011. Thus, we observe that economic sustainability topics are of increasing importance for organizations.

Fig 1. Trend analysis of mean probability of occurrence of environmental, social, and economic topics in the years 1999 until 2014.

Observation 4: As to environmental sustainability, organizations report on emissions and energy consumption

We find eight topics that are related to environmental sustainability. Two topics (no. 16 and no. 32) refer to environmental sustainability performance and environmental sustainability data respectively. Four topics are concerned with environmental sustainability in the supply chain. Topic no. 10 (green supplier) focuses on the supplier as part of the global supply chain. Environment-related terms are energy and emissions. Topic no. 21 (production) and topic no. 45 (green production) are, of course, concerned with production. Among the most probable occurring terms are safety, emissions, and fuel in topic no 21 and material, energy, waste, reduction, recycling, emissions, and impact in topic no. 45. Topic no. 35 (production and packaging) broadens the focus by including packaging, particularly recycling. Most probable occurring terms for this topic are recycling, waste, water, safety, and health. Two topics focus on environmental sustainability in certain contexts: Topic no. 53 summarizes terms from building construction, and environmental-sustainability-related terms are energy, green, sustainable, environmental, water, and material. Topic no. 63 summarizes terms that relate to water management, including wastewater treatment.

Table 3 provides an overview on all these topics, including the most probable terms for each topic. The most probable terms are those terms that have the highest probability (noted as percentage after each term) to appear if a text is about the specific topic.

For most of the environmental sustainability related topics, the terms energy and emissions as well as related terms such as gas, ton, waste, or consumption are among the twenty most probable terms, showing that organizations focus on energy consumption and emissions (including waste) specifically.

Observation 5: Biodiversity and renewable energy sources receive little attention in reports by organizations

While energy is well covered in the topics related to environmental sustainability, we could not find evidence that renewable energy was discussed in the reports as well. Further, the probability of the term biodiversity was close to zero, meaning that the chance of appearing in one of the sustainability reports is very low. Consequently, we conclude that biodiversity and renewable energy sources receive little attention in the analysed sustainability reports.

Observation 6: Regarding social sustainability, organizations report on labour practices

We find thirteen topics that are related to social sustainability. Six of these topics are concerned with employees and labour practices, each has a unique focus. Table 4 shows the most probable terms of each of these topics. Employee safety and work time belong to the five topics with the highest mean probability over all analysed topics, which further show the relevance of these topics for organizations.

Table 4. Social sustainability topics concerned with employees including most probable occurring terms ordered by their probability.

Observation 7: Customer orientation is in organizations’ focus

Two topics related to social sustainability focus on stakeholder involvement. Topic no. 69 refers to stakeholder information and consists of probable terms like program, community, organization, management, performance, government, work, people, information, reporting, and development. Topic no. 20 (customer orientation) focuses on one specific stakeholder, the customer. Probable terms in this topic include customer, service, product, responsibility, satisfaction, information, online, and survey. Analysing the development of the mean probability of these both topics between 2000 and 2014, we find that the mean probability of topic no. 20 is slightly increasing over time while the trend for topic no. 69 shows a slight decrease (see Fig 2). To summarize, the customer is one only stakeholder that is mentioned in a separate topic and this topic is further of increasing prominence in the sustainability reports, showing the importance of this topic for organizations.

Fig 2. Trend analysis of mean probability of occurrence of topic no. 20 and no. 69 in the years 2000 until 2014.

Observation 8: Sponsorship activities for social sustainability focus on schools and education

One topic related to social sustainability (topic no. 60) specifically addresses sponsorship, more precisely, school sponsorship. Among the most probable terms are school, project, education, child, development, support, foundation, initiative, and partnership. After a peak in 2004, the probability of occurrence of this topic in a sustainability report remained more or less stable.

Observation 9: Economic sustainability reporting is based on financial data

Of the ten topics related to economic sustainability, six are related to financial data. Table 5 shows the most probable terms for each of these financial data topics. Common probable terms are share, board, risk, tax, consolidated, shareholder, asset, euro, cost, and loss.

Table 5. Economic sustainability topics related to financial data including most probable occurring terms ordered by their probability.

The trend analysis of the probability of occurrence for these topics over time (Fig 3) shows that all financial data topics increase in probability over time, meaning that it is more likely that they are mentioned in sustainability reports.

Fig 3. Trend analysis of mean probability of occurrence of financial data topics in the years 2000 until 2014.

Observation 10: Sustainability actions are both general and context-specific in nature

For each topic, we also analyse how it is distributed over industries and countries respectively continents. That means, for each topic, we calculate the mean of the probabilities of all sustainability reports that were published by organizations that belonged to one specific industry or country. We gained the information about industry and country from the meta-data that we downloaded together with each sustainability report.

Many topics are kind of equally distributed over the available industries and countries, however, several topics are remarkable prominent on specific industries or countries. In the following, we only highlight those topics that differ from the average.

For the environmental sustainability topics, we find several prominent industries. For instance in the topics no. 10 (green supplier), 21 (production), and 35 (production & packaging) two industries are prominent: in topic 10, the computer industry and the technology hardware industry; in topic no. 21 the chemicals industry and the construction material industry; and in topic no. 35 the food and beverages industry and the households and personal products industry. In topic 45 (green production), the technology hardware industry, the consumer durables industry, and the equipment industry are the most prominent sectors. The construction and the real estate industries are concerned with building construction, while the water utilities industry and the waste management industry report about water management. Regarding the country, we find further significant differences between the two general environmental sustainability topics, as environmental sustainability performance is prominent mainly in North America, particularly in the United States. We also find differences among the supply chain topics. Green supplier is prominent in North America, while the production topics are most likely to occur in Asian reports. Production and packaging is most probable in Europe, followed by North America and Africa.

Analysing the topics related to social sustainability, topic no. 11 (social sustainability data) is particularly probable in the forest and paper products industry and in the agriculture industry, while the topic work time is most probable in reports from the toy industry. Topic no. 25 (sustainable development) is most probable in reports from Morocco or France. Topics no. 11 (social sustainability data) and no. 44 (corporate social responsibility) are most probable in the countries of South America; topic no. 11 is particularly common in Brazil. Employee safety is particularly prominent in North America and Asia, while employee diversity occurs primarily in the reports of multinational enterprises from North America. In Europe, employee responsibility is the most probable topic, while management is the most probable topic in reports from Asia. The topic work time is most probable in reports of small and medium-sized enterprises headquartered in Ecuador, while stakeholder information is more probable in small and-medium-sized enterprises in Australia and New Zealand.

The topics related to economic sustainability are the least focused on specific industries and countries. Only topics no. 39 (financial data 4) and no. 66 (financial data 6) are particularly probable in reports from Africa, as is investment.

We also find some topics that are related to sustainability in general. These topics are rather specific for certain industries and countries, for instance topic no. 30 (sustainability activities) is most probable in reports from the energy utilities sector, while nuclear power is, understandably, of particular interest to the energy industry. Stakeholder issues are mainly a topic of the tobacco industry. Topic no. 33 (CSR activities) is most probable in reports from the toys and the hardware technology industry. Topic no. 56 (development) is highly probable in reports from the mining industry. Further, topics no. 14 (organizational sustainability), no. 31 (sustainability program), and no. 33 (CSR activities) are most probable in reports from Asia. The probability of topic no. 30 (sustainability activities) is particularly high in Italian reports, while topic no. 41 (general sustainability) is more probable in reports from Oceania, and topic no. 50 (corporate sustainability) is more likely to appear in European reports. Sustainability projects are prominent only in Europe, and there particularly in Germany and Austria. The analysis of the companies’ nationality shows distinct results for the nuclear power topic, which is prominent mainly in Europe, but particularly in Belarus and the Russian Federation. Stakeholder issues is probable mainly in reports from Uganda, while the topic annual meeting has a high probability of appearing in reports from Africa, particularly South Africa and Namibia. Topic no. 56 (development) is most probable in reports from North America.


Our analysis applies topic modelling to more than 9,000 sustainability reports in order to identify sustainability practices. We identify forty-two topics that are related to sustainability from which we make ten observations. In the following, we discuss these observations and develop ten related recommendations for organizations and researchers.

Observation 1: Organizations report on environmental, social, and economic sustainability

Coding the topics identified in the sustainability reports confirms the notion of the so-called triple bottom line [33], in that topics relate to environmental, social, and economic sustainability. From the forty-two topics, we assigned thirty-one to one of these dimensions. Even though the triple bottom line has been criticized for being difficult to implement [34], our results suggest that the three dimensions fit to structure organizations’ sustainability topics in practice, confirming previous results that organizations report on all of these dimensions [13,25]. The remaining eleven topics that were not assigned to one of the three dimensions are related to general sustainability topics that consist of a mix of terms that belong to all three dimensions, thus representing the integration aspect of the sustainability definition [35]. Hence, our results show that organizations report on the distinct dimensions of sustainability, but their reports also reflect the required integration of environmental, social, and economic sustainability. Particularly in the general sustainability topics, the focus seem to be more on strategic elements, for instance, terms like business, group, and management appear among the most probable terms. This would be in line with our understanding of sustainability that sustainability can be seen distinct on operational level, but should be seen integrated on strategic level. We recommend that organizations keep this distinction on their operational level but focus on integrating the three dimensions on a strategic level [15].

Observation 2: Topics on environmental, social, and economic sustainability are equally distributed

While the common definitions of sustainability highlight the integration of its environmental, social, and economic dimensions [35], these dimensions have been seen historically as distinct. For instance, corporate sustainability origins are in environmental sustainability, while corporate social responsibility, which is today often synonymous with corporate sustainability, has its origins in social sustainability [12]. Before these two terms gained prominence, profit maximization and, therefore, economic value were seen as the core business functions [36]. Our study shows that the topics organizations report on can be more or less equally assigned to all three dimensions. Of the forty-two topics, slightly less than a quarter relates to environmental sustainability, slightly more than a quarter relates to social sustainability, and a quarter relates to economic sustainability. Thus, at least based on the number of topics, despite their origins, the three dimensions are nearly equally covered in the reports. However, this does not mean that the three dimensions are also equally covered in terms of depth or occurrence. Also previous studies found that organizations report on all three dimensions, however, the dimensions are not equally covered and efforts should be made to balance all three dimensions [37].

Observation 3: Economic sustainability topics are of increasing importance in organizations

The mean probability of most economic sustainability topics is increasing, indicating that–while economic, social, and environmental sustainability overall are covered equally in sustainability reports–mentions of economic sustainability are increasing. This finding confirms previous results concerning an increasing relevance of economic topics in sustainability reports [2] that might be a consequence of the 2008 financial crisis [1]. The mean probability of all environmental sustainability topics and the mean probability of all social sustainability topics have been largely stable since 1999; however, these two areas’ mean probabilities in all of the reports in our analysis are higher than that of all economic sustainability topics. One reason for this result might be seen in economic pressures like that seen in the Europe crises [38,39], the concerns about the Chinese economy [40], or organizations’ growing interest in digital innovation and transformation in their businesses [41]. In this regard, the data might confirm that organizations prioritize economic concerns during crises and that ecological and social interests are more likely to be considered in stable economic times [42,43]. However, research indicates that sustainability transformation can also offer economic potential for organizations and that digital technologies in particular can open up new business opportunities and business models in areas of environmental and social sustainability [37,44,45] like smart houses and energy supply solutions [46]. Consequently, organizations should leverage the economic potential of including environmental and social sustainability in their activities.

Observation 4: As to environmental sustainability, organizations report on emissions and energy consumption

Diverse measures for environmental sustainability have been discussed in the literature, including air emissions, energy use, resource depletion, waste, and water use [14]. Our study reveals that organizations predominantly report on their emissions and energy consumption data. Emissions and consumption are also the most frequently mentioned environmental issues found in a previous analysis, but in that analysis consumption data for energy and water were reported equally often [1]. In our results, energy was more probable than water in the context of environmental sustainability performance. We found a specific topic on water, but this topic focused on waste water. Consequently, we conclude that organizations should increase their range of measures for environmental sustainability by, for instance, additionally reporting on fuel and paper consumption, waste, and emission of certain gases [1].

Observation 5: Biodiversity and renewable energy sources receive little attention in reports by organizations

Research has identified both biodiversity and renewable energy sources as important aspects of environmental sustainability [14,47], but these topics were absent or rare in the sustainability reports we analysed. The probability of the term biodiversity is close to zero, and we found no evidence that the term occurs in the context of general environmental sustainability reporting. One reason for this rarity might be the complexity of biodiversity, as the related impacts of some of organizations’ actions are often distant in time and space [48]. Furthermore, other than some well-known threats to biodiversity, such as pollution, many threats are not yet fully understood [48], making it difficult for organization to address the issue. Another explanation might be that loss of biodiversity is a result of environmentally unsustainable behaviour, including over-abstraction of water, increasing demand for resources, and rising consumption levels [48], topics that are well-covered in the sustainability reports. Terms that relate to renewable energy are also rare in the reports. Our study suggests that organizations have not taken significant action to invest in renewable energy, nor are they reporting on projects to come. Investigations on why this is the case in terms of whether organizations see sufficient potential in renewable energy and biodiversity and plan to adopt it in the future would be useful.

Observation 6: Regarding social sustainability, organizations report on labour practices

The literature has discussed diverse issues concerning social sustainability, such as employee training programs, health and prevention programs, stakeholder involvement, customer satisfaction, and sponsorship [13,14]. Our study also supports the importance of topics like safety, work time, diversity, and development. These topics tend to cover the indicators mentioned in the global reporting principles and standards under the sub-category of labour practices and decent work, and organizations seem to cover the most important human rights in their reports. Compared to all 42 topics, which we analysed, two of the topics related to employees are among the five most probable topics (having the highest mean probability of occurring in a sustainability report), so employees play an important role in organizational sustainability reports. However, organizations should scope their sustainability initiatives beyond legal requirements [37] in order to differentiate themselves from their competitors. In particular, continuing with efforts concerning labour practices beyond existing regulations can greatly improve an employer’s profile and attract top talent on the competitive global job market [49].

Observation 7: Customer orientation is in organizations’ focus

The literature has identified several motives for organizations to engage in sustainability transformations, including regulations [14,20,21], pressure from customers [21], and new market creation [45]. Against this background, our analysis revealed a dominant topic related to customer-orientation. The probable terms of customer, responsibility, satisfaction, information, online, and survey show that organizations are concerned with customer satisfaction and use online surveys to measure it. These findings suggest that customer orientation is a valid strategy for organizations, as they report on their sustainability initiatives with respect to their customers’ perspective. Drawing from the data, we conclude that organizations should scope sustainability initiatives in order to consider a wider range of stakeholders. Stakeholder theory, in particular, shows that multiple views should be balanced in order to achieve business success over the mid- to long-term [50]. Sustainability research has also shown that corporate sustainability requires that one considers interactions with and value creation for all stakeholders [37,51].

Observation 8: Sponsorship activities for social sustainability focus on schools and education

Sponsorship activities are part of organizations’ social sustainability practices [14]. Since our analysis reveals a focus of such sponsorship on funds for schools and education, we find that decision makers appear to believe in the role of education in improving (social) sustainability over the long term [52]. Building on our data, we conclude that, in addition to their current activities, organizations should invest in their employees’ education as well in order to achieve bottom-up support of their sustainability activities, which has been shown to help organizations adopt sustainable practices [23].

Observation 9: Economic sustainability reporting is based on financial data

The topics on economic sustainability focus on financial data particularly that which is part of the corporate balance sheet. Our study did not find topics on practices like compliance or on codes of conduct, both of which have been identified as valuable in the effort to sustain economic success. We argue that sustainability reports should be enriched by statements on how to sustain and develop economic results in pursuit of economic sustainability [37].

Observation 10: Sustainability actions are both general and context-specific in nature

Research on sustainability transformations has identified a number of action potentials, such as guiding behaviour by sense-making and sustainable practices [53]. Building on the data from the sustainability reports of 3,900 companies, our study reveals a broad range of topics covered in sustainability reports, some of which are related to certain industries or certain geographic regions, while others are well distributed among industries and regions. In this regard, our study confirms previous assumptions about industry-specific practices [3]. We further assume that the differences in industries and regions might be due to different stakeholders, e.g. different national initiatives. Also previous research showed that the content of sustainability reports is influence by the organization’s stakeholders [1]. Therefore, we argue that future research and practice should be more specific in characterizing and understanding the context of sustainability behaviour.

In several cases, we can make assumptions concerning why a certain industry or region focuses on a particular topic. For instance, green production is most likely to be a topic in reports from companies in Asia, especially those in the technology hardware, consumer durables, and equipment industries, with which companies in China, Japan, and Taiwan are typically associated. Therefore, we see a link among the continent, the industry, and the most probable terms in the topic. The production and packaging topic is most likely to occur in reports from companies in Europe, possibly because of the European directive on packaging and packaging waste that aims to improve packaging’s environmental performance. Another example is sustainability projects, which are most likely to occur in German reports, possibly because of the German Energy Transition (“Energiewende”), a movement toward alternative energy sources that started in the 1970s and gained popularity in 2011 [54]. Therefore, we agree with the statement from Liew et al. [3] that sustainability practices are industry-specific. We also show how different stakeholders—particularly governments through regulations—influence the content of sustainability reports [1].

We summarize these ten observations and ten recommendations in Table 6.


Increasing numbers of organizations are publishing sustainability reports about their sustainability practices [2]. The present work used topic-modelling techniques to analyse 9,514 sustainability reports published by organizations between 1999 and 2015 and derives ten specific propositions to guide future research and practice.

More specifically, we identified forty-two topics related to sustainability that are distributed approximately equally in the areas of environmental, social, economic, and general sustainability. We showed that topics related to environmental sustainability consist mainly of emissions and consumption, particularly related to energy; biodiversity and renewable energy do not appear in our results. The focus of social sustainability is on employees, but customer orientation and sponsorship are also covered. In addressing economic sustainability, organizations tend simply to present their financial data. We also show the influence of industry and country on the content of the topics.

We advise organizations to balance their activities in and to use the potential of all three dimensions of sustainability, to increase their measures for environmental sustainability, to continue with their efforts concerning labour practices, to consider their interactions with all stakeholders, to invest in their employees’ education concerning sustainability, and to provide information in their sustainability reports on how to sustain and develop their economic results. Researchers are advised to investigate why organizations have not focused on biodiversity or renewable energy and to be precise on the contexts of the sustainability behaviours they examine.

Our work is not without limitations. We used text-mining techniques that reduce the content of the documents to simple collections of terms, so our findings depend on our interpretation of the results. Particularly, the labelling of the topics is based on the subjective opinion of the authors and other researchers might come up with different labels. Still, we are convinced that our labels represent the content of each topic (based on the most probable terms that describe this topic) well and thus, also other labels would not have a big influence on our findings and observations. Nevertheless, we encourage other researchers to evaluate our results through an in-depth qualitative analysis of sustainability reports. In addition, our data sources are sustainability reports that organizations publish to report on their sustainability activities, but they are also used as marketing instruments [1], so they might not reflect corporate sustainability practices in all details. Furthermore, we received these reports from one single source, the GRI database, thus, there might exist a certain bias in the data. Future research can address this limitation by complementing our findings with interviews of those who participate in sustainability activities in organizations or use another data source such as the Corporate Register, a database with more than eighty thousand corporate responsibility reports ( To our best knowledge, our application of LDA in the context of sustainability reports is new, so questions could arise regarding its reliability. While the use of LDA is not without risk, we are confident that, in this case, it provided valuable insights.

Despite its risks, we propose that other researchers use LDA in their research on sustainability reports or organizational reports in general, as its use in these contexts has several advantages compared to manual coding techniques: First, LDA allows the researcher to take a large amount of data into consideration and opposed to manual coding, the data analysis costs are minimal [32]; in our case, it allowed us to provide a broad picture of sustainability practices over several industries and years. Second, LDA requires no restrictions on the content of the topics, such as the requirement that one focuses on certain indicators as previous studies [8] have done; we had only to restrict the number of topics to be modelled. Third, LDA allows the resulting topics to be used for further analysis, such as we did in analysing the topic distribution over industries, years, and regions, restricted only to the nearly forty industries that published their sustainability reports on the GRI database. For example, the number of reports from the toy industry might otherwise have kept it from being the focus of any sustainability research, but using LDA allowed it to be included, and our analysis revealed two topics, work time and corporate responsibility activities, that were highly probable in reports from this industry. Fourth, applying LDA provides a broad overview of the many definitions and conceptualizations of sustainability that exist in organizations.

Our research can also guide practitioners in their sustainability activities, as it provides a comprehensive overview of potential sustainability-related efforts among the ten recommendations.

The study contributes to research in two ways. First, researchers can use our results to either explore the topics that resulted from the analysis or to explore the reasons for missing topics. Second, we propose a new technique for analysing sustainability reports or corporate reports in general that other researchers might use to analyse other types of documents.


Table 7 provides on overview about all topics, including the label (if related to sustainability), the most probable terms and the sustainability dimension.

Table 7. Overview on all 70 topics including the twenty most occurring ordered by their probability of occurrence.

Author Contributions

  1. Conceptualization: NS JvB.
  2. Data curation: NS.
  3. Formal analysis: NS.
  4. Investigation: NS.
  5. Methodology: NS.
  6. Project administration: NS.
  7. Resources: JvB.
  8. Supervision: JvB.
  9. Visualization: NS.
  10. Writing – original draft: NS.
  11. Writing – review & editing: JvB NS.


  1. 1. Freundlieb M, Teuteberg F (2013) Corporate social responsibility reporting-a transnational analysis of online corporate social responsibility reports by market–listed companies: contents and their evolution. International Journal of Innovation and Sustainable Development 7: 1–26.
  2. 2. Kolk A (2004) A decade of sustainability reporting: developments and significance. International Journal of Environment and Sustainable Development 3: 51–64.
  3. 3. Liew WT, Adhitya A, Srinivasan R (2014) Sustainability trends in the process industries: A text mining-based analysis. Computers in Industry 65: 393–400.
  4. 4. Global Reporting Initiative (2015) About Sustainability Reporting.
  5. 5. Kolk A (2003) Trends in sustainability reporting by the Fortune Global 250. Business Strategy and the Environment 12: 279–291.
  6. 6. Modapothala JR, Issac B. Evaluation of corporate environmental reports using data mining approach; 2009. IEEE. pp. 543–547.
  7. 7. Modapothala JR, Issac B, Jayamani E (2010) Appraising the corporate sustainability reports–text mining and multi-discriminatory analysis. Innovations in Computing Sciences and Software Engineering: Springer. pp. 489–494.
  8. 8. Bjørn A, Bey N, Georg S, Røpke I, Hauschild MZ (2016) Is Earth recognized as a finite system in corporate responsibility reporting? Journal of Cleaner Production.
  9. 9. Blei DM (2012) Probabilistic topic models. Communications of the ACM 55: 77–84.
  10. 10. World Commission on Environment and Development (1987) Report of the World Commission on Environment and Development: Our common Future. United Nations.
  11. 11. Linnenluecke MK, Griffiths A (2010) Corporate sustainability and organizational culture. Journal of World Business 45: 357–366.
  12. 12. Montiel I (2008) Corporate social responsibility and corporate sustainability separate pasts, common futures. Organization & Environment 21: 245–269.
  13. 13. Hahn T, Pinkse J, Preuss L, Figge F (2014) Tensions in corporate sustainability: Towards an integrative framework. Journal of Business Ethics 127: 297–316.
  14. 14. Azapagic A (2003) Systems approach to corporate sustainability: a general management framework. Process Safety and Environmental Protection 81: 303–316.
  15. 15. Dyllick T, Hockerts K (2002) Beyond the business case for corporate sustainability. Business Strategy and the Environment 11: 130–141.
  16. 16. Krug BA, Burnett SE, Dennis JH, Lopez RG (2008) Growers look at operating a sustainable greenhouse. GMPro 28: 43–45.
  17. 17. Belu C (2009) Ranking corporations based on sustainable and socially responsible practices. A data envelopment analysis (DEA) approach. Sustainable Development 17: 257–268.
  18. 18. Hahn T, Scheermesser M (2006) Approaches to corporate sustainability among German companies. Corporate Social Responsibility and Environmental Management 13: 150–165.
  19. 19. Zucca G, Smith DE, Mitry DJ (2009) Sustainable viticulture and winery practices in California: What is it, and do customers care. International Journal of Wine Research 2.
  20. 20. Daily BF, Huang S-c (2001) Achieving sustainability through attention to human resource factors in environmental management. International Journal of Operations & Production Management 21: 1539–1552.
  21. 21. Wilkinson A, Hill M, Gollan P (2001) The sustainability debate. International Journal of Operations & Production Management 21: 1492–1502.
  22. 22. Ameer R, Othman R (2012) Sustainability practices and corporate financial performance: A study based on the top global corporations. Journal of Business Ethics 108: 61–79.
  23. 23. Seidel S, Recker J, Pimmer C, vom Brocke J. Enablers and Barriers to the Organizational Adoption of Sustainable Business Practices; 2010. pp. 1–10.
  24. 24. Hall TJ, Dennis JH, Lopez RG, Marshall MI (2009) Factors affecting growers' willingness to adopt sustainable floriculture practices. HortScience 44: 1346–1351.
  25. 25. Fifka MS (2015) Zustand und Perspektiven der Nachhaltigkeitsberichterstattung. Corporate Social Responsibility: Springer. pp. 835–848.
  26. 26. Sweeney L, Coughlan J (2008) Do different industries report corporate social responsibility differently? An investigation through the lens of stakeholder theory. Journal of Marketing Communications 14: 113–124.
  27. 27. Hedberg CJ, von Malmborg F (2003) The Global Reporting Initiative and corporate sustainability reporting in Swedish companies. Corporate Social Responsibility and Environmental Management 10: 153–164.
  28. 28. Searcy C, Buslovich R (2014) Corporate perspectives on the development and use of sustainability reports. Journal of business ethics 121: 149–169.
  29. 29. Crain SP, Zhou K, Yang S-H, Zha H (2012) Dimensionality reduction and topic modeling: From latent semantic indexing to latent dirichlet allocation and beyond. In: Aggarwal CC, Zhai C, editors. Mining Text Data: Springer. pp. 129–161.
  30. 30. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. The Journal of Machine Learning Research 3: 993–1022.
  31. 31. Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation; 2009. ACM. pp. 61–68.
  32. 32. Debortoli S, Müller O, Junglas I, vom Brocke J (2016) Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems (CAIS).
  33. 33. Elkington J (1997) Cannibals with forks. Oxford, United Kingdom: Capstone Publishing Limited.
  34. 34. Amini M, Bienstock CC (2014) Corporate sustainability: an integrative definition and framework to evaluate corporate practice and guide academic research. Journal of Cleaner Production 76: 12–19.
  35. 35. Mitchell C (2000) Integrating sustainability in chemical engineering practice and education: concentricity and its consequences. Process Safety and Environmental Protection 78: 237–242.
  36. 36. Davis K (1973) The case for and against business assumption of social responsibilities. Academy of Management Journal 16: 312–322.
  37. 37. Székely F, Knirsch M (2005) Responsible leadership and corporate social responsibility:: Metrics for sustainable performance. European Management Journal 23: 628–647.
  38. 38. Sannajust A (2014) Impact of the World Financial Crisis to SMEs: The determinants of bank loan rejection in Europe and USA.
  39. 39. Acharya VV, Eisert T, Eufinger C, Hirsch CW (2014) Real effects of the sovereign debt crisis in Europe: Evidence from syndicated loans. CEPR Discussion Paper No DP10108 Available at SSRN: http://ssrncom/abstract=2501580.
  40. 40. Matthews C (2015) Will the crisis in China sink the U.S. economy? Fortune.
  41. 41. Schmiedel T, vom Brocke J (2015) Business process management: Potentials and challenges of driving innovation. In: Vom Brocke J, Schmiedel T, editors. BPM-Driving Innovation in a Digital World: Springer. pp. 3–15.
  42. 42. Zenghelis D (2012) A strategy for restoring confidence and economic growth through green investment and innovation.
  43. 43. Geels FW (2013) The impact of the financial–economic crisis on sustainability transitions: Financial investment, governance and public discourse. Environmental Innovation and Societal Transitions 6: 67–95.
  44. 44. Ambec S, Lanoie P (2008) Does it pay to be green? A systematic overview. The Academy of Management Perspectives 22: 45–62.
  45. 45. Hockerts K (2015) A cognitive perspective on the business case for corporate sustainability. Business Strategy and the Environment 24: 102–122.
  46. 46. Székely N, Seidel S, vom Brocke J. Green IS: Are We Still Thinking in Mere Economic Imperatives or Are We Striving for Eco-Effectiveness?; 2015; Puerto Rico.
  47. 47. Panwar N, Kaushik S, Kothari S (2011) Role of renewable energy sources in environmental protection: a review. Renewable and Sustainable Energy Reviews 15: 1513–1524.
  48. 48. Rands MR, Adams WM, Bennun L, Butchart SH, Clements A, Commes D, et al. (2010) Biodiversity conservation: challenges beyond 2010. Science 329: 1298–1303. pmid:20829476
  49. 49. Johnson M (2014) Winning The War for Talent: How to Attract and Keep the People to Make the Biggest Difference to Your Bottom Line: John Wiley & Sons.
  50. 50. Svendsen A (1998) The stakeholder strategy: profiting from collaborative business relationships: Berrett-Koehler Publishers.
  51. 51. Van Marrewijk M, Werre M (2003) Multiple levels of corporate sustainability. Journal of Business Ethics 44: 107–119.
  52. 52. Rowe D (2007) Education for a sustainable future. Science 317: 323. pmid:17641184
  53. 53. Seidel S, Recker J, vom Brocke J (2013) Sensemaking and sustainable practicing: Functional affordances of information systems in green transformations. MIS Quarterly 37: 1275–1299.
  54. 54. Energy Transition (2015).