Both sharing and using open research data have the revolutionary potentials for forwarding scientific advancement. Although previous research gives insight into researchers’ drivers and inhibitors for sharing and using open research data, both these drivers and inhibitors have not yet been integrated via a thematic analysis and a theoretical argument is lacking. This study’s purpose is to systematically review the literature on individual researchers’ drivers and inhibitors for sharing and using open research data. This study systematically analyzed 32 open data studies (published between 2004 and 2019 inclusively) and elicited drivers plus inhibitors for both open research data sharing and use in eleven categories total that are: ‘the researcher’s background’, ‘requirements and formal obligations’, ‘personal drivers and intrinsic motivations’, ‘facilitating conditions’, ‘trust’, ‘expected performance’, ‘social influence and affiliation’, ‘effort’, ‘the researcher’s experience and skills’, ‘legislation and regulation’, and ‘data characteristics.’ This study extensively discusses these categories, along with argues how such categories and factors are connected using a thematic analysis. Also, this study discusses several opportunities for altogether applying, extending, using, and testing theories in open research data studies. With such discussions, an overview of identified categories and factors can be further applied to examine both researchers’ drivers and inhibitors in different research disciplines, such as those with low rates of data sharing and use versus disciplines with high rates of data sharing plus use. What’s more, this study serves as a first vital step towards developing effective incentives for both open data sharing and use behavior.
Citation: Zuiderwijk A, Shinde R, Jeng W (2020) What drives and inhibits researchers to share and use open research data? A systematic literature review to analyze factors influencing open research data adoption. PLoS ONE 15(9): e0239283. https://doi.org/10.1371/journal.pone.0239283
Editor: Frantisek Sudzina, Aalborg University, DENMARK
Received: March 3, 2020; Accepted: September 3, 2020; Published: September 18, 2020
Copyright: © 2020 Zuiderwijk et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Nearly all relevant data are within the manuscript and its Supporting Information files. Additional data and the data included in the manuscript has also been made available as raw open data through the 4TU.Centre for Research Data of Delft University of Technology in the Netherlands (doi: 10.4121/12820631.v1).
Funding: The following institutions supported our study: Delft University of Technology (Dr. Anneke Zuiderwijk received salary from this institution), ETH Zurich (Rhythima Shinde MSc. received salary from this institution) and National Taiwan University (Dr. Wei Jeng received salary from this institution). The following grants supported our study: MOST109-2636-H-002-002 and MOST109-3017-F-002-004 (both from Ministry of Science and Technology, Taiwan) and NTU-109L900204 (from Ministry of Education, Taiwan) (grants received by Dr. Wei Jeng). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Both sharing and using open research data have the revolutionary potentials for forwarding scientific advancement [1–4]. Open research data use combined with new Information and Communication Technologies (e.g., new semantic standards, increasing computing power, increasing/cheaper data-storage capacity)–which has shortened geographical, disciplinary, and expertise’s distances–now offers tremendous opportunities . And now researchers worldwide can more efficiently reproduce each other’s research , ferret out any possible poor analyses and fraud , make novel scientific discoveries , and thus overall work more efficiently .
Previous research already provides insight into researchers’ drivers and inhibitors for both sharing and using open research data. For example, Piwowar, Day , along with Piwowar and Vision  have found that researchers might be driven to share their data openly as this could result in greater visibility of the researcher and thus lead to a greater citation rate. Moreover, researchers might want their study results to be both transparent and verifiable , or the policy of a journal in which they want to publish in requires them to openly share their data . Researchers may also be reluctant to openly share data due to the fear of possibly not receiving credit , losing possible publication opportunities [13–15], facing possible criticism about data quality  or due to data sensitivity . Furthermore, previous research has found that researchers may be driven to use open data because this activity saves time and effort, or because the use of open data can accelerate their overall research progress . Yet, researchers might be inhibited to use open research data due to possible fragmented data and that it is difficult to assess their quality [19, 20] or due to the difficulty finding or accessing reusable data, the difficulty of integrating data and possible data misinterpretation .
Despite various emerging data sharing initiatives in the past few decades , most raw datasets have still not been openly shared . Prior research has pointed out that the current rewarding system does not sufficiently encourage individual researchers to accomplish open science principles’ best practices such as those involving transparency, reproducibility, openness, and data reuse . In addition, previous research has not had a comprehensive thematic analysis that both explains and integrates the drivers plus inhibitors for both sharing and using open research data. Per Hossain, Dwivedi , existing literature has both discretely explored and provided results based on several antecedents to open data adoption (i.e. community participation). Yet, such results might be scattered and a comprehensive overview of factors has not yet been developed. Many studies have addressed both the drivers and inhibitors for sharing and using open research data. Yet, such studies only reveal a rather small part of the full picture. By investigating both data sharing and use, along with individual drivers and organizational contexts and arrangements–all of these create a more holistic understanding of both open research data sharing and reuse.
To fill the existing literature gap, this study’s purpose is to systematically review the literature on both individual researchers’ drivers and inhibitors for both sharing and using open research data. This study defines open research data as structured plus machine-readable data that can be actively published or shared on the Internet, and that ideally also reflects the FAIR principles: Findable, Accessible, Interoperable, Reusable [24, 25]. Open research data can be raw, can be derived from primary data for subsequent analysis or interpretation, or simply can be derived from existing sources held by others . Likewise, both data derived from qualitative and quantitative research altogether are within this study’s scope.
In the subsequent section, this study explains our approach towards the Systematic Literature Review. Thus, this study’s obtained results include both a descriptive analysis and principle themes rooted from the aforesaid literature. Lastly, this study discusses such findings’ implications for future research and practice in which conclusions are further derived from.
Research approach: Systematic literature review
A literature review reflects “the selection of available documents (both published and unpublished) on the topic that altogether contain information, ideas, data and evidence written from a particular standpoint to fulfil certain aims or express certain views on the nature of the topic and how it is to be investigated, and the effective evaluation of these documents in relation to the research being proposed” . One of the systematic literature review approach’s main advantages lies in its rigor and the applied processes’ overall transparency . Literature reviews have been proven to be useful in various diverse research disciplines such as those of software engineering , evidence-based medicine , social networks , and supply-chain management . In the context of open research data, Fecher, Friesike  also found that the systematic literature review approach can be a useful way to “systematically retrieve research papers from literature databases and analyze them according to a pre-defined research question” [p. 3].
Despite the aforesaid advantages of literature reviews, one should also be aware that systematic reviews’ validity might be reduced due to possible ‘publication bias’. This is because publication bias occurs when researchers both selectively report and publish statistically significant positive results of experiments, rather than negative or null results . With this in mind, this study is scoped towards a specific selection of open research data academic articles, along with excludes grey literature, news articles, blog posts and preprints. Literature reviews can be used for various purposes, such as those involving positioning research relative to existing knowledge and building on this knowledge, gaining useful insights on the research topic, introducing relevant terminology and defining key terms, obtaining useful insights on the research methods other scholars have used to study the research topic, along with relating research results to those of others . In this study, a literature review was applied for three reasons. For the first reason, it is done so to both position the identified research relative to existing knowledge and to build on this knowledge. Thus, the following questions were formulated:
- a) In which contexts has both open research data sharing and use been investigated by previous research (e.g., research disciplines, countries, types of institutions)?
- b) What are both the objectives and contributions of previous research about both open research data sharing and use?
- c) What theories and theoretical models have been indicated (e.g., applied, developed, used, tested) in studies about both open research data sharing and use?
For the second reason, it is to gain useful insights in the research methods other scholars have applied to study the research topic. Thus, rendered was the following question:
- d) What research designs have been applied in previous research about both open research data sharing and use?
For the third reason, it is to obtain useful insights on this research’s topic–namely regarding the researchers’ drivers and inhibitors for both sharing and using open research data. Thus, rendered were the following questions:
- e) What factors drive researchers to openly share their research data with others?
- f) What factors inhibit researchers from openly sharing their research data with others?
- g) What factors drive researchers to use openly available research data from other researchers?
- h) What factors inhibit researchers from using openly available research data from other researchers?
In this study, the Systematic Literature Review approach was applied per Kitchenham . This approach involves five respective steps: (1) identification of studies; (2) study selection; (3) study quality assessment; (4) data extraction; (5) data synthesis. The following paragraphs detail such steps. This study’s Systematic Literature Review approach’s first two steps concern both the research articles’ identification and relevant studies’ selection. Determined was the study selection criteria and selection process, then discoursed were the inclusion decisions. To identify as many relevant articles as possible, a diverse number of databases were searched, namely: Web of Science, ACM Digital Library, and Scopus (includes Elsevier/ ScienceDirect, Springer, Taylor & Francis, Wiley Blackwell, IEEE, Sage, Emerald, Cambridge University Press). For each database, the first 50 results were scanned–sorted by relevance–by carefully reading such results’ respective abstracts and titles. Also searched were three prominent journals in the library and information sciences-related discipline, namely articles involving data sharing research. These three journals were the: (1) “International Journal of Digital Curation”; (2) “Journal of the Association for Information Science and Technology”; (3) “Electronic Library”.
Table 1 lists the search terms applied in this study. Such terms’ selections were not limited to a certain disciplinary or geographical or area, because this would yield a large number of studies with too narrow of a scope. Instead, included were articles pertaining to both research data sharing and use worldwide, coupled with articles from all research disciplines types. Studies were identified in the summer of 2020 and studies published post-December 2019 were excluded. To ensure that this study’s literature review includes more up-to-date information, this study’s paper inclusion period was limited to the last 16 years and thus excluded were papers published before. Ultimately, 101 articles were identified.
As recommended by Jalali and Wohlin , the pool of studied articles in the systematic literature review was expanded and complemented using a snowballing technique. Thus, 35 additional relevant articles were identified via the reference lists of the publications that had already been found using search strings—thus enriching the overall literature base. By combining the systematic literature review with the snowballing approach and removing the duplicates, 119 studies were identified that detail research about both open data sharing and use. Applied were both Endnote as a bibliography management tool and Excel Spreadsheets for general search plus search results’ documentation. The raw data from this study’s analyses are available via the 4TU. Centre for Research Data: https://doi.org/10.4121/12820631.v1.
For each of the 119 identified records, their respective abstracts and titles were examined. In this step, 69 studies were excluded due to per below:
- Many studies focused on open government data or open data for businesses (n = 45). As this study is focused on both researchers’ data sharing and use, not considered were factors that impact business or governmental-related open data sharing and use.
- Several studies were excluded as they were considered to be irrelevant to this study’s research question (n = 21), such as studies focused on motivations related to e-commerce or open source. Relevance was determined per how the identified article fits within this study’s aims. This is so to develop a more comprehensive overview of factors that explain why researchers are motivated to openly share and use research data or not.
- Two of the identified records appeared to be workshop descriptions. These appeared in our search as they were published as conference proceedings. As these records did not detail research, they were removed from our sample.
- One record was excluded as it was not accessible.
After this step, 50 studies remained.
A systematic literature review’s third step is to assess the studies’ quality . Especially in the appraisal of qualitative research, this study concurs with Estabrooks, Field  that papers of weaker quality should be excluded from systematic literature reviews. Yet, what determines qualitative research quality has been highly prone to both heated debate and criticism . Namely in qualitative research’s systematic review, the study quality’s assessment continues to be a challenge and it might lead to different quality assessments by assessors . Although this challenge cannot be removed completely, this study undertook various measures to reduce bias resulting from it as much as possible. For example, by providing transparency about this study’s assessment procedure and by openly sharing the research data underlying our analysis and findings—thus other scholars were enabled to both cross-check our findings and examine if other interpretations might be possible.
Batini et al.  detailed that the four criteria most vital to most literature involving data quality assessment are: accuracy, completeness, consistency, and timeliness. In this study’s systematic literature review, each study was respectively assessed against such aforesaid dimensions. In a detailed manner, such assessments defined the quality assessment criteria using insights from the systematic literature review protocol developed by Bano and Zowghi . This resulted in the creation of the first version of this study’s rubric. When this study started with the quality assessment using this rubric, all of this study’s three authors independently assessed the first six papers. Next, discussed were such assessments’ outcomes that include minor differences in the quality assessment criteria’s interpretation. With this rubric’s further improvements, the final resulting rubric was applied to assess the studies’ quality (see Table 2). Upon this, the remaining studies included in such sample was divided into two. The first half was altogether assessed by this study’s first and second authors. The second half was altogether assessed by this study’s first and third authors. Thus, each article was independently assessed by at least two assessors. All assessors hold both extensive open data field experiences and trainings in qualitative research assessment. No conflicting assessments were found in the assessment’s second round.
From the 50 identified studies, eighteen studies were removed due to:
- Nine studies did not have clear research questions and/or did not describe the collection of empirical data. Instead, such studies included essays, opinion articles, conceptual studies or studies in which a proposed method, prototype or architecture were detailed.
- Seven articles provided insufficient information for quality assessment. Quality is defined per Kitchenham : an article’s quality is based on the credibility of how a study is both analyzed and conducted, followed by the findings’ importance. And some studies were not subject to peer review, but to editorial review only. These were ultimately left out.
- One study concerned a combined quantitative and qualitative analysis of the eleven responses provided to a questionnaire. The limited number of responses does not allow for quantitative analysis in the form applied by the authors. In addition, the study population was not explained.
- One study appeared to be a shorter version of an extended paper already included in the selection.
The aforesaid steps resulted in a final selection of 32 articles concerned with both drivers and inhibitors for both sharing and using open research data (Fig 1).
In the systematic literature review’s fourth step (the data extraction step), a spreadsheet was applied to keep track of the metadata for each of the selected studies. Table 3 depicts the metadata that were collected for the 32 selected studies that include: general information, context-related information, research design-related information, content-related information, along with information concerning both drivers and inhibitors for both sharing and using open research data. In this study’s final step, information that was obtained via the aforesaid research approach was synthesized. This synthesis’ findings are detailed in the study’s subsequent section.
Results: Data extraction and data synthesis
Per Kitchenham , in this section, the results of the synthesis from the studies collected via the literature review are reported. Extensive descriptive analyses and content analysis were carried out, that are common in information systems-related research . This is to summarize the article attributes and further report the descriptive results. Before the content analysis, several preparatory phases were undergone: metadata extraction, context analysis, and quality analysis (see following sections). Upon accessing all the sampled articles (n = 32), the three assessors first identified and captured metadata plus descriptive information from each article that include both the publication type and year. All the metadata attributes and the described information were altogether collected, cleaned, and organized in a spreadsheet style dataset.
With the descriptive data, the S1 Table (‘Overview of studies included in our literature review’) provides an overview of the 32 studies that detail research into open data sharing and use that this study selected to thus develop the comprehensive factor overview. This appendix too details these studies’ respective objectives. The majority have been published from the years 2010 to 2019 inclusively, except for one article published in 2004 and one in 2007. Most studies (n = 30) have been published in journals, such as the: “PLOS ONE” (n = 7), “Data Science Journal” (n = 3), and “International Journal of Information Management” (n = 3). One dissertation was also included.
Given that the descriptive information was insufficient to cover all the necessary attributes that this study requires, both the context-related information and information about the design of the examined studies was collected, such as the discipline that the article addressed and the period under investigation (context-related), coupled with the possible research approach and possible quality concerns (research design-related). Such forms of analysis were then followed by the content analysis that includes the factors that impact both open research data sharing and use. To reduce the risk of bias in collecting the data, specified were how many studies report each particular factor in the synthesis and made available was the raw underlying research data so that the findings could be further examined. The data underlying this section can be further found here: https://doi.org/10.4121/12820631.v1. In the following sections, this study reports the findings involving the context analysis, research design’s analysis, and content analysis.
Out of the 32 studies, nearly half of them both examined data sharing and use in the global context or multiple countries (n = 13), namely those involving the United States in tandem with several European countries. Some other studies focus on the United States as the primary nation under investigation (n = 9). Eight studies focus on both open data sharing and use in individual nations such as: the Netherlands, Argentina, Brazil, or the United Kingdom. Whereas, one study focused on both Kenya and South Africa. Twenty-four studies specified the period in which they were conducted, while eight studies do not.
About the research disciplines under investigation, the majority of the articles (n = 25) focused on specific research disciplines such as: biodiversity, sociology, microarray science, psychology, health sciences, earth and space science, genetic and genomic sciences. Eight articles include multiple research disciplines, such as those from the social sciences, humanities, natural sciences, information sciences, engineering, biology, education, law, and business. Two articles did not specify the research discipline(s) under investigation at that time.
Analysis of research design
As aforesaid in Table 3, the analysis of the research design considered the: (1) research methods (e.g. quantitative) and approaches (e.g. survey); (2) underlying research data’s availability; (3) literature review approach’s transparency; (4) overall quality concerns. In this study’s sample, the division of qualitative and quantitative studies was nearly equal in which fifteen of the 32 selected studies being exclusively quantitative and twelve being qualitative. Five studies applied a mixed-methods approach that combined both qualitative and quantitative research approaches. Fifteen of the 32 studies applied questionnaires as the primary data collection approach. Other research methods often used in open data research were interviews (n = 8) and case studies (n = 5). Thirteen studies applied other data collection approaches such as: quasi-experiments, expert panels, observations, dataset analysis, desktop research, and an analysis of the published papers’ respective number of citations (i.e. scientometric approach).
For nearly half of the studies, it is either unclear if the underlying research data are openly available or the data are not shared openly, since there is no reference to the data’s availability (n = 14). At times, there are references to similar cases in other publications or to reports that use the same research approach, without specifying where the raw research data can be found. Note that a lack of information about where the underlying research data can be found does not necessarily mean that this data is not openly available, as it may have been shared openly without being mentioned in the study itself. This circumstance can happen when the data is only shared after the publication of the article. And in some studies, it is mentioned that all the data was already included in the publication, but in those cases, the data was not shared in a machine-readable format. Sixteen studies do specify where the underlying research data can be found. Of the selected studies, the underlying research data is shared openly via, for example, Dryad, Github, Mendeley Data and an institutional data repository. Some of the shared data is not in a machine-readable format. In two studies, it is mentioned that it is not possible to openly share the underlying research data due to possible confidentiality issues.
As a final topic involving research design, we examined if there were any overall quality concerns about the 32 analyzed studies’ quality. For four articles, there are at least some concerns. For example, in one study, the investigated cases had been described and analyzed, while the case study selection criteria had not been specified. As another example, in one study it was unclear how many case studies have been conducted and exactly what they were about, as there was only a reference to an OECD report that contains this information. In another study, some information about the information sources of the case studies that were carried out was missing.
The majority of the investigated studies (n = 18) did not mention any theory (this study had a narrow view on what comprises theory), while fourteen studies mention one or more theories. Seven out of these fourteen mention the “Theory of Planned Behavior” (TPB), two mention “Institutional Theory”, two mention “Technology Adoption Model” (TAM), and two mention an integrated theory of the “Unified Theory of Acceptance and Use of Technology” (UTAUT), along with the two-stage “Expectation Confirmation Theory of Information Systems” (IS) continuance (ECT). Other theories were mentioned only by one study, namely the: “Theory of Reasoned Action” (TRA), Organizational theories (commons-based peer production, wisdom of the crowds and collective intelligence), “Unified Theory of Acceptance and Use of Technology”, “Grounded Theory”, “Motivation Theories” (e.g. Expectancy Theory, Reinforcement Theory, The Multi-Motive Information Systems Continuance Model), and “Coordination theory”.
The fourteen studies that mention theory applied it in various ways. Eleven studies applied theory to develop the theoretical research framework or model and/or to test hypotheses. The authors of these studies reflect on the theory in relation to their research model. One of those eleven developed a theory as the research outcome, while building on existing theories. One study mentioned the theory in the discussion section and examines the implications of the study on existing theories, without using the theory in other parts of the research. One study only mentions the theory in the recommendations for future research without using it elsewhere (Table 4). The discussion section further explores the potential and opportunities for using theories in open research data studies.
Analysis of factors influencing open research data sharing and use
The focus on open research data sharing, use or both.
For the 32 studies analyzed, it was examined how many of them mentioned: (1) researchers’ drivers for sharing research data openly; (2) researchers’ inhibitors for sharing research data openly; (3) researchers’ drivers for using open research data, (4) researchers’ inhibitors for using open research data (see Tables 5 and S2–S5).
Of the 32 records studied, six of them focused exclusively on data sharing and do not mention any factors related to the motivation to use open research data. Four studies focused exclusively on open research data use and do not mention factors related to open research data sharing. Twenty-six articles mention factors related to both open data sharing which can be explained by the interdependence between these two activities: data users depend on data providers in order to get research data, while data providers make research data available to data users and depend on them for feedback, development of the field of research and possible future collaborations. However, despite a few exceptions [e.g., 17, 19, 40, 50], the focus of the majority of the studies addressing both data sharing and use is on research data sharing. These studies only briefly mention factors related to open data use, as it is not their main topic. Our study confirms research by Joo, Kim  in the sense that “a relatively smaller body of research has focused on data reuse as compared to data sharing” (p. 390).
For each of the 32 analyzed articles, the factors that may drive or inhibit researchers to openly share their research data with others were identified, along with the factors that may drive or inhibit researchers to use open research data shared by others. The S2–S5 Tables provide this analysis’ detailed results. It was found that various articles refer to similar constructs. Also, this study categorized the constructs of the influencing factors into the following eleven categories:
- The researcher’s background. This category concerns factors related to the researcher’s personal characteristics and research background that might impact one’s open data sharing and use behavior altogether.
- Requirements and formal obligations. This refers to whether formal obligations are in place, such as those imposed by the project’s funder and if other forms of requirements are experienced, such as (in)formal policies.
- Personal drivers and intrinsic motivations. This refers to intrinsic motivations for both open research data sharing and use.
- Facilitating conditions. This refers to anything that can facilitate open research data sharing or use.
- Trust. This refers to how the level of trust a researcher has influences their open research data sharing and use behavior altogether.
- Expected performance. This concerns factors that may influence the performance of researchers who share and use open research data or not.
- Social influence and affiliation. This concerns factors related to social influence and affiliation that impact if a researcher is driven to both share and use open research data.
- Effort. This refers to the effort needed for a researcher to openly share or use research data.
- The researchers’ experience and skills. This refers to previous experience that a researcher has with open research data sharing and use and skills required for this activity, coupled with how this impacts future research data sharing and use altogether.
- Legislation and regulation. This concerns the impact of factors related to legislation and regulation on research data sharing and use behavior altogether.
- Data characteristics. This refers to the influence of data characteristics on if a researcher both shares and uses open research data.
In the following sections, the factors that drive and inhibit researchers to openly share their research data with others are discussed, along with the factors that drive and inhibit researchers to use open research data shared by others. The factors are discussed with the aforesaid categories.
Factors driving and inhibiting researchers to openly share their research data.
This section answers the question: ‘What factors drive researchers to openly share their research data with others?’ and ‘What factors inhibit researchers from openly sharing their research data with others?’ Table 6 depicts both such drivers and inhibitors. It shows that several factors relate to different sides of the same coin. For example, the factor ‘level of involvement in research activities’ refers to the finding that individuals who work solely in research, in contrast to researchers who have time-consuming teaching obligations, are in fact more likely to make their data available to other researchers . Thus, for researchers who solely work in research, the ability to focus on research without having to teach can be considered a driving factor, whereas for researchers who have time-consuming teaching obligations, this can in fact be considered an inhibiting factor. Other factors are more specifically related to either drivers for open data sharing, such as the increased pressures to release data , or to inhibitors for data sharing, such as the time and effort it takes to openly share research data .
Some factors might fit in multiple categories. For example, one study refers to the inhibiting factors of the “cost of sharing (e.g., time and effort)” . As this factor relates to effort that a researcher needs to put into openly sharing research data, but also to facilitating conditions such as time restrictions. When a factor is related to multiple categories, chosen is the category that we found to be most closely related. For this particular example, we chose the category of effort, as effort was explicitly mentioned by the study’s authors.
Many of the identified drivers for openly sharing research data relate to the ‘personal and intrinsic motivations’, ‘expected performance’ of researchers, and required ‘effort’ involved in openly sharing research data. The identified inhibitors for open data sharing mostly relate to ‘legislation and regulation’, ‘facilitating conditions’ and ‘expected performance’–essentially in the sense that opening up research data can also lead to a worse performance.
Factors driving and inhibiting researchers to use open research data from other researchers.
This section discusses the factors that drive or inhibit researchers to use openly-available research data from other researchers. Table 7 depicts the inhibitors for researchers to use open research data from other researchers. Similar to research data sharing, several factors can be either drivers or inhibitors, depending on their respective level. For example, both “trust in data producers”  and “trust in other researchers’ measurement”  are factors that can drive researchers to use open research data, whereas, lower levels of trust and trust concerns  can inhibit open research data use. Additionally, for open research data use, we identified several factors that can fit in multiple categories. For instance, the factor “costs associated to training potential data users”  could fit both in the category of experience and skills or facilitating conditions. Thus, this factor can be placed in the category of experience and skills as training is strongly related to experience and skills needed for open data use. Yet, this factor would also have fit in the category of facilitating conditions as training might be seen as a condition that facilitates open data use. Drivers for open research data use namely relate to personal and intrinsic motivations, along with the researchers’ expected performances. Likewise, the identified inhibitors for open research data use namely relate to effort and data characteristics altogether.
Open research data adoption: Thematic analysis
This section focuses on the thematic analysis of the studies included in the literature review. The previous section provides insight into the factors driving and inhibiting open research data sharing and use. In this section, the categories that hold vital roles in open research data adoption are combined (Fig 2). Each of the eleven categories that the factors relate to are further detailed in the following section, followed by an overview of the categories and factors thereafter.
Description of open research data adoption categories and factors
We found that various factors related to a researcher’s background altogether impact both open data sharing and use behavior. Such factors should be considered in relation to broader social, organizational, and cultural factors at play that influence people’s behavior. Research data sharing can be driven by disciplinary practice; organizational and academic culture and practice, and/or the researcher’s level of involvement in both research and teaching activities.
First, research data sharing is more common in certain disciplines than in that of others [11, 40]. It has been argued that disciplines such as genetic genealogy, atmospheric science, and oceanography have well-developed traditions of free and open access and robust databases, whereas disciplines such as wildlife ecology, medicine and many of the social sciences do not . Others have argued that biology researchers tend to openly share research data more than medical/pharmaceutical-related researchers . Likewise, political science researchers are more inclined to openly share compared to sociology-related researchers . Various studies have found that certain research disciplines might have certain nuances, traditions, cultures, or “climates” that can altogether empower researchers to share open research data [17, 40, 43, 44, 56]. Whereas, a specific research discipline’s certain culture or habits might inhibit research data sharing. Yet, in the selected studies, there was no mention of disciplinary practices as an inhibiting factor.
Second, open research data sharing can be driven or inhibited by certain organizational culture , academic culture , a supportive data sharing culture , and organizational practices. In the literature, both cultural and organizational factors are namely mentioned as driving factors. This study argues that if both the culture and organizational practices are by default to not share research data openly, a researcher is less likely to openly share research data on one’s own. Both the organizational culture and practices might be related to disciplinary culture and practices, since disciplinary research has often been organized in different organizations (e.g. university faculties).
Third, the researchers’ involvement levels in both research and teaching activities altogether impact if they openly share their respective data. Researchers who only conduct research, in contrast to researchers who have time-consuming teaching obligations, are more likely to make their research data available to others . Thus, the involvement in research and nothing else can be considered a factor driving open research data sharing. Whereas, the involvement in both research and teaching inhibits open research data sharing.
Fourth, some studies included in our review refer to demographic factors that differ for researchers who are openly sharing data to a smaller or larger degree. Such demographic factors by themselves do not explain researchers’ data sharing and use behavior. Yet, their occurrence differs for researchers who openly share research data compared to those who do not openly share research data. For example, Sayogo and Pardo  found that the probability of research data sharing among respondents from namely North American jurisdictions differ for both male and female researchers. In addition, research data sharing and use behavior is altogether more common in some countries than in others [17, 56] and non-tenured researchers are less likely to share their research data openly than tenured researchers . Correlations between age and data sharing behavior are also found, although the findings are inconsistent. Tenopir, Allard  observed that older people (over 50) show more interest in sharing data and younger people are less likely to make their data available to others. Schmidt, Gemeinholzer  found that younger researchers (age 20 to 35) are more concerned about the impact of openly sharing research data compared to older researchers (age 51 and older). In contrast, da Costa and Leite  found that younger researchers are not less but more inclined to openly share their data, both due to their abilities in the use of technologies and to their interest in collaborating with researchers working on other research projects. It is likely that various intermediating factors impact the correlation between the factors of age and likeliness to openly share research data. In general, it should be stressed that demographic factors such as age, gender, country, along being tenured or not need to be viewed in the context of other broader social, organizational, and cultural factors that play a role in researchers’ decisions to openly share research data or not. For example: Enke, Thessen  observed that in general, researchers from Germany and Canada altogether often feel less willing to share research data than researchers from the United States or Europe. This difference might be related to socio-economic characteristics, current data sharing policies in place in these countries,  or cultural differences [40, 49]. Such factors have been examined only for particular countries in the set of studies included in our systematic literature review, coupled with the impacts and direction of such factors still need extensive research in the future.
With regard to open research data use, drivers found in the literature include: research discipline practices [17, 40], disciplinary climate (a sense of community and openness with other researchers affiliated in the same field) , the research climate , if data reuse is considered a prevalent practice in the researchers’ research community , existing traditions , and the sector the researcher works in . Just like for openly sharing data, there might be differences in open data use behavior across researchers who have respective origins from different countries, along with older and younger researchers —although such factors are not considered as drivers of openly sharing research data.
Requirements and formal obligations.
Most of the factors found in relation to requirements and formal obligations concern the sharing of research data rather than the use of it. In data sharing’s context, both requirements and formal obligations relate to the increased pressure to release data . These can be considered soft requirements, such as both the pressure and policies to openly share research data as defined by funding bodies, government agencies or journal publishers, existence of government directives, or encouragement by the federal government to create a robust data management plan. This category is different from the category of legislation and regulation (see Section Legislation and regulation) that is based on hard regulations such as government rules that forbid or mandate data releases, such as the European Union’s “General Data Protection Regulation” (GDPR) and the United States’ “Health Insurance Portability and Accountability Act” (HIPPA).
In the category of requirements and formal obligations, Fecher, Friesike , Kim and Adler  and Schmidt, Gemeinholzer  refer to the altogether impacts of funding policies and grant requirements, as funding agencies demand data sharing in return for (financial) support. As such factors drive researchers to openly share their respective research data. Occasionally, researchers receive research data from external agencies and use this data as secondary data for their respective research. Often, the external agencies provided them with the data under the condition that these agencies would also share the data openly with the public after a certain period (usually one year), and thus researchers considered this is a form of ‘automatically’ sharing research data openly .
A second factor related to requirements and formal obligations concern the requirements [41, 42, 47, 55] or even mandates  of scientific journals to openly share underlying research data when an article is published using that data. Also, openly sharing research data is driven by ethic codes  and the mandates for the creation of data management plans from federal agencies . Generating data management plans forces researchers to think about what they will do with their data and requires an explanation if their data will not be published openly. Likewise, compliance with governmental directives can be a premise for opening up research data per Curty, Crowston .
Also, this study’s literature review specified university policies as a possible driver of openly sharing research data . Equally, the policies of research institutes might play a vital role in the decision to openly share research data. For example, if a university or research institute mandates that all research data and code supporting the results described in a doctoral thesis are needed to be published openly or else one cannot complete the graduation requirement. Or when a university states that all research should be open unless the researcher explains why this cannot be done, this in fact might be a driver for sharing research data openly.
Factors inhibiting the open sharing of research data as identified from the literature include the possible loss of funding opportunities . If the data is already openly available, there is thus no need to obtain funding to gather the data again. Furthermore, if the funders do not require researchers to openly share research data or if too many data policies apply, this has been said to inhibit research data sharing . Namely as the latter might be confusing to researchers–thus having an adverse effect. Another inhibiting factor relates to the fact that study sponsors, particularly from industries, might not agree to release raw detailed information . Companies might experience the risk of losing their competitive advantage if the collected data is openly shared .
In using open research data’s context, a factor that drives researchers involves the existence of policies that stimulate researchers to use available open research data  and whether researchers experience peer pressure . Another driver not mentioned in the literature is researchers’ needs to use open research data for their job or a particular study. For example, when a particular question can only be answered using available open data. This driver is particularly present when it is difficult to obtain the data and when there are strong needs to answer a particular (research) question for which the available open research data is vital. The use of open research data is inhibited as many varying policies on both access and reuse across countries  that might in fact confuse researchers and thus result in reluctance to use open research data. What’s more, possible ethical bottlenecks might hinder open data use .
Personal drivers and intrinsic motivations.
The third category of factors impacting both open research data sharing and use concern personal drivers and intrinsic motivations. Fecher, Friesike  refer to five-character traits influencing researchers to openly share their data: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism. The presence of higher or lower levels of such character traits within individual researchers can either drive or inhibit them to openly share their data. Also, scholars refer to both personal drivers  and a positive attitude toward data sharing  as vital individual drivers for openly sharing research data. Coupled with character traits, other drivers for openly sharing research data relate to either individual incentives  (e.g. wanting to learn about yourself , perceived behavioral autonomy  and self-efficacy to be able to share data ) or societal incentives (e.g. better informing society and fostering new learning processes ). Equal access to publicly-funded data can likewise be considered a driver by itself  as this offers individuals the opportunity to both better understand our social-physical world  and provides decision-makers with the vital facts needed to address complex and often transnational challenges .
Researchers might be driven to openly share their data due to strong beliefs. They might be convinced that all data generated with public money should be made public , especially when this data has reuse value for many years . Researchers might both be personally committed to open data and to respond to requests from data users . Also, they might have a strong sense of responsibility about both the dissemination and recognition of research results . Research data should be accessible for multiple disciplines and for researchers from different disciplines . This is expected to encourage both the validation and verification of research results [2, 56], along with enable falsification . Open research data can help identify errors and discourage research fraud [8, 9]. The public can scrutinize the data in-depth by analyzing, processing and combining the data. Both opening up research data encourages multiple perspectives [8, 42], along with allows other researchers to explore newer data interpretations [17, 56], ask new questions , pursue newer lines of research , and test different hypotheses . Thus, valuable resources can contribute far beyond their original analysis . Opening up research data is not only beneficial for researchers, but also for society overall as sit provides a democratic scientific knowledge-sharing platform: “Open access increases the pool of information available to anyone not just scientists” [4, p. 466]. A lack of concerns about ethics and the commercial potential of data altogether contributes to more data sharing .
Opening up research data can be driven by the intrinsic motivation to facilitate comparisons between methods and sites , increase the knowledge in the field at-hand , move the field forward more quickly and easily , encourage economic development and spur innovation , identify synergies , accelerate scientific progress, [11, 17, 55, 57] contribute to the advancement of research [18, 42, 52], gain newer insights for data-driven research , and enable citizen science and encourage public activism . The research data’s usability , the research community’s size , and the extent to which data is viewed as a vital asset  also altogether impacts research data sharing levels. Other factors include research’s improved discoverability [9, 17]; extending research from prior results ; a focus on best work via data availability ; the generation of new datasets, information, and knowledge when data from various sources is altogether combined ; educating researchers about the more consumer side of open data practices , and providing the opportunity to review works derived from the dataset . Drivers that are not mentioned in the literature, but that may play a role include: enthusiasm, curiosity, joy, and moral obligation. Many drivers for openly sharing research data have been mentioned in the studied literature, while only few inhibitors have been mentioned: the fear that the data will only be reused by few , laziness , a negative attitude towards data sharing , and the commercialization of research findings . If research findings are openly shared, the possibility of commercializing such findings becomes more limited.
Moreover, personal drivers for using open research data are identified from the literature. Researchers can be motivated to use open research data because of scientists’ beliefs and attitudes . For instance, believing that it is fun to explore data , believing data reuse is good , individual willingness , open data use reinforces open scientific inquiry , encouragements of both analysis and opinion  or the promotion of new research . Open data use is also driven by the belief that it might stimulate economic growth and the replication and validation of research  as it might enhance transparency and reproducibility of the scientific enterprise . Using open research data might be impacted by researchers’ feeling of worth, namely the feeling that the time spent on data reuse is well spent . Via open data use, research results may be replicated  that can advance researchers’ understanding in specific domains, such as health and disease , or in general. Other personal drivers for using open research data are that it: accelerates research , allows exploration of new interpretations of data , increases the knowledge in the field , because there is a strong intention to reuse data [17, 43] or because data being used enhances public trust and knowledge of the discipline at-hand .
In the studied literature, only a few inhibitors for using open research data are mentioned. Curty, Crowston  reflected that the altogether of researchers’ beliefs and attitudes on whether they will use open research data or not. Joo, Kim  also refer to attitudes, along with researchers’ perceived concerns. Finally, Yoon  refers to a negative first impression that might inhibit researchers from using openly shared research data.
Facilitating conditions can drive researchers to both openly share their data and use open data shared by others. However, the inverse of this is that the lack of facilitating conditions can both inhibit open research data sharing and use behavior. Facilitating conditions mentioned in the analyzed studies about open data sharing concern the availability of infrastructure [17, 57], and more specifically, an appropriately designed (technological) infrastructure , appropriate information systems  and better ICT facilitation (e.g. the Internet hosts per person; percentage of computers per household; continued rate of growth of chip, storage, and network technology capacity) . Wallis, Rolando  have detailed that researchers working in the hard sciences that have richer investments of funding, labor, scale, and infrastructure are in fact more motivated to openly share their data than those working in sciences where this is uncommon. Also, the lack of appropriate infrastructure inhibits openly sharing research data . And open data infrastructures need to be sustainable, flexible and robust in the long-term as researchers are less likely to openly share their data if it is unclear whether the infrastructure enables long-term access to their data. Flexiblity allows for adaptation to the latest technological and other developments in society. The latter is a driver for openly sharing research data that we did not find in the studies selected for our literature review.
Another driver for openly sharing research data in the category of facilitating conditions concerns the availabilities of both large data repositories [13, 17, 41, 42, 47] and archives  in which researchers can store data. One could even consider this a critical factor, since without these storage facilities, the data cannot be opened up. Both grow storage and access capabilities should also have the ability to grow and still operate reliably and efficiently  as datasets in some domains can be extremely large. Other drivers for openly sharing research data include both continued and dedicated budgetary planning plus appropriate financial support , a short embargo period  and consent such as informed consent or contractual consent  for opening the data. While such support types are related to facilitating conditions, other support types are more related to effort (see Section ‘Effort’). With regard to funding, da Costa and Leite  argue that “adequate funding for the treatment and availability of data can generate savings in resources in future research funding” (p. 920). Moreover, when funding specifically for the management of research data is available, this might motivate researchers to openly share their respective research data .
Inhibitors for openly sharing research data are often found in the area of financial arrangements and budgets , and financial resources [11, 41]. For example, the loss of potential licensing revenue that would accrue to inventors of patentable discoveries has been considered as a financial barrier . Also, inhibitors exist in terms of technical challenges [17, 50], such as limited openness of ICT tools which help in opening the data . They may also be organizational, such as when institutional members resist change , when there are structural conflicts and managerial practices in the organization (e.g. security reasons, financial interest)  or when there is not enough time , for example, not enough time to organize the data . Other inhibitors for openly sharing research data include the lack of a data repository , the lack of facilitating platforms , the lack of information systems to disclose research data in certain research disciplines (e.g. medicine) , difficulties with the communication of the open data results , the lack of tools to observe data metrics , a long embargo period , the perceived short reuse value  and science that can be considered ‘small’ (science that has less investment in funding, labor, scale, and infrastructure) .
Specifically in the contexts of both Kenyan and South African chemistry laboratories, Bezuidenhout  refers to inhibitors that inhibit research data sharing by researchers in low-resourced research settings. First, such researchers experience a lack of available resources, equipment and infrastructure that algother slows down the pace of research and that makes it even more important to only share research data openly once the related publication is out . For instance, research data sharing is limited in this context because of a lack of power, older equipment, poor maintenance, a lack of technical support, a lack of ICTs, a lack of platforms, along with a lack of appropriate software for openly sharing research data .
About facilitating conditions related to the use of openly-available research data, various facilitating conditions-related drivers were identified. First, several drivers are related to technical aspects such as digital tools . The potentials to involve more actors in data collection through citizen science platforms, unrestricted by physical or cognitive distance, has led to the facilitation of more data collection from various sources . Other technical drivers for open data use concern the availability of an open data infrastructure , particularly a robust infrastructure for long-term usage , along with the availability of data repositories [17, 43, 44]. An initial large data repository can foster a culture of both data sharing and reuse . Also, technical support might ease the process of open data use. For example, via the use of specialized software or programs . A final technical driver includes the possibility to cite and attribute datasets, to foster a scholarly communication system that altogether allows for the identification, retrieval, and attribution of research data . Drivers for using open research data in relation to facilitating conditions are organizational too. Both of these include the organizational environment  and institutional support [17, 43], such as any available assistance that researchers could acquire from their affiliated institutions or organizations, particularly technical or human help . Also, human resources for questions are mentioned by Kim and Yoon , as they refer to advisors, data reuser groups, and data producers as human resources altogether for support.
Inhibitors for open research data use in relation to facilitating conditions mainly concern technical bottlenecks  and the functionality of the infrastructures and portals. Examples of the latter are the lack of the necessary infrastructure support for quick data analysis , the lack of approaches that offer both precision and recall when it comes to locating data for reuse , the lack of interaction support and tools, the limited availability of search options for open datasets, the lack of support for searching for data in multiple languages, the lack of support for data analysis functions, and the limited availability of functionalities related to interaction with other open data users or data providers . The lack of availability of the data itself , heavy reliance on the methods and techniques data producers employed to obtain, organize, and code the data , along with doubts about the long-term availability of the infrastructure  are other inhibitors for using open research data.
Trust can be a very impactful driver and inhibitor for open research data sharing [17, 52]. In the literature review, several aspects of trust that drive openly sharing research data were identified, namely the trust of peers and society in general in the research findings, open data users’ trust of individual researchers, researchers’ trust in their own research findings, and individual researchers’ trust in the open data portal and long-term preservation of their data. First, researchers may openly share their data to make them transparent and to show others that includes other researchers and society at large, that they can trust the research findings, as this might lead to greater credibility of the research findings . Altogether, transparency of study results , research methods and processes  can enhance the trustworthiness of the research results and drive open data sharing. It can also increase the reproducibility of the research results . It has also been found that data availability provides safeguards against misconduct related to data fabrication and falsification , since this makes it easier to interpret the data . Second, if researchers better understand what users may or may not do with data in online data repositories, their drive to open up their data may be enhanced . Researchers often want to have a say in data use  and want to have the ability to place conditions on data access , such as data security conditions . Such conditions lower the likelihood of misconduct with the data and enhance a researcher’s trust in the user of the data. Furthermore, the lower the privacy risks, the lower the risk for trust issues . Third, researchers might trust their own study’s conclusions more when multiple users reach the same conclusions using the same data. Thus, ensuring the validity of the data by multiple users can be considered another driver for openly sharing research data . Fourth, another factor that might drive researchers to openly share their data concerns the trust of individual researchers in the open data portal and particularly, in the data’s long-term preservation. Researchers publish their respective data on a certain open data portal with the idea that the data will be available in the long-term, and with the assumption that potential users will be able to easily access their respective data. Per Tenopir, Allard , well-managed, long-term preservation helps retain data integrity. Openly sharing research data can then be considered good management of data integrity in time .
Trust-related inhibitors for openly sharing research data include the fear of losing control over the data, the fear of: possible unethical data use (includes both data misinterpretation and misuse), data’s commercialization, the fear of harm to the researcher, the level of trust in the data of other researchers and the knowledge about the data user, and losing a valuable resource that could have been used to obtain other data. First, the loss of control , such as the lack of control over the scientific findings and conclusions derived from the original data that a researcher shared, inhibits open data sharing . As once research data has been published online, the data can be copied, changed, and published elsewhere in various forms. Second, there might be issues regarding ethical responsible use of shared data , and possible data integrity concerns . Someone might draw the wrong conclusions , for instance, as the result of data’s flawed interpretations [11, 52, 55] or even misinterpretation and misuse of the data [19, 41, 49, 52, 55]. And possible data misuse incidents may take place . Researchers might also fear the commercial or competitive misuse of the data –causing potential harm to the data publisher’s reputation [11, 19]. Third, the difficulty in establishing trust in others' data inhibits openly sharing research data . If a researcher has little trust in others’ data, the researcher might assume that others might too have little trust in his or her data if it was openly shared that altogether demotivates the researcher to do so. Fourth, the level of knowledge about the data user  has been found to influence the trust a researcher has in the ethical use of his or her data when it is shared openly online. If the intent of the data user is unclear, this can thus inhibit data sharing . The more knowledge the researcher has about the user of his or her data, the more he or she may trust this person and the use of the data. Fifth, by openly publishing their respective data, researchers might fear losing a valuable resource that could have been used to obtain other data. Wallis, Rolando  refer to the “gift culture of scholarship”, meaning that researchers sometimes exchange valuable data through a trusted relationship with other individual researchers. This means that if they have no data to share with other individuals, they might not obtain valuable data from them. Sixth, the lack of trust in the data portal may inhibit open research data sharing, for instance as supplementary information and laboratory sites are transient . Finally, one factor was missing from the overview: the lack of trust of researchers in their own respective research findings. This factor was not mentioned, but it is strongly assumed that it might be a vital inhibitor for openly sharing research data.
Trust is not only vital in sharing research data’s context, but also in the context of using it. Higher levels of trusts are linked with increased use of open research data . In the literature, seven aspects of trust that drive researchers to use open research data were identified. One driver is the will of a data user to improve data integrity [2, 40]. Open research data might be used to investigate if research is both reproducible and trustworthy. A second trust-related driver for using open research data concerns the trust that a data user has in the data’s producer . Researchers might be more motivated to use a certain open dataset if they trust the dataset’s producer or provider . Trust in the dataset’s producer may increase when this person is altogether honest and transparent, received appropriate educational training, and is member of a trusted community . The reputation of the researcher who originally collected the data is thus vital . Although this was not mentioned in the literature, expected was that trust in the data producer too increases that the potential data user knows the researcher who collected the data or the organization that provided the data. This factor is related to the “social influence and affiliation” category. Moreover, as a third influencing factor, open research data use is impacted by the sources that funded the study . If the study’s funder has both no commercial interests and lacks apparent conflict of interest, this thus increases the researchers’ willingness to use open research data . A fifth trust-related driver for using open research data concerns the availability of credible information about the study . For instance, when both the metadata and related documentation explains the data collection procedures. This factor is related to the “data characteristics” category. Sixth, open data use might be driven by a data user’s trust in the researchers’ measurements  and thus in the data itself. Data quality, data validity, attribution and soundness of contextual information have altogether become critical factors influencing researchers’ motivations to use open research data. A positive first impression of the dataset is vital in making a decision about if the researcher will use an openly available dataset or not . This factor is strongly related to the aforesaid “data characteristics” category. Finally, the data’s existing evaluations increase the likelihood that a researcher will use open research data . For example, when many articles have been published using the same dataset or when a dataset has been reused and cited often, this thus increases trust in the data .
The use of open research data is inhibited by trust-related concerns [19, 43, 44, 46], such as concerns about the aforesaid possible data misinterpretation  and unintentional misuse [17, 40, 43]. As data users might unintentionally make mistakes in both data interpretation and use. And that open data can be reused for unintended or unexpected purposes . Inhibitors for using open research data that were not explicitly mentioned in the studied literature are the lack of trust in the producer and provider of the data, the lack of trust in the methods used to collect the data, and the lack of trust in the data itself. Such new factors are added to the factor overview.
There are many drivers for openly sharing research data that relate to the expected performance of researchers. As by opening up their data, they expect to perform better [11, 48]. The performance-related drivers found are as follows: First, researchers are driven to openly share their data both for possible collaboration and network opportunities. For example, openly sharing data creates ample opportunities to participate in new international projects, widening local scientists’ networks , and allows networking with other scientists for various interdisciplinary studies . And data sharing enhances the potential for collaboration among scholars with similar research interests [41, 48]. Second, opportunities to obtain research data via professional exchanges can further drive researchers to openly share their data . Third, openly sharing data might increase scientific efficiency , since it is an effective way to both archive and preserve data . Fourth, openly sharing research data can enhance the capacity to solve specific problems. For example, via interactions with other actors, research agendas could be better guided towards solving problems that affect a specific group , along with cheaper solutions to societal problems might be found . Furthermore, by opening up their data, researchers can help make local problems both become more visible and better communicated , coupled with other people can offer input to develop final solutions . Fifth, researchers might be driven to openly share their data when appropriate reward structures are put in place [13, 50] and especially when they are recognized for doing so [11, 47]. This recognition can be both institutional and professional in nature . Sixth, openly sharing research data can increase both researcher’s visiblity and his/her research. Formal citation and receiving proper data citation credit  can be considered one form of recognition. Another form is the acknowledgement of the dataset's originator in terms of appreciation (e.g. co-authorship on publications, formal acknowledgement of the data providers, or opportunity to collaborate with others) . Recognition can too be established in the form of citations and visibility of research, researchers and research institutions, such as systematic visibility of the data source , increased visibility and relevance of research output [17, 47], an increase in the researchers’ visibility in the community [10, 48], increased visibility of the institution in which the research was carried out  and altogether increased citation rates of datasets and publications [8, 9, 40, 41, 48]. Thus, openly sharing research data is a robust approach to demonstrate the value of a researcher’s own accomplishments . Seventh, data may also be shared openly because of perceived career benefits as a result . This factor is strongly correlated with the aforesaid reward structures and other recognition forms. Openly sharing research data can be considered one aspect of professionalism, namely to build upon codes of conduct and ethics of the scientific community . A specific example of a career benefit driving researchers to openly share their data is the opportunity to publish the research results in journals of great international prestige . This factor is too related to the category of ‘requirements and formal obligations’. Eighth, openly sharing research data can lead to improvements in terms of data scrutinization, comprehensive analyses, hypotheses testing and data quality. When comparable datasets are highly available, this thus enables comprehensive analyses . These comparisons may improve the understandability and quality of the data, since multiple researchers may then work with and scrutinize the data. Both the review and quality improvements are drivers for openly sharing research data , along with additional evaluation capability. For example, other researchers might test the data and hypotheses , allowing them to confirm the findings of the original publication or to test different hypotheses . Ninth, data might be shared openly because researchers may promulgate technology as a basis for others’ research . Tenth, researchers openly sharing their data could result in greater returns of public investments in research . For instance, wealth might be generated via a proactive downstream commercialization of outputs . Finally, research data may be shared to improve decision-making on a particular topic. Researchers can provide evidence to support an analytic framework and related decisions .
In relation to performance, researchers might feel inhibited to openly share their data for the following reasons. First, they may not want to openly share their data as they might fear the loss of control over unpublished data in publicly-accessible online databases  or their research products . They might be concerned about losing an advantage in their research area . Second, researchers might fear receiving no credit or reward for data sharing [13, 50, 52, 55, 59]. Someone else might publish using their data with no returned reward since there is no system of acknowledgement . As stated by Mooney and Newton , references to the name of the data creators and publishers are scarce or not prominently featured (mostly, there are only references to the dataset title). Data is often not cited properly , and as an enhancing effect, citations of research data are boht insufficiently recognized and valued. Thus, there is a lack of compensation for the required effort from researchers . Both current incentives and merit systems, which lack sufficient rewards for researchers, inhibit open research data sharing . Third, researchers might not openly share their data because they fear that they will be possibly deluged with requests for assistance . Fourth, researchers might be inhibited to openly share their data because they fear they will decrease their own competitive advantage . Openly sharing research data can also result in a perceived career risk , related to losing funding opportunities , losing potentially profit-making intellectual property , losing commercialization opportunities , and missing out on future publishing opportunities [8, 13, 41]. The latter especially concerns the fear of results scooping additional analyses that researchers have planned for the future [9, 48]. Other concerns involve protecting the researchers’ right to publish their results first . Such inhibiting factor is strengthened by the fact that most academic incentive systems favor publishing articles over publishing data [47, 57]. Researchers prefer to publish their results before openly sharing their data . Furthermore, researchers might fear losing information trade-in-kind offers with other labs . Researchers might lose the abilities to privatley barter data privately that thus creates a disincentive for openly sharing research data . Additionally, researchers might be afraid of criticism of their data or analyses . Investigators might worry that other researchers will find errors in their respective results [9, 48] that might harm their reputation . By openly sharing research data, the original conclusions might be challenged by a re-analysis, whether due to the original study’s possible errors, misunderstanding or misinterpretation of the data, or simply more refined analysis methods . This relates to the fear that researchers need to both spend time reviewing and possibly rebutting future re-analyses . Finally, openly sharing research data might be inhibited when researchers believe that data has limited usability value to others . In the context of research into data sharing in developing countries, it has too been stated that researchers might not openly share their data because they are concerned that if data would be released it would not be reused by their fellow international peers . The premise is the fear that the equipment used to produce the data is not as advanced than that of researchers in developed countries .
Also identified are various performance-related factors that impact open research data’s use. Drivers for open data use include: perceived usefulness, the ability to gain new insights and push science forward, collaboration across divers groups, enabling the exploration of topics not envisioned by initial investigators, testing new or alternative hypotheses and methods of analysis, coupled with making new data combinations and shortening the research process. First, the researchers’ opinions about whether a particular dataset can be useful for their purposes may drive them to use it [17, 43, 44, 46]. Perceived usefulness might be influenced by the second driver, namely the ability to arrive at new findings  and obtain new insights . With open research data, researchers become more aware of the state of the art and the need for certain data and facilities, rather than somewhat ‘reinventing the wheel’ . Reproducing key research findings and experimental methods could push science forward  that thus enables the application of old data in new contexts . Third, when a researcher finds out that another researcher has openly shared data on a topic that is of interest to both of them, they might start collaborating on the use of the shared data. Thus, open data use allows proactive collaboration across diverse groups , especially when resources are limited , and offers more opportunities for co-authorship . Thus, peers can give each other recognition for their efforts . Fourth, using open research data enables the exploration of topics not envisioned by initial investigators . Fifth, using open research data makes it possible to test new or alternative hypotheses and methods of analysis , namely when data are combined with other publicly-available datasets . Thus, open data use permits the creation of new datasets when data from multiple sources are combined , which can lead to novel combinations of data  and new scientific discoveries . These demonstrate the use value of data . Finally, researchers are driven to use open research data in order to shorten the research process . This is especially vital when researchers are limited on both time and resources.
Inhibitors for using open research data include the existing restrictions on data use , so that they cannot be used to perform as desired. The data might too be perceived as not useful  with the risk that the effort might be wasted on flawed data  and thus a potential waste of time . As another performance-related factor, researchers might be inhibited to used open research data because of negative reactions to data reuse . And it can be difficult to access information needed to cite the dataset and attribute the data producers . Finally, the quality of reusing data is per the study’s context in which the data was created . If data had been managed inappropriately or mistakes have been made this thus reduces the researchers’ motivation to use open research data . Likewise, carelessness on the part of the original investigators to manage the data well  and possible misinterpretation risks per inapproriate data use  might altogether inhibit open research data use.
Social influence and affiliation.
The analyzed studies too refer to social influence and affiliation as drivers plus inhibitors for both sharing and using open research data. Drivers for sharing open research data namely reflect social responsiveness, perceived normative pressure, standard social norms, subjective norms, pressure by journals, peer pressure, attitudes about data sharing, world-wide attention to the need to share and preserve data and codes of conduct, and related normative standards of professional scientists and their respective communities. Arza and Fressoli  have stated that social responsiveness is a factor that can drive researchers to share their research data openly. Both Kim and Adler  and Harper and Kim  have referred to the perceived normative pressure and standard social plus subjective norms, respectively. Normative pressure can relate to pressure by journals , as mentioned in the section “Requirements and Formal Obligations” section. Zenk-Möltgen, Akdeniz  refer to the perceived social pressure to share data with others. Social influence, such as peer pressure  can be a driver for sharing research data. For example, when the norm is not to share data openly or when a supervisor or colleagues simply tells you not to share your research data openly. For other influencing factors concern attitudes about data sharing [17, 42], there has been more worldwide attention to the needs to both share and preserve data . Finally, there are the codes of conduct and related normative standards of professional scientists and their respective communities .
For the “social influence” category, the only inhibitor for openly sharing research data mentioned in the literature is the an open sharing-like culture . Sayogo and Pardo  have stated that with regard to culture, academic promotion is tied to publications and not weighed much on sharing research data that thus altogether results in researchers prioritizing the publications of articles instead of data. Other possible social inhibitors for sharing open research data may relate to the identified drivers. For example, researchers might perceive normative pressure from their organization or colleagues not to openly share their data, as they may need to prioritize other tasks, such as teaching. Other inhibitors not identified in the literature but considered to be vital include standard social norms and subjective norms not to openly share data, along with possible negative attitudes toward data sharing.
In the “social influence” category, the literature refers to similar constructs that impact if researchers use open research data compared to their open data sharing behavior. For instance, Curty, Crowston  state that the factors driving researchers to use open research data include social pressure, perceptions of close colleagues, along with positive reactions to both data reuse and norms. For instance, colleagues might recommend researchers to use the data that can increase their respective motivations to do so . And having an emotional or interpersonal relation with the original investigator was identified as a driver for researchers to use open research data . Finally, Joo, Kim  refer to the driver of “social norms” (i.e. a researcher’s perception that other researchers think positively about data reuse practices).
The aforesaid examined literature mentions one social influence-related inhibitor for using open research data, namely the low social influence, for example, from fellow colleagues . We hypothesize that other social influence-related factors might also inhibit open research data use such as both the social pressure and perceptions of research supervisors not to use open research data. Coupled with the perception or perceived norm that other researchers are not using open research data, negative reactions to data reuse and a researcher’s perceived belief that other researchers think negatively about data reuse practices. With all these in mind, such inhibitors need to be examined further in future research.
In open research data’s context, perceived effort is believed to influence researchers’ intentions to openly share their data and to use data that others have openly shared. This study’s analysis of effort-related factors have shown that researchers are driven to openly share their data since this prevents the duplication of work [2, 41, 48, 57]. The work can be used as a source for researchers to consult when considering how to build upon existing studies , so that data sharing can thus accelerate scientific progress. As not having to recollect data also means that openly sharing data reduces research costs [17, 41, 42] and thus saves time involved in the data collection process [41, 48]. Ultimately, this means that there is more efficient and optimized use of resources altogether [1, 8, 9, 48, 56]. As researchers are namely driven to openly share their data when they expect that it will be reused  and thus lead to increased data use . What’s more, organizational support for data management is found to both reduce effort and drive data sharing . Research data sharing can be stimulated when tailored data management approaches and institutional models are used that meet the researchers’ needs . Previous research has found that when data is already cleaned, processed, refined and analyzed during the research instead of after the research, this thus increases the researchers’ willingness to openly share their data . The fact that anyone can access the data and contribute to it may improve the quality of the research . Also, it has also been stated that quantitative analytic work can motivate researchers to openly share their data, in contrast to qualitative work , as it is found that preparing qualitative research data for sharing requires more effort. Altogether, the use of software, equipment and data repositories can reduce the effort needed from researchers in openly sharing their data . Other effort-related drivers for openly sharing research data include having assistance with data management across the data lifecycle , technical support , being able to identify the web Application Programming Interface (API) for dataset access  and adapting the query-result parser to distinguish between invalid UIDs, datasets that have been released, openly sharing parts of a dataset rather than to share the whole dataset  and datasets that remain private . Finally, previous research has found that if researchers were not involved in the data collection themselves (e.g. when another researcher or external institution took care of this), researchers were more motivated to openly share the data .
The effort or perceived effort of openly sharing research data has been considered an important inhibitor [11, 41, 42, 47, 49]. Sometimes this required effort concerns manual effort  and this may require a large amount of work . Several effort-related inhibitors for openly sharing research data relate to the required individual investment needed to both preserve and manage data  that includes time investment (i.e. amount of time researchers would have to invest to get the data ready to share) [8, 10, 11, 49]. To enable open data sharing, researchers might need to structure the dataset following a particular standard [47, 55], to describe the data more thoroughly than required for the original research  or to properly document the data so that it becomes reusable for other researchers . Allowing for discoverable, reusable data from the long tail is emerging as a major challenge . The efforts needed for the formatting, documentation, and release of the data inhibits research data sharing [8, 9], and these efforts appear to be higher for qualitative analytic work compared to quantiatative analytic work . Effort can be technology-related too. For instance, researchers may be reluctant to use online databases because of complex user interfaces that make data entry time consuming . Opening up research data can be complicated and thus hinder data release . Other effort-related inhibitors for openly sharing research data include issues with the quality of the open data platforms and their credibility . Especially with the lack of acknowledgement for the researchers’ effort , the experience that conveying information to the public is not always straightforward , along with the possible issues with authorship and with gathering permission from all partners involved in larger collaborations .
With regard to open research data use, this is driven by the factor that it may prevent the duplication of research data , as researchers can efficiently make use of more opportunities for data use without the burden of data collection or repetition of effort . Likewise, researchers are more motivated to use open research data when they expect that effort requirements will be lower  and the ease of accessing open research data drives researchers to use such data [10, 57]. Also, motivations are increased when it is easy to find data  when the relevance of the data is clear , along with when the data is easy to use . What’s more, researchers are more driven to use open research data when they can identify the web API for dataset access . Finally, when researchers experience issues with open data use, collaboration can be used to overcome such issues .
Effort or perceived effort can inhibit open research data use [43, 44, 46, 48]. As sometimes the data is not accessible  that thus both naturally and immediately blocks the possibility to use it. And sometimes the data might exist, but cannot be found among hundreds of data repositories . Thus, it can be difficult discover any available and relevant data  and the available data and information may become overwhelming . Datasets might also be fragmented since they are offered at many different places . Such difficulty might be in locating and finding reusable data [17, 48, 59]. The search for data requires researchers to invest time [17, 48] and resources in their data search , without knowing in advance if the time spent is wasted or useful. Researchers might be inhibited to use open research data because of low ease of use  that was possibly caused by technology-related limitations, such as their reluctances to use online databases due to complex user interfaces . Once data has been found, it might be very difficult to both analyze and interpret since it is often separated from contextual information [19, 57], namely contextual information about how the data were processed  or due to appropriate metadata is lacking . Tools to use such data are often both fragmented and hardly integrated . Such factors too complicate the integration of multiple datasets . Finally, open research data use is inhibited due to complex terminology heterogeneity (each discipline has its own terminologies that leads to heterogeneity)  and due to a of a lack of tools provided with the data (e.g. visualization tools that data users need to look for themselves) .
Researchers’ experience and skills.
The identified literature shows that both experience and skill-related drivers for openly sharing research data include having access to data specialists , the possibility of data management consultation , the mastering of data management skills by researchers themselves , researchers having knowledge of metadata and its practices , along with researchers’ belief that open research data may be useful for training or educating students  and new researchers [8, 9, 50, 56]. It was also found that a researcher’s experience with openly sharing research data and his or her satisfaction with previous data-sharing experience(s) might be a driver for data sharing behavior [45, 48]. As other possible successful stories of other researchers openly sharing research data might too drive researchers to openly share their data, this factor was not identified in the studies selected for the literature review.
In contrast, a lack of skills, knowledge and expertise altogether inhibits openly sharing research data [11, 51]. Underlying this might be the inhibitors of a lack of data management skills and a lack of knowledge about metadata and its practices, although this was not explicitly mentioned in previous research. Other inhibitors that were not identified in the literature but that we believe might inhibit openly sharing research data concern a researcher’s lack of experience with openly sharing data, a researcher’s dissatisfaction with previous data-sharing experience(s), along with the dissatisfaction of other researchers (e.g. colleagues) with openly sharing research data. Negative experiences might result in reluctance to openly share research data.
Open research data use is driven by two main experience and skill-related factors. First, researchers who have positive past experiences with open data use might be more motivated to use open research data [40, 48, 58]. As they might already be familiar with what data is available  and find this data useful, have experience with collecting such data  and have knowledge of how to handle data  that altogether could save them time in finding and using data relevant for their own research. Especially having knowledge of particular (comparable) types of data and other research areas/trends, along with having specific knowledge about who is working in what areas can drive open data use . Second, a researcher’s education , a researcher’s ability to understand open data  and formal training for researchers in finding, acquiring and validating data collected by others  can drive the use of open research data. Zimmerman  refers specifically to the usefulness of knowledge gained via disciplinary training .
Experience and skill-related inhibitors for using open research data can altogether be divided into three main factors. First, open research data use might be inhibited due to the lack of experience with open data use  and the lack of familiarity of such data use . Second, researchers might be less motivated to use open research data when they lack the required skills to analyze datasets that can be quite complex in nature [48, 54]. A third inhibitor identified in this category both concerns and the costs linked with training potential data users . Other factors that were not identified in the literature, but that might inhibit the use of open research data include a lack of education, an inability to understand open data, coupled with a researcher’s dissatisfaction with previous open data use. Such inhibitors are closely related to the experience and skill-related drivers for open data use, along with often concern either the existence of a certain skill or positive experience (drivers) or the lack thereof (inhibitors).
Legislation and regulation.
In the context of open data, both legislation and regulation can either drive or inhibit researchers’ open data sharing and use behavior altogether . As both legislation and regulation-related drivers for openly sharing research data include an established clear and transparent data policy , data sharing policy , journal policy [11, 42] and/or formal organizational policy . It is especially useful when policies concerning data management exist across the whole data lifecycle . Other drivers include support from national and local governments in terms of policies, programs and management practices , national laws and international agreements that stimulate data sharing , regulatory pressure , and legal and policy requirements that concern, for example, the significance of citation, legal agreements, statements of use, conditions of use, and approval for reuse .
With regard to legislation and regulation, openly sharing research data may be inhibited by legal rights and restrictions [2, 19, 49], along with other legal issues . Data sources might be copyrighted such that the data subsets cannot be freely shared [8, 11]. Another issue related to licensing terms  is that one must choose from a large variety of licenses that could be confusing  to individual researchers. Researchers might consider licenses a burden , they might have concerns about having too restrictive licenses  or might experience difficulties in understanding licenses . The law prohibits publication of certain data types . And researchers might not be allowed by law to openly share their data due to certain intellectual property right issues [13, 17, 55], restrictions on use for private intellectual property rights , along with the fear of potentially violating property rights and other concerns such as those involving the legal liability for data or release of data , such as intellectual property or patent issues . For some data, there might also be priority rights for publication . Furthermore, ownership [11, 50, 59], the right of use , confidentiality [10, 11, 42, 55], and contracts with industry sponsors  are impactful inhibitors for data sharing. As data might also be sensitive [17, 19] or contain personal information that leads to privacy-related concerns [11, 17–19, 41, 42], namely as the sharing of privacy-sensitive data is prohibited by law. Data can be anonymized, but anonymization techniques cannot guarantee that individuals will not still be identified using certain re-identification techniques . What’s more, privacy and the protection of trade secrets  can too be solid premises for not openly sharing research data. Another inhibitor concerns the different levels of security: public access may negatively impact national security [1, 50]. Coupled with datasets are sometimes created by multiple organizations with different levels of security, different policies, and different laws with which they must adhere to. Thus, all parties then need to give permission for the disclosure of the data . Finally, informed consent agreements might not obviously cover subsequent uses of data and de-identification can be thus complex  that likewise inhibits openly sharing research data.
In the “legislation and regulation” category, not identified were any drivers for using open research data. There is no such thing as the use of open research data forced by regulation or legislation. At the same time, there are various legislation and regulation-related inhibitors for open research data use also referred to as “legal bottlenecks” . These include the sensitivity of the data , concerns about violating privacy when using such data [19, 50], legal restrictions related to national security and trade secrets that could further complicate data use , challenges related to data ownership , and unclear conditions for data use, such as confusion about what is and is not allowed under a specific license .
The last category, data characteristics, concerns the research data’s very nature. With the variety of methodologies, theories and research approaches altogether used and applied in different disciplines, unequivocal is that data is diverse in its domain, volume and type and may consequently be more or less difficult to use. Thus, the analyzed studies suggest that data characteristics might in fact be linked with researchers’ willingness to both share and reuse data.
With regard to data-related drivers, there are many factors that make it more likely that researchers will openly share their data that include: having effective data quality controls in place , good management practices , the use of dataset identifiers such as DOI , appropriate data documentation and metadata , along with following metadata standards  and formatting standards . Furthermore, the chance of research data being shared increases when the data is in an easily digestible and appropriate form [52, 53] and format , when it is interoperable and complies with international agreements on interoperability [11, 50], along with when it does not involve human subjects, such as medical research patients . Also, when data is sufficiently secure and when there are tools and applications for its use, openly sharing the data is thus more likely . Cragin, Palmer  have found that researchers are more likely to share data that result from quantitative research than that from qualitative research. This might be caused by the increased likelihood of qualitative research to contain both privacy-sensitive information and the increased effort required to remove sensitive information from qualitative data compared to that of quantitative data. Finally, scholars in general have stated that the more data is produced  and stored , the more data is shared.
Various data-related inhibitors for openly sharing research data are interdependent with the drivers, since these are often the other side of the same coin. For example, while the use of data standards drives research data’s open sharing [e.g., 11], the lack of data standards inhibits research data sharing [10, 55]. Issues with data standards and protection inhibit research data sharing . And while quantitative data collection increases the likelihood that researchers openly share their data, qualitative data might be considered an inhibitor for openly sharing research data . Other inhibitors include inconsistent metadata , biased data , and other problems related to the mobility of data (i.e. data that is challenging to be thus moved to other facilities) . Also, there might be possible quality issues [10, 11, 19, 49] and ones related to both local context and specificity, such as the specificity of purpose, events, and/or methodology and the duration of research . What’s more, data might be too sensitive to share openly , such as when privacy issues are encountered , or the data format and form may not be appropriate for data use . The data’s size may be too large to share the dataset  or may make it more difficult to share such data [48, 55].
Many of the aforesaid drivers and inhibitors too play a role in the decision of if to use open research data. In the analyzed studies, found was that the use of open research data is driven by appropriate data documentation  and namely comprehensive documentation of datasets and the approach to access them , along with the and documentation of both the methodology and measurements used to collect the data . Metadata—data about the data—also plays a vital role in driving researchers to use open research data. The likelihood of researchers using open research data increases when datasets are accompanied by sufficient metadata [54, 56]: by accurate and relevant attributes of metadata [13, 54] and by consistent metadata . Another driver for open data use concerns the data’s interoperability [50, 54], its standardization , the exchange of data via a standardized communication protocol  and the available technical and software standards that can be used to analyze the data . An example of standardization in the open data use’s context concerns the use of digital identifiers  that ensures that datasets receive a unique identifier so that they can more easily be both found and cited. And more researchers are more driven to use open research data when the data is of good quality, trustworthy and lacks errors  and in general when it meets the standards of scientific research concerning objectivity and representativeness .
Data-related inhibitors for open data use concern issues with data quality [19, 50, 55, 58], such as missing variables, along with errors and flaws in the data . This relates to the data users’ trust that the open data are what they purport to be  that is also related to changes to the data over time . When a researcher is unable to determine data quality, this hinders or even blocks the use of the data . Difficulties with determining the quality might be caused by poor data documentation [48, 58], data heterogeneity , inconsistency between datasets , inconsistent or lacking metadata , coupled with the inability to discern dataset content and hence suitability for analysis (e.g. because of a lack of metadata) . Researchers might also experience a lack of references to that of other qualified metadata systems . Likewise, open research data use might be inhibited by a lack of interoperability [2, 54]. For instance, the likelihood of using open research data decreases when the provided data is not machine-readable , when the data is provided not using standards [48, 55] and not using standardized and well-known protocols or ontologies , or simply when the opening of the data requires proprietary software . Research data is available in varying formats  and the lack of harmonization of data formats, processing, analyses and data transfers  altogether inhibits open data use. Other inhibitors include the data’s very nature (i.e. some are more easily reused than others) , the multiplicity of data types , the lack of a clear data usage license , the large volume and size of the data , the lack of awareness regarding existing standards for data citation , along with an access fee needed to access such data .
In the previous section, researchers’ drivers and inhibitors for openly sharing and using research data were examined, as derived from the selected studies. The identified factors were detailed in each of the eleven factor categories. In this section, both the findings and their implications are discussed.
Open research data theory development
The results section shows that of the 32 selected studies, nine mention theories. Few theories have been used or applied, and even fewer have been extended or developed. This finding confirms the study by Kim and Adler  that had a similar finding specifically for studies concerning sharing data openly. In this section, the potential for theory development in research concerning open research data sharing and use is discussed.
There might be multiple possible explanations for the limited use, application, development and testing of theories in open research data research. First, researchers whose research interests concern research data might not be aware of potential existing theories for open data research. This might have to do with the fact that there is no such thing as an open research data theory. Open research data is multifaceted, as explained in previous sections that indicate that different theories with different foci are required. Theories from related research disciplines, such as public administration, information systems, and psychology altogether do provide many theories that contain some constructs similar or related to the categories and factors derived from our thematic analysis (Table 8). Such theories can be used as bases for building that extending or further developing an open research data theory. For example, the “New Institutional Theory” [61, 62] refers to regulative pressures, and the “Cognitive Evaluation Theory” [63, 64] refers to intrinsic motivations. Different elements of various existing theories might be combined to create a more comprehensive theory that can be used to better understand, explain and address possible challenges related to both open research data sharing and use.
Another possible explanation for the limited mention, use, application and development of a theory in the studies selected for the literature review is that open data researchers might have found that existing theories are not useful for examining open research data sharing and use. None of the theories listed in Table 8 are readily fit to address the challenges surrounding open research data sharing and use. Thus, this calls for the development of a new theory, for which the categories and factors derived from our thematic analysis can be used as a basis. And such theory should build on the existing theories by altogether integrating, testing and complementing them.
Potential application of categories and factors
This study’s overview of categories and factors can be used in future research concerning drivers and inhibitors for open research data sharing and use. Also, this overview can provide insights and guidance to other stakeholders at the institutional level and for national funders’ open science policies. This potential is discussed in the following subsections:
Potential for related research fields.
This study conducted a thorough, comprehensive systematic literature review that collects metadata and facts from 32 prior open research data studies. Per the systematic literature review results, developed was an overview of categories and factors influencing open research data adoption to facilitate researchers in the related fields to comprehend various factors, including: individual considerations such as trust and perceived effort; a researcher’s context; and many other motivation factors, such as discipline practices and expectations. The literature review shows that the overview of categories and factors provides a more holistic explanation of why researchers are driven or inhibited to share and use open research data than existing research has done so far. In the future, the overview can be used to further examine researchers’ drivers and inhibitors for both sharing and using open data in different research disciplines and contexts, such as disciplines with low rates of data sharing and use versus disciplines with higher rates of data sharing and reuse. With the factor overview as a starting point, researchers can investigate under which conditions different types of researchers (from different research disciplines, functioning in different institutional contexts) can be both stimulated and incentivized to share and use open research data. This is vital to realize the envisioned benefits of both sharing and using open research data and finally generate both newer insights and advance scientific knowledge.
Developers of open research data infrastructures.
Developers of open research data infrastructures need to take the factors underlying the factor overview into account as the needs of individual researchers can be derived from them. For example, “lack of large data repositories” inhibitor indicates to developers that such repositories might need to be developed. Infrastructure developers can thus further examine which drivers and inhibitors should be prioritized according to researchers in different research disciplines, countries and positions. And developers can use the factor overview to develop infrastructures that support both open research data sharing and use.
The derived overview of categories and factors influencing open research data adoption can assist institutions that need to both serve and support the researchers working in such institutions. The eleven categories and factors altogether underlying the overview can be the first step for academic libraries and other research support organizations (e.g., the office of research or grant management services) to develop effective data services, workflows and consultations for their researchers. As both specifically and practically, survey instruments can be developed, and that the researchers’ maturity levels on open data sharing and reuse can be measured per both Fig 2 (the macro level with categories) and Table 8 (the micro level with specific factors).
Open data and open science policy makers, advisors and funding bodies.
Finally, both the overview of categories and factors impacting open research data adoption can serve as strong references for open data and open science policy makers, advisors, and funding bodies altogether to recognize both the drivers and inhibitors of researchers’ open data sharing and use practices. The factor overview is the first vital step that allows them to create strategies that incentivize both open research data sharing and use. The incentive mechanisms should incorporate the factors included in such overview.
This study’s purpose is to systematically review the literature on individual researchers’ drivers and inhibitors for both sharing and using open research data. With a “Systematic Literature Review” approach complemented with a snowballing approach, 32 studies describing research into open data sharing and use were selected. All studies were published between 2004 and 2019 inclusively. Nearly half of the selected studies (n = 15) is conducted by quantitative approaches; twelve are qualitative, and five use a mixed-method approach. Most studies (n = 22) focus on a specific research discipline, such as biodiversity, social sciences, or microarray science. The majority of such as investigated studies (n = 18) do not mention any theory. Of the fourteen studies that do mention theory, eleven use theory to altogether develop the theoretical research framework or model and/or to test hypotheses. Theories that are mentioned more than once are the “Theory of Planned Behavior” (n = 7), “Institutional Theory” (n = 2), “Technology Adoption Model” (n = 2), integrated “Unified Theory of Acceptance and Use of Technology”, and the two-stage expectation confirmation theory of “Information Systems” continuance (n = 2).
From the identified studies, we synthesized a comprehensive list of: (1) factors driving researchers to openly share research data; (2) factors inhibiting researchers to openly share research data; (3) factors driving researchers to use open research data; (4) factors inhibiting researchers to use open research data. Altogether influencing factors were identified in eleven categories: “the researcher’s background”, “requirements and formal obligations”, “personal drivers and intrinsic motivations”, “facilitating conditions”, “trust”, “expected performance”, “social influence and affiliation”, “effort”, “the researcher’s experience and skills”, “legislation and regulation”, and “data characteristics”. Also found were that the factors impacting both open data sharing and open data use are often similar (e.g. “the researcher’s background” category) that show the strong interdependency between such two activities.
Most drivers for openly sharing research data are related to personal and intrinsic motivations, to the expected performance of researchers and to the effort of openly sharing research data. The identified inhibitors for open data sharing mostly relate to legislation and regulation, facilitating conditions, and expected performance, in the sense that openly sharing research data can lead to worse performance. Drivers for open research data use mainly relate to personal and intrinsic motivations and the expected performance of researchers. The identified inhibitors for open research data use mainly relate to effort and data characteristics. Yet, the number of identified drivers and inhibitors for research data sharing and use does not indicate the importance of these drivers and inhibitors, and further research is needed to examine if certain drivers and inhibitors, in specific contexts and research disciplines, are more important than others.
The large diversity of factors influencing open research data sharing and use shows that theory regarding this topic needs to combine insights from various fields. In the discussion section, we highlighted various theories from information science literature, information systems literature, and motivational psychology literature that might be combined to further develop theory in research into both open research data sharing and use. This study’s analysis of theory development with regard to open research data could thus inspire other researchers while studying specific aspects of open research data sharing and use.
This study contributes to filling the gap of theory development in open data literature by providing a coherent and comprehensive overview of categories and underlying factors that need to be considered when studying open research data sharing and use behavior. With a scattered body of knowledge, this study developed an argument about how the categories and factors are connected to provide the basis for a comprehensive overview of factors influencing open research data adoption. The developed overview is needed to further examine the importance of researchers’ drivers and inhibitors for research data sharing and use in different research disciplines and contexts, such as disciplines with low rates of data sharing and use versus disciplines of high rates of data sharing and reuse. Moreover, while the majority of the inhibitors for open research data sharing and use cannot be mitigated completely, the negative impact of many challenges may be reduced with the right infrastructure and related institutional arrangements. With all these in mind, this study is the first essential step towards designing infrastructures and institutional arrangements that stimulate and incentivize open research data sharing and use behavior, since these need to take into account the factors driving and inhibiting researchers to adopt open research data.
Systematic literature reviews potentially have a risk of bias both at the review level (i.e. analysis of studies) and at the outcome level (i.e. reporting bias). Also, especially in the systematic review of qualitative research, a more robust study quality assessment premise continues to be a challenge . Although these risks and challenges cannot be removed completely, various measures were taken to reduce bias as much as possible. For example, multiple assessors were used for each study included in our review and detailed information was provided about how we collected, assessed and analyzed the collected studies. Thus, by providing transparency by the study’s review protocol and by openly sharing the research data underlying our analysis and findings, other scholars are enabled to cross-check our findings and examine if other interpretations could be possible.
In addition, some of the identified factors driving or inhibiting the adoption of open research data have only been found in a single study. Thus, more evidence is needed to improve our understanding of these factors and to investigate whether they play a role in different contexts. Future research is recommended to empirically test the usability and completeness of the aforesaid factor overview and to adapt it to specific contexts of open data sharing and use behavior. Especially as future research should focus on whether the factor overview needs to be adapted for research data provision and use in specific research disciplines (e.g. astrophysics, genomics, humanities, social sciences, computer science). Furthermore, it should be investigated whether certain factors receive a higher weight in researchers’ trade-off to openly share research data or not, and in their trade-off to use open research data or not. Moreover, most of the studies examined were focused on research data sharing and use in the United States and in European countries, and to a much smaller extent on Asian, African, and other jurisdictions, while the latter should receive more attention. Finally, future research should focus on both designing infrastructures and institutional arrangements that altogether stimulate and incentivize both open research data sharing and use behavior.
S1 Table. Overview of studies included in our literature review.
S2 Table. Overview of drivers for openly sharing research data by researchers, identified in the studies included in our literature review.
S3 Table. Overview of inhibitors for openly sharing research data by researchers, identified in the studies included in our literature review.
S4 Table. Overview of drivers for using open research data by researchers, identified in the studies included in our literature review.
S5 Table. Overview of inhibitors for using open research data by researchers, identified in the studies included in our systematic literature review.
We would like to thank our research assistants, Yu-Jen Chen, Chieh-Yun Lin, and Wayland Chang. Moreover, we are grateful to the anonymous reviewers, whose comments served to make this a stronger contribution.
- 1. Sá C, Grieco J. Open data for science, policy, and the public good. Review of Policy Research. 2016;33(5):526–43.
- 2. Campbell J. Access to scientific data in the 21st century: Rationale and illustrative usage rights review. Data Science Journal. 2015;13:203–30.
- 3. Union European. Riding the wave: how Europe can gain from the rising tide of scientific data. Brussels; 2010.
- 4. Arza V, Fressoli M. Systematizing benefits of open science practices. Information Services & Use. 2017;37(4):463–74.
- 5. Rouder JN. The what, why, and how of born-open data. Behavior Research Methods. 2016;48(3):1062–9. pmid:26428912
- 6. Grechkin M, Poon H, Howe B. Wide-Open: Accelerating public data release by automating detection of overdue datasets. PLoS Biology. 2017;15(6):e2002477. pmid:28594819
- 7. Nielsen M. Reinventing Discovery: The New Era of Networked Science. New Jersey: Princeton University Press; 2012.
- 8. Piwowar HA, Day RS, Fridsma DB. Sharing detailed research data is associated with increased citation rate. PLoS ONE. 2007;2: e308(3):1–5.
- 9. Piwowar HA, Vision TJ. Data reuse and the open data citation advantage. PeerJ. 2013;1:e175. pmid:24109559
- 10. Enke N, Thessen A, Bach K, Bendix J, Seeger B, Gemeinholzer B. The user's view on biodiversity data sharing—Investigating facts of acceptance and requirements to realize a sustainable use of research data Ecological Informatics. 2012;11:25–33.
- 11. Fecher B, Friesike S, Hebing M. What drives academic data sharing? PloS one. 2015;10(2):e0118053. pmid:25714752
- 12. Molloy JC. The Open Knowledge Foundation: Open Data Means Better Science. PLoS Biology. 2011;9(12):1–4.
- 13. Mooney H, Newton MP. The anatomy of a data citation: Discovery, reuse, and credit. Journal of Librarianship and Scholarly Communication. 2012;1(1):eP1035.
- 14. Ceci SJ. Scientists' attitudes toward data sharing. Science, Technology, & Human Values. 1988;13(1/2):45–52. pmid:25309997
- 15. Savage CJ, Vickers AJ. Empirical study of data sharing by authors publishing in PLoS journals. PloS one. 2009;4(9):e7078. pmid:19763261
- 16. Boulton G, Rawlins M, Vallance P, Walport M. Science as a public enterprise: The case for open data. The Lancet. 2011;377(9778):1633–5.
- 17. Joo S, Kim S, Kim Y. An exploratory study of health scientists’ data reuse behaviors: Examining attitudinal, social, and resource factors. Aslib Journal of Information Management. 2017;69(4):389–407.
- 18. Haeusermann T, Greshake B, Blasimme A, Irdam D, Richards M, Vayena E. Open sharing of genomic data: Who does it and why? PLoS ONE. 2017;12(5):e0177158. pmid:28486511
- 19. Zuiderwijk A. Open data infrastructures: The design of an infrastructure to enhance the coordination of open data use. 's-Hertogenbosch: Uitgeverij BOXPress; 2015.
- 20. Zuiderwijk A, Janssen M, Dwivedi YK. Acceptance and use predictors of open data technologies: Drawing upon the unified theory of acceptance and use of technology. Government Information Quarterly. 2015;32(4):429–40.
- 21. Von St. Vieth B, Rybicki J, Brzezniak M. Towards flexible open data management solutions. 41st International Convention on Information and Communication Technology, Electronics and Microelectronics; May 22–26, 2017; Opatija, Croatia2017. p. 233–7.
- 22. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science. 2015;348(6242):1422–5. pmid:26113702
- 23. Hossain MA, Dwivedi YK, Rana NP. State-of-the-art in open data research: Insights from existing literature and a research agenda. Journal of organizational computing and electronic commerce. 2016;26(1–2):14–40.
- 24. Force11. The FAIR data principles 2016 [Available from: https://www.force11.org/group/fairgroup/fairprinciples.
- 25. Wilkinson MD, Dumontier M, Aalbersberg IJsbrand Jan, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Nature. 2016;3(160018):1–9.
- 26. Research Council UK. Concordat on Open Research Data 2016 [Available from: https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/.
- 27. Hart C. Doing a literature review: Releasing the research imagination. London: Sage Publications; 1998.
- 28. Dixon-Woods M, Bonas S, Booth A, Jones DR, Miller T, Sutton AJ, et al. How can systematic reviews incorporate qualitative research? A critical perspective. Qualitative research. 2006;6(1):27–44.
- 29. Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S. Systematic literature reviews in software engineering–a systematic literature review. Information and software technology. 2009;51(1):7–15.
- 30. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions: John Wiley & Sons; 2011.
- 31. Martinez-Rojas M, del Carmen Pardo-Ferreira M, Rubio-Romero JC. Twitter as a tool for the management and analysis of emergency situations: A systematic literature review. International Journal of Information Management. 2018;43:196–208.
- 32. Soheilirad S, Govindan K, Mardani A, Zavadskas EK, Nilashi M, Zakuan N. Application of data envelopment analysis models in supply chain management: A systematic review and meta-analysis. Annals of Operations Research. 2018;271(2):915–69.
- 33. Torgerson CJ. Publication bias: the Achilles' heel of systematic reviews? British Journal of Educational Studies. 2006;54(1):89–102.
- 34. Sekaran U, Bougie R. Research Methods For Business: A Skill Building Approach. 7 ed. West Sussex: Wiley; 2016.
- 35. Kitchenham B. Procedures for performing systematic reviews. Keele, UK, Keele University. 2004;33(2004):1–26.
- 36. Jalali S, Wohlin C, editors. Systematic literature studies: database searches vs. backward snowballing. Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement; 2012: IEEE.
- 37. Estabrooks CA, Field PA, Morse JM. Aggregating qualitative findings: an approach to theory development. Qualitative Health Research. 1994;4(4):503–11.
- 38. Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM Computing Surveys. 2009;41(3):1–52.
- 39. Bano M, Zowghi D. A systematic review on the relationship between user involvement and system success. Information and Software Technology. 2015;58:148–69.
- 40. Curty RG, Crowston K, Specht A, Grant BW, Dalton ED. Attitudes and norms affecting scientists’ data reuse. PLoS ONE. 2017;12(12):e0189288. pmid:29281658
- 41. Harper LM, Kim Y. Attitudinal, normative, and resource factors affecting psychologists’ intentions to adopt an open data badge: An empirical analysis. International Journal of Information Management. 2018;41:23–32.
- 42. Kim Y, Adler M. Social scientists’ data sharing behaviors: Investigating the roles of individual motivations, institutional pressures, and data repositories. International Journal of Information Management. 2015;35(4):408–18.
- 43. Kim Y, Yoon A. Scientists' data reuse behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology. 2017;68(12):2709–19.
- 44. Yoon A, Kim Y. Social scientists' data reuse behaviors: Exploring the roles of attitudinal beliefs, attitudes, norms, and data repositories. Library & Information Science Research. 2017;39(3):224–33.
- 45. Zenk-Möltgen W, Akdeniz E, Katsanidou A, Naßhoven V, Balaban E. Factors influencing the data sharing behavior of researchers in sociology and political science. Journal of documentation. 2018;74(5):1053–73.
- 46. Zuiderwijk A, Cligge M, editors. The Acceptance and Use of Open Data nfrastructures-Drawing upon UTAUT and ECT. Electronic Government and Electronic Participation: Joint Proceedings of Ongoing Research, PhD Papers, Posters and Workshops of IFIP EGOV and EPart 2016; 2016; Guimaraes, Portugal: IOS Press.
- 47. da Costa MP, Leite FCL. Factors influencing research data communication on Zika virus: a grounded theory. Journal of Documentation. 2019;75(5):910–26.
- 48. Zuiderwijk A, Spiers H. Sharing and re-using open data: A case study of motivations in astrophysics. International Journal of Information Management. 2019;49:228–41.
- 49. Sayogo DS, Pardo T. Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data. Government Information Quarterly. 2013;30(1):S19–S31.
- 50. Arzberger P, Schroeder P, Beaulieu A, Bowker G, Casey K, Laaksonen L, et al. Promoting access to public research data for scientific, economic, and social development. Data Science Journal. 2004;3(29):135–52.
- 51. Bezuidenhout L. Technology transfer and true transformation: implications for Open Data. Data Science Journal. 2017;16(26):1–13.
- 52. Cragin MH, Palmer CL, Carlson JR, Witt M. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2010;368(1926):4023–38.
- 53. Ganzevoort W, van den Born RJ, Halffman W, Turnhout S. Sharing biodiversity data: citizen scientists’ concerns and motivations. Biodiversity and Conservation. 2017:1–17.
- 54. Raffaghelli JE, Manca S. Is there a social life in open data? The case of open data practices in educational technology research. Publications. 2019;7(1):9.
- 55. Schmidt B, Gemeinholzer B, Treloar A. Open data in global environmental research: The Belmont Forum’s open data survey. PloS one. 2016;11(1):e0146695. pmid:26771577
- 56. Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, et al. Data sharing by scientists: practices and perceptions. PloS one. 2011;6(6):e21101. pmid:21738610
- 57. Wallis JC, Rolando E, Borgman CL. If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology. PLOS ONE. 2013;8(7):e67332. pmid:23935830
- 58. Yoon A. Data reusers' trust development. Journal of the Association for Information Science and Technology. 2017;68(4):946–56.
- 59. Zimmerman A. Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse. International Journal on Digital Libraries. 2007;7(1–2):5–16.
- 60. Rocher L, Hendrickx JM, De Montjoye Y-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nature communications. 2019;10(1):1–9. pmid:30602773
- 61. DiMaggio PJ, Powell WW. The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review. 1983;48(2):147–216.
- 62. Scott WR. Institutions and organizations. 2nd ed. Thousand Oaks: Sage Publications; 2001.
- 63. Deci EL, Cascio WF, Krusell J. Cognitive evaluation theory and some comments on the Calder and Staw critique. Journal of personality and social psychology. 1975;31(1):81–5.
- 64. Deci EL, Porac J. Cognitive evaluation theory and the study of human motivation. The hidden costs of reward: New perspectives on the psychology of human motivation. 1978;149:155–7.
- 65. Venkatesh V, Thong JYL, Xu X. Consumer acceptance and use of information technology: Extending the unified theory of acceptance and use of technology. MIS Quarterly. 2012;36(1):157–78.
- 66. Venkatesh V, Morris MG, Davis GB, Davis FD. User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly. 2003;27(3):425–78.
- 67. Lowry PB, Gaskin J, Twyman N, Hammer B, Roberts T. Proposing the hedonic-motivation system adoption model (HMSAM) to increase understanding of adoption of hedonically motivated systems. Journal of the Association for Information Systems. 2013;14(11):617–71.
- 68. Venkatesh V, Thong JYL, Chan FKY, Hu PJ-H, Brown SA. Extending the two-stage information systems continuance model: incorporating UTAUT predictors and the role of context. Information Systems Journal. 2011;21(6):527–55.
- 69. Adams JS. Towards an understanding of inequity. The Journal of Abnormal and Social Psychology. 1963;67(5):422.
- 70. Walster E, Berscheid E, Walster GW. New directions in equity research. Journal of personality and social psychology. 1973;25(2):151.
- 71. Walster E, Berscheid E, Walster GW. New directions in equity research: Academic Press; 1976.
- 72. Lundberg C, Gudmundson A, Andersson TD. Herzberg's Two-Factor Theory of work motivation tested empirically on seasonal workers in hospitality and tourism. Tourism management. 2009;30(6):890–9.
- 73. Herzberg F. Work and the nature of man. New York: World Publishing; 1971.
- 74. Herzberg F, Mausner B, Bloch Snyderman B. The motivation to work. New Jersey: Transaction Publishers; 2005.
- 75. Herzberg F, Snyderman BB, Mausner B. The Motivation to Work. 2nd ed: Wiley; 1967.
- 76. Vroom VH. Work and motivation. New York: Wiley; 1964.
- 77. Rogers EM. Diffusion of innovations. first ed. New York: Free Press; 1962.
- 78. Keller JM. Motivational design of instruction. In: Reigeluth CM, editor. Instructional design theories and models: An overview of their current status. first ed: Lawrence Erlbaum Associates; 1983. p. 383–434.
- 79. Keller JM. Development and use of the ARCS model of instructional design. Journal of instructional development. 1987;10(3):2.
- 80. Lowry PB, Gaskin JE, Moody GD. Proposing the Multimotive Information Systems Continuance Model (MISC) to better explain end-user system evaluations and continuance intentions. Journal of the Association for Information Systems. 2015;16(7):515–79.