Figures
Abstract
Scientific data management is at a critical juncture, driven by exponential data growth, increasing cross-domain dependencies, and a severe reproducibility crisis in modern research. Traditional centralized data management approaches are not only struggling with data volume but also fail to address the fragmentation of research results across domains. This hinders scientific reproducibility and cross-domain collaboration and increases concerns about data sovereignty and governance. This article proposes FAIR and federated Data Ecosystems as an improved architectural pattern for future research data ecosystems. It tries to incorporate the latest advancements in decentralized, distributed systems into existing research infrastructure to promote cross-domain collaboration. Based on established patterns from Data Commons, Data Meshes, and Data Spaces, our approach focuses on a layered architecture that consists of governance, data, service, and application layers. With this, it could be possible to preserve domain-specific expertise and control while facilitating data integration through standardized interfaces and semantic enrichment. Key requirements include adaptive metadata management, simplified user interaction, robust security, and transparent data transactions. Our architecture supports compute-to-data as well as data-to-compute paradigms, implementing a decentralized peer-to-peer network that scales horizontally. This article aims to provide both an impulse for the technical architecture as well as concepts for a governance framework so that FAIR and federated Data Ecosystems could enable researchers to build on existing work while maintaining control over their data and computing resources. This could provide a practical path towards an integrated research infrastructure that respects domain autonomy as well as interoperability requirements.
Citation: Beyvers S, Schlegel J, Brehm L, Hansen M, Goesmann A, Förster F (2026) Towards FAIR and federated data ecosystems for interdisciplinary research. PLoS Comput Biol 22(2): e1013806. https://doi.org/10.1371/journal.pcbi.1013806
Editor: Virginia E. Pitzer, Yale School of Public Health, UNITED STATES OF AMERICA
Published: February 13, 2026
Copyright: © 2026 Beyvers et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the Federal Ministry of Research, Technology and Space (BMFTR) via the German Network for Bioinformatics Infrastructure (de.NBI; W-de.NBI-010 to SB and FF); the German Research Foundation (DFG; 460129525 (NFDI4Microbiota) to JS and LB; 442032008 (NFDI4Biodiversity) to MH). Open Access funding enabled and organized by Projekt DEAL. All funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Exponential scientific data growth and increasing cross-domain dependencies create urgent needs for new data management solutions [1]. Next-generation sequencing technologies, for example, have heavily transformed genomic research [2] but also revealed a critical challenge: our current infrastructure struggles to handle not just the volume of data but also the intricate relationships between research outputs across domains. In handling these challenges, traditional solutions like data warehouses, data lakes, and domain-specific repositories are reaching their limits when dealing with large, interconnected datasets with strong sovereignty constraints [3].
This challenge extends beyond storage and processing. The reproducibility crisis in modern research [4,5], worsened by fragmented data management practices, demands immediate action. While initiatives like National Research Data Infrastructure [6], European Open Science Cloud [7], and various Data Commons [8,9] have begun addressing these challenges, they often operate in isolation.
From a technological standpoint, Cloud technologies offer elastic resource allocation [10,11] but can create data sovereignty and privacy challenges. Recent architectural patterns such as Data Commons [8] and Data Spaces [12] demonstrate the potential of distributed approaches aligned with Findable, Accessible, Interoperable, Reusable (FAIR) principles [13]. However, these solutions often rely on centralized control or focus more on domain-specific solutions. This often results in the formation of new silos rather than achieving true cross-domain integration.
We propose FAIR: an architectural pattern that combines the benefits of distributed systems with the reality of existing research infrastructures. This approach integrates established domain-specific repositories with support for intermediate research outputs, including experimental data, analysis workflows, and computational results. By providing both the technical architecture and hints for a governance framework for cross-domain collaboration, FFDE could enable researchers to build upon existing work while maintaining control over their data and computational resources.
This paper argues that the technology for such systems already exists. Federated peer-to-peer systems [14] and ideas for distributed governance models [15] are already in productive use today. But, the fragmented understanding of the requirements and benefits makes it difficult to convince governing bodies and researchers to integrate such systems. We illustrate how combining proven patterns from Data Commons, Data Spaces, and Cloud Computing can create a foundation for future scientific collaboration, one that respects both the sovereignty of individual research groups and the need for easy data integration across domains.
2 Drawing from existing architectural patterns
Three architectural patterns have emerged for modern research data management: Data Commons, Data Mesh, and Data Spaces.
The Data Commons pattern [8] (e.g., NIH Data Commons Australian Research Data Commons [16–18]) establishes a centralized, collaborative platform where communities can access, integrate, and analyze data collectively, benefiting from unified standards and shared resources for improved discovery and consistent quality. However, this centralized approach faces two main challenges. First, as data from diverse sources continues to grow, managing and processing these large volumes can strain the central infrastructure, often requiring a significant investment in cloud resources. Second, it is difficult for new organizations to participate in a Data Commons because most are designed for a predefined set of members and the participating organizations may lose control over their data management practices since they must adhere to unified standards and policies established by the central platform.
Another very popular pattern is the the Data Mesh pattern [19,20]. It treats data as a product owned by teams from specific domains, emphasizing decentralized data ownership and federated governance to manage large-scale data across organizations. Although this approach increases interoperability and scalability, it introduces coordination and integration complexities between different domain-owned data products. Each team may implement its own standards, technologies, and governance practices, which makes ensuring consistent data quality and collaboration across the organization challenging. While the domain oriented thinking is clearly a good approach for data management, data meshes were developed for enterprise environments where standardized infrastructure and consistent regulatory frameworks are typical. Research environments present different challenges due to heterogeneous systems and varying institutional policies, which hinders a direct application.
Lastly, Data Spaces [12] are a decentralized framework that enables secure, sovereign data sharing between independent organizations through trust frameworks and granular access controls. This offers benefits such as enhanced data sovereignty and standardized interoperability. Yet, they come with challenges like high implementation costs, since establishing the necessary infrastructure, security measures, and compliance processes requires significant investment and coordination among participants. Complex multi-party governance also poses difficulties, as aligning the interests, policies, and responsibilities of multiple organizations demands robust mechanisms for decision-making and conflict resolution. Adoption with this approach is also challenging due to network effects, because the value of participating in a Data Space increases as more organizations join, making it difficult to achieve critical mass and incentivize early adopters.
Although each architecture offers valuable features, the biggest challenge is combining their complementary strengths to create a unified architecture. FFDEs, therefore, try to selectively build on the strengths of these patterns. From Data Commons, FFDEs adopt standardized metadata schemas for cross-domain interoperability and community curation approaches for data quality assurance. Data Mesh contributes the fundamental principle of domain ownership, which allows participants to maintain control over their data while participating in larger ecosystems, alongside federated governance mechanisms and technological flexibility that accommodate diverse research practices. Data Spaces provide essential sovereignty preservation mechanisms, policy enforcement systems for fine-grained access control, and trust frameworks that enable secure multi-institutional collaboration while maintaining compliance with regulatory requirements.
3 FAIR and federated data ecosystems
Modern research increasingly relies on collaboration across different fields. For instance, findings from fluid dynamics can improve weather simulations, which in turn influence agricultural research on resilient plant breeds [21]. As we cannot predict which domains will need to collaborate in the future, research infrastructures must support flexible, cross-domain integration while preserving domain expertise and data sovereignty.
FFDEs represent a vision for addressing these challenges by combining the complementary strengths of Data Commons, Data Mesh, and Data Spaces architectures to extend FAIR principles across organizational boundaries. To successfully implement FFDEs several key requirements must be met which address both technical capabilities and practical usability across research communities. Table 1 outlines the detailed considerations along with some examples for current existing approaches associated with each requirement.
These requirements balance technical capabilities with practical usability while ensuring broad adoption across research communities.
One of the most important decisions is the network protocol, which determines how participants interact with each other. For the greatest autonomy, we believe that a fully decentralized, peer-to-peer network is the best starting point for data and metadata exchange. In this type of network, no single entity controls the infrastructure. While these have been difficult to implement in the past, large infrastructures like IPFS [22] now make them accessible through libraries like libp2p. Peer-to-Peer (P2P) architectures are particularly suited for scientific data exchange as they enable direct connections between research institutions regardless of their network configurations, automatically handling NAT traversal and firewall constraints that often isolate academic networks, while providing resilient data access even when individual nodes or entire institutions temporarily go offline. Modern P2P technologies typically include distributed hash tables like Kademlia [23] for discovering peers and datasets, enabling decentralized data discovery.
The second challenge is building trust between participants. For this it is crucial to reliably identify users, institutions and even services. To implement this we can adopt the ideas from Data Spaces that describe a trust framework in which trust is derived by a chain of cryptographic certificates and signatures that prove the identity of a participant. Rather than building new identity systems, we can leverage existing federated infrastructure like eduGAIN [24], which already connects thousands of research institutions worldwide. This approach builds on decades of investment in academic identity federation while extending trust relationships to include not just users but also institutions and computational services.
The third component addresses data discovery and access governance through federated catalogs. Standardized protocols like Open Digital Rights Language (ODRL) [25] or Data Use Conditions [26] communicate governance policies and access terms, while DCAT-based [27] semantic descriptions enable automated discovery of available datasets, services and conditions. Each participant maintains sovereignty over their data models and quality standards, exposing only the minimal technical metadata necessary for discovery and access through standardized interfaces.
Together, these three components create a data plane that handles networking, identity, trust, and discovery without requiring participants to sacrifice their existing infrastructure or domain expertise. This data plane is the technical foundation for enabling higher-level services: federated search engines that span multiple institutions, analytical workflows accessing cross-domain datasets, web portals that provide familiar interfaces to distributed resources, and AI systems that can identify patterns across previously isolated data silos.
The data plane can thus serve as the foundation for three additional architectural planes that complete the FFDE ecosystem (see Fig 1). The Governance and Access Control Plane implements decentralized governance through community clusters and federated identity management. The Service Plane transforms distributed data sources into integrated secondary resources through automated processes and semantic enrichment, providing a transformed view into the existing data. Finally, the Application Plane constructs user-facing interfaces that abstract the distributed architecture complexity, allowing researchers to focus on scientific questions rather than system management. This plane can also include traditional repositories and other user-facing applications. A more detailed description of these planes is found in Table 2.
Storage and compute nodes are interconnected with each other through an internal peer-to-peer network; connections to existing systems are established by adapters. All services and applications build on top of the dataplane and connect via standardized interfaces like DCAT.
The four-plane architecture enables distributed data governance, peer-to-peer data management, semantic data integration, and user-accessible research applications while maintaining institutional autonomy and data sovereignty. Examples of existing technologies and research application that could be used to implement this plane.
The important point is that these plains can be implemented today using existing technologies. The challenge isn’t technical innovation but organizational coordination, establishing the governance frameworks that allow institutions to connect their existing infrastructure into a federated research ecosystem. What remains is filling the gaps and coordinating these technologies through unified governance. Successful implementations must therefore integrate with established repository standards such as OAI-PMH [28], CKAN [29], and DCAT [27]. Additionally, they should expose these same standards to enable other implementations to harvest their data, ensuring bidirectional interoperability across the ecosystem.
The benefits compound with scale. Each new institution that joins brings not just data but also domain expertise, computational resources, and research communities. In an era of AI-driven discovery, where machine learning models can identify patterns across seemingly unrelated datasets, the value of cross-domain data access becomes exponential. Knowledge graphs can automatically link concepts across fields, while automated reasoning systems can suggest novel research directions, but only if the data is accessible.
3.1 The researchers perspective
Consider research investigating plant flowering responses to temperature changes, requiring integration of climate records with genomics data across institutions. In the current world, finding and gathering the necessary data is mostly a manual, labor-intensive process. However, with the widespread adoption of FFDEs, this process could be much easier and faster.
Finding suitable data and access requirements would become the researcher’s starting point. Cross-domain discovery operates through federated catalogs that aggregate metadata from participating institutions while maintaining distributed control. These catalogs are built on widely adopted metadata standards such as Schema.org [32] or Dublin Core [33] that serve as the base metadata layer for all participating nodes. These can be extended with domain-specific metadata schemas and profiles like CF standards for climate data or BioSchemas [34] for genomics datasets, all synchronized through the P2P protocol to enable automated linking and discovery. Crucially, the metadata includes machine-readable rights and duties expressed in standards like ODRL, allowing researchers to immediately understand what permissions they need, what usage restrictions apply, and what obligations they must fulfill to access each dataset. Moreover, this may only be the first step in the discovery process. If the domain already has an established ontology and/or metadata format, more detailed metadata files can be retrieved in this highly specialized, domain-specific form.
Locating distributed data follows once relevant datasets and their access requirements are identified. The federated ecosystem relies on content-addressed storage systems, which ensure data integrity and enable efficient distribution. These systems are proven technology that has been in use for decades (e.g., BitTorrent), allowing researchers to locate exact copies of datasets across multiple institutions without concerns about data corruption or version mismatches.
Accessing data through institutional networks becomes possible through a unified peer-to-peer networking protocol that connects all participating nodes regardless of their current network setup. Genomics institutions might implement specialized interfaces following established specifications like GA4GH standards [30], while climate research centers could provide domain-optimized protocols for temporal environmental datasets, yet all participate in the same federated network through the underlying P2P synchronization mechanism. Processing data with portable workflows and tasks eliminates the need for large-scale data movement. Computational workflows defined using portable standards like Common Workflow Language can execute across federated computing resources, allowing researchers to process distributed datasets through familiar analytical platforms. Individual researchers can directly join the network with their own computational resources, creating a resilient distributed system that requires no central coordinating instance or single point of failure.
Maintaining compliance and audit trails operates through existing academic federation systems, allowing researchers to authenticate using their institutional credentials while fine-grained policy engines enforce the machine-readable data usage agreements discovered during the metadata phase. Immutable audit logs capture every data access and usage event across the distributed network, providing transparent accountability and enabling compliance verification while supporting reproducibility requirements. The institutional benefits extend beyond technical efficiency: participating organizations retain complete control over their data while gaining access to previously siloed resources, reducing infrastructure costs through shared computational resources, and enhancing their research impact through expanded collaboration opportunities. For researchers, this architecture eliminates the traditional barriers between domains, making discovery of relevant datasets across institutions as straightforward as searching within a single repository, while portable workflows ensure that analyses remain reproducible and shareable regardless of where they execute. The resulting research environment preserves the specialized expertise and governance structures that different domains require while creating new opportunities for interdisciplinary discovery, ultimately enabling scientific investigations that would be impossible within traditional institutional boundaries.
4 Discussion and conclusion
The architectural components outlined in this paper show that the technical foundation for FFDEs already exists. Production systems already handle distributed data at petabyte scale. Identity federations connect thousands of institutions. Governance frameworks from existing initiatives provide working templates for federated control. When you combine these mature technologies with new AI capabilities, knowledge graphs, and automated reasoning, the opportunities for cross-domain scientific discovery grow rapidly.
The potential benefits of such an architecture extends beyond technical efficiency. By preserving institutional sovereignty while enabling collaboration, this approach addresses the fundamental tension between data control and data sharing. It promises reduced infrastructure costs through resource pooling, better data quality through semantic enrichment, and faster discovery through cross-domain integration. Today’s breakthroughs increasingly happen at disciplinary boundaries. Climate data informs epidemiology, genomics advances materials science. In this context, widespread data integration becomes increasingly valuable.
The framework we’ve proposed is deliberately flexible, accommodating everything from fully open science to heavily regulated closed research environments. It also can support many forms of data management, from rapidly changing “hot” datasets that are currently researched as well as data publications. In theory it could also accommodate textual publications, but it is not realistic to do so in the near future.
However, realizing this vision involves navigating significant non-technical challenges. Establishing federated governance structures requires reconciling different institutional policies, regulatory frameworks, and cultural practices around data sharing. Legal complexities arise from varying data protection regulations across jurisdictions, intellectual property concerns, and liability questions in collaborative research. The heterogeneity of existing systems, each with established workflows, user communities, and investment, creates substantial integration challenges beyond mere technical compatibility. Community acceptance is critical for federated data sharing success. Organizations like the Elixir network are already developing standards, providing valuable collaboration opportunities. Interoperating with existing repositories reduces adoption barriers without disrupting current workflows. This approach rapidly aggregates datasets using existing infrastructure, avoiding major hardware investments. Demonstrating immediate value with minimal investment builds momentum and community trust.
Another adoption barrier involves metadata harmonization in domains with established standards. Our multi-tiered approach requires maintaining multiple metadata representations: broad summaries in standards like Schema.org for cross-domain discovery alongside rich domain-specific metadata for specialized use. This dual maintenance burden may deter communities that have invested heavily in their metadata ecosystems. Successfully bridging domains will require robust terminology and ontology services to map concepts between specialized vocabularies. While such services exist, integrating them into federated workflows and convincing communities of their value represents another coordination challenge beyond technical implementation.
Perhaps most critically, the benefits of federated ecosystems must be effectively communicated to diverse stakeholders. Researchers need assurance that federation enhances rather than complicates their work. Institutions require evidence that sovereignty and security are maintained. Funding bodies seek demonstration of tangible returns on infrastructure investment. Without clear articulation of these benefits and honest acknowledgment of the challenges, even technically superior solutions may face adoption barriers.
The transition from isolated repositories to federated ecosystems represents both a technical evolution and a cultural shift in how scientific communities approach data. While the technology stands ready, success will ultimately depend on building trust, demonstrating value, and creating governance frameworks that balance openness with control. The growing urgency of global challenges, from climate change to pandemic response, underscores the importance of this transition. Each day of continued fragmentation represents missed opportunities for discovery and innovation that our interconnected world can no longer afford.
References
- 1.
Demchenko Y, Grosso P, de Laat C, Membrey P. Addressing big data issues in scientific data infrastructure. In: 2013 International Conference on Collaboration Technologies and Systems (CTS). 2013. https://doi.org/10.1109/cts.2013.6567203
- 2. Katz K, Shutov O, Lapoint R, Kimelman M, Brister JR, O’Sullivan C. The sequence read archive: a decade more of explosive growth. Nucleic Acids Res. 2022;50(D1):D387–90. pmid:34850094
- 3. Nargesian F, Zhu E, Miller RJ, Pu KQ, Arocena PC. Data lake management. Proc VLDB Endow. 2019;12(12):1986–9.
- 4. Miyakawa T. No raw data, no science: another possible source of the reproducibility crisis. Mol Brain. 2020;13(1):24. pmid:32079532
- 5. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. pmid:16060722
- 6. Hartl N, Wössner E, Sure-Vetter Y. Nationale Forschungsdateninfrastruktur (NFDI). Informatik Spektrum. 2021;44(5):370–3.
- 7.
Commission E for RDG Innovation. Realising the European open science cloud. Publications Office; 2016. https://doi.org/10.2777/940154
- 8. Grossman RL, Heath A, Murphy M, Patterson M, Wells W. A case for data commons: toward data science as a service. Comput Sci Eng. 2016;18(5):10–20. pmid:29033693
- 9. Barker M, Wilkinson R, Treloar A. The Australian research data commons. Data Science Journal. 2019;18:44.
- 10. Alzakholi O, Haji L, Shukur H, Zebari R, Abas S, Sadeeq M. Comparison among cloud technologies and cloud performance. JASTT. 2020;1(1):40–7.
- 11.
Patel YS, Mehrotra N, Soner S. Green cloud computing: a review on Green IT areas for cloud computing environment. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE). 2015. p. 327–32. https://doi.org/10.1109/ablaze.2015.7155006
- 12.
Otto B, ten Hompel M, Wrobel S. Designing data spaces. Springer International Publishing; 2022. https://doi.org/10.1007/978-3-030-93975-5
- 13. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. pmid:26978244
- 14. Wang S, Zhang Y, Zhang Y. A blockchain-based framework for data sharing with fine-grained access control in decentralized storage systems. IEEE Access. 2018;6:38437–50.
- 15. Dolhopolov A, Castelltort A, Laurent A. Implementing federated governance in data mesh architecture. Future Internet. 2024;16(4):115.
- 16.
ARDC. Making a difference through data-driven research. 2020. https://doi.org/10.5281/zenodo.6529949
- 17. Rahaman T. Open Data and the 2023 NIH Data Management and Sharing Policy. Med Ref Serv Q. 2023;42(1):71–8. pmid:36862609
- 18. Jensen MA, Ferretti V, Grossman RL, Staudt LM. The NCI Genomic Data Commons as an engine for precision medicine. Blood. 2017;130(4):453–9. pmid:28600341
- 19.
Wider A, Verma S, Akhtar A. Decentralized data governance as part of a data mesh platform: concepts and approaches. In: 2023 IEEE International Conference on Web Services (ICWS). 2023. https://doi.org/10.1109/icws60048.2023.00101
- 20. Goedegebuure A, Kumara I, Driessen S, Van Den Heuvel W-J, Monsieur G, Tamburri DA, et al. Data mesh: a systematic gray literature review. ACM Comput Surv. 2024;57(1):1–36.
- 21. Welsh E, Jirotka M, Gavaghan D. Post-genomic science: cross-disciplinary and large-scale collaborative research and its organizational and technological challenges for the scientific research process. Philos Trans A Math Phys Eng Sci. 2006;364(1843):1533–49. pmid:16766359
- 22.
Trautwein D, Raman A, Tyson G, Castro I, Scott W, Schubotz M, et al. Design and evaluation of IPFS. In: Proceedings of the ACM SIGCOMM 2022 Conference. 2022. p. 739–52. https://doi.org/10.1145/3544216.3544232
- 23.
Maymounkov P, Mazières D. Kademlia: a peer-to-peer information system based on the XOR metric. Lecture Notes in Computer Science. Berlin Heidelberg: Springer; 2002. p. 53–65. https://doi.org/10.1007/3-540-45748-8_5
- 24.
Lopez D, Solberg A, Stanica M. Edugain profiles and implementation guidelines. GE’ANT2. 2006.
- 25.
Ianella R. Open digital rights language (ODRL). 2007.
- 26. Jeanson F, Gibson SJ, Alper P, Bernier A, Woolley JP, Mietchen D, et al. Getting your DUCs in a row - standardising the representation of Digital Use Conditions. Sci Data. 2024;11(1):464. pmid:38719839
- 27.
Albertoni R, Browning D, Cox S, Beltran AG, Perego A, Winstanley P. Data Catalog Vocabulary (DCAT) - Version 3. World Wide Web Consortium (W3C). 2024. https://www.w3.org/TR/vocab-dcat-3/
- 28.
Lagoze C, de Sompel HV, Nelson M, Warner S. The open archives initiative protocol for metadata harvesting. 2002. http://www.openarchives.org/OAI/openarchivesprotocol.html
- 29.
Open Knowledge Foundation. CKAN: Comprehensive Knowledge Archive Network. 2024. https://ckan.org/
- 30. Rehm HL, Page AJH, Smith L, Adams JB, Alterovitz G, Babb LJ, et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 2021;1(2):100029. pmid:35072136
- 31. Freeberg MA, Fromont LA, D’Altri T, Romero AF, Ciges JI, Jene A, et al. The European Genome-phenome Archive in 2021 . Nucleic Acids Res. 2022;50(D1):D980–7. pmid:34791407
- 32. Guha RV, Brickley D, Macbeth S. Schema.org. Commun ACM. 2016;59(2):44–51.
- 33.
Weibel S, Kunze J, Lagoze C, Wolf M. Dublin Core Metadata for Resource Discovery. 2413 . Internet Engineering Task Force. 1998. https://datatracker.ietf.org/doc/rfc2413/
- 34.
Gray AJ, Goble C, Jimenez RC. Bioschemas: from potato salad to protein annotation. In: The 16th International Semantic Web Conference 2017 . RWTH Aachen University; 2017. https://bioschemas.org/
- 35. Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, et al. Darwin Core: an evolving community-developed biodiversity data standard. PLoS One. 2012;7(1):e29715. pmid:22238640
- 36. Carroll SR, Garba I, Figueroa-Rodríguez OL, Holbrook J, Lovett R, Materechera S, et al. The CARE principles for indigenous data governance. Data Science Journal. 2020;19.
- 37. Buck M, Hamilton C. The nagoya protocol on access to genetic resources and the fair and equitable sharing of benefits arising from their utilization to the convention on biological diversity. Review of European Community & International Environmental Law. 2011;20(1):47–61.
- 38. Berman F, Crosas M. The research data alliance: benefits and challenges of building a community organization. Harvard Data Science Review. 2020.
- 39.
Hedges M, Hasan A, Blanke T. Management and preservation of research data with iRODS. In: Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience. 2007. p. 17–22. https://doi.org/10.1145/1317353.1317358
- 40. Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative data analyses: 2024 update. Nucleic Acids Res. 2024;52(W1):W83–94. pmid:38769056
- 41. Fajardo EM, Dost JM, Holzman B, Tannenbaum T, Letts J, Tiradani A, et al. How much higher can HTCondor fly?. J Phys: Conf Ser. 2015;664(6):062014.
- 42. Michener W, Vieglais D, Vision T, Kunze J, Cruse P, Janee G. DataONE: data observation network for earth—preserving data and enabling innovation in the biological and environmental sciences. D-Lib Magazine. 2011;17(1/2):12.
- 43. Cinquini L, Crichton D, Mattmann C, Harney J, Shipman G, Wang F, et al. The earth system grid federation: an open infrastructure for access to distributed geospatial data. Future Generation Computer Systems. 2014;36:400–17.
- 44. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. pmid:10802651
- 45. Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, et al. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(Database issue):D940-6. pmid:22080554
- 46. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316–9. pmid:28398311
- 47. Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2. pmid:22908215
- 48.
Siemer S. Exploring the apache Jena framework. Gottingen: George August University; 2019.
- 49. King G. An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research. 2007;36(2):173–99.