Skip to main content
Advertisement
  • Loading metrics

The IPCC Interactive Atlas DataLab: Online reusability for regional climate change assessment

  • Ezequiel Cimadevilla,

    Roles Conceptualization, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • Maialen Iturbide,

    Roles Data curation, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • Antonio S. Cofiño,

    Roles Data curation, Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • Jesús Fernández,

    Roles Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • Lina E. Sitz,

    Roles Writing – review & editing

    Affiliations Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain, IPCC Working Group I Technical Support Unit (WGI TSU), Université Paris Saclay, Paris, France

  • Aida Palacio,

    Roles Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • Andrés Heredia,

    Roles Writing – review & editing

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

  • José M. Gutiérrez

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    jose.gutierrez@csic.es

    Affiliation Instituto de Física de Cantabria (IFCA), CSIC-Universidad de Cantabria, Santander, Spain

Abstract

The Sixth Assessment Report (AR6) of the IPCC highlights the regionally varying impacts of climate change on both mean and extreme conditions, with significant socio-economic impacts. A key innovation of AR6 is the Interactive Atlas, which provides spatial and temporal analysis of over 20 Climatic Impact-Drivers (CIDs), integrating global and regional projection datasets from the CMIP and CORDEX model intercomparison projects. The Interactive Atlas allows users to explore, compare and download products relevant to regional climate change risk assessment, while adhering to FAIR principles to ensure data accessibility and reusability. This paper describes additional work undertaken to further support these principles, acknowledging the challenges of processing large-scale datasets. We present a cloud-enabled data laboratory—DataLab— developed using R and Python frameworks for advanced analysis and visualization. The DataLab seamlessly integrates the gridded dataset underpinning the Interactive Atlas with computational resources, combining traditional remote data access services and modern cloud-based solutions to balance cost-effectiveness with technological advances. This comprehensive resource supports regional climate change assessment and could be used to inform national adaptation plans and strengthen climate change research and policy development.

1. Introduction

The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) highlights that climate change is impacting all regions worldwide and will increasingly do so in coming decades [1,2]. Each region experiences diverse changes in the mean and extreme climate conditions which drive manyfold socio-economic impacts [3,4]. To assess these changes, AR6 introduced Climatic Impact-Drivers (CIDs), encompassing key hazards characterizing climate risks such as heat and droughts [5] characterized by Essential Climate Variables (ECVs) and (extreme) indices derived from climate data. The AR6 report assessed regional past trends and future changes across emission scenarios and used Global Warming Levels (GWLs) as a new policy-relevant dimension to convey information about what the future will look like, depending on the mitigation efforts we make today. The Interactive Atlas (https://interactive-atlas.ipcc.ch, see Fig 1) was an innovation introduced in AR6 to expand the assessment, allowing flexible spatial and temporal analysis for most of the datasets and CIDs used in the report. Users can explore, interact with, and download global maps, spatially aggregated time series, and other regional products displaying recent trends and future changes across emission scenarios for over twenty CIDs. The Atlas allows comparing different lines of evidence based on different emission/forcing scenario families, such as Representative Concentration Pathways (RCPs) used in CMIP5 [6] and CORDEX [7] and the Shared Socio-Economic Pathways (SSPs) used in CMIP6 [8].

thumbnail
Fig 1. Screenshot of the IPCC AR6 Interactive Atlas web application (regional information components).

The Interactive Atlas is a novel tool for flexible spatial and temporal analyses of observed and projected climate change information underpinning the Working Group I contribution to the Sixth Assessment Report. The products provided by the Interactive Atlas include maps of past trends and future changes across scenarios including uncertainty information and time-series, stripes and seasonal plots, and other products for regionally aggregated information, for the subcontinental IPCC AR6 regions (screenshot from http://interactive-atlas.ipcc.ch, accessed 21 May 2025).

https://doi.org/10.1371/journal.pclm.0000644.g001

The AR6 and the Interactive Atlas are invaluable sources of information for national adaptation plans and regional climate change studies, serving as an authoritative reference for aligning regional analyses with global assessments. Facilitating the accessibility and reusability of this information was a key objective of AR6, guided by the FAIR principles and focusing in the first instance on the datasets and the code underlying the figures in the report and in the Atlas. This work was supervised by the IPCC Task Group on Data Support for Climate Change Assessments (TG-Data), the IPCC Technical Support Unit (TSU) of the Working Group I (WGI), and the IPCC Data Distribution Centers (IPCC-DDC) [912]. The Interactive Atlas served as a comprehensive test case for implementing the FAIR principles, with full publication of the underlying software and datasets. The code recipes (regridding, index calculation, bias adjustment, etc.) and auxiliary information (common grids, masks, shapefiles for regions, etc.) are available for reproducibility and reusability through the IPCC-WGI/Atlas repository (https://github.com/IPCC-WG1/Atlas; [13]). Additionally, the gridded monthly dataset that underpins the Atlas is accessible via the IPCC Data Distribution Centre (IPCC-DDC, https://www.ipcc-data.org/). These resources support the traditional “download-and-analyze” model for data processing [14]. However, the scale of the datasets presents significant challenges for users and computational infrastructures, leading to an increasing adoption of “next-to-data” computing approaches, including cloud-based solutions (see e.g. [1416]).

To address these challenges, the DataLab introduced in this work provides a notebook-based online platform that facilitates streamlined next-to-data analysis and visualization. It complements the Interactive Atlas by offering transparent access to the underlying datasets and supporting advanced climate data processing through integrated tools in R (via climate4R [17]) and Python (via xarray [18]). With a strong emphasis on reusability and accessibility, the DataLab enables collaborative and reproducible research workflows, helping users to more effectively leverage authoritative IPCC resources and tailor them to diverse regional and sectoral applications.

2. The IPCC-WGI AR6 Interactive Atlas dataset

The IPCC-WGI AR6 Interactive Atlas Gridded Monthly Dataset provides global and regional climate change projections for 22 Climate Impact Drivers (CIDs) featured in the IPCC Interactive Atlas (see Table 1) [19]. Bias-adjusted results (TX35ba, TX40ba) are included for the two threshold-dependent indices (TX35, TX40) using the ISIMIP3 bias adjustment method [20], enabling assessment of model bias impacts on these indices [21]. This dataset offers gridded information at monthly (and in some instances, annual) temporal resolution, derived from historical and future emission scenarios (Representative Concentration Pathways and Shared Socioeconomic Pathways) in CMIP5 [6], CMIP6 [8], and CORDEX [7]. An overview of the indices available for each dataset is provided in Table 2. Note that certain indices are excluded for specific CORDEX domains where they are not relevant, such as days with maximum near-surface temperature above 35C (TX35) in the CORDEX-ANT domain (Antarctica).

thumbnail
Table 1. Table of variables/indices available in the Interactive Atlas Dataset, grouped by type. Air temperatures refer to near-surface measurements (usually at 2 meters). See Table 2 for the availability of the variables across different dataset sources.

https://doi.org/10.1371/journal.pclm.0000644.t001

thumbnail
Table 2. Summary of data availability for the IPCC-WGI AR6 Interactive Atlas Dataset. Asterisks denote dataset source availability, with variables and indices grouped as in Table 1. The columns correspond to CMIP5, CMIP6, and the various CORDEX domains—AFR: Africa, ANT: Antarctica, ARC: Arctic, AUS: Australasia, CAM: Central America, EAS: East Asia, EUR: Europe, NAM: North America, SAM: South America, SEA: Southeast Asia, and WAS: South Asia—as well as the temporal frequency. CMIP5 and CMIP6 data are available at 2 and 1 resolution, respectively, and CORDEX data is provided at 0.5 resolution, except for CORDEX-EUR, which is available at a finer 0.25 resolution.

https://doi.org/10.1371/journal.pclm.0000644.t002

This dataset was derived from a volume of 200 TB of data accessed through the Earth System Grid Federation (ESGF) [22]. The ensembles were harmonized using common regular grids with horizontal resolutions of 2 (CMIP5), 1 (CMIP6), 0.5 (CORDEX), and 0.25 (European CORDEX domain) resulting in a collection of 863 NetCDF files totaling 500 GB, with NetCDF compression enabled. They include comprehensive metadata that adhere to CF conventions, ensuring correct data interpretation by applications and users. The corresponding data source inventory, reference grids, and code are available in the IPCC-WGI/Atlas repository (https://github.com/IPCC-WG1/Atlas; [13]).

The dataset is accessible for download from the IPCC Data Distribution Centre (IPCC-DDC, https://www.ipcc-data.org/), through its long-term archival site at DIGITAL.CSIC (https://digital.csic.es/), as well as from a copy stored at the Copernicus Data Store [19]. These platforms support the traditional “download-and-analyze” processing model [14], which remains suitable for users who prefer or require local data storage and processing capabilities, but is becoming increasingly impractical for others. As a result, “next-to-data” computing services are being increasingly adopted to enable more scalable and efficient analysis workflows (e.g. [1416]).

3. The IPCC Atlas DataLab

Climate researchers are quickly adopting “next-to-data” computing services for data analysis. These infrastructures are powered by recent technological developments, particularly notebook-based solutions such as Jupyter Labs [23] and cloud infrastructures integrating storage and computing resources [14,15,24]. Depending on the specific context and based on their different characteristics, these environments are commonly referred to as Cloud Native Repositories [14], Virtual Research Environments or Digital Libraries [25], Climate Analytics as a Service [16], Science Gateways [26], Collaboratories [27] or Inhabited Information Spaces [28]. This paper uses the term DataLab to refer to such systems [29]. Despite the differences in their architectures, available resources, and scope, all these systems share the common goal of providing more effective ways to use computer systems for information sharing and processing.

The Atlas DataLab presented here has been developed in the framework of IPCC-DDC CSIC activities to provide a notebook-based reproducibility and reusability platform, integrating the software and the data (see Sect 2) underpinning the Interactive Atlas. Fig 2 shows a schematic illustration of the architecture highlighting the different computing infrastructures supporting the DataLab. The foundation of the Atlas DataLab is a GitHub repository (see Sect 3.1) including the software and data access components together with illustrative notebooks reproducing the products of the Interactive Atlas and illustrating reusability. The GitHub repository contains two launchers that connect to interactive computing environments for executing the notebooks via JupyterHub: BinderHub and the IPCC-DDC CSIC Hub (see Sect 3.2). Both provide ready-to-use computational environments for running the notebooks. BinderHub offers free access but is subject to hardware and resource limitations [23]. In contrast, the IPCC-DDC CSIC Hub provides a more robust and scalable infrastructure, with enhanced CPU, memory, and bandwidth performance, leveraging cloud and high-performance computing (HPC) resources hosted at the Instituto de Física de Cantabria (IFCA) to support IPCC-DDC activities. Access to the CSIC Hub is currently restricted to authorized users within IPCC initiatives, though efforts are underway to make it available as a Software-as-a-Service (SaaS) offering through the European Open Science Cloud (EOSC) marketplace [30], enabling broader open access via EOSC resources.

thumbnail
Fig 2. Diagram illustrating the architecture of the Atlas DataLab, emphasizing its three usage modes: (1) local computing with remote data access (blue), (2) BinderHub with remote data access (red), and (3) CSIC Hub with local data access (black).

Remote data access is enabled via a THREDDS Data Server hosted at the CSIC Hub.

https://doi.org/10.1371/journal.pclm.0000644.g002

The development of the Atlas DataLab carefully considered the strengths and limitations of common approaches used in the climate community for implementing data access layers. Traditional services, such as those based on the OPeNDAP protocol [31], enable remote access and support federated architectures. In contrast, modern cloud-based solutions integrate computing and storage within cloud infrastructures. The traditional approach is advantageous when working with data hosted in established community repositories such as the ESGF [22], as it avoids data duplication or reformatting and ensures access to the latest versions, including updates and bug fixes. Cloud-based approaches, meanwhile, are better suited to the rapid development of data-intensive technologies and align more closely with current technological developments. This paper compares the performance of these two paradigms by reproducing key products from the Interactive Atlas in Sect 3.3. The comparison supports a fit-for-purpose design of the data access layer, ensuring that the solutions chosen for the DataLab effectively meet user needs while harnessing the complementary strengths of both traditional and cloud-based approaches.

3.1. The DataLab GitHub repository

As previously noted, the Atlas DataLab is maintained through a dedicated GitHub repository (https://github.com/SantanderMetGroup/IPCC-Atlas-Datalab; archived version available at [32]), which hosts the software components, including illustrative Python and R notebooks that provide seamless access to the Interactive Atlas dataset (see Fig 2). GitHub provides a collaborative environment for code development with integrated version control and issue tracking [33] which has been previously used in IPCC activities, in particular for the reproducibility and reusability of some of the AR6 products, including the Interactive Atlas via the IPCC-WGI/Atlas repository [13]. Researchers with a GitHub account can contribute to the repository of the DataLab by suggesting improvements, reporting issues, or submitting pull requests, for example, to modify or add notebooks or additional software components. The repository is self-documented using Markdown README files.

The Atlas DataLab repository includes configuration files used to set up a research environment conforming to the Reproducible Execution Environment Specifications (REES). Software dependencies for data access and analysis using Conda (https://docs.conda.io), while system dependencies are handled with Docker [34]. Users can easily build this environment using the launch badges on the main README page. Once initiated, the repository content is automatically cloned, granting users access to ready-to-run resources through interactive online services such as BinderHub [23], where the data is accessed remotely (red line in Fig 2), or the CSIC Hub, which supports next-to-data computing (black line in Fig 2). Alternatively, users can run Docker images on their local workstations, allowing them to reproduce the environment and run notebooks locally, thanks to remote data access (blue line in Fig 2). The CSIC Hub infrastructure is explained in more detail in Sect 3.2.

The Atlas DataLab repository extends the functionalities of the IPCC-WGI/Atlas repository. While the IPCC-WGI/Atlas repository includes code recipes (e.g., regridding, index calculation, bias adjustment) and auxiliary resources (such as common grids, masks, and shapefiles for regions) used to prepare Interactive Atlas products, the DataLab repository offers direct access to the Interactive Atlas dataset (see Sect 3.2). Additionally, it includes notebooks that demonstrate how to access and load the data within the working environment, as well as perform subsequent step-by-step operations to reproduce, reuse, and customize key Interactive Atlas products, thereby enhancing their reproducibility and reusability. These products include global change maps (for future periods across different scenarios or Global Warming Levels) and regionally aggregated outputs, such as time series and climate stripes. In this paper, we present two typical use cases (Sect 4).

End-to-end data processes, from data access and loading to generating and visualizing final results, are carried out using specialized Python and R tools within separate Conda environments. In Python, the primary tool is xarray [18], while in R, the climate4R [17] framework is utilized. Both libraries are widely recognized in the climate community for their efficiency in handling climate datasets and for providing robust and flexible analysis frameworks. This dual-language approach enables streamlined workflows that leverage the unique strengths of each programming language.

3.2. The DataLab CSIC Hub infrastructure

The CSIC Hub supports IPCC-DDC activities with scalable cloud and HPC infrastructures hosted in CSIC premises at IFCA. The next-to-data computing component of the DataLab (Fig 2) has been deployed in the cloud infrastructure [29], providing access to cloud computing machines with sufficient capacity to handle workloads that require extensive CPU and memory resources, which is essential for some specialized data analysis tasks required to reproduce some Interactive Atlas products.

The data access component of the Atlas DataLab is deployed in the HPC infrastructure (Fig 2), with the NetCDF files of the Atlas Dataset (see Sect 2) in a General Parallel File System (GPFS). A THREDDS Data Server (TDS) [35] facilitates ex-situ remote access to the files of the Atlas Dataset via OPeNDAP [31]. This access method is essential for retrieving data from BinderHub or from environments installed on local workstations. In the case of next-to-data computing within the CSIC Hub, in-situ access is also possible by pointing directly to the path of the stored NetCDF files. In this case, the volume of the GPFS file system including the Interactive Atlas dataset is shared via NFS (Network File System) within the OpenStack cloud environment, so files are available via a local area network where Jupyter Hub is hosted. This hybrid in-situ/ex-situ data access setup allows access to the Atlas Dataset from different infrastructures and allows testing the performance of the alternative approaches for the calculation of illustrative products from the IPCC Interactive Atlas (see Sect 3.3).

Both in-situ (NFS) and ex-situ (OPeNDAP) data endpoints are cataloged in a CSV inventory file (data_inventory.csv), which enables convenient querying across various programming environments, such as Python (via Pandas) and R (through native CSV processing). This catalog allows users to locate dataset endpoints based on several facets: type (indicating whether the data is accessed via OPeNDAP or NFS); variable (the climate variable or index, as detailed in Table 1); project (the dataset source, including CMIP5, CMIP6, or the different CORDEX domains, as shown in Table 2); experiment (including historical, RCP26, SSP126, etc.); and frequency (with values "mon" for monthly and "yr" for yearly, as shown in Table 2). All Jupyter Notebooks feature step-by-step examples showing how to search for relevant data in the CSV inventory, as well as how to load and work with it efficiently.

3.3. Performance analysis: Next-to-data vs remote access

Traditional climate data services rely on client/server technologies such as OPeNDAP, which facilitates remote access to climate data subsets using a flexible protocol and a well-defined transmission format. Moreover, several software frameworks, such as climate4R and xarray, are OPeNDAP-enabled, allowing transparent data processing without file downloads. This approach ensures resource-efficient access to the specific data subsets required for particular tasks. The Atlas DataLab implements this strategy, as does the ESGF, which manages and distributes the massive volumes of data from global CMIP and regional CORDEX climate change projection initiatives. Additionally, a next-to-data version of the Atlas DataLab is deployed in the CSIC Hub, with local data access.

This section compares the performance of the different data access alternatives using an illustrative data-intensive product of the Interactive Atlas described in Sect 4.1, where annual time-series of mean global warming signals are computed for different alternative scenarios. The Python notebook provided for this use case is run with different DataLab configurations and the resulting performance is analyzed focusing on three primary metrics: latency, throughput and amount of data transferred over the network. Two usage scenarios were tested:

  • Next-to-data: The Atlas DataLab instance deployed on the CSIC Hub, leveraging a cloud infrastructure (black line in Fig 2).
  • Remote data access: An Atlas DataLab instance running on an HPC system outside the IFCA premises (blue line in Fig 2).

The remote data access use cases were tested with and without HTTP compression, using a different number of processes to account for parallelism. Compression over HTTP reduces the size of the information sent over the network. Thus, the latency from data compression is offset by the reduced data transfers, especially when the compression ratio is high enough. This effect is further enhanced when dealing with slow Internet connections. Note that although NetCDF already uses chunk compression, OPeNDAP decompresses the data before sending it through the network.

Fig 3 summarizes the computing times from five executions of the global warming time-series notebook (see Sect 4.1). These results demonstrate the repository’s capability to handle extensive data processing with consistent performance metrics. The rows show the results for the two usage scenarios: remote data access from an external network (first row) and CSIC Hub with intranet access to data services (second row) . Access to data via OPeNDAP to the THREDDS Data Server is tested in both cases (first two columns, with and without compression, respectively), whereas in-situ access via NFS is only available from the CSIC Hub (third column). As expected, direct in-situ access within the CSIC Hub demonstrated the best performance (smallest runtimes) since no intermediate servers were involved. Remote data access via the THREDDS server with HTTP compression effectively reduced network data transfer by nearly 50% (see the inset numbers), albeit at the cost of throughput due to the additional CPU load for server-side compression. Parallel requests improved overall performance, allowing the server to leverage multiple cores for performing network traffic compression.

thumbnail
Fig 3. Experimental results of data retrieval from two Atlas DataLab setups: next-to-data deployment in the CSIC-DDC Hub (bottom) and remote data access from a user’s workstation (top).

The bars represent the mean retrieval time across ten experiment replicates for each worker configuration, with error bars indicating variability (minimum and maximum times). Inset numbers denote the volume of network data transferred. The first two columns illustrate THREDDS data access via OPeNDAP, while the last column shows results for in-situ access via file system.

https://doi.org/10.1371/journal.pclm.0000644.g003

This analysis highlights the importance of physical proximity to the THREDDS server. HTTP compression proves advantageous when multiple CPUs are available to perform compression, reducing latency and nearly halving the data volume transmitted over the network. These findings underscore the critical role of compression in optimizing remote data access. Emerging cloud-optimized file formats [14] are anticipated to further enhance remote data access by eliminating redundant decompression/compression cycles inherent in OPeNDAP data access.

4. Illustrative case studies

The “getting started” notebooks of the GitHub Atlas DataLab [32], implemented both in Python and R, provide an introduction to the capabilities of the DataLab for both programming languages with illustrative examples. Two additional notebooks illustrate end-to-end reproducibility and reusability workflows for two case studies corresponding to key AR6 products [1]: 1) global warming time series across different scenarios and 2) maps of projected changes and robustness for various Global Warming Levels (GWLs). The first example replicates the time series of global surface air temperature changes from Fig 4.2 (first panel) in WGI AR6 Chapter 4 [36], while the second example recreates an Interactive Atlas result for Europe, depicting projected precipitation changes at +3C warming. These two examples represented in Fig 4 are described in Sects 4.1 and 4.2 and correspond to data-intensive and data-light examples, the former used to evaluate the performance of different DataLab configurations in Sect 3.3. They are implemented in separate Python and R notebooks, providing flexibility to analyze different variables/indices, scenarios, GWLs, regions, and other parameters. Fig 5 displays the end-to-end code used to generate the results of the second case study (Fig 4B), illustrating the type and length of code that users will encounter in the DataLab notebooks.

thumbnail
Fig 4. Illustrative case studies of relevant products from the Interactive Atlas directly reproducible from the notebooks included in the Atlas DataLab GitHub repository [32].

The first example (A) replicates the time series of global surface air temperature changes from Fig 4.2 (first panel) in WGI AR6 Chapter 4 [36]. The second example (B) recreates an Interactive Atlas result for Europe, depicting projected precipitation changes at +3C warming. Locations with low model agreement are displayed using diagonal lines [2]. The base layer of the map can be obtained from the GSHHG database [37] and can be obtained from https://www.ngdc.noaa.gov/mgg/shorelines/data/gshhg/latest/gshhg-shp-2.3.7.zip.

https://doi.org/10.1371/journal.pclm.0000644.g004

thumbnail
Fig 5. Two-column representation of the R code generating Fig 4B.

The base layer of the map is provided by the visualizeR R package [38].

https://doi.org/10.1371/journal.pclm.0000644.g005

Additionally, users will find other notebooks for performing additional analyses, such as extracting results for land-only or pre-delimited regions (e.g., the IPCC-WGI Reference Regions [39]), or creating different types of visuals, such as stripes (https://showyourstripes.info) or global warming level plots [40].

4.1. Global change time series across scenarios

The notebook GSAT-change_time-series.ipynb provides step-by-step, annotated code to generate an annual time series of global temperature changes projected by CMIP6 (see Fig 4A). This analysis covers the period from 1950 to 2100 and includes various radiative scenarios representing low, medium, and high emissions (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). For this case study, the notebook retrieves and processes data for the entire global domain, all available SSP scenarios, and CMIP6 models, thereby demonstrating the DataLab’s capability to handle large datasets effectively. The graph is presented relative to the 1995-2014 period (left axis) and the pre-industrial 1850-1900 period (right axis). This replicates one of the figures from WGI AR6, specifically the first panel of Fig 4.2 in Chapter 4 [36]. Although slight differences exist in the ensemble of models used compared to AR6, the results are virtually identical. The notebook allows users to easily change the variable of interest, enabling the generation of similar plots for each index and variable included in the IPCC WGI Atlas dataset. The execution of this notebook can take between 15 minutes and 1 hour, depending on whether it is run from the local copy of the CSIC-DDC Hub or an external home network connection, respectively.

4.2. Maps of changes and robustness for different GWLs

The notebook Maps_of_change_under_global-warming-levels.ipynb reproduces climate change anomaly maps for a specified global warming level (GWL). To do so, it extracts information on the time periods during which various GCMs reach different levels of global surface temperature warming (relative to the 1850-1900 period). This example builds on the information produced for the Atlas chapter in the framework of the IPCC FAIR activities described above [13], such as GWL periods for the different CMIP6 (also for CMIP5 and CORDEX), the IPCC-WGI Reference regions [39], land-sea masks, etc., thus ensuring end-to-end reproducibility.

This example focuses on the +3C GWL by extracting the corresponding time windows from the SSP5-8.5 scenario. Data for each GCM is then loaded separately by requesting the corresponding warming level period, along with other analysis dimensions, such as the target season and variable (DJF precipitation in this example) and the coordinates delimiting the desired study area (Europe in this example). Historical data is also retrieved to compute the relative anomaly and associated uncertainty, both of which are presented in a map of the ensemble mean of Fig 4B. Uncertainty is calculated for both the simple and advanced methods described for the Interactive Atlas [2]. The notebook allows the reproduction of the final figure but also provides relevant auxiliary context information and figures. For instance, Fig 6 shows the periods when the different models forming the ensemble (in rows) reach the +3C global warming. The execution of this notebook takes just a handful of minutes, even when executed from remote workstations.

thumbnail
Fig 6. Auxiliary information of (twenty year) periods when the different models forming the ensemble (in rows) reach the +3C global warming levels under the SSP5-8.5 future scenario.

The figure shows the variability of the timing to reach the GWL across the ensemble and allows to compute relevant context information for the case study Maps of changes and robustness for different GWLs (Sect 4.2).

https://doi.org/10.1371/journal.pclm.0000644.g006

5. Conclusions

Reproducibility is of paramount importance in the context of the Intergovernmental Panel on Climate Change (IPCC), as it ensures transparency, credibility, and robustness in the findings that inform global climate policy. By enabling scientists and policymakers to verify results and reproduce analyses, reproducibility fosters trust in conclusions drawn from complex datasets and models. Moreover, it facilitates collaboration and accelerates scientific progress. In the context of the IPCC report, where decisions have far-reaching implications for climate action and adaptation, reproducibility ensures that all stakeholders can critically evaluate the evidence supporting recommendations.

To address these needs, we have developed a DataLab designed to reproduce real examples from the Sixth Assessment Report (AR6) and to assess the content of the Interactive Atlas developed for the IPCC. This DataLab strikes an effective balance between advancing the current state of FAIR data principles and managing the costs associated with supporting infrastructure. The DataLab explores and demonstrates technologies for data sharing, emphasizing their practical application in public climate services within the framework of climate change and the international exchange of climate data. The design of the DataLab aims to enhance the FAIR data practices applied in the IPCC-WGI AR6 Atlas and real examples from AR6 are used to analyze the performance in relevant use cases.

A key component of the DataLab is the data access service. We selected NetCDF (Network Common Data Form) as the storage format due to its ability to handle multidimensional data, including latitude, longitude, altitude, and time. NetCDF supports the storage of vast amounts of structured data in a highly compressed, self-describing format, enabling easy access, manipulation, and sharing across different software platforms. OPeNDAP (Open-source Project for a Network Data Access Protocol) further enhances the utility of NetCDF by enabling remote access to NetCDF datasets over the internet without requiring users to download entire files. This integration ensures that data access is language-agnostic, supported by a diverse range of programming languages through compatible bindings and clients. To evaluate the performance of various data access methods, specifically in-situ versus remote OPeNDAP access, we conducted a performance analysis experiment. This experiment highlights the advantages and costs associated with different data access approaches, demonstrating the benefits of executing computations close to the data source.

While the current implementation demonstrates a promising proof of concept, important challenges remain for the potential use of the DataLab in future IPCC assessments, such as AR7, to provide broader on-the-fly analyses and interactive support for authors. Such integration would not only reinforce the reproducibility and transparency of the assessment process but also promote a more dynamic, collaborative environment among authors and contributors. While the DataLab has demonstrated scalability under current testing conditions, ensuring its continued scalability remains a challenge. Future developments include testing cloud-optimized technologies for climate data analysis [14] that leverage object storage-optimized data formats and access patterns—such as S3 buckets and Zarr stores—thereby enabling extended workflows capable of handling more complex, data-intensive use cases.

Beyond these aspects, establishing and maintaining technologies, such as the DataLab presented in this work, is not a trivial undertaking; it requires considerable infrastructure, resources, and human effort for both development and ongoing management—including consistent updating of datasets, ensuring tool accessibility across diverse user groups, providing standards for metadata and provenance, and coordinating skilled personnel to ensure the sustainability of these services. In line with this, future research should explore variations in performance and reliability in regions with less robust digital infrastructure, ultimately helping to design more inclusive solutions for global accessibility, such as refining data compression techniques to reduce bandwidth consumption.

Overall, the IPCC Interactive Atlas DataLab represents a significant step forward in enhancing reproducibility and reusability within the IPCC framework, laying the groundwork for a future where real-time, interactive data analysis becomes integral to the climate assessment process. Importantly, the DataLab is also accessible to a broader audience through BinderHub, albeit with hardware and resource limitations that may restrict its performance in high-demand scenarios. To address these constraints, our next step is to integrate the DataLab into the European Open Science Cloud (EOSC), enabling it to operate within a federated cloud environment that supports long-term storage, advanced computational capabilities (including HPC integration), and standardized data management practices.

References

  1. 1. Arias PA, Bellouin N, Coppola E, Jones RG, Krinner G, Marotzke J. Technical summary. In: Masson-Delmotte V, Zhai P, Pirani A, Connors SL, Péan C, Berger S, editors. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press. 2021. p. 33–144.
  2. 2. Gutiérrez JM, Jones RG, Narisma GT, Alves LM, Amjad M, Gorodetskaya IV. Atlas. In: Masson-Delmotte V, Zhai P, Pirani A, Connors SL, Péan C, Berger S, editors. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press. 2021. p. 1927–2058.
  3. 3. Birkmann J, Liwenga E, Pandey R, Boyd E, Djalante R, Gemenne F. Poverty, livelihoods and sustainable development. In: Pörtner HO, Roberts DC, Tignor M, Poloczanska ES, Mintenbeck K, Alegría A, editors. Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, UK and New York, USA: Cambridge University Press. 2022. p. 1171–274.
  4. 4. Bezner Kerr R, Hasegawa T, Lasco R, Bhatt I, Deryng D, Farrell A, et al. Food, fibre, and other ecosystem products. In: Pörtner HO, Roberts DC, Tignor M, Poloczanska ES, Mintenbeck K, Alegría A, et al., editors. Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, UK and New York, USA: Cambridge University Press. 2022. p. 713–906.
  5. 5. IPCC. Annex VIII: acronyms. In: Masson-Delmotte V, Zhai P, Pirani A, Connors SL, Péan C, Berger S, editors. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press. 2021. p. 2257–66.
  6. 6. Taylor KE, Stouffer RJ, Meehl GA. An overview of CMIP5 and the experiment design. Bullet Am Meteorol Soc. 2012;93(4):485–98.
  7. 7. Diez-Sierra J, Iturbide M, Gutiérrez JM, Fernández J, Milovac J, Cofiño AS, et al. The worldwide C3S CORDEX grand ensemble: a major contribution to assess regional climate change in the IPCC AR6 Atlas. Bullet Am Meteorol Soc. 2022;103(12):E2804–26.
  8. 8. Eyring V, Bony S, Meehl GA, Senior CA, Stevens B, Stouffer RJ, et al. Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci Model Dev. 2016;9(5):1937–58.
  9. 9. Pirani A, Alegria A, Al Khourdajie A, Gunawan W, Gutiérrez JM, Holsman K. The implementation of FAIR data principles in the IPCC AR6 assessment process. Task Group on Data Support for Climate Change Assessments (TG-Data) guidance document. Zenodo. 2022.
  10. 10. Stockhause M, Huard D, Al Khourdajie A, Gutiérrez JM, Kawamiya M, Klutse NAB, et al. Implementing FAIR data principles in the IPCC seventh assessment cycle: lessons learned and future prospects. PLOS Clim. 2024;3(12):e0000533.
  11. 11. Pirani A, Cammarano D, Fisher E, Krüss B, Matthews R, Pascoe C. Experience in the implementation of FAIR data principles in the WGI AR6 assessment. Zenodo. 2022. https://zenodo.org/record/6992173
  12. 12. Pirani A, Matthews R, Sitz L. AR6 Working Group I FAIR Supplementary Material. 2022. https://zenodo.org/record/6451137
  13. 13. Iturbide M, Fernández J, Gutiérrez JM, Pirani A, Huard D, Al Khourdajie A, et al. Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository. Sci Data. 2022;9(1):629. pmid:36243817
  14. 14. Abernathey RP, Augspurger T, Banihirwe A, Blackmon-Luca CC, Crone TJ, Gentemann CL, et al. Cloud-native repositories for big scientific data. Comput Sci Eng. 2021;23(2):26–35.
  15. 15. Gray J, Liu DT, Nieto-Santisteban M, Szalay A, DeWitt DJ, Heber G. Scientific data management in the coming decade. SIGMOD Rec. 2005;34(4):34–41.
  16. 16. Schnase JL, Lee TJ, Mattmann CA, Lynnes CS, Cinquini L, Ramirez PM, et al. Big Data challenges in climate science. IEEE Geosci Remote Sens Mag. 2016;4(3):10–22. pmid:31709380
  17. 17. Iturbide M, Bedia J, Herrera S, Baño-Medina J, Fernández J, Frías MD, et al. The R-based climate4R open framework for reproducible climate data access and post-processing. Environ Model Softw. 2019;111:42–54.
  18. 18. Hoyer S, Hamman J. xarray: N-D labeled arrays and datasets in python. JORS. 2017;5(1):10.
  19. 19. Copernicus Climate Change Service. Gridded monthly climate projection dataset underpinning the IPCC AR6 Interactive Atlas. Copernicus Climate Change Service. 2023. https://cds.climate.copernicus.eu/doi/10.24381/cds.5292a2b0
  20. 20. Lange S. Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0). Geosci Model Dev. 2019;12(7):3055–70.
  21. 21. Iturbide M, Casanueva A, Bedia J, Herrera S, Milovac J, Gutiérrez JM. On the need of bias adjustment for more plausible climate change projections of extreme heat. Atmosph Sci Lett. 2021;23(2):e1072.
  22. 22. Cinquini L, Crichton D, Mattmann C, Harney J, Shipman G, Wang F, et al. The Earth System Grid Federation: an open infrastructure for access to distributed geospatial data. In: 2012 IEEE 8th International Conference on E-Science. Chicago, IL, USA: IEEE; 2012. p. 1–10. http://ieeexplore.ieee.org/document/6404471/
  23. 23. Project Jupyter, Bussonnier M, Forde J, Freeman J, Granger B, Head T, et al. Binder 2.0 - Reproducible, interactive, sharable environments for science at scale. In: Akici F, Lippa D, Niederhut D, Pacer M, editors. Proceedings of the 17th Python in Science Conference; 2018. p. 113–20.
  24. 24. Hey T, Gannon D, Pinkelman J. The future of data-intensive science. Computer. 2012;45(5):81–2.
  25. 25. Candela L, Castelli D, Pagano P. Virtual research environments: an overview and a research agenda. Data Sci J. 2013;12:GRDI75–81.
  26. 26. Wilkins‐Diehr N. Special issue: science gateways—common community interfaces to grid resources. Concurr Comput. 2006;19(6):743–9.
  27. 27. Wulf WA. The collaboratory opportunity. Science. 1993;261(5123):854–5. pmid:8346438
  28. 28. Snowdon D, Churchill EF, Munro AJ. Collaborative virtual environments: digital spaces and places for CSCW: an introduction. Computer supported cooperative work. London: Springer; 2001. p. 3–17. https://doi.org/10.1007/978-1-4471-0685-2_1
  29. 29. Hoz AP, Heredia Canales A, Cimadevilla Álvarez E, Obregón Ruiz M, López García Á. DataLab as a service: distributed computing framework for multi-interactive analysis environments. IEEE Access. 2025;13:22566–77.
  30. 30. Saenen B, Borrell-Damian L. Federating research infrastructures in Europe for FAIR access to data: Science Europe briefing on EOSC. European Open Science Cloud (EOSC). 2022. https://zenodo.org/record/7346887
  31. 31. Garcia J, Fox P, West P, Zednik S. Developing service-oriented applications in a grid environment. Earth Sci Inform. 2009;2(1–2):133–9.
  32. 32. Cimadevilla E, Iturbide M. SantanderMetGroup/IPCC-Atlas-Datalab. PLOSv6. 2025. https://zenodo.org/doi/10.5281/zenodo.15363392
  33. 33. Lima A, Rossi L, Musolesi M. Coding together at scale: GitHub as a collaborative social network. ICWSM. 2014;8(1):295–304.
  34. 34. Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014;2014(239):2.
  35. 35. Caron J, Davis E, Hermida M, Heimbigner D, Arms S, Ward-Garrison C. Unidata THREDDS data server. 1997. http://www.unidata.ucar.edu/software/tds/
  36. 36. Lee JY, Marotzke J, Bala G, Cao L, Corti S, Dunne JP, et al. Future global climate: scenario-based projections and near-term information. In: Masson-Delmotte V, Zhai P, Pirani A, Connors SL, Péan C, Berger S, et al., editors. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press. 2021. p. 553–672.
  37. 37. Wessel P, Smith WHF. A global, self‐consistent, hierarchical, high‐resolution shoreline database. J Geophys Res. 1996;101(B4):8741–3.
  38. 38. Frías MD, Iturbide M, Manzanas R, Bedia J, Fernández J, Herrera S, et al. An R package to visualize and communicate uncertainty in seasonal climate prediction. Environ Model Softw. 2018;99:101–10.
  39. 39. Iturbide M, Gutiérrez JM, Alves LM, Bedia J, Cimadevilla E, Cofiño AS, et al. An update of IPCC climate reference regions for subcontinental analysis of climate model data: definition and aggregated datasets. Copernicus GmbH. 2020. https://doi.org/10.5194/essd-2019-258
  40. 40. Diez-Sierra J, Iturbide M, Fernández J, Gutiérrez JM, Milovac J, Cofiño AS. Consistency of the regional response to global warming levels from CMIP5 and CORDEX projections. Clim Dyn. 2023;61(7–8):4047–60.