A Legacy Data Space for Impetus for Change (I4C)
The core datasets and tools produced in I4C will be consolidated in an open cloud data space (based on EOSC) for reproducibility and reusability.
Traditionally, climate data analysis has been carried out by downloading datasets from data servers and using local workstations or local computing infrastructures to perform data analysis. This approach is inefficient in the era of big data, due to the high cost of transferring raw data over the Internet. Climate data analysis involves processing raw data from a wide range of sources, including observations and models, to provide distilled actionable local climate data for various applications. Some of this processing workflow is standard (subsetting, regridding, bias adjustment, etc.) and can be carried out using existing packages implementing best practices, avoiding repetition and potential errors. Recently, web-based computing frameworks such as Jupyter Notebook and cloud computing have emerged as an alternative computing infrastructure facilitating code reproducibility and reusability. Cloud systems are often built on top of object storage and new data formats and libraries have been created that take advantage of this new type of storage. All these new technologies have converged in the development of data spaces, web-based virtual research environments that speed up data analysis making it more efficient by putting together data and computational resources with ready to use software frameworks.
As part of the dissemination activities of I4C a legacy data space will be produced seamlessly integrating core datasets and user-relevant software packages allowing to reproduce relevant results (e.g. preparing local data using bias adjustment methods). This data space will be integrated into the European Open Science Cloud to ensure the legacy of project results and their use by society beyond the lifetime of the project.