The Macro-Roadmap for the implementation of EOSC is a visual mapping of the results of EU projects developing EOSC and the in-kind contributions of EOSC Association member organisations.
These two inputs will be mapped over time and categorized according to selected high-level objectives and the respective Action Areas of the EOSC Partnership’s Strategic Research & Innovation Agenda. As the data become available, the in-kind contributions of the EOSC-A members (Additional Activities) will be added to the roadmap.
COVID-19 Data Platform is an infrastructure that provides access to open literature and data on both the virus and the disease. It enhances researchers in their efforts towards critical information resources. The European COVID-19 Data Platform consists of three components: the COVID-19 Data Portal, the SARS-Cov-2 Data Hubs, and the Federated EGA (European Genome Phenome Archive).
Data across different data types is integrated and linked to promote FAIR and open sharing and usage for the benefit of the global scientific community. Long term infrastructure is provided for SARS-CoV-2 and COVID-19 data that will remain publicly available and re-usable. The infrastructure built can be repurposed to be used for future pandemics as well.
COVID-19 Portal is a service where users may find related, comprehensive and scientific information through one portal. It harmonises and brings together different repositories and providers for different data types (open/public and sensitive data).
Provision of comprehensive open data (on infectious agents and related diseases during outbreaks) supports evidence-based decision-making across scientific, medical, public health and policy domains and promotes reproducibility of research outcomes.
Providers and repositories can be found and (if possible) accessed, while considering the challenge of dealing with sensitive data. The Portal is forming part of the pandemic preparedness toolkit to address future pathogen outbreaks.
COVID-19 Portal is a service where users may find related, comprehensive and scientific information through one portal. It harmonises and brings together different repositories and providers for different data types (open/public and sensitive data).
Provision of comprehensive open data (on infectious agents and related diseases during outbreaks) supports evidence-based decision-making across scientific, medical, public health and policy domains and promotes reproducibility of research outcomes.
Providers and repositories can be found and (if possible) accessed, while considering the challenge of dealing with sensitive data. The Portal is forming part of the pandemic preparedness toolkit to address future pathogen outbreaks.
BY-COVID demonstrates how EOSC and EOSC technologies, workflows, principles, standards etc. can be shaped and of value to respond to transnational access users and the needs of large user groups.
BY-COVID addresses three overarching topics:
The consolidation of rapid responses and technical developments from the first two years of the pandemic (funded through range of H2020 projects) into sustainable services for pandemic preparedness
Enable interdisciplinary data mapping – with technologies that extend beyond pandemic preparedness. The linking of the CESSDA catalogues to the life science catalogues is the project flagship to-date
Developing a unified framework for pandemic preparedness by collecting data, software and tools from disparate sources and bringing them together.
Workshop Report ‘What Does EOSC Bring to Research Infrastructure (RI) Users?’
This report summarizes the key findings of the workshop: “What does EOSC bring to the RI users?“. It provides key recommendations for the Research Infrastructures (RIs) from the user perspective.
The report aims to visualise how RIs and end users will fully benefit from the EOSC added value and further progress towards a fully operational EOSC for thematic communities and researchers. It also addresses the role of ESFRI and other thematic Research Infrastructures (RIs) as crucial for the development of EOSC.
Building a distributed network in the whole of Europe (12 distributed computing nodes in total), so that a computational job ( including VREs, like Jupyter notebooks) can be executed on national resource providers.
Development of additional public (national) nodes that can be used by a large number of users.
Connecting countries that do not have the resources to run their own Galaxy server, so that their computational jobs (including VREs like Jupyter notebooks) can be executed on national resource providers.
Bring your own compute (BYOC). Develop a mechanism to plug-in own/institutional computing resources into the Galaxy European Science Gateway.
Bring your own storage (BYOS). Develop a mechanism to plug-in own/institutional storage into the Galaxy European Science Gateway.
Development of a scheduler that takes into account proximity to the data source, energy efficiency and environmental overhead when scheduling the job distribution over the European servers.
Zotero and Zenodo Libraries Including Training Material on Open Science and FAIR Principles from All over Europe
Zotero and Zenodo libraries to share the compilation of references and training materials on Open Science and Open Data collected from different Competence Centres, User support Networks, Professional networks for lifelong learning, etc.
Create awareness of the current Open Science and Open Data training resources available in Europe also to encourage their reuse.
A curated portfolio of workflows has been maintained and extended, with releases made widely available through WorkflowHub.The implementation of RO-Crate support for workflow runs further enhances reusability of workflows and transparency of data derivation steps. These developments are also available on the Galaxy Europe server.
Data processing tasks can be slow and produce results of poor quality due to lack of interoperability between individual applications and the loss of data provenance. Workflows and processing pipelines speed up data processing, enable transparent quality and provenance management, and ensure that methods and data are openly shared and reproducible.
The Pathogens Portal, launched in July 2023, is an invaluable resource for researchers, clinicians, and policymakers who need access to the latest and most comprehensive datasets on pathogens.
The COVID-19 pandemic has highlighted the critical importance of pathogen data sharing and the need for focused access to relevant datasets. The Pathogens Portal addresses this need by providing a centralised platform for sharing and accessing data on key pathogens, including those that can cause diseases in humans and animals, and on vectors and hosts. By bringing together data from multiple sources, and linking these data where possible, the portal enables researchers to better understand the biology and life cycles of pathogens.
In addition to facilitating data sharing and research, the Pathogens Portal plays a vital role in enabling preparedness for future pandemics. By providing access to open data on emerging and endemic pathogens, the Portal helps researchers, public health officials, policymakers, clinicians and other stakeholders to access a comprehensive range of information with which to arm themselves in tackling an array of questions.
The Pathogen Data Hubs form a component of the Pathogens Platform and are targetted at public health agencies and other scientific groups responsible for generating pathogen sequencing data. Pathogen Data Hubs are in place offering a variety of configurations for upload tools and specific analysis workflows and visualisations.
Implementing metrics for automated FAIR digital objects assessment in a disciplinary context for F-UJI tool
FAIR-IMPACT will extend and adapt the FAIRsFAIR data object assessment metrics and F-UJI tool to be more disciplinary-context aware and to include more discipline-specific tests (social domain). F-UJI and the FAIR-Aware tool will be further developed to improve the user interface, efficiency and to identify and mitigate bias. Discipline-aware metrics and tests will be developed with use case partners, domain data repositories, research infrastructures and e-infrastructures. A reference collection of test datasets is provided for verification and benchmarking of FAIR assessment tools’ results. Pilots will test FAIR assessment tools including additional disciplinary-extended tests.
Disciplinary extensions into FAIR assessment tools and enabling technical and organisational integration of assessment tools within EOSC infrastructure.
Metrics for automated FAIR software assessment in a disciplinary context
FAIR-IMPACT will build on the outputs of the RDA/ReSA/FORCE11 FAIR for Research Software WG and existing guidelines & metrics for research software to adapt and enhance the FAIR principles for research software.
Transform and project the FAIR principles and metrics into practical tests, relevant for research software, to be implemented in assessment tools for infrastructures in the scholarly ecosystem and in disciplinary contexts
RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. Provenance standards have been matured and extended in collaboration with other initiatives. These include workflow related
RO-Crate profiles, a Common Provenance Model and stronger support for RO-Crate has been added to the Galaxy data analysis platform.
Data are only useful when their context is understood, this context is provided by metadata. However, metadata are not always captured and when they are they are not in a format that allows data analysis and reuse. RO-Crate is a user-friendly way to package data with the metadata, ensuring the data remains meaningful.
The Blue-Cloud Metadata Broker is one of the technical service components of the Blue-Cloud Data Discovery & Access Service (DD&AS). It is based upon the DAB service as developed and managed by CNR-IIA. This middleware harvests metadata at collection level from each of the Blue Data Infrastructures (BDIs), using their web services. The Discovery and Access Broker (DAB) service transforms the harvested XML files from each of the BDIs into a common ISO Blue-Cloud collection profile, which is encoded using the recent ISO 19115-3:2016 XML profile, providing a common denominator of metadata fields, which is then published by the DAB service by means of a Blue-Cloud OGC CSW - ISO v. 2.0.2 service. The provided common denominator allows users to discover data sets in each BDI in a common way at collection level. This service is fully dynamic and currently runs every week to provide the latest index of the data collections.
More technical background information can be found in the Blue-Cloud architecture document.
Different stakeholder groups in marine science use various metadata schemes and ontologies. The Metadata Broker service supports configuring and providing a common denominator for the metadata schema at data collection level, while in Blue-Cloud 2026 the Data Discovery & Access Service (DD&AS) will be expanded with also a Semantic Broker service, using ontologies, in order to provide common vocabulary terms in the metadata output.Also, it will be explored how the common denominator of metadata fields and their semantic support could be expanded in cooperation with the BDIs.
A cross-disciplinary resource that maps and interlinks databases, standards and policies, FAIRsharing enables efficient onboarding of new data sources and a means to ensure these are more discoverable in the European Open Science Cloud (EOSC) ecosystem.The collection contains over 20 data sources, developed by BY-COVID members, from social science and humanities, health and clinical data, images, genomic and phenotypic data and chemical biology.
Making a range of infectious disease data sources widely discoverable, accessible and interoperable is important for research and innovation, which is increasingly multidisciplinary in nature. For example, pathogen research is accelerated by the availability of data from clinical trials, biobanks, behavioural and socioeconomic studies, particularly if the data is combined with host and pathogen omics information. Many of these data types, for example clinical records or bioactivity data, may contain high resolution images, the availability of which extends the potential research questions which can be explored.
Multidisciplinary data is also critical for public health decision-making, where policy questions are complex and evidence from biomolecular research, clinical studies and social sciences must be taken into account. One lesson from the COVID-19 pandemic was that data-driven decision-making needs high quality, real-time data from many research disciplines and geographic areas in an integrated format. The BY-COVID project is building on these learnings and creating solutions for COVID-19 that can be extended to other pathogens. Resources like the COVID-19 Data Portal and FAIRsharing are pivotal to meet these goals.
Guidelines for recommended metadata standard for research software within EOSC
FAIR-IMPACT developed guidelines for the collection and curation of metadata to archive, reference, describe and cite research software by surveying the ecosystem of scholarly infrastructures and review existing guidelines in this area. In this, FAIR-IMPACT will follow recommendations from the EOSC SIRS report and actions from FAIR4RS and evaluate how standard metadata impacts software reproducibility.
To evaluate how standard metadata impacts software reproducibility
Research Analysis Identifier (RAI) of experiments will become a new Persistent Identifier (PID) that can be used in addition to existing PIDs. This identifier combines the result with the dataset and algorithm employed by a researcher to process the dataset.
To identify, reproduce and acknowledge research analysis performer and data owner, especially in the context of controlled data.
The data-integration platform cBioPortal is a well-established solution empowering (non-bioinformatic) researchers to query, visualize and analyze (gen)omics data – in combination with clinical and sample characteristics – in an intuitive and user-friendly manner. At the patient level, longitudinal views are available, offering additional insight into phenotypic and genotypic data over time, e.g. before, during and after cancer treatment.
In EOSC4Cancer, colorectal cancer study data are made available through cBioPortal as part of different use-cases of the project. This data includes the underlying data models and codebooks used in the original studies to facilitate harmonization efforts and development of protocols, recommendations and best practices on how to make cancer research data available compatible with this platform.
Standardization, harmonization, visualization and analysis of sensitive data (cancer research data). This work will contribute towards the creation of an interoperable cancer data research ecosystem where data from controlled access data sources, e.g. EGA, can connect with cBioPortal using at least one of the virtual research environments chosen by the project (Galaxy). It will also contribute towards increasing the discoverability of data thanks to the use of individual beacon instances and the overall beacon network.
White Paper 2022 on Cooperation of e-Infrastructures
This is a report that provides recommendations to strengthen communication between e-Infrastructure related organisations at European level, and their cooperation towards an enhanced and coordinated strategy setting. The White Paper 2022 discusses user environments and the need for an ecosystem of portals, ultimately accessible via a smart (AI-based) personalized dashboard.
The current cooperation and coordination is ad-hoc and not properly framed among European e-Infrastructure initiatives and related organisations under the EOSC umbrella or other efforts.
e-IRG sees a clear need to enhance the coordination and cooperation among major European e-Infrastructure-related initiatives, which are an indispensable part of EOSC.
The recommendations will eventually benefit the end-users so that they can focus on their (cross-)disciplinary research, and not the infrastructures and tools.
Virtual Research Environment (VRE) for marine research
The VRE is a cloud-based platform where users can seamlessly exploit computing environments, algorithms and data sources to support marine research.
Providing a common environment for all researchers in marine science, the VRE enables the collaborative setup of new algorithms and the sharing, reuse, and reproducibility of marine data.
Earth Analytic Lab is an integrated collaborative web-based platform for an Earth System Model that describes the atmospheric and oceanic circulation and thermodynamics, and the biological and chemical processes, that feedback on to the physics of the climate and grid over the Earth surface and underneath the surface of the oceans. The EAL is based on three main pillars, based on the FAIR-EASE pilots: (1) The Environmental BioGeochemical Asset, (2) Earth and Environmental Dynamics and (3) Biodiversity Observation.
To visualise, analyse and process multi-domain heterogenous environmental data on-demand, i.e. according to their specific objectives, thematic, geographical areas and temporal slots of interest and provide direct access to data and data sources for the VRE, improving data harmonisation and the technical efficiency of data access.
The new joint ESFRI-EOSC Task Force is an official instrument of collaboration. It will coordinate the activities between ESFRI and EOSC at the policy level and provide a structured interface between the two bodies.
The objectives are to align the interests and maximise the impact of Member States/Associated Countries and European Commission efforts on Research Infrastructures (RIs) in EOSC. This includes the uptake of Open Science and FAIR principles by the RIs and the federation of RIs in EOSC and the use of the EOSC horizontal services.
Open Science and Open Commons Initiatives Discovery Tool and Service
A tool to provide information about global Open Science and Open Commons Initiatives. The tool will also include information on the (new) RDA working groups. This corresponds to analysis of the initiatives, inclusion in a public graph database, which is part of the RDA maintenance facility (a connection to other graph databases being investigated). The database is also connected to the Landscape analysis service, which is provided as an optional component of Result 1 (Service Framework to facilitate international collaboration).
To find information about global Open Science and Open Commons Initiatives including the RDA Working Groups and their outputs in an easy way.
Service Framework to Facilitate International Collaboration
A set of services to support international collaboration in the area of research data sharing. Services include: i) tools to make international collaboration efficient (Facilitation of a RDA Working Group creation and operation), ii) tools to bring outputs to international discussions (Facilitation and communication services of a RDA Working Group) and iii) tools to locate global partners (Engagement activity together with landscape analysis). The service framework is designed to help streamline the processes of finding and engaging international partners in a structured way and creating additional sustainability of the results via facilitating community uptake.
Research data challenges are a global issue. For organizations and initiatives, it is difficult to navigate the complexity of the research data sharing ecosystem. Tools to facilitate international collaboration are needed.
Network Starter Kit for professional networks: data steward
The Starter Kit provides support for the creation of new professional and sustainable communities. Professional communities play a key role in the context of Skills4EOSC Competence Centre network as they are seen as a tool for lifelong learning through peers.
The starter kit addresses the challenges related to the emergence of data stewardship as a professional identity within Open Science and Open Data initiatives. It helps in defining responsibilities, roles, and skills for data stewards, facilitates the formation of data steward communities, and provides guidance on organizing and sustaining these communities. It also draws from different community organization models and references social psychology and management theory to support effective team building.
Community-endorsed quality assurance and certification framework for professional training and qualifications
The Quality Assurance and certification Framework (QAF) for Open Science professional training and qualifications. It offers a reference framework (not a prescriptive requirement) to assure the quality of training materials and resources, based in four sections or sub-frameworks important in the context of Skills4EOSC project’s principles and philosophy: Minimum Viable Skillset, FAIR-by-design and ELSI (Ethical, Legal and Social Issues), in conjunction with a comprehensive evaluation framework and learning resources catalogues.
Lack of standardized quality assurance criteria and frameworks has been a significant challenge in addressing the issues related to the creation and curation of high-quality learning/training materials on Open Science, FAIR data, and related topics.
Catalogue of Open Science Career Profiles - Minimum Viable Skillsets
The Catalogue of Minimum Viable Skillset (MVS) Profiles is a resource that outlines essential skills and competences for roles in Open Science with the European Open Science Cloud (EOSC). It helps with skills development and curriculum design by connecting key competences to specific Open Science practices.
Each MVS Profile within the catalogue is developed by drawing upon existing skills resources and competence frameworks, consolidating them into a concise and standardized format. The primary purpose of these profiles is to facilitate skills development, particularly in the context of curriculum and training course design.
The Catalogue of Minimum Viable Skillset (MVS) Profiles is a significant outcome of the project to unify the training landscape, define competencies, clarify professional roles, and reduce fragmentation in training resources, in Open Science and FAIR data.
The methodology is based on the backward instructional process, extended with additional activities focused on the implementation of the FAIR principles. The document includes a discussion on granularity, scope, metadata schema, interoperability and publication in relevant repositories, together with a step-by-step six-stage workflow and checklist. The methodology will be used as blueprint for a train-the-trainer course aimed to present the practical FAIR-by-design instructional design
Existing training material is fragmented and generally not FAIR-compliant, with no information on target group, length/duration, key messages, learning objectives, or documentation to ensure re-usability. By creating FAIR material, trainers (especially those who did not participate in the “train the trainers” sessions) will be able to use it as they see fit, adapting the material as needed by e.g. merging or restructuring. Proper documentation will allow the material to be reused in any field.
The BY-COVID Educational Toolkit provides teachers with resources to create lessons with young people (aged 14-19+) in English, French, Spanish, Czech and Dutch to discuss the ethics of data use in infectious disease research and pandemic preparedness.
To mount a successful public health response to a pandemic, citizen data is required to inform decision making (for example the number of positive diagnostic texts in a given postal area). In many cases citizens are not aware this data is being used, how their privacy is being protected, and the potential risks involved. This educational toolkit is aimed at the adults of tomorrow, to inspire fact-based discussions around data sharing in public health.
The data-integration platform cBioPortal is a well-established solution empowering (non-bioinformatic) researchers to query, visualize and analyze (gen)omics data – in combination with clinical and sample characteristics – in an intuitive and user-friendly manner. At the patient level, longitudinal views are available, offering additional insight into phenotypic and genotypic data over time, e.g. before, during and after cancer treatment.
In EOSC4Cancer, colorectal cancer study data are made available through cBioPortal as part of different use-cases of the project. This data includes the underlying data models and codebooks used in the original studies to facilitate harmonization efforts and development of protocols, recommendations and best practices on how to make cancer research data available compatible with this platform.
Standardization, harmonization, visualization and analysis of sensitive data (cancer research data). This work will contribute towards the creation of an interoperable cancer data research ecosystem where data from controlled access data sources, e.g. EGA, can connect with cBioPortal using at least one of the virtual research environments chosen by the project (Galaxy). It will also contribute towards increasing the discoverability of data thanks to the use of individual beacon instances and the overall beacon network.
The COVID-19 Disease Map enables the understanding of the molecular mechanism of COVID-19. Three main categories of data have been integrated i) omics, ii) drug targets and iii) imaging/3D structure data, implemented as an interactive plugin for end-users. Omics data are integrated as visual data
overlays allowing to calculate diagram enrichment and visual analytics of expression profiles. Drug target data are used for identifier mappings to visualise molecules targeted by them. Imaging/3D structure data are used for side-by-side visualisations, displaying the imaging content and at the same time indicating relevant molecules in corresponding pathways. The omics category is integrated with broader computational workflows allowing the end-users to define their analytical steps before visualising data in the COVID-19 Disease Map.
Understanding the molecular mechanisms of the SARS-CoV-2 virus enables research into treatments, vaccines and diagnostics, as well as improving knowledge of how the virus functions. It is difficult to understand the wealth of this public domain information due to the sheer volume of interrelated data involved, and, especially in a pandemic situation, the rapid production of additional data.. The COVID-19 Disease Map brings together information in a visual format, to enable rapid interrogation.
The BY-COVID Educational Toolkit provides teachers with resources to create lessons with young people (aged 14-19+) in English, French, Spanish, Czech and Dutch to discuss the ethics of data use in infectious disease research and pandemic preparedness.
To mount a successful public health response to a pandemic, citizen data is required to inform decision making (for example the number of positive diagnostic texts in a given postal area). In many cases citizens are not aware this data is being used, how their privacy is being protected, and the potential risks involved. This educational toolkit is aimed at the adults of tomorrow, to inspire fact-based discussions around data sharing in public health.
A cross-disciplinary resource that maps and interlinks databases, standards and policies, FAIRsharing enables efficient onboarding of new data sources and a means to ensure these are more discoverable in the European Open Science Cloud (EOSC) ecosystem.The collection contains over 20 data sources, developed by BY-COVID members, from social science and humanities, health and clinical data, images, genomic and phenotypic data and chemical biology.
Making a range of infectious disease data sources widely discoverable, accessible and interoperable is important for research and innovation, which is increasingly multidisciplinary in nature. For example, pathogen research is accelerated by the availability of data from clinical trials, biobanks, behavioural and socioeconomic studies, particularly if the data is combined with host and pathogen omics information. Many of these data types, for example clinical records or bioactivity data, may contain high resolution images, the availability of which extends the potential research questions which can be explored.
Multidisciplinary data is also critical for public health decision-making, where policy questions are complex and evidence from biomolecular research, clinical studies and social sciences must be taken into account. One lesson from the COVID-19 pandemic was that data-driven decision-making needs high quality, real-time data from many research disciplines and geographic areas in an integrated format. The BY-COVID project is building on these learnings and creating solutions for COVID-19 that can be extended to other pathogens. Resources like the COVID-19 Data Portal and FAIRsharing are pivotal to meet these goals.
List of existing and emerging standards corresponding to a set of functional requirements to support the next level of interoperability. This provides a ‘lingua franca’ consisting of a set of cross-domain standard which can be adopted, or against which current domain practices (metadata standards etc) can be mapped.
The aim is to create a cross-domain interoperability framework provides functionality far beyond simple catalogue level metadata. It provides a set of recommendations for necessary functional requirement to implement the FAIR principles, including assessment of data, controlled access, interoperability and data integration / combination, and reuse. It does so through a set of existing or emerging cross-domain standards and will provide a strong basis to support the interoperability and reuse of data for interdisciplinary grand challenge research areas.
FAIR-IMPACT will provide a complete legal framework facilitating technical interoperability.
Organisational (policy, procedures) and legal interoperability have been identified as persistent issues in the EOSC IF; solutions and recommendations are needed in relation to each, applicable in various contexts (national, European and international: UNESCO, OECD, WIPO).
OJS plugin for the EOSC Interoperability Framework on Research Product Publishing
The plugin will implement the alignment with the EOSC Interoperability Framework on Research Product Publishing and make a step towards harmonising institutional publishing platforms by solutions of known problems. It will improve the interoperability among institutional publishing platforms and the journals that use them.
Increasing interoperability of Open Access Diamond publishing platforms. Open Access Diamond publishers are usually small and designed for niche research communities, so they are not made interoperable in the general case. CRAFT-OA will contribute to ensuring that publishing platforms incorporate all necessary Open Science aspects (e.g. PID, FAIR metrics & certification, AAI, etc.).
MoU and Service Level Agreement templates for data interoperability
Building on work from FAIRsFAIR, RDA and elsewhere, FAIR-IMPACT will work with the integrated use case partners and other stakeholders to provide key documents and recommendations for legal interoperability and organisational interoperability frameworks.
Organisational (policy, procedures) and legal interoperability have been identified as persistent issues in the EOSC IF; solutions and recommendations are needed in relation to each, applicable in various contexts (national, European and international: UNESCO, OECD, WIPO).
Adopting recommendations of RDA FAIR Data Maturity, piloting for ML community. Extension of FAIR EVA to support ML model provenance.
Improving FAIR aspects of AI and ML assets by adopting recommendations of RDA FAIR Data Maturity, piloting for ML community. Extension of FAIR EVA to support ML model provenance.
This report describes requirements for aggregators and indexes of metadata standards and protocols that could be integrated into journal publishing services. It considers operational recommendations for journals without a dedicated publishing software.
Metadata and ontologies are required by Open Access publishing platforms to enable discovering relevant articles.
Provenance Tracking System for Artificial Intelligence (AI)
Provenance tracking system using semantic standards and domain specific ontologies for specific use cases with a particular focus on AI.
Provenance is a key point to understanding how data is derived. This also includes normal data and research, but if you translate the need for provenance into AI research, it is a very challenging aspect, also on an ethical level. With provenance we are addressing a key aspect in the development of AI in EOSC and in AI research.
Specification of shared metadata description of semantic artefacts and their catalogues including common reference API
Standardisation mechanisms to describe and serve semantic artefacts within EOSC. Via use cases, the team will demonstrate the impact of semantic artefact management and sharing for data FAIRification and data-driven science. Liaison with multiple domain SA-catalogues, and technological component developers will demonstrate cross-domain benefits, brought to new communities.
The report will standardise mechanisms to describe and serve semantic artefacts within EOSC. Via use cases, we will demonstrate the impact of semantic artefact management and sharing, for data FAIRification and data-driven science.
Report on Semantic artefact governance models and disciplinary approaches for inclusion within EOSC
The report will provide governance models and disciplinary approaches by reviewing and analysing semantic artefact community practices and governance models to establish their role for EOSC. FAIR-IMPACT will survey integrated use cases and other relevant domain and national level initiatives of existing communities that show very different levels of governance.
Existing communities show very different levels and approaches of governance.
Recommendations on how to model data for cancer researchers, which ontologies to use and a reporting of best practices
This report will outline recommendations on how to provide better metadata description for cancer research datasets, how to better use ontologies for the content of datasets and how to facilitate the reuse of data.
Guidelines for creating a user tailored EOSC compliant PID policy
Research communities and infrastructures will be supported in defining PID policies that are in line with the EOSC PID policy as well as the needs of the designated communities.
This involves identifying and analysing different EOSC actors and mapping PID policies available, e.g., policies for repositories and RIs on the ESFRI roadmap. This work will produce a blueprint supporting communities in defining and writing machine actionable PID policies aligned with the EOSC PID policy including the PID assessment toolkit.
Improvement of the Open Access Publishing Software
Improve existing Open Access publishing software for the three most widely used platforms: Open Journal Systems (OJS), Janeway, and Lodel.
Open Access publishing platforms at research institutions are usually small and often focus on niche research disciplines having some areas better covered than others. Developing a community-oriented approach will lead to better results to respond to community needs and to find solutions to recognized issues.
The roadmap will outline operating models, responsibilities, access mechanisms for a sustainable cancer dataspace.
Creation of a sustainable cancer data space. Alignment of architectures and services created in the project with overall European cancer research landscape: EU Cancer mission, EOSC, European Health Data Space, other relevant HE cancer projects.
Interdisciplinary Data Discovery and Access Service (IDDAS) for Earth and Environmental Research
FAIR-EASE IDDAS provides a harmonised interface to data discovery, access and download services of selected data infrastructures in EU. The IDDAS is a central search and access interface to a distributed system of systems, that uses an inventory from FE pilots as basis to list the involved systems like EMODnet, SeaDataNet, NOAA/NCEI, ESA, NASA, and others. It will provide an overview of available datasets via its metadata and facilitates the access. Discovery will be supported by the GEODAB brokering software as well as I-ADOPT smart mapping solutions.
Many datasets in earth and environmental sciences cannot be accessed in a common interface, using common search terms and central data handling, because of the lack of metadata harmonisation and aligned discovery and access services. IDDAS provides a common interface for location of and access to all data sources through a central query dialogue, and a “shopping” mechanism that allows users to compose and submit mixed request baskets for data sets from multiple BDIs.
The AI4OS software stack allows providers to build customized platforms (similar to the AI4EOSC platform) that can be tailor-made for the specific needs of a community.
The AI4OS software stack allows providers to build customized platforms (similar to the AI4EOSC platform) that can be tailor-made for the specific needs of a community. One-size-fits-all and generic platforms cannot be enough for some research communities that may need a dedicated and custom platform.
The AI4EOSC platform (powered by the AI4OS software stack) provides EU researchers and data scientists with a comprehensive set of tools to share, develop and deploy AI models following the open science and FAIR principles.
The AI4EOSC platform (powered by the AI4OS software stack) provides EU researchers and data scientists with a comprehensive set of tools to share, develop and deploy AI models following the open science and FAIR principles. EU researchers need state-of-the-art user environments in the EOSC to develop AI models.
The Landscape Analysis provides an overview of the European Research Infrastructure (RI) ecosystem by identifying the main RIs operating transnational access in Europe, in all fields of research, and major new or ongoing projects, as well as an outlook to the global landscape of relevance. In order to underline the strategic relevance of the Landscape Analysis, ESFRI has decided to separate the Landscape Analysis from the ESFRI Roadmap.
According to the new ERA Action 8, ESFRI will publish a more strategic Landscape Analysis report mid-2024, which must provide the framework for the next ESFRI Roadmap, contribute to the EOSC Strategic Research and Innovation Agenda, as well as to promote the development of new research infrastructure services. It includes, when needed, infrastructure clustering for pan-European thematic or interdisciplinary services.
Case study led by leading global institutions in the respective fields: Cultural Heritage
This case study provides recommendations related to Cultural Heritage. Through this case study, the DRI will produce a mapping report of existing policies and practices that support image sharing across diverse collecting institutions, develop a set of broadly applicable recommendations for shifting these practices into closer alignment with FAIR, and implement the recommendations at the Repository. Establishing FAIR practices in the GLAM sector would have a very significant effect on the sharing of cultural heritage data, and on the research data management practices across the arts, humanities and social sciences disciplines.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Disaster Risk Reduction
This case study provides recommendations related to Disaster Risk Reduction. The application of the FAIR principles for EO data, including domain-specific FAIR vocabularies for disaster, climate change and global health for the Pacific and Africa, will facilitate the easier, and lower cost, reuse of data and the extraction of key information.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Ocean Science & Sustainable Development
This case study provides recommendations related to Ocean Science & Sustainable Development. The case study will examine how the ODIS Interoperability Architecture (ODIS-Arch) being piloted with regional partners can be coordinated with other case studies and central guidelines of CODATA and RDA to support digital policy alignment. The key objective will be to ensure policies support regional and local specificity, but allow the concrete implementation of global FAIRness around key (meta)data types. Through these actions, this case study aims to sustainably interface the ODIS digital ecosystem with many others.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Agricultural Biodiversity
This case study provides recommendations related to Agricultural Biodiversity. A survey of existing initiatives handling plant-pollinator interaction data will be conducted and an overview of the current status of best practices for plant-pollinator data management will be provided and discussed within the community for improvement. FAIR data assessment rubrics will be adapted for the plant-pollinator domain, to be accompanied by guidelines for their use.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Biodiversity
This case study provides recommendations related to Biodiversity, and is led by GBIF. In this project, the team consults community members on the development of a new FAIR data model that encompasses long-term biodiversity monitoring data from newly developing biodiversity monitoring projects around the world and makes it easier to integrate, share and reuse. The aim of the consultation is to identify improvements to data models and processes that could then in turn lead to improvements in the Darwin Core (DwC) standard and its implementation of FAIR principles.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Urban Health
This case study provides recommendations related to Urban Health. It is led by the SALURBAL project; SALURBAL has systematized a process for city definition and operationalization that integrates multiple ways in which a city can be delimited, has created a data structure that allowed the incorporation of data from different sources, making it shareable across several cores and disciplines, and has developed procedures and standards that systematically documented issues related to data access, quality, and completeness during the process of data harmonization. The WorldFAIR case study explores and further refines this approach to provide recommendations for urban health data that reflect the FAIR and CARE principles and contribute to promote best practices in data sharing and use within and beyond the Urban Health field.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Population Health
This case study provides recommendations related to Population Health. It will improve the interoperation of OMOP with other standards to enable machine-actionable descriptions of data structure and provenance (e.g., DDI-CDI, PROV-O, SDTL); the composition of measurements focused on the objects of research (e.g., I-ADOPT); record linkage modeling for creating and evaluating bridges that connect domains, vocabularies (e.g., SKOS); and data discovery (e.g., Schema.org, DCAT). This suite of standards forms the basis of an ‘AI-Ready’ description of data suitable for use across domain and institutional boundaries.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Social Surveys
This case study provides recommendations related to Social Surveys. This case study undertakes a comparative study of the data management, harmonization and integration practices of one of the satellite countries – Australia, through the AUSSI-ESS – and the core ESS, an ERIC social science infrastructure. It leverages the DDI metadata standards to understand how such multi-national collections could be made increasingly interoperable and reusable through shared procedural and technical development, and establishes a set of guidelines and tools for the development of cross-national collections into the future.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Geochemistry
This case study provides recommendations related to Geochemistry. Through OneGeochemistry, an informal international network of national geochemical data infrastructure organisations, the geochemistry community works to define the minimum common variables for a set of geochemical data types and build them into FAIR Implementation Profiles, that can also be used by laboratories/ repositories/publishers for QA/QC validation of data.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Nanomaterials
This case study provides recommendations related to Nanomaterials. Τhe case study will foster development and piloting of interoperability standards and guidelines for increasing FAIRness in the interlinked scientific disciplines (chemical toxicity, nanomaterials toxicity and characterisation, risk assessment, advanced materials, environmental science), and across the different domains. The results will include complete human- and machine-readable nanomaterials data provenance trails that can be implemented in a straightforward way using the distributed FAIRification approach.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Case study led by leading global institutions in the respective fields: Chemistry
This case study provides recommendations related to Chemistry. The case study works on:
- Development of guidelines, tools and validation services that enable scientists to share and store data in a FAIR manner; addressing gaps in standards that currently restrain chemistry in both academic and industrial areas; engaging critical stakeholders in the adoption of standards and best practices to significantly increase the amount of chemical data available for all scientific disciplines.
The recommendations contained in the CDIF will consider standards in particular domains being used in EOSC, as well as in other regions of the world.
Training material on AI-related aspects like technical aspects, DEEP EOSC platform, the AI marketplace, the use cases in the AI4EOSC project etc.
Disseminating knowledge on different aspects of AI usage. Training material on AI-related aspects like technical aspects, DEEP EOSC platform, the AI marketplace, the use cases in the AI4EOSC project etc.
EOSC4Cancer will develop training material on how to use the different tools and elements developed by EOSC4Cancer to support cancer data research. The project will establish a training portfolio based on a review of skill needs. They will scale training activities through a European support network.
The roadmap will outline operating models, responsibilities, access mechanisms for a sustainable cancer dataspace.
Creation of a sustainable cancer data space. Alignment of architectures and services created in the project with overall European cancer research landscape: EU Cancer mission, EOSC, European Health Data Space, other relevant HE cancer projects.
EOSC4Cancer will develop training material on how to use the different tools and elements developed by EOSC4Cancer to support cancer data research. The project will establish a training portfolio based on a review of skill needs. They will scale training activities through a European support network.
Recommendations on how to model data for cancer researchers, which ontologies to use and a reporting of best practices
This report will outline recommendations on how to provide better metadata description for cancer research datasets, how to better use ontologies for the content of datasets and how to facilitate the reuse of data.
The Infectious Disease Toolkit sources and shares best practices for rapidly managing data cross-domains. It captures the pan-European knowledge from the infectious disease community brought together in BY-COVID, ISIDORe and EOSC4Cancer projects for future preparedness.
Cross-domain recommendations and feedback for the EOSC Interoperability Framework
FAIR-IMPACT will build on existing international frameworks and standards in scientific disciplines and on non-scientific large data sources of interest for research and will contribute to fostering the alignment between these frameworks (e.g. data sharing practices being proposed in the context of European data spaces - including open and private data-, industrial data management frameworks such as DAMA). It will strengthen collaboration with EOSC Association Task Forces supporting FAIR interoperability uptake via cross community best practice recommendations, thus feeding into the next SRIA.
Fostering alignment between frameworks and strengthening collaboration with EOSC Association Task Forces supporting FAIR interoperability uptake via cross community best practice recommendations.
Guidelines for the usage of components for technical and semantic interoperability in cross-domain use cases
FAIR-IMPACT will analyse and demonstrate semantic and technical interoperability in use cases across multiple domains. It will generate guidelines about usage of the components identified in the EOSC Interoperability Framework (EOSC IF) and that are, or will be, implemented and deployed in EOSC-related projects. This analysis will result in recommendations usable by these initiatives, and guidelines for user communities on cross-domain interoperability problems.
Given the importance of research data description in the EOSC IF, FAIR-IMPACT will make a detailed analysis of how several initiatives on research data description (e.g. FAIR Digital Objects, DCAT 2.0, DDI-CDI), identified in the EOSC IF, can be applied in cross-domain use cases in order to facilitate interoperability. This analysis will result in recommendations usable by these. initiatives, and guidelines for user communities on cross-domain interoperability problems.
Interoperability of Neuroscience, Cancer, Transportation and Energy Scientific Knowledge Graphs (SKGs) with the OpenAIRE Graph
SciLake will identify databases, ontologies, and taxonomies relevant to the pilots' scientific domains (neuroscience, cancer, transportation and energy) and will use these to create or extend domain-specific Scientific Knowledge Graphs (SKGs) that are interoperable with the OpenAIRE Graph. To this end, the project will identify associations between domain-specific ontologies and the main ontologies in the scholarly domains already well-represented in EOSC.
Improvement in the accessibility of domain-specific knowledge.
Improvement in the interoperability of domain-specific Scientific Knowledge Graphs (SKGs) with the OpenAIRE Graph.
Assistance in the creation of customisable added-value services for various research communities.
SciLake is building a data storage and analytics service to accommodate, manage, and expose heterogeneous scholarly content through domain-agnostic or domain-specific “Scientific Knowledge Graphs” together with relevant unstructured data (e.g., scientific texts). This is expected to facilitate the creation of customisable added-value services for various research communities.
SciLake is building upon the OpenAIRE Graph and existing domain-specific knowledge graphs (e.g. the EBRAINS Knowledge Graph).
“Scientific Lake as a Service” provides researchers with tools to create their own knowledge graphs and to integrate them with domain-agnostic science graphs to facilitate knowledge discovery and access. Because structured data is easier to retrieve, SciLake also includes tools for domain-specific experts to create structured knowledge graphs from unstructured information. As an example, the service will include text mining components that will allow users to extract information from texts and incorporate it into Scientific Knowledge Graphs. As a result, even researchers without data mining expertise will be able to access the data. The tools will also attempt to handle content in other languages besides English.
Final recommendations on implementing and exposing FAIR assessments for data and code
FAIR-IMPACT supports development of guidelines and mechanisms improving connections between repository registries and discovery portals (including EOSC Portal), repository trustworthiness mechanisms (e.g. CoreTrustSeal) and FAIR digital object assessments.
To enable seamless discovery of repositories (including their trustworthiness) and datasets and code (including their FAIRness).
Report on FAIR semantic artefact lifecycle from engineering, to sharing and FAIR assessment
Establishing guidelines and community practices for FAIR semantic artefacts from creation, to sharing and reuse, via repositories and catalogues i.e. ontology repositories, registries, vocabulary/terminology services and metadata
schemas catalogues.
Need for guidelines and community practices for FAIR semantic artefacts and standardisation mechanisms across communities and stages of semantic artefact management.
Use case driven validation of semantic artefact exploitation within data repositories
FAIR-IMPACT will will demonstrate the impact of FAIR semantic artefacts for data repositories and metadata catalogues. It will validate methodologies and guidelines with data driven scenarios from the use cases using semantic artefacts from their data repositories.
Validation of methodologies and guidelines with data driven scenarios from the use cases using semantic artefacts from their data repositories
Guidelines and methodology to create, document and share mappings and crosswalks
FAIR-IMPACT will develop a framework establishing bridges between metadata description of various research objects based on metadata schema crosswalks and ontology/vocabulary mappings. Such a framework will enable identification of a common “minimal metadata” representation, identified as a core requirement for EOSC (SRIA, EOSC IF); it will be based on existing community resources, models (e.g. SSSOM, CodeMeta, EDOAL, RO-Crate) and recommendations (SEMAF, FAIRsFAIR).
Need for identification of a common “minimal metadata” representation, identified as a core requirement for EOSC and the establishment of a common conceptual model for mappings, crosswalks and guidelines to share them in a machine actionable format, also supporting their governance.
The EOSC PID Meta Resolver (PIDMR) is a generalized resolver to map items into records and integrate them in a global network (e.g. Global Handle). It will also develop (1) a new user interface that supports easy integration and search, and (2) the metadata of different PID systems (i.e. the “Kernel Information Profiles”) to query metadata information of a PID without having to resolve it.
The increasing use of Persistent Identifiers (PIDs) to reference all types of research results is a major step forward in meeting future requirements for the FAIRness of (research) data. However, this brings new challenges to process the PIDs, and to integrate them in research activities. The biggest difficulty is the multitude of systems used to create and maintain PIDs. The challenge is to know which system is responsible for the resolution process and which provides the PID metadata. The PIDMR will tackle this problem to simplify PID resolution through a uniform interface.
Bringing RO-Crate to Technology Readiness Level (TRL) 9
RO-Crate is a generic packaging format for datasets and their metadata description that uses standards for FAIR Linked Data (JSON-LD). EuroScienceGateway is completing its development by connecting all elements, tying up the loose ends and making it reach TRL 9.
The integration of RO-Crate into Galaxy enables institutions that use Invenio RDM or address Zenodo instances to export immediately; thus, complete analysis histories, provenance information and identifiers get published following standards. Commonly, data will be stored locally in long-term archives. Galaxy will keep track of where the data is stored as well as the metadata. Both data and metadata can be accessed through Galaxy when needed, enabling the re-import of all assets necessary for validation or re-use.
EOSC Metadata Schema and Crosswalk Registry (MSCR)
The EOSC Metadata Schema and Crosswalk Registry (MSCR) provides the steps to translate data from different repositories with non-compatible metadata schemas to a single metadata schema to enable data analysis by allowing to deposit or describe metadata schemas in different repositories and defining the crosswalks between them.
The EOSC Metadata Schema and Crosswalk Registry (MSCR) allows users and communities to create, register and version schemas and crosswalks using Persistent Identifiers (PIDs). The content published can be searched, browsed and downloaded without restrictions. It also provides an API to facilitate the transformation of data from one schema to another via registered crosswalks.
The EOSC MSCR will thus enable projects and individual researchers to manage their metadata schema and/or relevant metadata schema crosswalks. Schema and crosswalks are shared with the community for reuse and extension, supported by a proper versioning mechanism. The MSCR will be integrated with all relevant EOSC-Core services: AAI, monitoring and helpdesk.
By integrating use cases and other stakeholders to support research communities FAIR-IMPACT establishes best practices around versioning, granularity, type registries, and PID minting practices for different types of datasets, workflows and research objects, as well as with other solutions that support reproducible research, actionable publications and FAIR. The developments will support coordination and build on existing work (e.g. the PID Graph) promoting efficient models for ingestion into the Graph. Identifying user needs and PID practices in different scientific communities is the top priority of this task.
Document best practices around versioning, granularity, type registries, and Persistent Identifier (PID) minting practices for different types of datasets, workflows and research objects, as well as with other solutions that support reproducible research, actionable publications and FAIR.
Shared Long-term Vision for PID Service Providers on PID Usage in EOSC
Establishment of the coordination mechanism for Persistent Identifier (PID) service providers, to help them align with the EOSC PID policy and the implementation of PID services in EOSC. It will cover PIDs for various entities, including research outputs, instruments, services, people, organisations and software; emerging new PIDs will be identified and analysed. It will establish a collaborative and shared high level vision for PID use within EOSC, thus supporting FAIRCORE4EOSC on managing PIDs in a sustainable way and the organisation of common issues.
Align Persistent Identifier (PID) providers with EOSC PID Policy
The RDGraph is an improvement of the OpenAIRE Graph that integrates new components like RAiD. FAIRCORE4EOSC will improve several existing features (e.g. data dumps, and APIs to support data re-use) and will add some new ones (e.g. “impact-based search”, to provide a way to search for data while taking into account usage statistics, and “natural language search”, to enable the use of human language to search for research data and community recommendation profiles that allows to create sub-profiles for different communities).
The EOSC Research Discovery Graph Service (RDGraph) delivers advanced discovery tools across EOSC resources and communities and builds upon the EOSC Catalogue’s content.
RAiD is a new Persistent Identifier (PID) developed by the Australian Research Data Commons (ARDC) to mint persistent, unique and resolvable information for research projects. RAiD enables users and services to manage information about project-related participants, services, and outcomes. RAiD also collects related identifies (of contributors, organisations, inputs, outputs, etc.), plus descriptive information about the project (title, description, subject, etc.), and stores them in an associated metadata record. It has been recently certified by ISO.
RAiD connects existing persistent identifiers for researchers, institutions, outputs and tools with key project information to create a timeline of research projects.
Enrichment of the DataCite PID graph to systematically integrate the connections between the nodes by harvesting from different/new sources, and incorporate them in AAI and data dump access.
The nodes in the Persistent Identifier (PID) Graph are relatively well known, but the connections between them are not properly inventoried and systematically integrated.
The EOSC Data Type Registry (DTR) allows the registration of Persistent Identifier (PID) metadata elements as data types, including provenance information and human readable description. The data types will be assigned a PID, which allows their use in PID metadata. Based on a validation schema, PID metadata records can then be validated.
PID carry their own metadata to e.g. specify provenance, or to provide human readable descriptions. This metadata needs to be classified in a registry in order to enable machine readability and standardisation of PID schemes.
The Compliance Assessment Toolkit (CAT) will support the EOSC PID policy by providing vocabulary services, API services and user interfaces. The CAT provides user and application interfaces to encode, record, and query compliance with the policy.
There is currently no unambiguous way to assess the compliance of Persistent Identifier (PID) schemes in use. In order for EOSC to