This project deploys and validates a cross-node genomics scientific use case within the EOSC Federation, demonstrating the practical added value of interoperable data and service integration across Nodes. Coordinated by Area Science Park (AREA), the initiative integrates the ORFEO HPC/AI infrastructure with the CERN EOSC Node through Rucio for federated data management, and with the Italian EOSC Node (Fondazione ICSC) for metadata registration and discoverability.
The pilot validates an end-to-end “find–access–process” workflow: open genomics datasets are ingested and enriched within ORFEO, registered in the Italian Node catalog, accessed via policy-aware mechanisms through Rucio, and analysed in EOSC- integrated environments (e.g. REANA, Galaxy, Jupyter) without manual data transfers. The project builds on mature, production- grade infrastructures and open-source technologies, focusing on interoperability orchestration rather than new tool development. The main outcome is a reusable integration pattern (National Catalog + Rucio + Federated Analysis Environment) that can be adopted by other research infrastructures and future EOSC Nodes. By strengthening intra-node and cross-node workflows and aligning metadata standards the project contributes to the EOSC Build-Up portfolio and supports scalable, FAIR-aligned data federation for life science research.
The pilot establishes a reusable integration pattern (National Catalog + Rucio + Federated Analysis Environment) that other research infrastructures can replicate to connect local storage and compute resources to EOSC Nodes. The architecture is domain-agnostic and extends naturally beyond genomics to other data-intensive disciplines, providing a scalable model for federation-ready infrastructures.
Scalability is ensured at multiple levels. Technically, the solution builds on production-grade, open-source components (Rucio and REANA) already adopted and maintained by large international research communities, whose active development and broad user base guarantee long-term robustness and interoperability. Institutionally, ORFEO is embedded within AREA’s strategic roadmap and connected to the Italian EOSC Node through Fondazione ICSC, ensuring alignment with national and European EOSC developments.
The use case also demonstrates strengthened intra-node synergies. The interplay between metadata discovery via the Italian EOSC Node catalogue and policy-aware access mediated by Rucio highlights the importance of harmonised metadata profiling and access standards within a federated ecosystem, while the integration of ORFEO S3 storage into Rucio validates transparent, rule-based data orchestration across heterogeneous backends.
The current pilot focuses on open, non-sensitive datasets. Progressive integration of advanced Authentication and Authorisation Infrastructure (AAI) mechanisms, already foreseen in the strategic agendas of both ORFEO and Rucio, will enable future extension to controlled-access, GDPR-compliant life science data.
