The KMD4EOSC project presented at SC25 in St. Louis

The KMD4EOSC project presented at SC25 in St. Louis

The KMD4EOSC project is the fourth project in the National Data Warehouse series, aimed at building and providing access to data infrastructure and services and applications integrated with the infrastructure for storing, sharing and processing data, including in particular – in the current edition of the project – for calculations in AI models, powering AI models, supporting learning and inference processes.

The infrastructure and services are being developed and implemented jointly by PCSS (coordination) and five KDM centers (TASK, NCBJ WCSS, Cyfronet, ICM) and five MAN centers (PCz/CzestMAN, BIAMAN, LodMAN, PŚk/KielMAN, UZ/ZielMAN). The fourth edition of the project aims to support the scientific community in Poland in collecting, securing, and sharing research data, ensuring the openness of scientific data, including compliance of open data repositories with the EOSC. The project’s main products include repository services for open scientific data, along with tools supporting the implementation of data opening processes in accordance with FAIR principles, as well as services integrating data repositories with the computing infrastructure.

This is one of the key topics presented at SC25, and it is especially relevant to mention in the context of PIONIER, as these activities are being implemented by five HPC centres and five MANs, all of which are members of the PIONIER Consortium.

KMD4EOSC is building a comprehensive solution for storing and accessing data used in artificial intelligence processes. The developed hardware infrastructure will be made available to users in the form of an intuitive platform. Using this platform, researchers will be able to leverage both shared data storage solutions and computing centers to develop their own AI models or utilize tools and models available in the catalog. In the area of artificial intelligence, the KMD4EOSC project places a strong emphasis on generative AI solutions, including large language models (LLMs). The platform will provide, among other things, an environment enabling users to apply large language models for in-depth analysis and advanced knowledge extraction from research datasets—both public and private—which they can then securely and personalizedly feed into available tools. This environment will significantly support the scientific community in conducting and optimizing research and managing resources.

The project is being conducted in accordance with the Polish Artificial Intelligence Development Policy until 2030, developed by the Ministry of Digital Affairs. Thanks to integration with the European Open Science Cloud (EOSC), it also meets the objectives of the Open Access Policy to publicly funded research data, established by the Ministry of Science and Higher Education. Services and applications integrated in the KMD4EOSC infrastructure also address the requirements of the Ministry of Economy’s policy regarding AI in Poland, in particular those regarding high-quality data necessary to build advanced AI models based on scientific datasets generated in Poland.