Minisymposium Presentation
The FDB: Developments Supporting a Semantic Approach to Scientific Data Managament
Presenter
Presenter
Description
Data management plays a vital role in complex scientific workflows. As systems and workflows expand, become more heterogenous, and integrate components across the HPC-cloud ecosystem, the data management challenges become larger.
We introduce the FDB, a specialised object store for meteorological data developed in-house at ECMWF, along with its metadata-driven API and access semantics optimised for time-critical forecasting workflows. The FDB was initially developed to absorb Numerical Weather Prediction model output, managing access to the global parallel filesystem in an HPC environment, but it has grown to provide a larger, more-general, multi-system data ecosystem.
New developments in this ecosystem include a remote protocol, to provide access between distinct HPC and cloud systems, as well as GRIBJump, a library integrated with the FDB to enable users to directly and efficiently extract sub-features from large data objects (from a single data point, to large multi-dimensional subdomains).
We are developing further backends for the FDB, using Fabric Attached Memory (FAM) to facilitate direct in-memory transfer of data between HPC and cloud partitions within the OpenCUBE system. We are also exploring the role of additional flexibility in the metadata language. We discuss how these developments support scientific workflows in HPC and cloud domains.