Minisymposium: The Exabyte Data Challenge

Various data-intense scientific domains must deal with Exabytes of data before they reach the Exaflop. Data management at these extreme scales is challenging and covers not only pre-processing, data production, and data analysis workflows. While there are many research approaches and science databases that aim to manage data and improved their limits over time, practitioners still struggle to manage their data in the Petabyte era. For instance, achieving high performance and providing means to easily localize data upon request. With billions of files, the scalability of the manual and fine-grained data management in HPC environment reaches its limitations. Various domain-specific solutions have been developed that mitigate performance and management issues enabling data management in the Petabyte era. However, due to new storage technologies and heterogeneous environments, the challenges increase and so does the development effort for individual solutions.

In this minisymposium, speakers from environmental science (MetOffice and ECMWF), CERN, and the Square Kilometre Array will address this matter for different domains; each speaker will present the challenges faced in their scientific domain today, give an outlook for the future, and present state-of-the-art approaches the community follows to mitigate the data deluge.

This minisymposium is organized as part of the PASC official schedule.

Date Friday, June 14th, 2019
Venue HG D 1.1
Contact Dr. Julian Kunkel

  • 13:30 SKA - Handling 0.2 EB/sec BandwidthPeter Braam
    Analysis of the SKA radio telescope imaging algorithms resulted in a requirement of 200PB/sec memory bandwidth, which surprisingly may have come within reach due to recent developments. We will give an overview of the data flow through the SKA system, and transition into new technology areas that may be helpful, with emphasis on data properties, precision, and co-processors.
  • 14:00 The CERN Tape Archive : Preparing for the Exabyte Storage EraMichael Davis
  • 14:30 The Met Office Cold Storage Future: Tape or Cloud?Richard Lawrence
  • 14:30 ECMWF's Extreme Data Challenges Towards a Exascale Weather Forecasting SystemTiago Quintino, Simon Smart, James Hawkes, Baudouin Raoult
    CMWF's operational weather forecast generates massive I/O in short bursts, currently approaching 100 TiB per day, in two hour-long windows. From this output, millions of user-defined daily products are generated and disseminated to member states and commercial clients all over the world. As ECMWF aims to achieve Exascale NWP by 2025, we expect to handle around 1 PiB of model data per day and generate 100's of millions daily products. This poses a strong challenge to a complex workflow that is already facing I/O bottlenecks. To help tackle this challenge, ECMWF is developing multiple solutions and changes to its workflows, and incrementally bringing them into operations. For example, it has developed a high-performance distributed object-store that manages the model output, for the needs of our NWP and Climate simulations, making data available via scientific meaningful requests, which integrate seamlessly with the rest of the operational workflow. We will present how ECMWF is leveraging this and other technologies to address current performance issues in our operations, while at the same time preparing for technology changes in the hardware and system landscape and the convergence between HPC and Cloud provisioning.
  • 15:30 End

