ESiWACE2
The Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE2) is an H2020 funded project and successor of the ESiWACE project.
Within the project, the research group is responsible for various contribution to work packages and particularly WP 4.
Please see the official web page of ESiWACE for further information.
Contact Dr. Julian Kunkel
People
Collaboration
Deutsches Klimarechenzentrum GmbH (coordinator)
Centre National de la Recherche Scientifique
European Centre for Medium-Range Weather Forecasts
Barcelona Supercomputing Center
Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V./ Max-Planck-Institut für Meteorologie
Sveriges meteorologiska och hydrologiska institut
Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique
National University of Ireland Galway (Irish Centre for High End Computing)
Met Office
Fondazione Centro Euro-Mediterraneo sui Cambiamenti Climatici
The University of Reading
Science and Technology Facilities Council
BULL SAS
Seagate Systems UK Limited
ETH Zürich
The University of Manchester
Netherlands eScience Center
Federal Office of Meteorology and Climatology
DataDirect Networks
Mercator Océan
Goals for the University of Reading
While we are involved in various work packages, the main focus lies is WP4 which will provide the necessary toolchain to handle data at pre-exa-scale and exa-scale, for single simulations and ensembles.
Specifically, we will
Support data reduction in ensembles and avoid un-necessary subsequent data manipulations by providing tools to carry out ensemble statistics “in-flight” and compress ensemble members on the way to storage.
Provide tools to: a) transparently hide complexity of multiple-storage tiers from applications at runtime by developing middleware that lies between the familiar NetCDF interface and storage, and prototype commercially credible storage appliances which can appear at the backend of such middleware, and; b) support manual migration of semantically important content between primary storage on disk, tape, and object stores, including appropriate user-space caching tools (thus allowing some portable data management within weather and climate workflows).
Flexible Storage Layout for Earth-System Data
The work will build upon the generated prototypes build in ESiWACE-1.
architecture-d4.2.pdf
ESiWACE2 Architecture Milestone Document
ESDM builds upon a data model similar to NetCDF and utilizes a self-describing on-disk data format for storing structured data. We aim to deliver the NetCDF integrated version by the end of the ESiWACE1 project. This improvement can then be used as a drop-in replacement for typical use-cases without changing anything from the application perspective. While our current version utilises the manual configuration by data-center experts, the ultimate long-term goal is to employ machine learning to automatise the decision making and reduce the burden for users and experts.
Here are some results achieved from the ESiWACE 1 project. We run our ESDM prototype at Mistral with several larger number of processes. The results for running the benchmarks on 200 nodes with varying numbers of processes are shown in Figure 1. The figure shows the results for different processes per node (x-axis) considering ten timesteps of 300 GB data each.
As the baseline for exploring the efficiency, we run the IOR benchmark using optimal settings (i.e., large sequential I/O). The graphic shows two IOR results: storing file-per-process (fpp) on Lustre (ior-fpp), as this yielded better performance than the results for shared file access, and storing fpp on local storage (ior-fpptmp).
Mistral has two file systems (Lustre01 and Lustre02) and five configurations with ESDM were tested: storing data only in Lustre02, settings where data are stored on both Lustre file systems concurrently (both), and environments with in-memory storage (local tmpfs). We also explored if fragmenting data into 100MB files and 500MB files is beneficial (the large configurations). Note that the performance achieved on a single file system is slightly faster to the best-case performance achieved with optimal settings using the benchmarks. We conclude that the fragmentation into chunks accelerates the benchmark.
By utilizing the two file systems resembling a heterogeneous environment effectively, we can improve the performance from 150 GB/s to 200 GB/s (133% of a single file system). While this was just a benchmark testing, it shows that we are able to exploit the available performance and thus.
Publications
Toward Understanding I/O Behavior in HPC Workflows (Jakob Lüttgau, Shane Snyder, Philip Carns, Justin M. Wozniak,
Julian Kunkel, Thomas Ludwig), 2019-02-11
BibTeX DOI
Beating data bottlenecks in weather and climate science (Bryan N. Lawrence,
Julian Kunkel, Jonathan Churchill, Neil Massey, Philip Kershaw, Matt Pritchard), 2019-01-25
BibTeX URL PDF
Cost and Performance Modeling for Earth System Data Management and Beyond (Jakob Lüttgau,
Julian Kunkel), 2019-01-25
BibTeX DOI PDF
-
-
-
Poster:
Modeling and Simulation of Tape Libraries for Hierarchical Storage Systems (Jakob Lüttgau,
Julian Kunkel), 2016-11-15
BibTeX URL PDF
Talks
-
-
-
-
Potential of I/O-Aware Workflows in Climate and Weather (
Dr. Julian Kunkel)
, Supercomputing Frontiers Europe, Virtual/Warshaw Poland, 2020-03-25
Presentation
Challenges and Approaches for Extreme Data Processing (
Dr. Julian Kunkel)
, EPSRC Centre for Doctoral Training Mathematics of Planet Earth, University of Reading, Reading, UK, 2020-03-11
Presentation
Smarter Management using Metadata and Workflow Expertise (
Dr. Julian Kunkel)
, BoF: Knowledge Is Power: Unleashing the Potential of Your Archives Through Metadata, Supercomputing, Denver, USA, 2019-11-21
Presentation
-
-
-
-
-
-
High-Level Workflows – Potential for Innovation? The NGI Initiative and More (
Dr. Julian Kunkel)
, BoF: Data-Centric I/O, ISC HPC, Frankfurt, Germany, 2019-06-19
Presentation
Fighting the Data Deluge with Data-Centric Middleware (
Dr. Julian Kunkel)
, PASC Minisymposium: The Exabyte Data Challenge, Zürich, Switzerland, 2019-06-14
Presentation
Opportunities for Integrating I/O Capabilities with Cylc (
Dr. Julian Kunkel)
, Cylc Weather User Group; Supercomputing 2018, Dallas, USA, 2018-11-15
Presentation
Status of WP4: Exploitability (
Dr. Julian Kunkel, Bryan N. Lawrence, Jakob Luettgau, Neil Massey, Alessandro Danca, Sandro Fiore, Huang Hu)
, ESiWACE review meeting, DKRZ, Hamburg, Germany, 2018-11-06
Presentation
Overcoming Storage Issues of Earth-System Data with Intelligent Storage Systems (
Dr. Julian Kunkel)
, 18th Workshop on high performance computing in meteorology, ECMWF, Reading, UK, 2018-09-27
Presentation
-
-
-
-
-
-
-
-