Summer School on Effective HPC for Climate and Weather
News: April 15th: The registration for the 2021 Summer School on Effective HPC for Climate and Weather is now open!
Aim and Scope
Due to COVID-19, we will organize the summer school as virtual event again.
Making effective use of HPC environments becomes increasingly challenging for PhD students and young researchers. As their primary intent is to generate insight, they often struggle with the technical nature of the tools and environments that enable their computer-aided research: computation, integration, and analysis of relevant data.
The scope of the summer school is the training of young researchers and software engineers in methods, tools, and theoretical knowledge to make effective use of HPC environments and generate insights.
Date | 23-27 August 2021 | ||
Venue | Virtual event | ||
Contact | Julian Kunkel (University of Göttingen, GWDG) | ||
Communication | Mailing List |
While the school aims to prepare the attendees for large scale simulation runs and data processing, it does also cover a representative selection of modern concepts such as machine learning, domain-specific languages, containerisation, and analysis of climate/weather data using Python 1).
We will also provide an outlook of challenges and strategies for HPC for climate and weather. Additionally, we aim to foster networking among scientists bringing together users of specific models and tools and enabling them to exchange their knowledge.
A certificate of attendance will be provided to the attendees.
The summer school will also support the mission of the European Network for Earth System modelling (ENES).
The ESiWACE project funds this summer school.
Summer School Programme
The ESiWACE Summer School is structured along with topical sessions in the morning/afternoon.
A topical session typically consists of an academic lecture and it may contain hands-on/lab practicals, group work, and discussion. Experts in the respective field will organise each of these sessions.
The hands-on/lab practicals are recorded, and they will provide an introduction/walk-through to the topic. We will have dedicated slots to allow independent/self-paced learning, and participants can decide what and when they want to engage. Additional support will be provided by the Mailing List, in which participants may post questions and cooperate with other students and the organisers. The Q&A session on Friday will offer the participants the opportunity to contact the organiser of each hands-on session and ask questions regarding the pre-recorded material.
The hands-on tutorials/lab practicals work as follows:
- A video tutorial is pre-recorded and will provide an introduction/walk-through to the topic.
- The tutorials will be scheduled for each session, however, the usage is up to your decision.
- We provide a tutorial to set up the Virtual Machine. The VM comes with Ubuntu and all the software for the training pre-installed. Attendees may install the VM on their PC to perform most of the training. Some training may be on a dedicated cluster. You should set up the VM before the summer school if you like to participate in the tutorial sessions.
- A lab practical may list additional exercises and suggestions for further learning.
- At the end of the day, a time slot for a Virtual Lab Session is given to allow independent/self-paced learning, and participants can decide what and when they want to engage.
- On Friday, a Q&A slot is scheduled for all lab practicals.
The session will offer the participants the opportunity to contact the organisers of each hands-on session — to ask questions regarding the topic and particularly regarding the tutorial and exercises.
- Additional support will be provided by the Mailing List, in which participants may post questions and cooperate with other students and the organisers.
Detailed information will be announced to registered attendees at the beginning of August.
Topics
The topics covered in the summer school are as follows:
Extreme-Scale Computation
This session will introduce the concept of extreme-scale computing with an explanation of the trends in the computer architectures that provide the underlying computing power. In particular, the increasing use of parallelism and heterogeneity in these architectures will be discussed.
A high-level overview will then be given of the performance, portability and productivity (3P's) requirements that Weather and Climate models have in order to run successfully on these computer architectures. It will be shown how current approaches can struggle to meet all three of these requirements.
Lastly, a relatively new, Domain-Specific Language (DSL), approach to programming Weather and Climate models will be introduced with examples from two existing DSLs - DAWN and PSyclone. It will be shown that the DSL approach offers the possibility of supporting all three of the above requirements, by separating the implementation of the science code from its parallelisation and optimisation on the underlying computer architecture.
Learning Objectives
- Illustrate the complexity and diversity of extreme-scale computing on examples in climate and weather
- State the Performance, Portability and Productivity requirements of Weather and Climate models (3P’s)
- Describe how Domain-Specific Languages (DSLs) can provide a solution to the problem of providing the 3P's
- Use PSyclone and Gridtools DSLs for small applications
Parallel Programming in Practice
In this session, we will provide a global overview of how the main concepts of parallel programming are implemented in weather and climate codes. We will detail the different parallel programming models for distributed and shared memory systems and describe the resulting scalability of commonly-used algorithms implementing those models. Particular attention will be devoted to specific features that may inhibit the scaling and performance of weather and climate codes. This analysis will be done at the level of the code routine itself but also in the more general context of code coupling, the latter being a specific implementation of coarse grain parallelism.
Learning Objectives
- Describe the scaling characteristics of commonly used algorithms in weather and climate models
- Discuss issues that may inhibit scaling and performance
- Classify programming models for distributed and shared memory systems
- Identify performance features and potential issues for computer processor architectures
- Describe the concepts of coupling software
- Classify coupling software implementations given their main characteristics
- Evaluate qualitatively the impact of different coupling configurations (sequential vs concurrent, multi vs mono-executable, …) on coupled model performance
- Describe the most used coupling software in climate and weather applications
Modern Storage
Learning Objectives
- Describe the architecture and architectural implications of modern storage architectures and object stores suitable for extreme-scale computing
- Discuss the storage stack with its semantics and potential performance implications on different levels: in particular POSIX vs MPI-IO vs NetCDF and high-level I/O middleware
- Execute the Darshan tool to identify I/O patterns and assess the performance
- Apply benchmarking tools to assess the performance
Input/Output and Middleware
Climate and weather research is typically data-intensive and applications must utilise input/output efficiently. Often, a user struggles to assess observed performance leading to superflux attempts to tune the application and optimise performance in a wrong layer of the stack. The content of this session is twofold. Firstly, we discuss storage layers focusing on the NetCDF middleware and provide a performance model that aids users to identify inefficient I/O. Secondly, we introduce the NetCDF Climate and Forecast (CF) conventions that are often used as a standard to exchange data.
Learning Objectives
- Identify typical I/O performance issues and their causes
- Apply performance models to assess and optimise I/O performance
- Design a data model for NetCDF/CF
- Analyse, manipulate and visualise NetCDF data
- Execute programs in C and Python that read and write NetCDF files in a metadata-aware manner
Machine Learning
(1) Predicting weather and climate require modelling the Earth System – a huge system that consists of many individual components that show chaotic behaviour and for which conventional tools often struggle to provide satisfying results. (2) A huge amount of data of the Earth System is available from both observations and modelling. (3) Machine learning methods allow learning complex non-linear behaviour from data if enough data is available and to apply the learned tools efficiently on modern supercomputers. If you combine (1), (2) and (3), it is easy to see that there are a large number of potential application areas for machine learning in weather and climate science that are currently explored. However, whether these approaches will succeed is still unclear as there are also a number of challenges for the application of machine learning tools in weather predictions. This talk will provide an introduction to machine learning, outline how to apply machine learning in Earth System modelling, show examples for the application of machine learning throughout the weather and climate modelling workflow, and discuss the challenges that will need to be tackled.
Learning Objectives
- Describe the relevance of Machine Learning and its application to judge why there is such a hype around the topic at the moment
- Explore how machine learning can be used in weather and climate modelling
- List a number of specific examples for the use of machine learning at ECMWF
- Discuss challenges for machine learning in weather and climate science
ECMWF - Virtual Visit
- Computer Hall Tour
Learn about the performance and specifications of the ECMWF High-Performance Computing Facilities, and the way this supercomputer is used for operations, storage and research by ECMWF and its 34 Member & Co-operating States. The presentation will include a video tour of the computing facilities currently located in our HQ in Reading and a preview of what the new data centre will look like when it opens in Bologna (Italy) next year.
- Weather Room Tour
Learn about ECMWF Forecasting products and activities. A member of the ECMWF Forecasting team will introduce you to the maps, charts and plots that are produced daily in the “Weather Room” for weather prediction and analysis.
High-Performance Data Analytics and Visualisation
Analysis and visualisation of scientific data, such as those in the field of climate and weather, requires solution capable of effectively and efficiently handling massive data. In this session, we will discuss some of the main challenges concerning scientific data management and in particular those related to data analytics and visualisation. Software solutions for high-performance data analytics and visualisation, as well as examples of applications of these systems for real use cases in the climate and weather domain, will be presented. The lab tutorial will provide a more practical introduction about some tools and modules for data analysis and how to apply these on climate data, as well as a walk-through of the VMI for the virtual lab.
Learning Objectives
- Discuss the main challenges of joining big data and HPC for scientific data management, in particular for data analytics and visualisation
- Put into action practical hints about some HPDA tools and their application to scientific data at scale
- Apply techniques and knowledge acquired during the course to real case studies in the weather and climate domain
Performance Analysis
Learning Objectives
- Define performance analysis fundamentals (objectives, methods, metrics, hardware counters, etc.)
- Describe the BSC performance analysis tools suite (Extrae, Paraver, Dimemas)
- Interpret uses cases from Earth System Models (IFS, NEMO, etc.) that illustrate how to identify and solve performance issues
- Apply profiling techniques to identify performance bottlenecks in your code
- Summarise typical performance problems
- Discuss specific knowledge about performance analysis applied to earth system modelling
Containers
This session will present an introduction to an end-to-end scientific computing workflow utilising Docker containers. Attendees will learn about the fundamentals of containerisation and the advantages it brings to scientific software. Participants will then familiarise with Docker technologies and tools, discovering how to manage and run containers on personal computers, and how to build applications of increasing complexity into portable container images. Particular emphasis will be given to software resources which enable highly-efficient scientific applications, like MPI libraries and the CUDA Toolkit. The final part of the lecture will briefly introduce HPC-focused tools capable of deploying containers on high-end computing systems starting from Docker images.
Learning Objectives
- Describe the difference between a container and a virtual machine
- Explain the relationship between a container and a container image
- Outline the basic workflow for the distribution of an image
- List advantages of using containers for scientific applications
- Write a Dockerfile
- Build a container image using Docker
- Run containers on personal computers using Docker
- Perform basic management of Docker containers and images
- Explain the motivations which drove the creation of HPC-focused container solutions
Agenda
We are currently preparing the agenda. The following shows the general (tentative) structure of the agenda.
The video playlist is available.
All times are in CEST.
Monday - Computing
- 09:00 Extreme-Scale Computation – Chair: Rupert Ford (STFC, UK), Carlos Osuna (MeteoSwiss, Switzerland) – Video
- 12:30 Virtual Lunch Break
- 13:30 Parallel Programming in Practice – Chair: Sophie Valcke (Cerfacs, France), Christopher Maynard (University of Reading, UK)
Tuesday - Storage
- 09:00 Modern Storage – Chair: Sai Narasimhamurthy (Seagate, UK), Jean-Thomas Acquaviva (DDN, France)
- 12:30 Virtual Lunch Break
- 13:30 Middleware and File Formats – Chair: Julian Kunkel, Sadie Bartholomew (University of Reading, UK)
- 15:00 Virtual Refreshment Break
- 15:30 Lab Session
- 16:15 CF-NetCDF with cfdm, cf-python and cf-plot – Sadie Bartholomew – Video
- 17:00 Session ends
Wednesday - Data Analytics
- 09:30 Machine Learning – Chair: Peter Dueben (ECMWF, UK)
- 09:30 Morning Academic Session – Peter Dueben (ECMWF, UK) Video
- This is a pre-recorded video, Peter will join for the Q&A session at the end
- 10:30 Virtual Refreshment Break
- 11:00 ECMWF - Virtual Visit
- Overview of ECMWF – Sam Hatfield (30 min) – Slides – Video Part 1 Video Part 2
ECMWF is an intergovernmental organisation supported by 34 European nations. Learn about its role within Europe and beyond. - Computer Hall tour – Jenny Rourke (30 min) – Video
Learn about the performance and specifications of the ECMWF High-Performance Computing Facilities, and the way this supercomputer is used for operations, storage and research by ECMWF and its 34 Member & Co-operating States. The presentation will include a video tour of the computing facilities currently located in our HQ in Reading and a preview of our new data centre due to open in Bologna (Italy) in September 2021.
- 12:30 Virtual Lunch Break
- 13:30 High-Performance Data Analytics and Visualisation – Chair: Donatello Elia (CMCC, Italy), Niklas Röber (DKRZ, Germany)
Thursday - Supporting Tools
- 09:00 Performance Analysis – Chair: Kim Seradell, Mario Acosta (BSC, Spain)
- 12:30 Virtual Lunch Break
- 13:30 Containers – Chair: Alberto Madonna (ETH Zürich, Switzerland), Simon Wilson (NCAS, UK)
- 13:30 Afternoon Academic Session
- 15:00 Virtual Refreshment Break
- 17:00 Session ends
Friday - Keynote and conclusions
- 10:00 Project FORESTCARE: An HPC application case study for a high spatial and temporal resolution forest vitality assessment - Sebastian Paczkowski (Georg-August-Universität Göttingen) – Video
The project develops a data-intense semi-automated forest vitality assessment procedure. A workflow consisting of forest ground data assessment, drone and satellite data acquisition, normalized data array generation, and HPC enabled deepCNN algorithm computation will enable a ground label – tree crown feature correlation that allows the identification of single trees with decreased vitality. Thereby, the increasing stress factors in forests, which are mainly associated with climate change (e.g. drought, bark beetle infestation), can be monitored constantly. The resulting high spatial and temporal resolution of the forest ecosystem dynamics can allow an improved and climate change adapted forest management, e.g. a selection of drought-resistant tree species depending on specific growth conditions of forest regions. - 11:00 Joint lab session – all lab leaders are there to answer any questions regarding the labs
- 12:00 End of the Summer School Programme
Attendance
It is free to attend the virtual summer school, however, registration is required!
Important Dates
- June - Final summer school program is released
- 31 July - Registration deadline for the summer school
- 23-27 August 2020 – Summer School on Effective HPC for Climate and Weather
- Mid-September – Selected presentations will be made available, and certificates will be sent to all attendees
Registration
The registration is now closed.
Target Audience
The target audience for the summer school is Earth system scientists, including PhD students and young researchers, and software engineers in the domain. While each of our topics will be introduced, the attendees should have a basic understanding of:
- Python
- Linux
- The general computational aspects of a climate/weather model
For attendees without prior experience, the following links provide some references to cover significant aspects of the contents mentioned above.
Organisation
Organisers
- Julian Kunkel (University of Göttingen / GWDG), julian.kunkel@gwdg.de
- Jack Ogaja (GWDG) jack.ogaja@gwdg.de
Programme Committee
- Julian Kunkel (University of Reading)
- Sophie Valcke (Cerfacs)
ESiWACE2
The Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) addresses world challenges pushing the limits of science. It benefits the broader community by providing services and training opportunities.
As part of the ESiWACE2 project, we are organising this summer school to bridge the gap between scientists and computational science and increase the effectiveness of young scientists. The main goal of this event is the training of representative scientists from different institutions with state-of-the-art concepts tailored to the domain, but that also stretches beyond climate and weather, allowing them to act as a multiplier and increase productivity overall.
ESiWACE is funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 823988.