Table of Contents

Practical: High-Performance Computing System Administration

High-Performance Computing System Administration is essential for managing HPC resources not only as a user but as a cluster administrator. As part of this practical course, you will take part in a hands-on one-week block course, which will introduce the basics of Linux and using HPC resources and then go into depth on HPC system administration. At the end of the block course you will choose a topic in terms of a tool related to HPC system administration, evaluate that tool and hand-in a report at the end of the semester. For this a supervisor will be assigned to you, who is an expert on the assigned tool and is able to guide you.

Key information

Contact Julian Kunkel, Jonathan Decker
Location Virtual Main Room Support Room
Time 16.10.23-20.10.23 5-day block course
Language English
Module M.Inf.1831: High-Performance Computing System Administration
SWS 4
Credits 6
Contact time up to 84 hours (63 full hours), depending on the course
Independent study up to 186 hours

Please note that we plan to record sessions (lectures and seminar talks) with the intent of providing the recordings via BBB to other students but also to publish and link the recordings on YouTube for future terms. If you appear in any of the recordings via voice, camera or screen share, we need your consent to publish the recordings. See also this Slide.

Required Prior Knowledge

Learning Objectives

Topics for Practical Works

Agenda

Block Seminar 16.10.23-20.10.23

This part is attended by BSc/MSc students and GWDG academy participants

Note: There are only breaks for lecture slots in the schedule. You can take a break during exercises as necessary. Preparation sheets: Preparation

Monday 16.10.2023

Tuesday 17.10.2023

Wednesday 18.10.2023

Thursday 19.10.2023

RzGö live hardware demonstration and Hands-on. If you are a remote participant, we request that you revisit the previous material and prepare questions for Q&A sessions.

On-site is limited to up to 20 participants.

Friday 20.10.2023

Student Project Work

Examination

The exam is conducted through a report. The report should cover the evaluation of the assigned tool. The report should describe:

The report should not exceed 15 pages (only counting raw text in the main part, the full report including cover pages and appendix may be longer). It is not sufficient to repeat the documentation of the tool in your own words.

We recommend to use the LaTeX templates provided by us here: https://hps.vi4io.org/teaching/ressources/start#templates

Examination Requirement

In order to be allowed to take the examination, you have to show that you have taken the majority of the sessions of the block course. To prove this, please send 1-2 pages of notes on the course to us. These can be your personal notes from the course you took during the sessions and does not need to be a formatted document and is just to prove that you took the course. These do NOT need to be complete solutions to the exercises, a few sentences on your takeaways per section are enough.

If you joined the course late or had to miss out on some of the sessions, you can find the recordings on BBB and the materials on this web page. The exercises can be completed on a personal VM.

Topic Distribution

Student Supervisor Topic
Jakob Hampel Stefanie Mühlhausen Ticketing Systems Schnittstellen/Performance/Vergleich
Joao Soares Timon Vogt Web Hosting Software Stacks Supabase vs Pocketbase
Andre Buderus Hauke Kirchner Scalable software management and distribution for Python
Jakob Dieterle Freja Nordsiek File system management (NFSv4, Ceph, BeeGFS)
Qumeng Sun Marcus Merz Intrusion detection tools for HPC
Mohamed Basuony Hendrik Nolte Scalable software management and distribution for Python
Zilin Song Timon Vogt Ressource Management with SLURM
Abdellah Omar Adolf Marcus Merz Monitoring System Performance
Michael Hubert Duah Jaromir Nemecek Image Management and network booting with Warewulf
Tim Dettmar Sebastian Krey HPC networking with libibverbs and libfabric (fallback Managing Cluster File Systems in user space)
Frederik Hennecke Zoya Masih Berkeley Packet Filters
Mehmet Niyazi Kayi Julian Rüger Cluster wide User/Group management (e.g. LDAP)
Surendhar Muthukumar Freja Nordsiek Managing cluster file systems in user space (GlusterFS, FUSE, SeaWeedFS)
Ashutosh Jaiswal Narges Lux Application and System Benchmarks
Pranay Bhatia Jonathan Decker Kubernetes for HPC
Sunny Jain Lars Quentin Scalable databases with e.g. Elasticsearch, Postgres
Lars Quentin Marcus Merz Prometheus Scalability Evaluation for HPC Monitoring
Chinaza Ogo Obiagazie Julian Rüger Cluster wide User/Group management (e.g. LDAP)