Practical: High-Performance Computing System Administration

We are finalizing this course, information on this page is subject to change.

High-Performance Computing System Administration is essential for managing HPC resources not only as a user but as a cluster administrator. As part of this practical course, you receive an introduction into the basics of Linux and using HPC resources in two sessions. At the end of these sessions you will be assigned a topic in terms of a tool related to HPC system administration. You will test out and evaluate the tool. After the end of the term, a one-week block course will take place that goes more in depth on HPC system administration. At the end of the semester, you will hand in a report describing your evaluation of the topic you were assigned to.

Contact Julian Kunkel, Jonathan Decker
Location Virtual Support Room
Time 26.10.22 14:15-17:45, 02.11.22 14:15-17:45, 20-24.02.23 5-day block course
Language English
Module M.Inf.1831: High-Performance Computing System Administration
Credits 5,6(,9) (depending on the course)
Contact time up to 84 hours (63 full hours), depending on the course
Independent study up to 186 hours

Please note that we plan to record sessions (lectures and seminar talks) with the intent of providing the recordings via BBB to other students but also to publish and link the recordings on YouTube for future terms. If you appear in any of the recordings via voice, camera or screen share, we need your consent to publish the recordings. See also this Slide.

  • No skills/knowledge is required
  • Understanding of Linux basics and having used Linux before and being able to operate a Bash shell is beneficial
  • We will provide a short crash course at the beginning of the course and link supplementary training material
  • Discuss theoretic facts related to networking, compute and storage resources
  • Integrate cluster hardware consisting of multiple compute and storage nodes into a “supercomputer“
  • Configure system services that allow the efficient management of the cluster hardware and software including network services such as DHCP, DNS, NFS, IPMI, SSHD.
  • Install software and provide it to multiple users
  • Compile end-user applications and execute it on multiple nodes
  • Analyze system and application performance using benchmarks and tools
  • Formulate security policies and good practice for administrators
  • Apply tools for hardening the system such as firewalls and intrusion detection
  • Describe and document the system configuration
  • Intrusion detection tools for HPC
  • Encryption tools
  • Image Management and network booting with Werewolf
  • Software Management with modules/spack
  • Ressource Management with SLURM
  • Managing object storage
  • Managing cluster file systems in user space (GlusterFS, FUSE, SeaWeedFS)
  • File system management (NFSv4, Ceph, BeeGFS)
  • Performance analysis tools
  • Monitoring system performance
  • Application and system benchmarks
  • Virtualization tools for HPC (e.g., CharlieCloud, Singularity, Shifter)
  • Scalable databases with e.g., Elasticsearch, Postgres
  • Kernel compilation and configuration
  • Security infrastructures and intrusion systems
  • Deep Package Analysis and filtering
  • Berkeley Packet Filters (eBPF)
  • Firewalls
  • Kernel splicing
  • Scalable software management and distribution for Python
  • Forensic tools
  • Cluster wide User/Group management (e.g. LDAP)
  • Scalable logging and log-file analysis
  • 26.10.22 14:15 - 17:45
    • 14:15 - Welcome/Structure of the Course – Julian Kunkel slides
      • Forming support groups
    • 14:30 Linux Crash Course – Jonathan Decker preparation exercise sheet slides exercise sheet
      • Command Line
      • Some basic commands
      • Remote access to the Scientific Compute Cluster
    • 16:00 break
    • 16:15 Linux Exercise – Jonathan Decker
    • 16:45 First steps running applications on the cluster using Slurm – Ruben Kellner slides
      • Running applications on multiple nodes using SRUN
      • Getting an overview of the available hardware (docu, sinfo)
      • Outlook of running a parallel program, measuring different types of applications
    • 17:15 SLURM Exercise exercise
    • 17:30 Exercise - Homework – Jonathan Decker homework primes.c
      • Virtual Linux machine setup
      • Assessing the performance of running applications
  • 02.11.22 14:15 - 17:45
    • 14:15 Homework discussion – Jonathan Decker
    • 14:35 Introduction to Git – Christian Köhler slides
    • 15:20 break
    • 15:30 Compilation of applications via cmake, Autotools, make – Trevor Khwam slides
      • Exercise for cmake, Autotools, make exercise
    • 16:10 Software management with Spack – Trevor Khwam slides
    • 16:30 break
    • 16:45 Running container with Singularity – Azat Khuziyakhmetov slides
    • 17:15 Assignment information and topics – Julian Kunkel, Jonathan Decker slides
  • You work on your topic with some meetings with your supervisor.
    • We encourage you to collaborate in teams on your independent topics.
  • 20-24.02.23 4.5-day block course 9:00 - 18:00
    • Schedule to be announced
  • 31.03.23 Deadline for the submission of the report
  • Presentation of the project results
    • Forensic Tools – Dominik Mann
    • Scalable logging and log-file analysis – Linus Weber
    • Encryption Tools – Sonal Lakhotia

The exam is conducted through a report. The report should cover the evaluation of the assigned tool. The report should describe:

  • What the tool is, what it is used for
  • How the tool was set up
  • How you evaluated it
  • The results of your evaluation
  • Discussion of problems and potential of the tool
  • Conclusion

We recommend to use the LaTeX templates provided by us here:

  • Encryption Tools1)Sonal Lakhotia
  • Encryption Tools2)Julius Sieg
  • Forensic Tools3)Dominik Mann
  • Security infrastructures and intrusion systems4)Matthias Mildenberger
  • Scalable logging and log-file analysis5)Linus Weber
  • Scalable databases with e.g., Elasticsearch, Postgres6)Jakob Schmitz
  • Virtualization tools for HPC (e.g., CharlieCloud, Singularity, Shifter)7)Winfired Oed
  • Virtualization tools for HPC (e.g., CharlieCloud, Singularity, Shifter)8)Frederik Hennecke
  • Ressource Management with SLURM9)Aaron Kurda
  • Ressource Management with SLURM10)David Nelles
  • Application and system benchmarks11)Silin Zhao
  • Application and system benchmarks12)Johannes Richter
  • Evaluation of Time-Series Databases13)Lars Quentin
  • Monitoring System Performance14)Lukas Steinegger
  • Managing cluster file systems in user space15)Tim Dettmar
  • Performance analysis tools16)Nicolas Alqas Alyas
  • Performance analysis/measurements with Cassandra and HBase17)Abdul Rafay

1) , 2)
Supervisor: Hendrik Nolte
3) , 4)
Supervisor: Artur Wachtel
Supervisor: Christoph Hottenroth
Supervisor: Zoya Masih
7) , 8)
Supervisor: Azat Khuziyakhmetov
9) , 10)
Supervisor: Vanessa End
11) , 12) , 13) , 14)
Supervisor: Marcus Merz
Supervisor: Sebastian Krey
Supervisor: Jack Ogaja
Supervisor: Julian Kunkel
  • Impressum
  • Privacy
  • teaching/autumn_term_2022/hpcsa.txt
  • Last modified: 2022-11-15 11:13
  • by Jonathan Decker