Open Theses

The offered theses below are intended for MSc but can also be reduced in scope and handed out as BSc theses.

Benchmarking of VAST protocol flavors for ML workloadsApply

A VAST Storage system will be installed as part of the new KISSKI data center. VAST storage systems offer different protocol flavours to access the storage backend, i.e. NFS, S3, SMB, and mixed. Since projects at the new data center should be executed in an efficient way, it is important to gain some insights in the potential performance of machine learning workloads. The proposed thesis will fill this gap and provide recommendations for future projects.

Prototype for GPU computing in LIGGGHTSApply

LIGGGHTS is a common code used for the simulation of macroscopic partiles. It is based on the well-known molecular dynamics code LAMMPS. The variant used within the thesis is the academic fork LIGGGHTS-PFM which is under current development. Since LAMMPS already has some modules for GPU processing, it is the goal of the thesis to modify LIGGGHTS-PFM to make use of these capabilities. It is expected that the enhancement will improve the run-time performance and pave the road to particle simulations on GPUs.

Implementation of a precice-Adapter for LIGGGHTSApply

Precice as already presented at the GöHPCoffee is a multiphysics framework which allows the combination of various simulation codes to perform coupled simulations. These can both include coupled thermal problems or topics related to fluid structure interaction. So far, there exists no possibility to perform a coupled particle simulation using preCICE since the only particle solver is not publically available. It is the aim of this thesis to mitigate this limiation by implementaing a precice-adapter for the particle solver LIGGGHTS-PFM. In addtion, the thesis will compare the achivable performance with other coupling libraries using LIGGGHTS and its derivatives.

Concept for a real-time drilling simulator using PythonApply

Despite python is not the obvious choice for the implementation of real-time simulations, it is an appropriate choice for the creation of an easily modifable code aiming on a deeper understanding of the drilling process. The proposed thesis topic is based on previous work of the supervisor which resulted in a number of models to describe the drilling process. As a next step, the successful applicant will create a concept to couple these models to provide a more comprehensive description of the drilling process. Within this scope different methods for the coupling of the various models will be implemented and evaluated with the goal of providing a working first prototype.

Characterizing HPC storage systemsApply

HPC storage systems exhibit complex behavior that is unfortunately often not well understood. As part of this work, the student would execute various storage benchmarks on different storage systems at GWDG and aim to understand the system performance creating a characterization for these systems. We will also document the results and aim to create a publication.

Containers for Parallel ApplicationsApply

Parallel applications on HPC systems often rely on system specific MPI (Message Passing Interface) and interconnect libraries, for example for Infiniband or OmniPath networks. This partially offsets one main advantage of containerizing such applications, namely the portability between different platforms. The goal of this project is to evaluate different ways of integrating system specific communication libraries into containers, allowing for porting these containers to a different platform with minimal effort. A PoC should be implemented and benchmarked against running natively on a system.

Fixing Shortcomings of Kubernetes Severless TechnologiesApply

Serverless Computing or Function-as-a-Service (FaaS) has emerged as a new paradigm for computing over the last few years. There exists a number of open source FaaS platforms based on Kubernetes as the container orchestration platform maps well to the components required for FaaS. However, most approaches to FaaS are still relatively naive and leave many performance improvements on the table. This work focuses on said limitations and aims to solve at least one of them and implement a proof of concept. Finally, the performance improvements should be benchmarked in a virtualized environment and on the HPC system.

Evaluating the Capabilities of K8SGPTApply

K8SGPT ( is a Kubernetes tool that can use the OpenAI API or self-hosted AI APIs (such as LocalAI to analyse a given cluster. On paper this sounds great as it allows finding and fixing the relevant information within the complexity of a K8s cluster. But how capable is it really? What limitations apply, what is the overhead and how do the OpenAI API and LocalAI compare? For this topic, methods for the evaluation should be developed and applied to test clusters. Finally, a recommendation should be given on which use cases can benifit from K8SGPT and which not.

Confidential GPU InferenceApply

For customer facing systems that handle sensitve data such as patient information, it is required to comply with strict data protection laws. In order to comply with these laws even during a security breach, confidential computing should be used, however, modern use-cases require the usage of scable multi-user systems with GPU acceleration for ML inference workloads. This thesis encapsulates setting up confidential computing on top of a Kubernetes cluster using Kata Containers, Confidential Containers and Nvidia Confidential GPU Computing as well as measuring the performance costs of using a confidential compute stack.

Scientific container monitoring: Development of a tool to monitor a vast number of scientific containers in a multi-node, multi-user HPC environmentApply

In the realm of scientific computing, containerization is gaining an ever-growing relevance, as the advantages of containerized applications, especially in the multi-user environment of an HPC system, are numerous: encapsulation of dependencies, ease of portability to other systems, ease of deployment and much more. Yet, while by now a multitude of container runtimes and container management solutions exist, HPC monitoring software that specifically takes containerized applications into account and is capable of generating and displaying monitoring data that is "container-aware" and can resolve down to a level of individual containers within individual computing nodes is still very much lacking. In this thesis, you will develop your own HPC monitoring software that specifically targets containerized applications on the GWDGs HPC system. Your software will then be deployed on Scientific Compute Cluster (SCC) of the GWDG to monitor the containers that researchers are running on it and you will analyze which additional insights administrators of HPC systems can achieve if their monitoring software is "container-aware".

Development and Evaluation of a GPU Performance Model for HPC ClustersApply

This thesis aims to enhance the computational efficiency of GPU-based applications on GWDG clusters. A performance model will be developed considering the GPU architecture, application characteristics, and GWDG cluster configuration. The model will be implemented and its accuracy will be evaluated using a set of benchmark applications. The model will then be used to identify performance bottlenecks and optimize these applications. The expected outcome is an improved understanding of GPU performance on GWDG clusters, leading to more efficient utilization of these resources. This work has the potential to significantly impact the performance of GPU-based applications on GWDG clusters.

An AI-Based Algorithm development for Early Fault Detection in Compute Continuum SystemsApply

In this thesis, student will explore the current methods in AI to create smart algorithm that can catch problems in computer systems before they turn serious. Think of it as developing a high-tech 'early warning system'. The journey will involve playing with data, crafting algorithms, and running simulations to see how well they work. Plus, you'll get to integrate your creations into real computing systems, making them more reliable and reducing downtimes.

Implementing Edge Computing for Real-Time Predictive Maintenance in Compute Continuum SystemsApply

This research explores the potential of edge computing technologies in enabling real-time predictive maintenance within compute continuum systems. The objective is to develop a framework that utilizes edge computing for immediate data processing and decision-making, enhancing the overall efficiency and responsiveness of maintenance protocols. The thesis will involve both theoretical and practical aspects, including system design, implementation, and testing in real-world scenarios.

Scalability Challenges and Solutions in AI-Based Predictive Maintenance for Large-Scale Compute Continuum SystemsApply

To answer the question of how making AI-driven maintenance work smoothly in huge computing systems, we will need to find out what makes scaling up so tricky and come up with efficient ways to make it better. Student will investigate the scalability challenges associated with implementing AI-based predictive maintenance in large-scale compute continuum systems. They'll get to analyze existing systems, brainstorm new methods, and test how well they work in the real world of large-scale computing maintenance. The research will focus on identifying key scalability issues and developing innovative solutions to enhance the performance and effectiveness of predictive maintenance strategies. It also will include a thorough analysis of current systems, proposal of new methodologies, and evaluation of their impact on large-scale system maintenance.

Comparison of Distributed Computing FrameworksApply

While the data analytics tool Apache Spark has already been available on GWDG systems for multiple years, Dask is an upcoming topic. Spark is primarily used with Scala (and supports Python as well), Dask on the other hand is a part of the Python ecosystem. The project proposal is to compare the deployment methods on an HPC system (via Slurm in our case), the monitoring possibilities and tooling available, and to develop, run and evaluate a concrete application example on both platforms.

How to efficiently access free earth observation data for data analysis on HPC Systems?Apply

In recent years the availability of freely available Earth observation data has increased. Besides ESA's Sentinel mission [1] and NASA's Landsat mission [2], various open data initiatives have arisen. For example, several federal states in Germany publish geographical and earth observation data, such as orthophotos or lidar data, free of charge [3,4]. However, one bottleneck at the moment is the accessibility of this data. Before analyzing this data, researchers need to put a substantial amount of work into downloading and pre-processing this data. Big platforms such as Google [5] and Amazon [6] offer these data sets, making working in their environments significantly more comfortable. To promote and simplify data analysis in earth observation on HPC systems, approaches for convenient data access need to be developed. In a best-case scenario, the resulting data is analysis-ready so that researchers can directly jump into their research. The goal of this project is to explore the current state of services and technologies available (data cubes [7], INSPIRE [8], STAC [9]) and to implement a workflow that provides a selected data set to users of our HPC system. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Performance optimization of deep learning model training and inferenceApply

Recent advances in deep learning, such as image (Rombach et al. 2022) and text generation (OpenAI 2023), have led to an increase in the number of AI publications in the world (Zhang et al. 2022). The breakthrough in deep learning is only possible because of evolving hardware and software that allows the processing of big data sets efficiently. Further, most of the accuracy gains of these models result from increasingly complex models (Schwartz et al. 2019). From 2013 to 2019, the required computing power for training deep learning models increased by a factor of $300,000$ (Schwart 2019). Therefore, performance optimization of deep learning model training and inference is highly relevant. Profiling with tools such as DeepSpeed [1] and the in-build PyTorch Profiler [2] helps identify the existing model's bottlenecks. Different optimization strategies, such as data and model parallelism, could be applied depending on the profiling results. Further, tools such as PyTorch Lightning's trainer [3] and Horovod [4] can be tested to use the cluster's resources efficiently. [1] [2] [3] [4] Dodge, Jesse et al. (2022). Measuring the Carbon Intensity of AI in Cloud Instances. doi: 10.48550/ARXIV.2206.05229. url: OpenAI (2023). GPT-4 Technical Report. arXiv: 2303.08774 [cs.CL]. Rombach, Robin et al. (2022). “High-Resolution Image Synthesis with Latent Diffusion Models”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). url: // Schwartz, Roy et al. (2019). “Green AI”. In: CoRR abs/1907.10597. arXiv: 1907.10597. url: Zhang, Daniel et al. (2022). The AI Index 2022 Annual Report. arXiv: 2205.03468 [cs.AI].

Benchmarking phylogenetic tree reconstructionsApply

In phylogenetic tree reconstructions, we describe the evolutionary relationship between biological sequences in terms of their shared ancestry. To reconstruct such a tree, multiple approaches exist, including maximum likelihood and Bayesian methods. Among the most commonly used implementations of these methods are RAxML and MrBayes, both of which are available on SCC. In this project, you will identify a suitable benchmarking suite and use it to benchmark RAxML and MrBayes on SCC.

Prototyping common workflows in phylogenetic tree reconstructionsApply

In phylogenetic tree reconstructions, we describe the evolutionary relationship between biological sequences in terms of their shared ancestry. To reconstruct such a tree, multiple approaches exist, including maximum likelihood and Bayesian methods. Among the most commonly used implementations of these methods are RAxML and MrBayes, both of which are available on SCC. In this project, you will identify and establish a typical workflow on SCC, from data management to documentation. This project is especially suitable for students enrolled in Computer Science (M.Ed.) programme.

Benchmarking Applications on Cloud vs. HPC SystemsApply

In this day and age, everybody has heard of the Cloud, is using cloud services and most people know that you can deploy parallel applications on cloud infrastructure. Meanwhile, HPC is still stuck in its narrow niche of a select few power users and experts. Few everyday people even know what HPC means. It is easy to get access to large amounts of computing power by renting time on various cloud services. But how do applications deployed on a cloud service like the GWDG cloud compare to their twins deployed on HPC clusters in terms of performance? How well suited are different parallelization schemes to run on both systems? The goal of this project is to get some insight into these questions and benchmark a few applications to get concrete numbers, compare both approaches and present the results in an accessible and clear way.

RISC-V eval board Linux and toolchains putting into operationApply

While the HPC world is dominated by x86 architectures, RISC-V is a promising evolving alternative. To prepare for work with RISC-V based HPC and get familiar with architecture specific details, Starfive Visionfive2 eval boards have been procured [1]. These need to be configured to run Linux according to documentation, compiler toolchains and libraries, need to be setup and tested, some benchmark or other proof of operability performed. Familiarity with electronics equipment and Linux command line is an advantage. [1]

Benchmarking AlphaFold and alternative models for protein structure predictionApply

Proteins are involved in every biological process in every living cell. To assess how a protein functions exactly, knowing its amino acid sequence alone is not enough. Instead, its three-dimensional structure needs to be determined as well. In the last year, we saw a number of AI bases approaches put forward. In this project, you will compare and benchmark the performance of AlphaFold and alternative models on the SCC.

Integration of HPC systems and Quantum ComputersApply

Especially in the noisy intermediate scale quantum computing era, hybrid quantum-classical approaches are among the most promising to achieve some early advantages over classical computing. For these approaches an integration with HPC systems is mandatory. The goal of this project is to design and implement a workflow allowing to run hybrid codes using our HPC systems and, as a first step, quantum computing simulators, extend this to cloud-available real quantum computers, and provide perspectives for future systems made available otherwise. Possible aspects of this work are Jupyter based user interfaces, containerization, scheduling, and costs of hybrid workloads. The final result should be a PoC covering at least some important aspects.

Using Neuromorphic Computing in Optimization ProblemsApply

Neuromorphic computers, i.e., computers which design is inspired by the human brain, are mostly intended for machine learning. However, recent results show that they may prove advantageous for NP-complete optimization problems as well. In this area they compete with (future) Quantum Computers, especially with Quantum Annealing and Adiabatic approaches. The goal of this project is to explore the SpiNNaker systems avaialable at GWDG regarding their use in this type of problems. A successful project would encompass the implementation of a toy problem comparing it to implementations on other platforms.

Parallelization of Iterative Optimization Algorithms for Image Processing using MPIApply

Iterative optimzation algorithms are used in various areas of computer science and related fields including machine learning and artificial intelligence, and image reconstruction. For large-scale problems these algorithms can be parallelized to run on multiple CPUs and GPUs. In this work, an existing image-reconstruction framework for Computational Magnetic Resonance Imaging (MRI) will be parallelized using Message Passing Interface (MPI) standard. Benchmarks and performance analysis on the parallel implementations will be performed on a national super-computer.

Performance Analysis of Generative Neural NetworksApply

Training Generative Adversarial Networks (GANs) involves training both genrator and discriminator networks in an alternating procedure. This procedure can be complex and consumes relatively huge amount of computational resources. In this work, HPC performance tools for applications will be used to profile training of GANs and characterise their performances on GPUs.

Network Anormaly Detection using DPUsApply

Data Processing Units (DPUs) are programmable SoC-based SmartNICs which have the capability to offload processing tasks that are normally performed by CPUs. Using their onboard processors, DPUs can be used to perform in-network data analysis besides performing the traditional NIC functions. Specifically, the host system can offload data-intensive workloads to DPUs for Big Data Anlytics and AI/ML acceleration. Anomalies in computer network can be attributed to hardware or software failures, cyber-attacks or misconfigurations. In-networking analysis of network data can help reduce serious damages in case of cyber attack or similar security breaches. Big Data analytics tools like Spark Streaming can be used to enable real-time data processing before applying ML/DL algorithms for anomaly detection. In this work, machine learning models will be trained and deployed in DPUs to perform in-network inference on network data for anomaly detection. The results is expected to demonstrate the potential of deploying DPUs for cybersecurity.

Workflow Optimization in Data Management PlanningApply

1) Theory: To explore and present "Efficient Workflow" in "Data Management Planning" 2) Practical: To prepare a model on one of a use cases in Data Management Planning showing an optimize workflow.

Theoretical Analysis of Mapping Problem Using Quantum ApproachApply

1) Theory: To explore and present how HPC resource mapping problem can be solved using Quantum Approach 2) Practical: To prepare a model on one of a use cases in HPC workflow showing an improvement techniques.

Theoretical Analysis of Scheduling Problem Using Quantum ApproachApply

1) Theory: To explore and present how HPC resource mapping problem can be solved using Quantum Approach 2) Practical: To prepare a model on one of a use cases in HPC workflow showing an improvement techniques.

Practical Analysis of Mapping Problem Using AI/ML ApproachApply

1) Theory: To explore and present how HPC resource mapping problem can be solved using AI/ML Approach 2) Practical: To prepare a model on one of a use cases in HPC workflow showing an improvement techniques.

Practical Analysis of Scheduling Problem Using AI/ML ApproachApply

1) Theory: To explore and present how HPC resource mapping problem can be solved using AI/ML Approach 2) Practical: To prepare a model on one of a use cases in HPC workflow showing an improvement techniques.

Containerizing On-Premise HPC Services with SingularityApply

Containerized applications are becoming more popular than ever. One of the biggest problems when compiling a software on OS is to break dependencies of other installed softwares. Containers encapsulate an application as a single executable package of software that put application code together with all of the related configuration files, libraries, and dependencies required for it to run. They also maximize scalability and flexibility during deployment process, which are the most important points of DevOps culture.This project aims to containerize the one of the most popular HPC softwares, Slurm. Therefore, Slurm can be easily deployed and upgraded without compatibility issues with CI/CD pipelines

Evolutionary Algorithm for Global OptimizationApply

Evolutionary algorithms are an established means for optimization tasks in a variety of fields. An existing code being used for molecular clusters using a now simpler target system shall be investigated in regards of e.g. another parallelization scheme, more efficient operators, better convergence behavior of optimization routines used therein, etc.

Secure Real-Time Inference on Medical Images: A Comparative Study on High-Performance Computing SystemsApply

This master's thesis explores the cutting-edge domain of real-time medical image processing, focusing on secure inference using MRI/CT scan data. The study encompasses segmentation/detection/classification of MRI/CT data, leveraging datasets from UMG and other public sources. The core of the research involves training deep learning models on the SCC/Grete cluster, followed by real-time inference on various high-performance computing (HPC) systems. A comparative analysis of inference performance across these HPC systems forms a crucial part of this investigation. The thesis aims to contribute significant insights into the optimization of real-time medical image processing in secure environments, adhering to stringent data privacy standards. This research necessitates a master’s student with a background in applying deep learning to image data and some proficiency in PyTorch or TensorFlow.

Advancing Education in High Performance Computing: Exploring Personalized Teaching Strategies and Adaptive Learning TechnologiesApply

The present thesis delves into the exciting research field of personalized teaching in High Performance Computing (HPC). The objective is to identify innovative methods and technologies that enable tailoring educational content in the field of high-performance computing to the individual needs of students. By examining adaptive learning platforms, machine learning, and personalized teaching strategies, the thesis will contribute to the efficient transfer of knowledge in HPC courses. The insights from this research aim not only to enhance teaching in high-performance computing but also to provide new perspectives for the advancement of personalized teaching approaches in other technology-intensive disciplines.

Integrated Analysis of High Performance Computing Training Materials: A Fusion of Web Scraping, Machine Learning, and Statistical InsightsApply

This thesis focuses on the compilation and analysis of training materials from various scientific institutions in the High Performance Computing (HPC) domain. The initial phase involves utilizing scraping techniques to gather diverse training resources from different sources. Subsequently, the study employs methods derived from Machine Learning and Statistics to conduct a comprehensive analysis of the collected materials. The research aims to provide insights into the existing landscape of HPC training materials, identify commonalities, and offer recommendations for optimizing content delivery in this crucial field.

Revolutionizing High Performance Computing Education: Harnessing Large Language Models for Interactive Training Content Generation and Coding SupportApply

This groundbreaking thesis endeavors to transform the landscape of High Performance Computing (HPC) education by leveraging the capabilities of Large Language Models (LLMs). The primary focus is on developing an interactive training environment where LLMs are employed to dynamically generate tailored instructional content for HPC courses. Additionally, the study explores the proficiency of LLMs in providing coding support, assessing the quality of their output, and discerning their effectiveness in facilitating a seamless learning experience.

Evaluating Pedagogical Strategies in High Performance Computing Training: A Machine Learning-driven Investigation into Effective Didactic ApproachesApply

This thesis delves into the realm of computer science education with a particular focus on High Performance Computing (HPC). Rather than implementing new tools, the research centers on the field of didactics, aiming to explore and assess various pedagogical concepts applied to existing HPC training materials. Leveraging Machine Learning tools, this study seeks to identify prevalent didactic approaches, analyze their effectiveness, and ascertain which strategies prove most promising. This work is tailored for those with an interest in computer science education, emphasizing the importance of refining instructional methods in the dynamic and evolving landscape of High Performance Computing.

Reimagining and Porting a Prototype for High Performance Computing Certification: Enhancing Knowledge and Skills ValidationApply

This thesis focuses on the evolution of the certification processes within the High Performance Computing (HPC) domain, specifically addressing the adaptation and porting of an existing prototype from the HPC Certification Forum. The objective is to redefine, optimize and automate the certification procedures, emphasizing the validation of knowledge and skills in HPC. The study involves the redevelopment of the prototype to align with current industry standards and technological advancements. By undertaking this project, the research aims to contribute to the establishment of robust and up-to-date certification mechanisms and standards that effectively assess and endorse competencies in the dynamic field of High Performance Computing.

Facilitate Fastest Data Exchange between Containers in a Single Pod via IPC (Shared memory) in Kubernetes.Apply

Shared memory is a powerful interprocess communication mechanism in the Linux operating system that allows multiple processes to access and manipulate a common block of memory. It allows processes to access common structures and data by placing them in shared memory segments. It is the fastest form of inter-process communication available since no kernel involvement occurs when data is passed between the processes without the need for complex data transfer methods such as message passing or file I/O. A segment can be created by one process, and subsequently written to and read from by any number of processes. Shared memory provides a fast and direct means of communication, making it ideal for scenarios where processes need to share large amounts of data or collaborate closely. In fact, data does not need to be copied between the processes. Shared memory is commonly used by databases and custom-built (typically C/OpenMPI, C++/using boost libraries) high performance applications for scientific computing and financial services industries.

  • Impressum
  • Privacy
  • research/open-theses.txt
  • Last modified: 2023-08-28 10:40
  • by