Open Theses
PhD
MSc & BSc
The offered theses below are intended for MSc but can also be reduced in scope and handed out as BSc theses.
Fixing Shortcomings of Kubernetes Severless TechnologiesApply
Serverless Computing or Function-as-a-Service (FaaS) has emerged as a new paradigm for computing over the last few years. There exists a number of open source FaaS platforms based on Kubernetes as the container orchestration platform maps well to the components required for FaaS. However, most approaches to FaaS are still relatively naive and leave many performance improvements on the table. This work focuses on said limitations and aims to solve at least one of them and implement a proof of concept. Finally, the performance improvements should be benchmarked in a virtualized environment and on the HPC system.
Performance optimization of numerical simulation of condensed matter systemsApply
The naive simulation of interacting condensed matter systems is an ocean-boiling problem because of the exponential growth of the Hilbert space dimension. This offers a great opportunity to apply many analytical approximations and advanced numerical methods in HPC.
AI-Driven Anomaly Detection in Compute Continuum SystemsApply
This thesis focuses on developing advanced AI techniques for anomaly detection in Compute Continuum systems. By leveraging state-of-the-art machine learning models, the research will explore effective methods for identifying anomalies in diverse environments, ranging from edge devices to HPC clusters. The work includes creating datasets, designing algorithms, and implementing prototypes to validate the models' efficiency in real-world scenarios.
Edge AI for Real-Time Predictive Maintenance in Distributed SystemsApply
Edge computing is revolutionizing real-time predictive maintenance by processing data locally, minimizing latency, and enhancing system reliability. This thesis aims to design an edge AI-based framework for predictive maintenance in distributed systems. The student will develop and test a real-time solution to detect and predict potential failures using sensor data and logs from edge devices, ensuring robust and efficient operations.
Scalability Challenges in AI-Enhanced Workload Scheduling for HPC SystemsApply
This research addresses the scalability challenges associated with implementing AI-driven workload scheduling in HPC systems. By analyzing bottlenecks and proposing innovative solutions, the thesis will contribute to optimizing workload distribution for large-scale computing environments. The student will explore and test novel approaches for enhancing the scalability and performance of scheduling algorithms tailored to HPC systems.
Development of a new application for the SpiNNaker-2 neuromorphic computing platformApply
SpiNNaker is a new kind of computer architecture, inititally designed to efficiently perform simulations of spiking neuron networks. It consists of a large number of low-powered ARM cores, connected with an efficient message passing network. This architecture together with the flexibility of the spiking neuron model make it also ideal for accelerating other types of algorithms such as optimization problems, constrain problems, live image and signal processing, AI/ML, cellular automata, finite element simulations, distributed partial differential equations, and embedded, robotics, and low powered applications in general. As part of the Future Technology Platform, the GWDG has acquired a number of SPiNNaker boards that will be available for the thesis. In this thesis, you will develop one (or more) applications for SPiNNaker, either with the high-Level Python or low-level C/C++ software stacks, characterize your solution, compare it to a pure CPU/GPU solution (or other hardware in the Future Technologa Platform), if possible apply it to a real case study, and study the power consumption of your program.
Parallelization of Iterative Optimization Algorithms for Image Processing using MPIApply
Iterative optimzation algorithms are used in various areas of computer science and related fields including machine learning and artificial intelligence, and image reconstruction. For large-scale problems these algorithms can be parallelized to run on multiple CPUs and GPUs. In this work, an existing image-reconstruction framework for Computational Magnetic Resonance Imaging (MRI) will be parallelized using Message Passing Interface (MPI) standard. Benchmarks and performance analysis on the parallel implementations will be performed on a national super-computer.
Performance Analysis of Generative Neural NetworksApply
Training Generative Adversarial Networks (GANs) involves training both genrator and discriminator networks in an alternating procedure. This procedure can be complex and consumes relatively huge amount of computational resources. In this work, HPC performance tools for applications will be used to profile training of GANs and characterise their performances on GPUs.
Network Anormaly Detection using DPUsApply
Data Processing Units (DPUs) are programmable SoC-based SmartNICs which have the capability to offload processing tasks that are normally performed by CPUs. Using their onboard processors, DPUs can be used to perform in-network data analysis besides performing the traditional NIC functions. Specifically, the host system can offload data-intensive workloads to DPUs for Big Data Anlytics and AI/ML acceleration. Anomalies in computer network can be attributed to hardware or software failures, cyber-attacks or misconfigurations. In-networking analysis of network data can help reduce serious damages in case of cyber attack or similar security breaches. Big Data analytics tools like Spark Streaming can be used to enable real-time data processing before applying ML/DL algorithms for anomaly detection. In this work, machine learning models will be trained and deployed in DPUs to perform in-network inference on network data for anomaly detection. The results is expected to demonstrate the potential of deploying DPUs for cybersecurity.
Comparison of Distributed Computing FrameworksApply
While the data analytics tool Apache Spark has already been available on GWDG systems for multiple years, Dask is an upcoming topic. Spark is primarily used with Scala (and supports Python as well), Dask on the other hand is a part of the Python ecosystem. The project proposal is to compare the deployment methods on an HPC system (via Slurm in our case), the monitoring possibilities and tooling available, and to develop, run and evaluate a concrete application example on both platforms.
Evolutionary Algorithm for Global OptimizationApply
Evolutionary algorithms are an established means for optimization tasks in a variety of fields. An existing code being used for molecular clusters using a now simpler target system shall be investigated in regards of e.g. another parallelization scheme, more efficient operators, better convergence behavior of optimization routines used therein, etc.
Constraint Programming for Workload Optimization in HPCApply
Constraint programming offers a robust approach to optimize workload scheduling in HPC systems. This project explores formulating resource allocation and task scheduling as constraint satisfaction problems to minimize makespan and maximize resource utilization.
Integration of Constraint Programming and Machine Learning for HPC SchedulingApply
Machine learning can enhance constraint programming by predicting constraints or guiding solution search. This project investigates hybrid approaches to improve scheduling performance in HPC systems.
Quantum Constraint Programming for Workload Optimization in HPCApply
Constraint programming offers a robust approach to optimize workload scheduling in HPC systems. This project explores formulating resource allocation and task scheduling as constraint satisfaction problems to minimize makespan and maximize resource utilization.
Advancing Education in High Performance Computing: Exploring Personalized Teaching Strategies and Adaptive Learning TechnologiesApply
The present thesis delves into the exciting research field of personalized teaching in High Performance Computing (HPC). The objective is to identify innovative methods and technologies that enable tailoring educational content in the field of high-performance computing to the individual needs of students. By examining adaptive learning platforms, machine learning, and personalized teaching strategies, the thesis will contribute to the efficient transfer of knowledge in HPC courses. The insights from this research aim not only to enhance teaching in high-performance computing but also to provide new perspectives for the advancement of personalized teaching approaches in other technology-intensive disciplines.
Integrated Analysis of High Performance Computing Training Materials: A Fusion of Web Scraping, Machine Learning, and Statistical InsightsApply
This thesis focuses on the compilation and analysis of training materials from various scientific institutions in the High Performance Computing (HPC) domain. The initial phase involves utilizing scraping techniques to gather diverse training resources from different sources. Subsequently, the study employs methods derived from Machine Learning and Statistics to conduct a comprehensive analysis of the collected materials. The research aims to provide insights into the existing landscape of HPC training materials, identify commonalities, and offer recommendations for optimizing content delivery in this crucial field.
Evaluating Pedagogical Strategies in High Performance Computing Training: A Machine Learning-driven Investigation into Effective Didactic ApproachesApply
This thesis delves into the realm of computer science education with a particular focus on High Performance Computing (HPC). Rather than implementing new tools, the research centers on the field of didactics, aiming to explore and assess various pedagogical concepts applied to existing HPC training materials. Leveraging Machine Learning tools, this study seeks to identify prevalent didactic approaches, analyze their effectiveness, and ascertain which strategies prove most promising. This work is tailored for those with an interest in computer science education, emphasizing the importance of refining instructional methods in the dynamic and evolving landscape of High Performance Computing.
Reimagining and Porting a Prototype for High Performance Computing Certification: Enhancing Knowledge and Skills ValidationApply
This thesis focuses on the evolution of the certification processes within the High Performance Computing (HPC) domain, specifically addressing the adaptation and porting of an existing prototype from the HPC Certification Forum. The objective is to redefine, optimize and automate the certification procedures, emphasizing the validation of knowledge and skills in HPC. The study involves the redevelopment of the prototype to align with current industry standards and technological advancements. By undertaking this project, the research aims to contribute to the establishment of robust and up-to-date certification mechanisms and standards that effectively assess and endorse competencies in the dynamic field of High Performance Computing.
Integration of HPC systems and Quantum ComputersApply
Especially in the noisy intermediate scale quantum computing era, hybrid quantum-classical approaches are among the most promising to achieve some early advantages over classical computing. For these approaches an integration with HPC systems is mandatory. The goal of this project is to design and implement a workflow allowing to run hybrid codes using our HPC systems and, as a first step, quantum computing simulators, extend this to cloud-available real quantum computers, and provide perspectives for future systems made available otherwise. Possible aspects of this work are Jupyter based user interfaces, containerization, scheduling, and costs of hybrid workloads. The final result should be a PoC covering at least some important aspects.
Using Neuromorphic Computing in Optimization ProblemsApply
Neuromorphic computers, i.e., computers which design is inspired by the human brain, are mostly intended for machine learning. However, recent results show that they may prove advantageous for NP-complete optimization problems as well. In this area they compete with (future) Quantum Computers, especially with Quantum Annealing and Adiabatic approaches. The goal of this project is to explore the SpiNNaker systems avaialable at GWDG regarding their use in this type of problems. A successful project would encompass the implementation of a toy problem comparing it to implementations on other platforms.
Benchmarking of VAST protocol flavors for ML workloadsApply
A VAST Storage system will be installed as part of the new KISSKI data center. VAST storage systems offer different protocol flavours to access the storage backend, i.e. NFS, S3, SMB, and mixed. Since projects at the new data center should be executed in an efficient way, it is important to gain some insights in the potential performance of machine learning workloads. The proposed thesis will fill this gap and provide recommendations for future projects.
Concepts for GPU computing for particle transport simulations using LIGGGHTSApply
LIGGGHTS is a common code used for the simulation of macroscopic particles. It is based on the well-known molecular dynamics code LAMMPS. The variant used within the thesis is the academic fork LIGGGHTS-PFM which is under current development. Since LAMMPS already has some modules for GPU processing, it is the goal of the thesis to modify LIGGGHTS-PFM to make use of these capabilities. In a first step the best strategy for implementing LIGGGHTS-PFM on GPUs should be evaluated. Based on this a concept and initial steps of the implementation are expected. However, it is not required that all features of LIGGGHTS-PFM are implemented within the scope of the thesis. It is expected that the enhancement will improve the run-time performance and pave the road to particle simulations on GPUs. General programming experience is required. Knowledge in GPUcomputing and particle transport is beneficial but not mandatory.
Implementation of a precice-Adapter for the particle transport simulator LIGGGHTSApply
Precice as already presented at the GöHPCoffee is a multiphysics framework which allows the combination of various simulation codes to perform coupled simulations. These can both include coupled thermal problems or topics related to fluid structure interaction. So far, there exists no possibility to perform a coupled particle simulation using preCICE since the only particle solver is not publicly available. It is the aim of this thesis to mitigate this limitation by implementing a precice-adapter for the particle solver LIGGGHTS-PFM. One possibility could be the modification of an existing OpenFOAM-adapter in preCICE. In addition, the thesis will compare the achievable performance with other coupling libraries using LIGGGHTS and its derivatives. General programming experience is required. Knowledge in simulation technology and particle transport especially in LIGGGHTS is beneficial but not mandatory.
Understanding I/O behavior in HPC systems, for AI workloadsApply
Improving I/O performance is critical for optimizing the overall efficiency of HPC systems. Enhanced I/O performance leads to faster data processing, which is crucial for AI workloads that require quick and efficient handling of large datasets. The first step to have a good performance, is a good understang of that, and therefore the project should focus on Analyzing different I/O metrics in HPC systems and/or developing models to predict I/O performance, especially under the pressure of AI workloads that are data-intensive and have unique I/O patterns.
How to efficiently access free earth observation data for data analysis on HPC Systems?Apply
In recent years the availability of freely available Earth observation data has increased and could be made accessible to all HPC users using our data pools. [0] Besides ESA's Sentinel mission [1] and NASA's Landsat mission [2], various open data initiatives have arisen. For example, several federal states in Germany publish geographical and earth observation data, such as orthophotos or lidar data, free of charge [3,4]. However, one bottleneck at the moment is the accessibility of this data. Before analyzing this data, researchers need to put a substantial amount of work into downloading and pre-processing this data. Big platforms such as Google [5] and Amazon [6] offer these data sets, making working in their environments significantly more comfortable. To promote and simplify data analysis in earth observation on HPC systems, approaches for convenient data access need to be developed. In a best-case scenario, the resulting data is analysis-ready so that researchers can directly jump into their research. The goal of this project is to explore the current state of services and technologies available (data cubes [7], INSPIRE [8], STAC [9]) and to implement a workflow that provides a selected data set to users of our HPC system. [0] https://docs.hpc.gwdg.de/services/datapool/index.html [1] https://sentinels.copernicus.eu/ [2] https://landsat.gsfc.nasa.gov/ [3] https://www.geoportal-th.de/de-de/ [4] https://www.geodaten.niedersachsen.de/startseite/ [5] https://developers.google.com/earth-engine/datasets [6] https://aws.amazon.com/de/earth/ [7] https://datacube.remote-sensing.org/ [8] https://inspire.ec.europa.eu/ [9] https://stacspec.org/
Performance optimization of deep learning model training and inferenceApply
Recent advances in deep learning, such as image (Rombach et al. 2022) and text generation (OpenAI 2023), have led to an increase in the number of AI publications in the world (Zhang et al. 2022). The breakthrough in deep learning is only possible because of evolving hardware and software that allows the processing of big data sets efficiently. Further, most of the accuracy gains of these models result from increasingly complex models (Schwartz et al. 2019). From 2013 to 2019, the required computing power for training deep learning models increased by a factor of $300,000$ (Schwart 2019). Therefore, performance optimization of deep learning model training and inference is highly relevant. Profiling with tools such as DeepSpeed [1] and the in-build PyTorch Profiler [2] helps identify the existing model's bottlenecks. Different optimization strategies, such as data and model parallelism, could be applied depending on the profiling results. Further, tools such as PyTorch Lightning's trainer [3] and Horovod [4] can be tested to use the cluster's resources efficiently. [1] https://github.com/microsoft/DeepSpeed [2] https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html [3] https://lightning.ai/docs/pytorch/latest/accelerators/gpu_intermediate.html [4] https://github.com/horovod/horovod Dodge, Jesse et al. (2022). Measuring the Carbon Intensity of AI in Cloud Instances. doi: 10.48550/ARXIV.2206.05229. url: https://arxiv.org/abs/2206.05229. OpenAI (2023). GPT-4 Technical Report. arXiv: 2303.08774 [cs.CL]. Rombach, Robin et al. (2022). “High-Resolution Image Synthesis with Latent Diffusion Models”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). url: https://github.com/CompVis/latent-diffusionhttps: //arxiv.org/abs/2112.10752. Schwartz, Roy et al. (2019). “Green AI”. In: CoRR abs/1907.10597. arXiv: 1907.10597. url: http://arxiv.org/abs/1907.10597. Zhang, Daniel et al. (2022). The AI Index 2022 Annual Report. arXiv: 2205.03468 [cs.AI].
Development of a Benchmarking Suite for Energy Efficiency in the Context of HPC CentersApply
In times of rising energy costs and climate change, energy efficiency is a high priority for HPC center operators. To be able to compare the energy efficiency of HPC centers with meaningful measurements, we need standardized KPI's and benchmarking tools. In order to reliably measure different aspects of energy efficiency in HPC centers, a benchmarking suite, which can include a variety of benchmarks and can easily be deployed on any HPC system, should be developed.
Data Analysis Pipeline for HPC Monitoring DataApply
High-Performance Computing (HPC) systems generate vast amounts of monitoring data, which, when effectively analyzed, can provide critical insights into system performance, resource utilization, and potential issues. This project delves into the creation and implementation of a data analysis pipeline tailored for HPC monitoring data. The key components and stages of the pipeline, including data collection, preprocessing, storage solutions, and advanced analytics will be studied.
AI-Powered Penetration TestingApply
This thesis focuses on developing an AI-driven penetration testing framework. The project integrates both beginner and expert ethical hacking projects to provide a comprehensive understanding of cybersecurity.
Secure Real-Time Inference on Medical Images: A Comparative Study on High-Performance Computing SystemsApply
This master's thesis explores the cutting-edge domain of real-time medical image processing, focusing on secure inference using MRI/CT scan data. The study encompasses segmentation/detection/classification of MRI/CT data, leveraging datasets from UMG and other public sources. The core of the research involves training deep learning models on the SCC/Grete cluster, followed by real-time inference on various high-performance computing (HPC) systems. A comparative analysis of inference performance across these HPC systems forms a crucial part of this investigation. The thesis aims to contribute significant insights into the optimization of real-time medical image processing in secure environments, adhering to stringent data privacy standards. This research necessitates a master’s student with a background in applying deep learning to image data and some proficiency in PyTorch or TensorFlow.
Behavioral Anomaly Detection in HPC Systems Using Machine LearningApply
This research project presents an exciting opportunity to innovate in the realm of cybersecurity within HPC systems. The goal is to design and implement a cutting-edge system that leverages sophisticated machine learning algorithms to detect malicious user behavior effectively. By analyzing user behavior on the system, one can uncover patterns that signify potential threats and develop a high-accuracy anomaly detection system that enhances the security landscape of HPC environments. Join us in safeguarding critical infrastructures through advanced intelligence.
AI-Based Approach to Firewall Rule Refinement on HPC Service NetworksApply
Dive into the fascinating intersection of AI and network security through this innovative research project! This study aims to harness the power of machine learning to refine and optimize firewall rules within HPC systems. You will analyze vast network traffic data to identify and respond to patterns of malicious activity, ultimately creating a more robust firewall policy framework. Join us to learn and improve the security protocols of HPC networks, ensuring they are effective and adaptive to the ever-evolving threat landscape.
Assessing the Viability of Wazuh as a SIEM Solution for HPCApply
Embark on a critical evaluation project that explores the integration of Wazuh as a Security Information and Event Management (SIEM) solution in HPC systems. This research will assess whether Wazuh can seamlessly coexist with HPC environments while preserving performance integrity. You will analyze its ability to deliver actionable insights into potential security threats and contribute significantly to enhancing cybersecurity frameworks. Be part of a pioneering investigation that could redefine security management in HPC and cloud computing!
Preliminary Analysis of Log Data for Enhanced Model Selection in HPCApply
In HPC environments, log files generated by various systems contain critical insights essential for performance monitoring, anomaly detection, and operational optimization. However, the vast volume and heterogeneous nature of log data can pose significant challenges for data scientists seeking to effectively analyze this information. This project aims to conduct a preliminary analysis of log data to equip data scientists with a clearer understanding of the dataset, facilitating informed model selection and development for subsequent analysis tasks. The project will commence by collecting log data and performing data parsing to convert raw log entries into structured formats, identifying key attributes relevant for analysis. Next, exploratory data analysis (EDA) [1] will be implemented to identify patterns, outliers, and statistical distributions within the log data. Key activities will include visualizing log frequency distributions, analyzing temporal trends, and performing keyword analysis to uncover common issues and behaviors. To enhance the understanding of log structure and content, we will categorize log entries based on severity and extract important features, such as timestamps, error codes, and operation types. This feature engineering will provide a clearer view of the log attributes that are most relevant for subsequent modeling efforts. Ultimately, the output of this project will serve as a foundational analysis report, equipping data scientists with essential insights and recommendations for model selection, enhancing their ability to derive meaningful conclusions from the log data in future analyses [1] https://link.springer.com/content/pdf/10.1007/978-3-319-43742-2_15.pdf
Facilitate Fastest Data Exchange between Containers in a Single Pod via IPC (Shared memory) in Kubernetes.Apply
Shared memory is a powerful and efficient interprocess communication (IPC) mechanism in the Linux operating system that enables multiple processes to access and manipulate a common block of memory. By placing shared structures and data in shared memory segments, processes can efficiently collaborate without requiring complex data transfer methods such as message passing or file I/O. This approach is the fastest form of IPC because no kernel involvement is necessary when data is passed between processes. A shared memory segment can be created by one process and subsequently accessed, read, or written by any number of other processes. This direct and rapid communication method is ideal for scenarios requiring the sharing of large data sets or intensive collaboration between processes. Crucially, shared memory eliminates the need for data copying, significantly reducing overhead. Shared memory is widely adopted in high-performance applications, such as databases and scientific computing. Common implementations often use custom-built solutions in C (using OpenMPI) or C++ (leveraging Boost libraries). Modern tools like eBPF (Extended Berkeley Packet Filter) can further enhance shared memory systems by monitoring system calls such as shmget, shmat, and shmdt. eBPF enables fine-grained tracking of shared memory operations, providing real-time insights into performance, latency, and potential bottlenecks. Initial benchmarks demonstrate that shared memory significantly outperforms standard data transfer methods in terms of speed, especially for large datasets, due to its zero-copy nature and lack of kernel overhead. By integrating eBPF for monitoring and benchmarking, developers can optimize shared memory usage and make informed decisions when comparing it to alternative IPC mechanisms.
Browser RAG (Retrieval-Augmented Generation)Apply
AI models (LLMs) can be used for inference based on pretrained data. Each LLM has a knowledge cut-off. As well as, not being trained on domain-specific data. When there is a lack of big data, finetuning the model parameters seems useless and can be inefficient (cf. LoRA), RAG makes use of the context of the LLM which nowadays covers up to 132k tokens to teach the LLM in the recent memory, some fact about our domain-specific data, which downstream can be converted back into input for downstream LLMs in the GWDG HPC data center, where the data lives only in memory and is never stored, even in anonymized fashion. The problem is, this data may be confidential and therefore should not leave the local machine. Installing software on the local machine can also be hard / cumbersome for tech-newbie users. Therefore, using the browser to make use of the online resources would be one desirable modality. We would like to make use of technologies including but not limited to: web assembly, C/C++/Rust/Go, WebGL/GPU to make this indexing engine .The next step in the thesis would be to investigate into vLLM to figure out the effects of embedding compatibility between RAG and LLM model, and whether conversion from index to text before downstream LLM is triggered, would be a viable option or not?
Developing an Inference Engine with WebGPUApply
WebGPU is an emerging graphics API for modern GPU use in the browser. In this thesis the potential of it for AI inference directly in the browser is explored.
Utilising Apache TVM to speed up AI inferenceApply
Apache TVM is a machine learning compiler framework. In this work, the performance of it is evaluated with respect to alternative options.
Evaluating network protocols for AI inferenceApply
The defacto standard API for LLM inference tasks is the OpenAI API. However, it is not optimal with regard to performance and other characteristics. In this work, other existing and novel protocol designs for common AI inference tasks are explored and evaluated.