Jonathan is a scientific employee of the Georg-August-University of Göttingen and a PhD student of Julian Kunkel.
He takes the role of a system architect and is focused on designing systems that enable new and novel ways of utilizing Cloud and HPC resources, while also being efficient, secure and scalable. Most notably, he strives to combine HPC with Kubernetes.
ORCID: 0000-0002-7384-7304
While vLLM is a widely spread inference backend engine for operating LLMs, there are alternative options that have the potential to deliver better performance by replacing or extending vLLM. Notable options are the Modular platform with MAX, ServerlessLLM and LMCache. Performance improvements may be limited to certain use cases. The overarching goal of this topic is to explore potential performance improvements for the Chat AI platform.
Projects such as K8sGPT as well as MCP servers for Kubernetes enable LLMs to directly interact with Kubernetes clusters. This project aims to explore how well it is possible to maintain a given Kubernetes cluster with LLM-based engineers to complete typical maintenance tasks such as adjusting workloads and migrating between versions.
As part of our goals for the SAIA platform, which operates Chat AI, we want it to operate with geo-redundancy such that even if a given geo-location experiences an outage, the service stays operational. To achieve this, a geo-redundancy engine should be prototyped, which can itself operate with multiple redundant instances and is able to synchronize service configurations across multiple geo-locations.
All publications as BibTex