Seminar: Newest Trends in High-Performance Data Analytics
High-Performance Data Analytics is a vehicle to extract findings from large data sets. It is an indispensable tool in science and business but a rapidly changing field. As part of this seminar, you will create a presentation and report revolving around a selected hot topic in German or English. You will learn to research literature and may conduct small experiments to provide a holistic view of the selected topic. You will meet regularly with an assigned supervisor and work towards the presentation and report.
Key information
Contact | Julian Kunkel, Jonathan Decker | ||
Location | Virtual | ||
Time | Thursday 16:15-17:45 | ||
Language | English or German (individual presentation) | ||
Module | M.Inf.1237: Seminar Neueste Trends in High-Performance Data Analytics | ||
SWS | 2 | ||
Credits | 5 | ||
Contact time | 28 hours | ||
Independent study | 122 hours |
As part of this seminar, you will create a presentation (and report) revolving around a research topic in German or English (your choice!). Therefore, you will meet regularly with an assigned supervisor and work towards the presentation and report.
This seminar is also available as a pro-seminar. As pro-seminar, the focus will be on learning presentation techniques while in the seminar your focus must be on presenting scientific facts and leading a scientific discussion. There are also two additional mandatory sessions for pro-seminar attendees (optional for seminar attendees).
The presentation time is 35 minutes (plus discussion). A short report accompanying the slides is expected (max 15 pages).
Please note that we plan to record sessions (lectures and seminar talks) with the intent of providing the recordings via BBB to other students but also to publish and link the recordings on YouTube for future terms. If you appear in any of the recordings via voice, camera or screen share, we need your consent to publish the recordings. See also this Slide.
Learning Objectives
- Appraise research in the area of high-performance data analytics
- Compose a presentation covering their selected topic in depth
- Evaluate findings (tools or theory) of other researchers
- Explain theory and application covering their topic
Topics
This is the list of topics that we will assign to students during the first meeting. You will have some room for developing the topic in the direction of your choice. Feel free to propose your own great topic.
- Retrieval Augmented Generation (RAG) State-of-the-art and use cases
- LLM Open Source Agents
- LLM Benchmarking Frameworks and their limitations
- LLM Inference Optimization techniques
- LLM Compression and Quantization Techniques
- LLM Trustworthiness and Fact Validation
- Impact of GIL-less Cpython on performance and compatiblity
- Parallelization of async Python with trio-parallel
- Machine Learning for Predictive Maintenance on a HPC Single Node
- Understanding GPU performance, e.g., using MLCommons ML Benchmarks
- Confidential Computing (HPC/Cloud)
- Python Performance Optimization leveraging Native Implementations (Numba/CPython/PyO3/Nukita/transpyle)
- AI for monitoring
- Neuromorphic Computing
- Effective intrusion detection systems (IDS) Strategies in HPC Environments
- The compute continuum - IoT, edge and HPC computing
- Use cases for integration of edge and IoT with HPC simulations
- AI and HPDA use cases for critical infrastructure from the medical and energy domains
- Data management concepts in HPC - potential of data lakes and data warehousing
- Scalable quantum computer simulation on HPC systems
- FPGA Computing with SciEngine
- RISC-V: State of the union
- Regression Testing for HPC
- Global Optimization (of Clusters) with Genetic Algorithms
- RUST Programming for HPC application
- Benchmarking of HPC Systems
- Security in Cloud and HPC
- Infiniband DPU
- DevOps strategies in HPC
- What's new in the Kubernetes ecosystem
- Containers in HPC
- Function-as-a-service in HPC
- Object storage systems
- Encryption tools
- Scalable databases with e.g., Elasticsearch, Postgres
- Kernel compilation and configuration
- Berkeley Packet Filters (eBPF)
- Forensic tools
- Distributed computing paradigms in Cloud and HPC
- Detection of AI generated content
- Service Discovery and Traffic Management in Cloud Applications
Examination
The exam is conducted as part of the presentation (50% of the mark) and report (50%). The focus for pro-seminars lies in the effective presentation while the focus for seminars is the depth of the scientific topic (slightly different marking schemes).
Agenda
- 24.10.24 Introduction & Scientific Presentation – Julian Kunkel, Jonathan Decker
If you cannot attend contact us asap! - 31.10.24 Holiday
- 01.11.24 You have submitted your selected topic by email to jonathan.decker@uni-goettingen.de
- 07.11.24 LaTeX Crash Course & Scientific Writing – Julian Kunkel, Jonathan Decker
- Introduction to LaTeX Slides
- Showcasing our LaTeX templates https://hps.vi4io.org/teaching/ressources/start#templates
- Talk: Scientific Writing Slides
- 08.11.24 You have been assigned a supervisor and presentation date
- 14.11.24 Effective Literature Search & Discussion of example reports – Julian Kunkel, Jonathan Decker
- Talk: Effective Literature Search Slides
- Discussion of example reports from previous semesters
21.11.24 Student presentations- 28.11.24 Student presentations
- 05.12.24 Student presentations
- 12.12.24 Student presentations
- 19.12.24 Student presentations
- 09.01.25 Student presentations
- 16.01.25 Student presentations
- Friedrich Schwarz: Dask and Zarr vs. HDF5 in the context of Neuroscience
- 23.01.25 Student presentations
- Frederik Hennecke: Python Performance Optimization leveraging Native Implementations (Numba/CPython/PyO3/Nukita/transpyle)
- Anila Ghazanfar: Scalable databases with e.g., Elasticsearch, Postgres
- 30.01.25 Student presentations
- 06.02.25 Student presentations
- Backup
- 31.03.25 Deadline for the submission of the report
Topic Distribution
Student | Supervisor | Topic | Submissions | ||||
Your Name | Your Supervisor | Your Topic | Report | ||||
Friedrich Schwarz | Zoya Masih | Dask and Zarr vs. HDF5 in the context of Neuroscience | |||||
Frederik Hennecke | Lars Quentin | Python Performance Optimization leveraging Native Implementations (Numba/CPython/PyO3/Nukita/transpyle) | |||||
Anila Ghazanfar | Aasish Kumar Sharma | Scalable databases with e.g., Elasticsearch, Postgres |