====== Seminar: Newest Trends in High-Performance Data Analytics ======
High-Performance Data Analytics is a vehicle to extract findings from large data sets. It
is an indispensable tool in science and business but a rapidly changing field. As part of
this seminar, you will create a presentation and report revolving around a selected hot
topic in German or English. You will learn to research literature and may conduct small
experiments to provide a holistic view of the selected topic. You will meet regularly with
an assigned supervisor and work towards the presentation and report.
===== Key information =====
|| Contact || [[about:people:julian_kunkel|Julian Kunkel]], [[about:people:jonathan_decker|Jonathan Decker]] ||
|| Location || [[https://meet.gwdg.de/b/jul-yha-uqh-vrl|Virtual]] ||
|| Time || Thursday 16:15-17:45 ||
|| Language || English or German (individual presentation) ||
|| Module || M.Inf.1237: Seminar Neueste Trends in High-Performance Data Analytics ||
|| SWS || 2 ||
|| Credits || 5 ||
|| Contact time || 28 hours ||
|| Independent study || 122 hours ||
As part of this seminar, you will create a presentation (and report) revolving around a research topic in German or English (your choice!).
Therefore, you will meet regularly with an assigned supervisor and work towards the presentation and report.
This seminar is also available as a pro-seminar.
As pro-seminar, the focus will be on learning presentation techniques while in the seminar your focus must be on presenting scientific facts and leading a scientific discussion.
There are also two additional mandatory sessions for pro-seminar attendees (optional for seminar attendees).
The presentation time is 35 minutes (plus discussion).
A short report accompanying the slides is expected (max 15 pages).
Please note that we plan to record sessions (lectures and seminar talks) with the intent of providing the recordings
via BBB to other students but also to publish and link the recordings on YouTube for future terms.
If you appear in any of the recordings via voice, camera or screen share, we need your consent to publish the recordings.
See also this {{ :teaching:templates:dataprivacy_student_notice_slide.pdf |Slide}}.
===== Learning Objectives =====
* Appraise research in the area of high-performance data analytics
* Compose a presentation covering their selected topic in depth
* Evaluate findings (tools or theory) of other researchers
* Explain theory and application covering their topic
===== Topics =====
This is the list of topics that we will assign to students during the first meeting.
You will have some room for developing the topic in the direction of your choice.
Feel free to propose your own great topic.
* Understanding GPU performance e.g. using MLCommons ML Benchmarks
* Usage of data lakes and/or data warehouses
* The compute continuum - IoT, edge and HPC computing
* Use cases for integration of edge and IoT with HPC simulations
* AI and HPDA use cases for critical infrastructure from the medical and energy domains
* Data management concepts in HPC - potential of data lakes and data warehousing
* Scalable quantum computer simulation on HPC systems
* Seagate CORTX storage system
* FPGA Computing with SciEngine
* RISC-V: State of the union
* Regression Testing for HPC
* Global Optimization (of Clusters) with Genetic Algorithms
* Julia Programming Language for deep learning
* RUST Programming for HPC application
* Sustainability for data centers
* The HPC Community
* Benchmarking of HPC Systems
* History and Development of System Architectures
* Security in Cloud and HPC
* DevOps strategies in HPC
* Infiniband DPU
* Convergence of HPC and High-Performance Data Analytics
* Using Data Analytics in HPC Applications
* GPU Computing with Python
* Parallelization with Dask + Xarray
* What's new in the Kubernetes ecosystem (SEDNA, Volcana, ...)
* What's new with Spark
* What's new with Pytorch/Tensorflow
* Containers in HPC
* Webassembly for Function-as-a-service
* Function-as-a-service in HPC
* Key-value stores for HPDA
* Object storage systems
* HPDA Benchmarks
* Performance Analysis using Scalasca and Vampir
* Data Streaming and Workflows using Apache Airflow
===== Examination =====
The exam is conducted as part of the presentation (50% of the mark) and report (50%).
The focus for pro-seminars lies in the effective presentation while the focus for seminars is the depth of the scientific topic (slightly different marking schemes).
The presentation should cover 35 min and the report should be 10 to 15 pages (not counting cover, toc, appendix).
===== Agenda =====
* 11.04.24 **Introduction & Scientific Presentation** -- Julian Kunkel, Jonathan Decker \\ If you cannot attend contact us asap!
* Introduction of the course format and requirements
* Talk on Scientific Presentations
* Assignment of topics to the participants on a first-come-first-served basis
* Introduction {{ :teaching:summer_term_2024:nthpda-welcome.pdf |Slides}}
* Talk: Scientific Presentation {{ :teaching:summer_term_2024:scientific-presentation.pdf |Slides}}
* Recording: https://youtu.be/NrahVjkUFls?si=SLzX8dxGpfO3SdgW
* 18.04.24 **LaTeX Crash Course & Scientific Writing** -- //Julian Kunkel, Jonathan Decker//
* Introduction to LaTeX {{ :teaching:summer_term_2024:latex-intro.pdf |Slides}}
* Showcasing our LaTeX templates https://hps.vi4io.org/teaching/ressources/start#templates
* Talk: Scientific Writing {{ :teaching:summer_term_2024:scientific-writing.pdf |Slides}}
* 19.04.24 **You have submitted your selected topic by email to jonathan.decker@uni-goettingen.de**
* 25.04.24 **Effective Literature Search & Discussion of example reports** -- //Julian Kunkel, Jonathan Decker//
* Talk: Effective Literature Search {{ :teaching:summer_term_2024:scientific-literature.pdf |Slides}}
* Discussion of example reports from previous semesters
* 26.04.24 **You have been assigned a supervisor and presentation date**
* 06.06.24 **Student presentations**
* 13.06.24 **Student presentations**
* 20.06.24 **Student presentations**
* Yahya Raja - Usage of data lakes and/or data warehouses
* Ossama Bin Raza - Understanding GPU performance e.g. using MLCommons ML Benchmarks
* 27.06.24 **Student presentations**
* Mohd Uwaish - GPU Computing with Python
* Sunny Jain - History and Development of System Architectures
* 04.07.24 **Student presentations**
* Priyanshu Gupta - DevOps strategies in HPC
* 11.07.24 **Student presentations**
* Abdallah Abdelnaby - RISC-V: State of the union
* Lars Quentin - MPI-based Creation and Benchmarking of a Dynamic Elasticsearch Cluster (SCAP result presentation)
* Asmus Barth - Seagate CORTX storage system (SCAP result presentation)
* 30.09.24 **Deadline for the submission of the report**
===== Topic Distribution =====
| **Student** | **Supervisor** | **Topic** | **Submissions** |
| Yahya Raja | Aasish Sharma | Usage of data lakes and/or data warehouses | |
| Priyanshu Gupta | Mirac Aydin | DevOps strategies in HPC | |
| Ossama Bin Raza | Chirag Mandal | Understanding GPU performance e.g. using MLCommons ML Benchmarks | {{ :teaching:summer_term_2024:stud:nthpda:ossama-bin-raza-report.pdf |Report}} {{ :teaching:summer_term_2024:stud:nthpda:ossama-bin-raza-slides.pdf |Slides}} |
| Sunny Jain | Aasish Sharma | History and Development of System Architectures | {{ :teaching:summer_term_2024:stud:nthpda:sunny-jain-report.pdf |Report}} {{ :teaching:summer_term_2024:stud:nthpda:sunny-jain-slides.pptx |Slides}} |
| Mohd Uwaish | Michael B.Khani | GPU Computing with Python |
| Abdallah Abdelnaby | Freja Nordsiek | RISC-V: State of the Union | {{ :teaching:summer_term_2024:stud:nthpda:abdallah-abdelnaby-report.pdf |Report}} {{ :teaching:summer_term_2024:stud:nthpda:abdallah-abdelnaby-slides.pdf |Slides}} |