Workshop: Data Lakes

Workshop: Data Lakes

Abstract

In recent years, classic HPC users have seen an ever-increasing interest in the public cloud that is used as part of traditional HPC workflows. There are many reasons for this, e.g. special hardware components such as TPUs or special GPUs are available in the cloud earlier than in a local data center. In addition, there is a need for users to store any data for analysis using AI methods in different data silos and to be able to access them flexibly from HPC and cloud systems. A central role for data analytics workflows is the flexible data migration and provision in the data lake. For this purpose, highly-scalable object storage has long been established in the cloud area, which is mostly used via an S3 interface. Another advantage from the user's point of view for a consistent data management strategy as offered by a data lake is the uniform and consistent view that it allows for the individual data silos.

This workshop aims to have a discussion with researchers that believe data lake solution can improve their projects workflows. At this event, we would like to share with you our services and future prospects regarding the data-lake pipeline.

But, more importantly, we like to have a fruitful discussion with you on how your ideas and your needs could be realized. Hence, we kindly invite you to contribute to this event by presenting a few slides about your goals and big data use case(s).

Our motto is: Let's build a bridge to the data lake together

Date		Friday, 12 November 2021
Time		14-18
Venue		Virtual, bbb room

This workshop is funded by the GWDG and supported by the NHR.

Organization

The workshop is organized by

Julian Kunkel (Georg-August-Universität Göttingen/GWDG), julian.kunkel@gwdg.de
Masoud Rezai (GWDG), masoud.rezai@gwdg.de
Alexander Goldmann, alexander.goldmann@gwdg.de

Agenda

Prospect agenda:

14:00 Welcome and motivation – Julian Kunkel, Piotr Kasprzak
Slides
- Short introductory round of all attendees
Presentation of individual use cases (~10 min per user presentation)
- CryoEM facility at UMG – Tat Cheng (UMG, Göttingen)
  Slides
- UMG-MeDIC: Establishing a Medical Research Data Sercive Unit – Markus Suhr (UMG, Göttingen)
  Slides
- Prediction of neurodevelopmental disorders in young children using multi sensory data analysis – Tomas Kulvicius (Uni Göttingen)
  Slides
15:00 GWDG data lake services and future plans
- 15:00 Activities at the GWDG – Julian Kunkel
- 15:10 Approaches for the GWDG data lake – Hendrik Nolte
  Slides
- 15:20 Outlook GWDG Infrastructure Development – Piotr Kasprzak
15:30 Break and networking
16:00 Groupwork: similarities of use cases and potential approaches
17:00 Concluding group discussion
17:30 Conclusions

Registration

Please register here. If you like to give a talk, please contact Julian Kunkel.

Table of Contents

Workshop: Data Lakes

Abstract

Organization

Agenda

Registration