HPC-IODC: HPC I/O in the Data Center Workshop
Abstract
Managing scientific data at a large scale is challenging for both scientists and the host data centre.
The storage and file systems deployed within a data centre are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications.
With new storage technologies and layers in the memory hierarchy, the picture is becoming even murkier. To effectively manage the data load within a data centre, I/O experts must understand how users expect to use the storage and what services they should provide to enhance user productivity.
In this workshop, we bring together I/O experts from data centres and application workflows to share current practices for scientific workflows, issues, and obstacles for both hardware and the software stack, and R&D to overcome these issues. We seek to ensure that a systems-level perspective is included in these discussions.
The workshop content is built on the tracks with calls for papers/talks (see our topic list):
Research paper track – Submit original research papers on state-of-the-practice and research related to I/O in the data center.
Talks from I/O experts – Share your experiences and solutions for data center workloads.
Student Mentoring Sessions – Students are encouraged to submit abstracts and receive constructive feedback from the community to advance their research.
We are excited to announce that research papers will be published in the Journal of High-Performance Storage as well.
Contributions to both tracks are peer-reviewed and require submission of the respective research paper or idea for your presentation via Easychair (see the complete description in Track: Research Papers).
The workshop is held in conjunction with the ISC-HPC during the ISC workshop day.
Note that attendance at ISC workshops requires a workshop pass.
See also our last year's workshop web page.
Date | Friday 2025-06-13 14:00-18:00 |
Venue | Hall X11, 1st floor, Hamburg, Germany |
Contact | Dr. Julian Kunkel |
This workshop is powered by the Virtual Institute for I/O and the Journal of High-Performance Storage.
Organisation
The workshop is organised by
Agenda
14:00 - Welcome – Julian Kunkel (GWDG/Uni Göttingen), Jay Lofstead (Sandia National Laboratories), Jean-Thomas Acquaviva (DDN)
Slides
14:10 - Experience with the first Scalable Storage Competition and S3 Summer School - Julian Kunkel
Slides
14:30 - I/O time prediction transfer learning - Radita Liem
Slides
15:00 - Optimizing the Longhorn Cloud-native Software Defined Storage Engine for High Performance - Konstantinos Kampadais, Antony Chazapis, Angelos Bilas
Slides
15:30 - Assessing Machine Learning I/O Behavior Using IO500 - Zoya Masih
Slides
16:00 - Break
16:30 - Enhancing Storage Semantics in MCSE - Sebastian Oeste
Slides
17:00 - The role of storage for AI workloads - Radu Stoica
Slides
As AI workloads rapidly evolve, they are reshaping expectations around storage functionality and interfaces. Traditional storage architectures are no longer sufficient to meet the demands of modern AI systems, particularly in enterprise environments where performance, scalability, and integration are crucial. In this talk, I will explore how storage systems can be reimagined to become an integral component of AI workflows. I will focus on two key areas. First, I will discuss content-aware functionality that enables storage systems to understand, index, and optimize access to data based on its content, thereby facilitating smarter data retrieval and integration with AI pipelines. Second, I will present the role storage can play in storing model KVCache state, providing persistent and low-latency support for key-value caches used in LLM inference, significantly accelerating model response times. For each of these areas, I will highlight current challenges and present our approach to addressing them, including architectural innovations and interface design strategies that enable AI-native storage capabilities.
17:30 -
Enhancing Parallel Computing CPU and I/O Performance through Malleable Resource Management -
David E. Singh
Slides
In the context of parallel computing, malleability refers to the ability to dynamically adjust computing resources during runtime to optimize performance and improve program efficiency. This talk presents a system that applies malleability at both the application and parallel file system levels. The developed environment combines traditional and machine learning prediction models, scalability models, and dynamic programming models to meet various computing and I/O performance criteria. The system was evaluated using EpiGraph, a parallel, data-intensive agent-based epidemiological simulator, and Hercules, an ad hoc in-memory parallel file system. Results show that the system can improve computing performance by over 80%, I/O performance by over 14%, reduce total execution time by 42%, and lower resource usage time by 36%.
Program Committee
Thomas Bönisch (HLRS)
Matthew Curry (Sandia National Laboratories)
Sandro Fiore (University of Trento)
Javier Garcia Blas (Carlos III University)
Adrian Jackson (The University of Edinburgh)
George S. Markomanolis (AMD)
Sandra Mendez (Barcelona Supercomputing Center (BSC))
Feiyi Wang (Oak Ridge National Laboratory)
Participation
The workshop is integrated into ISC-HPC.
We welcome everybody to join the workshop, including:
I/O experts from data centres and industry.
Researchers/Engineers working on high-performance I/O for data centres.
Domain scientists and computer scientists interested in discussing I/O issues.
Vendors are also welcome, but their presentations must align with data centre topics (e.g. how do they manage their own clusters) and not focus on commercial aspects.
The call for papers and talks is already open. We accept early submissions and typically proceed with them within 45 days.
We particularly encourage early submission of abstracts such that you indicate your interest in submissions.
We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions' data centre.
Note that such presentations should cover the topics mentioned below.
Call for Papers/Contributions (CfP)
Track: Research Papers
The research track accepts papers covering state-of-the-practice and research dedicated to storage in the data centre.
We will apply the review criteria from JHPS and use the open workflow of the JHPS journal for managing the proceedings. For interaction, we will rely on Easychair, so please submit the metadata to EasyChair before the deadline.
For the workshop, we accept papers with up to 12 pages (excluding references).
You may already submit an extended version suitable for the JHPS. There are two JHPS templates, a LaTeX and a Word template.
For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers).
All relevant work in the area of data centre storage will be published with our joint workshop proceedings. We just believe the available time should be used best to discuss controversial topics.
Topics
The relevant topics for papers cover all aspects of data centre I/O, including:
Application workflows
User productivity and costs
Performance monitoring
Dealing with heterogeneous storage
Data management aspects
Archiving and long term data management
State-of-the-practice (e.g., using or optimising a storage system for data centre workloads)
Research that tackles data centre I/O challenges
Cloud/Edge storage aspects
Application of AI methods in storage
Paper Deadlines
Review Criteria
The main acceptance criterion is the relevance of the approach to be presented, i.e., the core idea is novel and worthwhile to be discussed in the community.
Considering that the camera-ready version of the papers is due after the workshop, we pursue two rounds of reviews:
Acceptance for the workshop (as a talk).
Acceptance as a paper *after* the workshop, incorporating feedback from the workshop.
After the first review, all papers undergo a shepherding process.
The criteria for The Journal of High-Performance Storage are described on its webpage.
Track: Expert Talks
The topics of interest in this track include, but are not limited to:
We also accept industry talks, given that they are focused on operational issues on data centres and omit marketing.
We use Easychair for managing the interaction with the program committee.
If you are interested in participating, please submit a short (1/2 page) intended abstract of your talk together with a brief Bio.
Abstract Deadlines
Content
The following list of items should be tried to be integrated into a talk covering your data centre, if possible.
We hope your site's administrator will support you to gather the information with little effort.
Workload characterisation
Scientific Workflow (give a short introduction)
A typical use-case (if multiple are known, feel free to present more)
Involved number of files/amount of data
Job mix
Node utilisation (related to peak-performance)
System view
Architecture
Schema of the client/server infrastructure
Capacities (Tape, Disk, etc.)
Potential peak-performance of the storage
Theoretical
Optional: Performance results of acceptance tests.
Software/Middleware used, e.g. NetCDF 4.X, HDF5, …
Monitoring infrastructure
Tools and systems used to gather and analyse utilisation
Actual observed performance in production
Throughput graphs of the storage (e.g., from Ganglia)
Metadata throughput (Ops/s)
Files on the storage
Number of files (if possible, per file type)
Distribution of file sizes
Issues/Obstacles
Hardware
Software
Pain points (what is seen as the most significant problem(s) and suggested solutions, if known)
Conducted R&D (that aim to mitigate issues)
Future perspective
Known or projected future workload characterisation
Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community
Ideal system characteristics and how it addresses current problems or challenges
What hardware should be added
What software should be developed to make things work better (capabilities perspective)
Items requiring discussion
Track: Student Mentoring Sessions
To foster the next generation of data-related practitioners and researchers, students are encouraged to submit an abstract following the expert talk guidelines above as far as their research is aligned with these topics. At the workshop, the students will be given 10 minutes to talk about what they are working on followed by 10-15 minutes of conversation with the community present about how to further the work, what the impact could be, alternative research directions, and other topics to help the students progress in their studies. We encourage students to work with a shepherd towards a JHPS paper illustrating their research based on the feedback obtained during the workshop.