Understanding I/O Behavior in Scientific and Data-Intensive Computing

Philip Carns (Argonne National Laboratory, US)
Julian Kunkel (Gesellschaft f. wissenschaftl. Datenverarbeitung, DE)
Kathryn Mohror (LLNL – Livermore, US)
Martin Schulz (TU München, DE)

Two key changes are driving an immediate need for deeper understanding of I/O workloads in high-performance computing (HPC): applications are evolving beyond the traditional bulk-synchronous models to include integrated, multi-step workflows, in-situ analysis, AI, and data analytics methods; and storage systems designs are evolving beyond a two-tiered file system and archive model to complex hierarchies containing temporary, fast tiers of storage close to compute resources with markedly different performance properties. Both of these changes represent a significant departure from the decades-long status quo and require investigation from storage researchers and practitioners to understand their impacts on overall I/O performance. Without an in-depth understanding of I/O workload behavior, storage system designers, I/O middleware developers, facility operators, and application developers will not know how best to design or utilize the additional tiers for optimal performance of a given I/O workload.

The goal of this Dagstuhl Seminar is to bring together experts in I/O performance analysis and storage system architecture to collectively evaluate how our community is capturing and analyzing I/O workloads on HPC systems, identify any gaps in our methodologies, and determine how to develop a better, in-depth understanding of their impact on HPC systems. We expect our discussions to result in a) a set of common terminology across the community to describe I/O workloads; b) concrete recommendations to homogenize measurement and analysis of I/O workloads across centers; c) a roadmap showing how the collected I/O data can have practical impact for users; and d) a special issue of a journal documenting our findings and providing the needed outreach to the wider community.

In the seminar, we will discuss key topic areas and related questions aimed towards our goal of improving our understanding of HPC application I/O behavior. We anticipate the following topics and questions will generate lively discussions:

I/O workflow analysis: What data do we need to collect in order to understand I/O patterns? What analysis do we need to perform in order to know how to support these emerging I/O patterns?
Tools for I/O analysis: Are our current tools adequate for understanding I/O behavior? If not, what new capabilities do we need? How can we couple our tools to meet the needed capabilities? What lessons can we learn from instrumenting applications in the past that we can apply to our future endeavors?
Changing workloads and their requirements: How are workflows changing on HPC systems? How are their I/O patterns different than what we have seen in the past? How do we expect workflows to behave in the future?
Data center support: What data do HPC centers need to collect about their workloads to ensure they stay current with the I/O needs of their applications and workflows? What do HPC system administrators need to know to tune their systems for high I/O performance?
Storage system designs: Are there advanced storage system designs that could aid in improving the performance of anticipated future workflows? Can we influence the designs such that adequate I/O monitoring and analysis is built into the hardware?

The results of this seminar will have broad applicability for those interested in improving I/O performance of HPC applications, which is an often overlooked bottleneck in system efficiency. We anticipate that our meeting will spark long-term, international collaboration across HPC I/O performance researchers that share the goal of understanding and improving HPC I/O.

For more details, see the official Dagstuhl webpage.

Understanding I/O Behavior in Scientific and Data-Intensive Computing

Organization

Motivation