BoF: Analyzing Parallel I/O
Abstract
Parallel I/O performance can be a critical bottleneck for applications, yet users are often ill-equipped for identifying and diagnosing I/O performance issues. Increasingly complex hierarchies of storage hardware and software deployed on many systems only compound this problem. Tools that can effectively capture, analyze, and tune I/O behavior for these systems empower users to realize performance gains for many applications.
In this BoF, we form a community around best practices in analyzing parallel I/O and cover recent advances to help address the problem presented above, drawing on the expertise of users, I/O researchers, and administrators in attendance.
The primary objectives of this BoF are to: 1) highlight recent advances in tools and techniques for monitoring I/O activity in data centers, 2) to discuss experiences and limitations of current approaches, 3) to discuss and derive a roadmap for future I/O tools with the goal to capture, assess, predict and optimize I/O.
The BoF is held in conjunction with the Supercomputing conference. The official announcement is listed here.
Date | 17.November 2022 | ||
Time | 12:15pm - 1:15pm CST | ||
Venue | C146, see the SC schedule for details |
The BoF is powered by the Virtual Institute for I/O and ESiWACE 1).
Organization
The BoF is organized by
- Shane Snyder (ANL, USA), ssnyder@mcs.anl.gov
- Julian Kunkel (Georg-August-Universität Göttingen/GWDG), julian.kunkel@gwdg.de
Agenda
We have a series of (8 minute) talks followed by a longer discussion:
- Welcome – Shane Snyder, Julian Kunkel
Slides - Detecting data races on relaxed systems using Recorder – Chen Wang (LLNL)
Slides - Non-Intrusive Monitoring and I/O Classification with IOFS – Christian Boehme (GWDG)
Slides - Monitoring with Vast – Rob Mallory (VAST)
Video - Visualizing I/O bottlenecks with DXT Explorer 2.0 – Jean-Luca Bez (LBL)
Slides
DXT Explorer is an interactive web-based log analysis tool to visualize Darshan DXT logs and aid in understanding the I/O behavior of scientific applications. In recent work, we have enriched DXT Explorer with novel visualizations toward detecting root causes of performance bottlenecks. By detecting and highlighting I/O phases, stragglers, and unbalanced workloads, we can guide users to solve I/O slowdowns when transferring data. Our tool is open-source and available at https://github.com/hpc-io/dxt-explorer. - Darshan I/O Runtime Monitoring – Ann Gentile (Sandia National Laboratories)
Slides - Panel and discussion –