HPC-IODC: HPC I/O in the Data Center Workshop

head1.jpg

Logo

Managing scientific data at large scale is challenging for scientists but also for the host data center. The storage and file systems deployed within a data center are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications.

With new storage technologies and layers in the memory hierarchy, the picture is becoming murkier. To effectively manage the data load within a data center, I/O experts must understand how users expect to use these new storage technologies and what services they should provide in order to enhance user productivity. We seek to ensure a systems-level perspective is included in these discussions.

In this workshop, we bring together I/O experts from data centers and application workflows to share current practices for scientific workflows, issues, and obstacles for both hardware and the software stack, and R&D to overcome these issues.

The workshop content is built on two tracks with calls for papers/talks:

  • research paper track – requesting submissions regarding state-of-the-practice and research about I/O in the datacenter (see our topic list below).
  • talks from I/O experts – requesting submissions of talks.

Contributions to both tracks are peer-reviewed and require submission of the respective research paper or idea for your presentation via Easychair (see the descriptions below).

The sessions are jointly organized with the Workshop on Performance and Scalability of Storage Systems (WOPSSS) hosting performance-oriented research papers.

The workshop is held in conjunction with the ISC-HPC during the ISC workshop day. Note that the attendance of ISC workshops requires a workshop pass. See also our last year's workshop web page.

Date Thursday, June 20th, 2019
Venue Marriott Hotel, Frankfurt, Germany, Details about the ISC-HPC venue
Contact Dr. Julian Kunkel

This workshop is powered by the Virtual Institute for I/O and ESiWACE 1).

The workshop is organized by

Agenda

We will publish links to the papers on the page.

  • 09:00 WelcomeJulian KunkelSlides
  • 09:10 Data management session – chair: Jay Lofstead
    • 09:10 Data-Centric I/O and Next Generation Interfaces
      Julian KunkelSlides
    • 09:30 Adventures in NoSQL for Metadata Management
      Jay Lofstead, Ashleigh Ryan and Margaret LawsonSlides
    • 10:00 Towards High Performance Data Analytics for Climate Change
      Sandro Fiore, Donatello Elia, Cosimo Palazzo, Fabrizio Antonio, Alessandro D’Anca, Ian Foster and Giovanni AloisioSlides
    • 10:30 Mediating data center storage diversity in HPC applications with FAODEL
      Patrick Widener, Craig Ulmer, Scott Levy, Gary Templet and Todd KordenbrockSlides
  • 11:00 Coffee break
  • 11:30 Expert talks – chair: Julian Kunkel
    • 11:30 An overview of the storage and post-processing environment at RIKEN R-CCS
      Jorji NonakaSlides
    • 12:00 Running HPC-like workloads on the public cloud
      Vinay GaonkarSlides
    • 12:30 An I/O analysis of HPC workloads on CephFS and Lustre
      Alberto Chiusole, Stefano Cozzini, D. van der Ster, M. Lamanna, G. GiulianiSlides
  • 13:00 Lunch break
  • 14:00 Research paper session – chair: Jean-Thomas Acquaviva
    • 14:00 Lustre - the next 20 years
      Andreas DilgerSlides
    • 14:30 Media-Based Work Unit – Jianshen Liu, Philip Kufeldt and Carlos MaltzahnSlides
    • 15:00 Tracking User-Perceived I/O Slowdown via Probing
      Julian Kunkel and Eugen BetkeSlides
    • 15:30 A quantitative approach to architecting all-flash Lustre file systems
      Glenn Lockwood, Kirill Lozinskiy, Lisa Gerhardt, Ravi Cheema, Damian Hazen and Nicholas WrightSlides
  • 16:00 Coffee break
  • 16:30 Research paper session – chair: Jay Lofstead
    • 16:30 An Architecture for High Performance Computing and Data Systems using Byte-Addressable Persistent Memory
      Adrian Jackson, Michele Weiland, Mark Parsons and Bernhard HomoelleSlides
    • 17:00 Predicting File Lifetimes With Convolutional Neural Networks
      Florent MonjaletSlides
    • 17:20 Footprinting Parallel I/O – Machine Learning to Classify Application’s I/O Behavior
      Eugen Betke and Julian KunkelSlides
    • 17:40 Conclusion and Discussion
  • 18:00 End
  • Anthony Kougkas (Illinois Institute of Technology)
  • Suzanne McIntosh (New York University)
  • Jay Lofstead (Sandia National Laboratories)
  • George S. Markomanolis (Oak Ridge National Laboratory)
  • Suren Byna (Lawrence Berkeley National Laboratory)
  • Adrian Jackson (The University of Edinburgh)
  • Javier Garcia Blas (Carlos III University)
  • Bing Xie (Oak Ridge National Lab)
  • Sandro Fiore (CMCC)
  • Glenn Lockwood (Lawrence Berkeley National Laboratory)
  • Michael Kluge (TU Dresden)
  • Jean-Thomas Acquaviva (DDN)
  • Robert Ross (Argonne National Laboratory)
  • Wolfgang Frings (Juelich Supercomputing Centre)
  • Feiyi Wang (Oak Ridge National Laboratory)
  • Thomas Boenisch (High performance Computing Center Stuttgart)
  • Matthew Curry (Sandia National Laboratories)

Participation

The workshop is integrated into ISC-HPC. We welcome everybody to join the workshop, including:

  • I/O experts from data centers and industry.
  • Researchers/Engineers working on high-performance I/O for data centers.
  • Interested domain scientists and computer scientists interested in discussing I/O issues.
  • Vendors are also welcome, but their presentations must align with data center topics (e.g. how do they manage their own clusters) and not focus on commercial aspects.

The call for papers and talks is already open. We accept early submissions, too, and typically proceed with them within 45 days. We particularly encourage early submission of abstracts such that you indicate your interest in submissions.

You may be interested to join our mailing lists at the Virtual Institute of I/O.

We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions data center. Note that such presentations should cover the topics mentioned below.

The research track accepts papers covering state-of-the-practice and research dedicated to storage in the datacenter. We accept papers with up to 12 pages (excl. references) in LNCS format (upon request, we may allow a small extension). Please submit your paper anonymously for blind review, i.e., remove references to you (see here). Please see the instructions and templates for authors provided by Springer.

Our targeted proceedings are ISC's post-conference workshop proceedings in Springers LNCS. We use Easychair for managing the proceedings and PC interaction.

For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers). We also allow virtual participation (without attending the workshop personally). All relevant work in the area of data center storage will be able to publish with our joint workshop proceedings, we just believe the available time should be used best to discuss controversial topics.

Topics

The relevant topics for papers cover all aspects of data center I/O including:

  • application workflows
  • user productivity and costs
  • performance monitoring
  • dealing with heterogeneous storage
  • data management aspects
  • archiving and long term data management
  • state-of-the practice (e.g., using or optimizing a storage system for data center workloads)
  • research that tackles data center I/O challenges

Paper Deadlines

  • Submission deadline: 2019-04-19 AoE
    • Note: The call for papers and talks is already open.
    • You can submit an abstract anytime.
    • We also appreciate early full submissions, too, and typically review with them within 45 days.
  • Author notification: 2019-05-03
  • Pre-final submission: 2019-06-10 (to be shared during the workshop)
  • Workshop: 2019-06-20
  • Camera-ready papers2): 2019-07-21 – As they are needed for ISC's post-conference workshop proceedings. We embrace the chance for authors to improve their papers based on the feedback received during the workshop.

Review Criteria

The main acceptance criteria is the relevance of the approach to be presented – i.e., is the core idea worthwhile in the community to be discussed or novel. Since the camera-ready version of the papers is due after the workshop, we pursue two rounds of reviews:

  1. Acceptance for the workshop (as a talk)
  2. Acceptance as a paper *after* the workshop, this incorporates feedback from the workshop.

After the first review, all papers undergo a shepherding process.

The topics of interest in this track include but are not limited to:

  • A description of the operational aspects of your data center
  • A particular solution for certain data center workloads in production

We also accept industry talks, given that they focus on operational issues on data centers and omit marketing.

We use Easychair for managing the acceptance and PC interaction. If you are interested to participate please submit a short (1/2 page) intended abstract of your talk together with a short Bio.

Deadlines for the submission of the abstract

  • Submission deadline: 2019-04-19 AoE
  • Author notification: 2019-05-03

Content

The following list of items should be tried to be integrated into a talk covering your data center, if possible. We hope your sites administrator will support you to gather the information with little effort.

  1. Workload characterization
    1. Scientific Workflow (give a short introduction)
      1. A typical use-case (if multiple are known, feel free to present more)
      2. Involved number of files / amount of data
    2. Job mix
      1. Node utilization (rel. to peak-performance)
  2. System view
    1. Architecture
      1. Schema of the client/server infrastructure
        1. Capacities (Tape, Disk, etc.)
      2. Potential peak-performance of the storage
        1. Theoretical
        2. Optional: performance results of acceptance tests.
      3. Software / Middleware used, e.g. NetCDF 4.X, HDF5, …
    2. Monitoring infrastructure
      1. Tools and systems used to gather and analyse utilization
    3. Actual observed performance in production
      1. Throughput graphs of the storage (e.g. from Ganglia)
      2. Metadata throughput (Ops/s)
    4. Files on the storage
      1. Number of files (if possible per file type)
      2. Distribution of file sizes
  3. Issues / Obstacles
    1. Hardware
    2. Software
    3. Pain points (what is seen as the biggest problem(s) and suggested solutions, if known)
  4. Conducted R&D (that aim to mitigate issues)
    1. Future perspective
    2. Known or projected future workload characterization
    3. Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community
    4. Ideal system characteristics and how it addresses current problems or challenges
    5. what hardware should be added
    6. what software should be developed to make things work better (capabilities perspective)
    7. Items requiring discussion to work through how to address
1)
ESiWACE has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 675191
2)
tentative