Table of Contents

HPC-IODC: HPC I/O in the Data Center Workshop

head1.jpg

Logo

Due to COVID-19, the workshop will be organized as a free virtual event using video conferencing, the videos of the presentations will be published on this page.

Managing scientific data at a large scale is challenging for both scientists and the host data centre.

The storage and file systems deployed within a data centre are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications.

With new storage technologies and layers in the memory hierarchy, the picture is becoming even murkier. To effectively manage the data load within a data centre, I/O experts must understand how users expect to use the storage and what services they should provide to enhance user productivity.

In this workshop, we bring together I/O experts from data centres and application workflows to share current practices for scientific workflows, issues, and obstacles for both hardware and the software stack, and R&D to overcome these issues. We seek to ensure that a systems-level perspective is included in these discussions.

The workshop content is built on two tracks with calls for papers/talks:

We are excited to announce that research papers will be published in Springer LNCS open access and extended manuscripts in the Journal of High-Performance Storage as well. Contributions to both tracks are peer-reviewed and require submission of the respective research paper or idea for your presentation via Easychair (see the complete description in Track: Research Papers).

The workshop is held in conjunction with the ISC-HPC during the ISC workshop day. Note that the attendance to ISC workshops requires a workshop pass. See also our last year's workshop web page.

Date Thursday, June 25th, 2020
Venue Virtual Event (the free registration is required)
Contact Dr. Julian Kunkel

This workshop is powered by the Virtual Institute for I/O, the Journal of High-Performance Storage, ESiWACE 1).

Organisation

The workshop is organised by

Agenda

The videos are available in YouTube. Please see our workshop summary paper.

Times are listed in BST (GMT+1), CEST is +1 hour, -6 hours for US Central (CDT)

Registration

We will provide the link for the video conference to registered attendees. Fill the linked form, to register for the workshop.

Program Committee

Participation

The workshop is integrated into ISC-HPC. We welcome everybody to join the workshop, including:

The call for papers and talks is already open. We accept early submissions and typically proceed with them within 45 days. We particularly encourage early submission of abstracts such that you indicate your interest in submissions.

You may be interested in joining our mailing lists at the Virtual Institute for I/O.

We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions' data centre. Note that such presentations should cover the topics mentioned below.

CFP text

Track: Research Papers

The research track accepts papers covering state-of-the-practice and research dedicated to storage in the data centre.

Proceedings will appear in ISC's post-conference workshop proceedings in Springers LNCS. Extended versions have a chance for acceptance in the first issue of the JHPS journal. We will apply the more restrictive review criteria from JHPS and use the open workflow of the JHPS journal for managing the proceedings. For interaction, we will rely on Easychair, so please submit the metadata to EasyChair before the deadline.

For the workshop, we accept papers with up to 12 pages (excluding references) in LNCS format. You may already submit an extended version suitable for the JHPS in JHPS format. Upon submission, please indicate potential sections for the extended version (setting a light red background colour). The JHPS template can be easily converted to the LNCS Word format such that the effort is minimal for the authors to obtain both publications. Alternatively, you can use the Springer LNCS LaTeX or Word template and convert it to a Google Doc. See the Manuscript Preparation, Layout & Templates, Springer.

For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers). We also allow virtual participation (without attending the workshop personally). All relevant work in the area of data centre storage will be published with our joint workshop proceedings. We just believe the available time should be used best to discuss controversial topics.

Topics

The relevant topics for papers cover all aspects of data centre I/O, including:

Paper Deadlines

Review Criteria

The main acceptance criterion is the relevance of the approach to be presented, i.e., the core idea is novel and worthwhile to be discussed in the community. Considering that the camera-ready version of the papers is due after the workshop, we pursue two rounds of reviews:

  1. Acceptance for the workshop (as a talk).
  2. Acceptance as a paper *after* the workshop, incorporating feedback from the workshop.

After the first review, all papers undergo a shepherding process.

The criteria for The Journal of High-Performance Storage are described on its webpage.

Track: Talks by I/O Experts

The topics of interest in this track include, but are not limited to:

We also accept industry talks, given that they are focused on operational issues on data centres and omit marketing.

We use Easychair for managing the interaction with the program committee. If you are interested in participating, please submit a short (1/2 page) intended abstract of your talk together with a brief Bio.

Abstract Deadlines

Content

The following list of items should be tried to be integrated into a talk covering your data centre, if possible. We hope your site's administrator will support you to gather the information with little effort.

  1. Workload characterisation
    1. Scientific Workflow (give a short introduction)
      1. A typical use-case (if multiple are known, feel free to present more)
      2. Involved number of files/amount of data
    2. Job mix
      1. Node utilisation (related to peak-performance)
  2. System view
    1. Architecture
      1. Schema of the client/server infrastructure
        1. Capacities (Tape, Disk, etc.)
      2. Potential peak-performance of the storage
        1. Theoretical
        2. Optional: Performance results of acceptance tests.
      3. Software/Middleware used, e.g. NetCDF 4.X, HDF5, …
    2. Monitoring infrastructure
      1. Tools and systems used to gather and analyse utilisation
    3. Actual observed performance in production
      1. Throughput graphs of the storage (e.g., from Ganglia)
      2. Metadata throughput (Ops/s)
    4. Files on the storage
      1. Number of files (if possible, per file type)
      2. Distribution of file sizes
  3. Issues/Obstacles
    1. Hardware
    2. Software
    3. Pain points (what is seen as the most significant problem(s) and suggested solutions, if known)
  4. Conducted R&D (that aim to mitigate issues)
    1. Future perspective
    2. Known or projected future workload characterisation
    3. Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community
    4. Ideal system characteristics and how it addresses current problems or challenges
    5. What hardware should be added
    6. What software should be developed to make things work better (capabilities perspective)
    7. Items requiring discussion
1)
ESiWACE is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 823988.
2) , 3)
Anywhere on Earth
4)
tentative