events:hamburg:2017:iodc

HPC-IODC: HPC I/O in the Data Center Workshop

Managing scientific data at large scale is challenging for scientists but also for the host data center. The storage and file systems deployed within a data center are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications.

With new storage technologies and layers in the memory hierarchy, the picture is becoming murkier. To effectively manage the data load within a data center, I/O experts must understand how users expect to use these new storage technologies and what services they should provide in order to enhance user productivity. We seek to ensure a systems-level perspective is included in these discussions.

In this workshop we bring together I/O experts from data centers and application workflows to share current practices for scientific workflows, issues and obstacles for both hardware and the software stack, and R&D to overcome these issues.

The workshop content is build on two tracks with calls for papers/talks:

research paper presentation – you'll need to submit a short paper regarding relevant research for I/O in the datacenter.
talks from I/O experts – you'll need to submit a rough outline for your talk.

Contributions to both tracks are peer reviewed and require submission of the respective research paper or idea for your presentation via Easychair (see the descriptions below).

The workshop is held in conjunction with the ISC-HPC during the ISC workshop day. Note that the attendance of ISC workshops requires a workshop pass. The HPC-IODC workshop is half day but embedded into a full-day program for I/O that we organize with the team of the WOPSSS. Our cooperation includes the alignment of sessions and the potential to shift papers between the two workshops. We encourage participants to take the opportunity to attend both events. See also our last year's workshop web page.

Date		Thursday June 22nd, 2017
Venue		Marriott Hotel, Frankfurt, Germany, Details about the ISC-HPC venue
Contact		Dr. Julian Kunkel

We stream the workshop on YouTube.

This workshop is powered by our partner workshop WOPSSS, the Virtual Institute for I/O and ESiWACE ¹⁾.

Program committee

Wolfgang Frings (Jülich Supercomputing Center)
Javier Garcia Blas (University Carlos III of Madrid)
Rob Ross (Argonne National Laboratory)
Carlos Maltzahn (University of California, Santa Cruz)
Thomas Boenisch (HLRS)
Sai Narasimhamurthy (Seagate)
Jean-Thomas Acquaviva (DDN)
Julian Kunkel (DKRZ, Germany)
Jay Lofstead (Sandia National Laboratory)
Colin McMurtrie (CSCS, Switzerland)

Agenda

09:00 Welcome – Julian Kunkel
Slides
09:10 Research paper session, chair Jay Lofstead
- 09:10 GPU Erasure Coding for Campaign Storage
  Walker Haddock, Matthew Curry, Purushotham Bangalore and Tony Skjellum
  Slides
- 09:20 Real-Time I/O-Monitoring of HPC Applications with SIOX, Elasticsearch, Grafana and FUSE
  Eugen Betke and Julian Kunkel
  Slides
- 09:30 Simulation of Hierarchical Storage Systems for TCO
  Jakob Luettgau and Julian Kunkel
  Slides
- 09:40 Characterizing Output Bottlenecks in a Supercomputer
  Bing Xie, Jay Lofstead, David Dillow, Sarp Oral, Scott Klasky and Jeff Chase
  Slides
- 09:50 PIOM-PX: A Framework for Modeling the I/O Behavior of Parallel Scientific Applications
  Pilar Gomez-Sanchez, Sandra Mendez, Dolores Rexachs and Emilo Luque
  Slides
10:00 Expert talk session 1, chair Julian Kunkel
- 10:00 The UK JASMIN Environmental Data Commons
  Bryan Lawrence
  Slides
- 10:20 ECMWF's IO Challenges and the path to Exascale Numerical Weather Prediction
  Tiago Quintino
  Slides
- 10:40 Extraordinary HPC file system solutions at KIT
  Roland Laifer
  Slides
11:00 Coffee break
11:30 Expert talk session 2, chair Colin McMurtrie
- 11:30 High Availability Operation of Parallel File Systems at the K computer
  Yuichi Tsujita
  Slides
- 11:45 Understanding Monitored I/O Patterns on LRZ HPC systems
  Sandra Mendez
  Slides
- 12:00 High-Performance Modelling and Simulation for Big Data Applications
  Clemens Grelck
  Slides
- 12:15 Catching rogue jobs before they overload the storage: the importance of I/O profiling
  Rosemary Francis
  Slides
12:30 Discussion: Benchmarking HPC Storage and the IO-500 – Jay Lofstead
13:00 End

Participation

The call for papers and talks is now closed. The workshop is integrated into ISC-HPC. We welcome everybody to join the workshop, including:

I/O experts from data centers and industry.
Researchers/Engineers working on high-performance I/O for data centers.
Interested domain scientists and computer scientists interested in discussing I/O issues.
Vendors are also welcome, but their presentations must align with data center topics (e.g. how do they manage their own clusters) and not focus on commercial aspects.

You may be interested to join our mailing lists HPC-IODC-16 which is open to discuss HPC-I/O topics or the Virtual Institute of I/O.

We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions data center. Note that such presentations should cover the topics mentioned below.

Track: research papers

We accept short papers with up to 12 pages (excl. references) in LNCS format. Please see the instructions and templates for authors provided by Springer.

Our targeted proceedings are ISC's post-conference workshop proceedings in Springers LNCS. We use Easychair for managing the proceedings and PC interaction.

For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers). We also allow virtual participation (without attending the workshop personally). All relevant work in the area of data center storage will be able to publish with our joint workshop proceedings, we just believe the available time should be used best to discuss ambivalent topics.

Paper Deadlines

Submission deadline: 2017-04-12 AoE
Author notification: 2017-04-25
Pre-final submission: 2017-06-10 (to be shared during the workshop)
Workshop: 2017-06-22
Camera-ready papers: 2017-07-22 – As they are needed for ISC's post-conference workshop proceedings. We embrace the chance for authors to improve their papers based on the feedback received during the workshop.

Track: Talks by I/O experts

The topics of interest in this track include but are not limited to:

A description of the operational aspects of your data center
A particular solution for certain data center workloads in production

We use Easychair for managing the acceptance and PC interaction. If you are interested to participate please submit a short (1/2 page) intended abstract of your talk together with a short Bio.

Deadlines for the submission of the abstract

Submission deadline: 2017-04-13 AoE
Author notification: 2017-04-25

Content

The following list of items should be tried to be integrated into a talk covering your data center, if possible. We hope your sites administrator will support you to gather the information with little effort.

Workload characterization
1. Scientific Workflow (give a short introduction)
  1. A typical use-case (if multiple are known, feel free to present more)
  2. Involved number of files / amount of data
2. Job mix
  1. Node utilization (rel. to peak-performance)
System view
1. Architecture
  1. Schema of the client/server infrastructure
    1. Capacities (Tape, Disk, etc.)
  2. Potential peak-performance of the storage
    1. Theoretical
    2. Optional: performance results of acceptance tests.
  3. Software / Middleware used, e.g. NetCDF 4.X, HDF5, …
2. Monitoring infrastructure
  1. Tools and systems used to gather and analyse utilization
3. Actual observed performance in production
  1. Throughput graphs of the storage (e.g. from Ganglia)
  2. Metadata throughput (Ops/s)
4. Files on the storage
  1. Number of files (if possible per file type)
  2. Distribution of file sizes
Issues / Obstacles
1. Hardware
2. Software
3. Pain points (what is seen as the biggest problem(s) and suggested solutions, if known)
Conducted R&D (that aim to mitigate issues)
1. Future perspective
2. Known or projected future workload characterization
3. Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community
4. Ideal system characteristics and how it addresses current problems or challenges
5. what hardware should be added
6. what software should be developed to make things work better (capabilities perspective)
7. Items requiring discussion to work through how to address

¹⁾

ESiWACE has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 675191

Table of Contents