Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
events:2020:iodc [2020-01-14 11:14]
Julian Kunkel [Program committee]
events:2020:iodc [2020-07-02 13:02] (current)
Julian Kunkel
Line 5: Line 5:
 {{ :​events:​2019:​hpc-iodc-logo.png?​200&​nolink|Logo}} {{ :​events:​2019:​hpc-iodc-logo.png?​200&​nolink|Logo}}
  
-Managing scientific data at large scale is challenging for scientists but also for the host data center. +<color #​ed1c24>​Due to COVID-19, ​the workshop will be organized as free virtual event using video conferencing,​ the videos of the presentations will be published on this page.</​color>​
-The storage and file systems deployed within ​data center are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications+
  
-With new storage technologies ​and layers in the memory hierarchy, the picture is becoming murkier. To effectively manage ​the data load within a data center, I/O experts must understand how users expect to use these new storage technologies and what services they should provide in order to enhance user productivity. We seek to ensure a systems-level perspective is included in these discussions.+Managing scientific data at a large scale is challenging for both scientists ​and the host data centre.
  
-In this workshop, we bring together I/O experts from data centers ​and application workflows to share current practices for scientific workflows, issues, and obstacles for both hardware and the software stack, and R&D to overcome these issues. ​+The storage and file systems deployed within a data centre are expected to meet users' requirements for data integrity and high performance across heterogeneous and concurrently running applications.  
 + 
 +With new storage technologies and layers in the memory hierarchy, the picture is becoming even murkier. To effectively manage the data load within a data centre, I/O experts must understand how users expect to use the storage and what services they should provide to enhance user productivity.  
 + 
 +In this workshop, we bring together I/O experts from data centres ​and application workflows to share current practices for scientific workflows, issues, and obstacles for both hardware and the software stack, and R&D to overcome these issues. We seek to ensure that a systems-level perspective is included in these discussions.
  
 The workshop content is built on two tracks with calls for papers/​talks:​ The workshop content is built on two tracks with calls for papers/​talks:​
-  * research ​paper track -- requesting ​submissions regarding state-of-the-practice and research about I/O in the datacenter //(see our topic list below)//.  +  * Research ​paper track -- Requesting ​submissions regarding state-of-the-practice and research about I/O in the data centre ​(see our [[iodc#​topics|topic list]]). 
-  * talks from I/O experts -- requesting ​submissions of talks.+  * Talks from I/O experts -- Requesting ​submissions of talks.
  
 We are excited to announce that research papers will be published in Springer LNCS open access and extended manuscripts in the  [[https://​jhps.vi4io.org|Journal of High-Performance Storage]] as well. We are excited to announce that research papers will be published in Springer LNCS open access and extended manuscripts in the  [[https://​jhps.vi4io.org|Journal of High-Performance Storage]] as well.
-Contributions to both tracks are peer-reviewed and require submission of the respective research paper or idea for your presentation via [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]] (see the descriptions below).+Contributions to both tracks are peer-reviewed and require submission of the respective research paper or idea for your presentation via [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]] (see the complete description in [[iodc#​trackresearch_papers|Track:​ Research Papers]]).
  
 The workshop is held in conjunction with the [[http://​www.isc-hpc.com/​|ISC-HPC]] during the ISC workshop day. The workshop is held in conjunction with the [[http://​www.isc-hpc.com/​|ISC-HPC]] during the ISC workshop day.
-Note that the attendance ​of ISC workshops requires a **workshop pass**.+Note that the attendance ​to ISC workshops requires a **workshop pass**.
 See also our last year's [[https://​hps.vi4io.org/​events/​2019/​iodc|workshop web page]]. See also our last year's [[https://​hps.vi4io.org/​events/​2019/​iodc|workshop web page]].
  
  
 || Date || Thursday, June 25th, 2020 || || Date || Thursday, June 25th, 2020 ||
-|| Venue || Marriott Hotel, Frankfurt, Germany, ​[[http://​www.isc-hpc.com/​venue.html|Details about the ISC-HPC venue]] || +|| Venue || Virtual Event (the free [[#​registration|registration]] is required) ​|| 
 || Contact || [[about:​people:​julian kunkel]] || || Contact || [[about:​people:​julian kunkel]] ||
  
-This workshop is powered by the [[https://​www.vi4io.org|Virtual Institute for I/O]], the [[https://​jhps.vi4io.org|Journal of High-Performance Storage]], [[http://​www.esiwace.eu|ESiWACE]] ((ESiWACE ​has received funding from the European Union’s Horizon 2020 Research ​and Innovation Programme ​under Grant Agreement ​No 823988)). +This workshop is powered by the [[https://​www.vi4io.org|Virtual Institute for I/O]], the [[https://​jhps.vi4io.org|Journal of High-Performance Storage]], [[http://​www.esiwace.eu|ESiWACE]] ((ESiWACE ​is funded by the European Union’s Horizon 2020 research ​and innovation programme ​under grant agreement ​No823988.)).
  
 {{:​events:​2017:​vi4io.png?​200&​nolink|}} \w {{:​research:​projects:​esiwace-logo.png?​300&​nolink|}} \w {{:​events:​2020:​jhps-logo.png?​250&​nolink|}} {{:​events:​2017:​vi4io.png?​200&​nolink|}} \w {{:​research:​projects:​esiwace-logo.png?​300&​nolink|}} \w {{:​events:​2020:​jhps-logo.png?​250&​nolink|}}
  
 +===== Organisation =====
  
-===== Organization ===== +The workshop is organised ​by  
- +  * Julian Kunkel ([[https://www.reading.ac.uk/​computer-science/​|Department of Computer Science]], University of Reading, UK), [[j.m.kunkel@reading.ac.uk]]
-The workshop is organized ​by  +
-  * Julian Kunkel ([[http://cs.reading.ac.uk|Department of Computer Science]], University of Reading, UK), [[j.m.kunkel@reading.ac.uk]]+
   * Jay Lofstead (Sandia National Lab, USA), [[gflofst@sandia.gov]]   * Jay Lofstead (Sandia National Lab, USA), [[gflofst@sandia.gov]]
   * Jean-Thomas Acquaviva (DDN, France), [[jtacquaviva@ddn.com]]   * Jean-Thomas Acquaviva (DDN, France), [[jtacquaviva@ddn.com]]
Line 43: Line 44:
 ====== Agenda ====== ====== Agenda ======
  
-//This is the general skeleton ​for the agenda, the agenda will be updated after author notification//+The videos are available in [[https://​www.youtube.com/​watch?​v=1iqmRGqvusk&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2|YouTube]]. 
 +Please see our {{ :​events:​2020:​hpc_iodc_2020.pdf |workshop summary paper}}. 
 + 
 +Times are listed in BST (GMT+1), CEST is +1 hour, -6 hours for US Central (CDT) 
 + 
 +  * 9:45 **Welcome to the HPC IODC workshop** -- Julian Kunkel, Jay Lofstead, Jean-Thomas Acquaviva \\ {{ :​events:​2020:​hpciodc20-intro.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=1iqmRGqvusk&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=1|Video]] \\ //​This ​talk provides an introduction to the HPC IODC workshop providing the motivation and wider scope behind the workshop.//​ 
 +  * 10:00 **Research paper session** -- chair Jean-Thomas Acquaviva 
 +      * **Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and Interconnects** -- Yuichi Tsujita (RIKEN) \\ {{ :​events:​2020:​hpciodc20-riken.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=1hUF9GcqOV0&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=2|Video]] \\ // Recent HPC systems utilize parallel file systems such as GPFS and Lustre to ope with the huge demand of data-intensive applications. Although most of the HPC systems provide performance tuning tools on compute nodes, there is not enough chance to tune I/O activities on parallel file systems including high speed interconnects among compute nodes and file systems. We propose an I/O performance optimization framework using log data of parallel file systems and interconnects in a holistic way for effective use of HPC systems including I/O nodes and parallel file systems. We demonstrate our framework at the K computer with two I/O benchmarks ​for the original and the enhanced MPI-IO implementations. Its I/O analysis has revealed that I/O performance improvements achieved by the enhanced MPI-IO implementation are due to effective utilization of parallel file systems and interconnects among I/O nodes compared with the original MPI-IO implementation.//​ 
 +      * **Investigating the Overhead of the REST Protocol to Reveal the Potential for Using Cloud Services for HPC Storage** ​ -- Frank Gadban (University of Hamburg) \\ {{ :​events:​2020:​hpciodc20-rest.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=b8T81S2-M0U&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=3|Video]] \\ In this paperwe investigate ​the overhead of the REST protocol via HTTP compared to the HPC-native communication protocol MPI when storing and retrieving objects. Albeit we compare the MPI for a communication use case, we can still evaluate the impact of data communication and, therewith, the efficiency of data transfer for data access patterns. We accomplish this by modeling the impact of data transfer using measurable performance metrics. Hence, our contribution is the creation of a performance model based on hardware counters that provide an analytical representation of data transfer over current and future protocols. We validate this model by comparing the results obtained for REST and MPI on two different cluster systems, one equipped with Infiniband and one with Gigabit Ethernet. The evaluation shows that REST can be a viable, performant and resource-efficient solution, in particular for accessing large files.  
 +      * **Classifying Temporal Characteristics of Job I/O Patterns Using Machine Learning Techniques** -- Eugen Betke (DKRZ) \\ {{ :​events:​2020:​hpciodc20-temporal-io.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=dcloWCBUfKI&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=4|Video]] \\ Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure,​ for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into. \\ This work utilizes machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal I/O behavior. Our contribution is the qualitative and quantitative evaluation of different I/O characterizations and similarity measurements and the development of a suitable clustering algorithm. \\ In the evaluation, we explore I/O characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various IO statistics is converted into features using different similarity metrics that customize the classification. When using general-purpose clustering techniques, suboptimal results are obtained. Additionally,​ we extract phases of IO activity from jobs. Finally, we simplify the grouping algorithm in favor of performance. We discuss the impact of these changes on the clustering quality. 
 +  * 11:30 **Research talks** -- chair Julian Kunkel 
 +     * **A Reinforcement Learning Strategy to Tune Request Scheduling at the I/O Forwarding Layer** \\ {{ :​events:​2020:​hpciodc20-io-forwarding.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=xl9UJe2ghgQ&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=5|Video]] -- __Jean Luca Bez__, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, Philippe O. A. Navaux \\ //I/O optimization techniques can improve performance for the access patterns they were designed to target, but they often decrease for others. Moreover, these techniques usually depend on the precise tune of their parameters, which commonly falls back to the users. We propose an approach to tune parameters dynamically at runtime based on the I/O workload observed by the system. Our focusing is on the I/O forwarding layer as it is transparent to applications and file system independent. Our approach uses a reinforcement learning technique to make the system capable of learning the best parameter value to each observed access pattern during its execution, eliminating the need for a complex and time-consuming training phase. We evaluate our proposal for the TWINS scheduling algorithm designed for the I/O forwarding layer seeking to reduce contention and coordinate accesses to the data servers. We demonstrate our approach can reach a precision of 88% on the parameter selection in the first hundreds of observations of an access pattern, achieving 99% of the optimal performance.//​ 
 +    * **Data Systems at Scale in Climate and Weather: Activities in the ESiWACE Project** -- Julian Kunkel (University of Reading) \\ {{ :​events:​2020:​hpciodc20-esiwace.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=aVelQJFOhp0&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=6|Video]] ​ \\ The ESiWACE project aims to enable global eddy-resolving weather and climate simulations on the upcoming (pre-)Exascale supercomputers. In this talk, a selection of efforts to mitigate the effects of the data deluge from such high-resolution simulations is introduced. In particular, we describe the advances in the Earth System Data Middleware (ESDM), which enables scalable data management and supports the inhomogeneous storage stack. ESDM which provides a NetCDF compatible layer at a high-performance and portable-portable fashion. A selection of performance results is given and ongoing efforts for workflow support and active storage are discussed. 
 +    * **Phobos a scale-out object store implementing tape library support** -- __Patrice Lucas (CEA)__, Philippe Deniel (CEA), Thomas Leibovici(CEA) \\ {{ :​events:​2020:​hpciodc20-phobos.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=ad4PlAWlnsU&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=7|Video]] \\ Phobos is an open source scale-out distributed object store providing access to multiple backends from flash and hard drives to tape libraries. Very large datasets can be efficiently managed on inexpensive storage media without giving up performance,​ scalability or fault-tolerance. Phobos is designed to offer several data layouts, such as mirroring or erasure coding. IOs through tape drives are optimized by dedicated resource scheduling policies. Developed at CEA, Phobos is in production since 2016 to manage the France Genomique multi-petabyte dataset at TGCC.
  
-  * 09:00 **Welcome**  +  * //13:00 Virtual Lunch break// 
-  * //11:00 Coffee break// +  ​14:00 **Expert talks** -- chair Jean-Thomas Acquaviva 
-  * //13:00 Lunch break// +     ​**The ALICE data management pipeline** -- Massimo Lamanna (CERN) \\ Slides -- [[https://www.youtube.com/​watch?​v=HgW4OE92EDg&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=8|Video]] \\ ALICE is a major experiment at the CERN LHC with more than 1500 physicists, engineers and technicians,​ including around 350 graduate students, from 154 physics institutes in 37 countries across the world. ALICE primarily focuses on the study of high-energy nucleus-nucleus collisions. This allows the physicists to study strongly interacting matter at the highest energy densities reached so far in the laboratory with the goal to understand mechanisms and phenomena in particle physics and astrophysics. \\ After 7 years of data taking, the experiment is currently being upgraded with new detectors providing novel computing challenges notably in high data rates, processing and storage needs. In this talk, I will describe the computing challenges we need to solve in order to fully benefit from the performance of the new detectors. I will discuss and present the overall design of the ALICE computing farm O^2. Particular emphasis will be given to the technical choices in setting up a 60-PB disk farm to sustain rates of the order of 100 GB/s to the mass storage during the online data-processing. 
-  * //16:00 Coffee break//+     * **Accelerating your Application I/O with UnifyFS** -- Kathryn Mohror (Lawrence Livermore National Laboratory) \\ {{ :​events:​2020:​hpciodc20-unifyfs.pdf |Slides}} -- [[https://www.youtube.com/​watch?​v=r4gGPtO-qpE&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=9|Video]] \\ UnifyFS is a user-level file system that is highly-specialized for fast shared file access on high performance computing (HPC) systems with distributed burst buffers. UnifyFS delivers significant performance improvements over general purpose file systems by supporting the specific needs of HPC workloads with reduced POSIX semantics support called "​lamination semantics."​ In this talk, we will give an introductory overview of how to use the lightweight UnifyFS file system to improve the I/O performance of HPC applications. We will describe how UnifyFS works with burst buffers, the benefits and limitations of lamination semantics, and how users can incorporate UnifyFS into their jobs. Finally, we will detail the current implementation status of UnifyFS and our plans for the future. 
 +     * **How to recognise I/O bottlenecks and what to do about them** -- Rosemary Francis (Ellexus) \\ {{ :​events:​2020:​hpciodc20-ellexus.pdf |Slides}} -- [[https://​www.youtube.com/​watch?​v=JGKEeQfKB_0&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=9|Video]] ​ \\ Dr Rosemary Francis is CEO and technical founder of Ellexus, the I/O profiling company. Rosemary will be sharing industry perspectives on how to recognise I/O bottlenecks and what to do about them. The delicate and often dynamic balance between I/O, CPU and memory can hide some easy wins in terms of improving throughput on-prem and reducing costs in the cloud. Equally, improving I/O is also about reducing the load on shared storage and not just about the incremental improvements of individual applications. 
 +  * 15:30 **Discussion of hot topics** -- chair Julian Kunkel 
 +  * 16:00 **Expert talks** -- chair Jay Lofstead 
 +    * **Managing Decades of Scientific Data in Practice at NERSC** -- Glenn Lockwood (NERSC) \\ {{ :​events:​2020:​hpciodc20-nersc.pdf |Slides}} -- [[https://www.youtube.com/​watch?​v=O3977O94FzE&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=12|Video]] \\ The National Energy Research Scientific Computing Center (NERSC) has been operating since 1974 and has been storing and preserving user data continuously for over 45 years as a result. ​ This has resulted in NERSC building significant expertise in how to store and manage user data for long periods of time--a decade or more--and the practical factors that must be considered when data must be retained for longer than the lifetime of the physical components of the data center, including the entire data center facility itself. ​ As the relevance of HPC extends beyond modeling and simulation and the usable lifetime of data extends from months to years or decades, these best practices in long-term data stewardship are likely to become more important to more HPC facilities. ​ To this end, we present here some of the practical considerations,​ best practices, and lessons learned from managing the scientific data of NERSC'​s thousands of users over a period of four decades. 
 +    ​**Portable Validations of Scientific Explorations with Container-native Workflows** -- Ivo Jimenez (UC Santa Cruz) \\ {{ :​events:​2020:​hpciodc20-workflow.pdf |Slides}} -- [[https://www.youtube.com/​watch?​v=wlZls0yE9xg&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=13&​t=0s|Video]] \\ Researchers working in computer, computational or data science often find it difficult to reproduce experiments from artifacts like code, data, diagrams and results which are left behind by previous researchers. The code developed on one machine often fails to run on other machines due to differences in hardware architecture,​ OS, software dependencies,​ among others. This is accompanied by the difficulty in understanding how artifacts are organized, as well as in using them in correct order. Software container technology such as Docker, can solve most of the practical issues of portability,​ and in particular, container-native workflow engines can significantly aid experimenters in their work. In this talk, we introduce Popper, a container-native workflow engine that executes each step of a workflow in a separate dedicated container without assuming the presence of a Kubernetes cluster or any cloud based Kubernetes service. With Popper, researchers can build and validate workflows easily in almost any environment of their choice including local machines, SLURM based HPC clusters, CI services or Kubernetes based cloud computing environments. To exemplify the suitability of this workflow engine, we present three case studies where we take examples from Machine Learning and High Performance Computing and turn them into Popper workflows. We also discuss how Popper can be used to aid in preparing artifacts associated with article submissions to conferences and journals, and in particular give an overview of the Journal of High-Performance Storage, a new eJournal that combines open reviews, living papers, digital reproducibility,​ and open access. 
 +    * **Tuning I/O Performance on Summit: HDF5 Write Use Case Study** -- Xie Bing (Oak Ridge National Laboratory) \\ {{ :​events:​2020:​hpciodc20-hdf5.pdf |Slides}} -- [[https://www.youtube.com/​watch?​v=NOEkoG1PPAA&​list=PL_PBXYC_ExoOggQRV98QLNMhT8u-iRxt2&​index=14|Video]] ​ \\ The HDF5 I/O library is widely used in HPC across a variety of domain sciences for its simplicity, flexibility,​ and rich performance-tuning space. In this work, we address an observed HDF5 write performance issue on Summit at OLCF, which in particular is the poor write performance of HDF5 with the default configuration. ​ To identify the performance issue, we developed an I/O benchmarking methodology to profile the HDF5 performance on Summit across scales, compute-node allocations,​ I/O configurations and times. We developed a solution to the issue by altering the HDF5 alignment configuration which resulted in a 100x write performance improvement for VPIC benchmark. We expect our methodology and solution to be applicable to other platforms and technologies. 
 +  * 17:30 **Discussion of hot topics** -- chair Jay Lofstead
   * 18:00 //End//   * 18:00 //End//
  
Line 55: Line 76:
  
  
-===== Program ​committee ​===== +===== Registration ===== 
-  * Thomas Boenisch (High performance Computing Center Stuttgart)+ 
 +We will provide the link for the video conference to registered attendees. Fill the linked [[https://​forms.gle/​G2pbA9MEDS52jXfk8|form]],​ to register for the workshop. 
 + 
 +===== Program ​Committee ​===== 
 + 
 +  * Thomas Boenisch (High-performance Computing Center Stuttgart)
   * Suren Byna (Lawrence Berkeley National Laboratory)   * Suren Byna (Lawrence Berkeley National Laboratory)
   * Matthew Curry (Sandia National Laboratories)   * Matthew Curry (Sandia National Laboratories)
Line 64: Line 90:
   * Adrian Jackson (The University of Edinburgh)   * Adrian Jackson (The University of Edinburgh)
   * Ivo Jimenez (University of California, Santa Cruz)   * Ivo Jimenez (University of California, Santa Cruz)
-  * Michael Kluge (TU Dresden) 
   * Anthony Kougkas (Illinois Institute of Technology)   * Anthony Kougkas (Illinois Institute of Technology)
   * Glenn Lockwood (Lawrence Berkeley National Laboratory)   * Glenn Lockwood (Lawrence Berkeley National Laboratory)
Line 75: Line 100:
   * Feiyi Wang (Oak Ridge National Laboratory)   * Feiyi Wang (Oak Ridge National Laboratory)
   * Bing Xie (Oak Ridge National Lab)   * Bing Xie (Oak Ridge National Lab)
 +
 +
  
  
Line 81: Line 108:
 The workshop is integrated into ISC-HPC. The workshop is integrated into ISC-HPC.
 We welcome everybody to join the workshop, including: We welcome everybody to join the workshop, including:
-  * I/O experts from data centers ​and industry. +  * I/O experts from data centres ​and industry. 
-  * Researchers/​Engineers working on high-performance I/O for data centers+  * Researchers/​Engineers working on high-performance I/O for data centres
-  * Interested domain ​scientists and computer scientists interested in discussing I/O issues. +  * Domain ​scientists and computer scientists interested in discussing I/O issues. 
-  * Vendors are also welcome, but their presentations must align with data center ​topics (e.g. how do they manage their own clusters) and not focus on commercial aspects.+  * Vendors are also welcome, but their presentations must align with data centre ​topics (e.g. how do they manage their own clusters) and not focus on commercial aspects.
  
-The call for papers and talks is already open. We accept early submissions, too, and typically proceed with them within 45 days.+The call for papers and talks is already open. We accept early submissions and typically proceed with them within 45 days.
 We particularly encourage early submission of abstracts such that you indicate your interest in submissions. We particularly encourage early submission of abstracts such that you indicate your interest in submissions.
  
-You may be interested ​to join our [[https://​www.vi4io.org/​listinfo|mailing lists]] at the [[https://​www.vi4io.org/​|Virtual Institute ​of I/O]].+You may be interested ​in joining ​our [[https://​www.vi4io.org/​listinfo|mailing lists]] at the [[https://​www.vi4io.org/​|Virtual Institute ​for I/O]].
  
-We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions data center.+We especially welcome participants that are willing to give a presentation about the I/O of the representing institutions' ​data centre.
 Note that such presentations should cover the topics mentioned below. ​ Note that such presentations should cover the topics mentioned below. ​
  
 +{{:​events:​2020:​cfp-iodc.txt|CFP text}}
  
-===== Track: ​research papers ​=====+===== Track: ​Research Papers ​=====
  
-The research track accepts papers covering state-of-the-practice and research dedicated to storage in the datacenter.+The research track accepts papers covering state-of-the-practice and research dedicated to storage in the data centre.
  
-Proceedings will appear in ISC's post-conference workshop proceedings in Springers LNCS; extended ​versions have a chance for acceptance in the first issue of the [[https://​jhps.vi4io.org|JHPS journal]]. +Proceedings will appear in ISC's post-conference workshop proceedings in Springers LNCS. Extended ​versions have a chance for acceptance in the first issue of the [[https://​jhps.vi4io.org|JHPS journal]]. 
-We will apply the more restrictive review criteria from [[https://​jhps.vi4io.org/​authors/​|JHPS]] and use the open workflow of the JHPS journal for managing the proceedings, for interaction,​ we will rely on [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]],​ so please submit the metadata to EasyChair before the deadline.+We will apply the more restrictive review criteria from [[https://​jhps.vi4io.org/​authors/​|JHPS]] and use the open workflow of the JHPS journal for managing the proceedings. For interaction,​ we will rely on [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]],​ so please submit the metadata to EasyChair before the deadline.
  
-For the workshop, we accept papers with up to 12 pages (excl. references) in [[http://​www.springer.com/​computer/​lncs?​SGWID=0-164-6-793341-0|LNCS format]].  +For the workshop, we accept papers with up to 12 pages (excluding ​references) in [[http://​www.springer.com/​computer/​lncs?​SGWID=0-164-6-793341-0|LNCS format]].  
-You may submit ​already ​a longer ​version suitable for the JHPS in [[https://​drive.google.com/​file/​d/​1LysF9H_86dguwW-w5_Fg2mKoD2aGGGak/​view?​usp=sharing|JHPS format]]; upon submission, please indicate potential sections for the extended version (setting a light red background ​color). +You may already ​submit an extended ​version suitable for the JHPS in [[https://​drive.google.com/​file/​d/​1LysF9H_86dguwW-w5_Fg2mKoD2aGGGak/​view?​usp=sharing|JHPS format]]. Upon submission, please indicate potential sections for the extended version (setting a light red background ​colour). 
-The JHPS template can be easily converted to the LNCS Word format such that the effort is minimal for the authors to obtain both publications ​-- alternatively, you can use the Springer LNCS LaTeX or Word template and convert it to a Google Doc. See the [[http://​www.springer.com/​computer/lncs?​SGWID=0-164-6-793341-0|instructions and templates for authors provided by Springer]].+The JHPS template can be easily converted to the LNCS Word format such that the effort is minimal for the authors to obtain both publications. Alternatively, you can use the Springer LNCS LaTeX or Word template and convert it to a Google Doc. See the [[https://​www.springer.com/​gp/authors-editors/​book-authors-editors/​resources-guidelines/​book-manuscript-guidelines/​manuscript-preparation/​5636|Manuscript Preparation,​ Layout & Templates, ​Springer]].
  
 For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers). For accepted papers, the length of the talk during the workshop depends on the controversiality and novelty of the approach (the length is decided based on the preference provided by the authors and feedback from the reviewers).
 We also allow virtual participation (without attending the workshop personally). We also allow virtual participation (without attending the workshop personally).
-All relevant work in the area of data center ​storage will be able to publish ​with our joint workshop proceedings, we just believe the available time should be used best to discuss controversial topics.+All relevant work in the area of data centre ​storage will be published ​with our joint workshop proceedings. We just believe the available time should be used best to discuss controversial topics.
  
 ==== Topics ==== ==== Topics ====
  
-The relevant topics for papers cover all aspects of data center ​I/O including:+The relevant topics for papers cover all aspects of data centre ​I/Oincluding:
  
-  * application ​workflows +  * Application ​workflows 
-  * user productivity and costs +  * User productivity and costs 
-  * performance ​monitoring +  * Performance ​monitoring 
-  * dealing ​with heterogeneous storage +  * Dealing ​with heterogeneous storage 
-  * data management aspects +  * Data management aspects 
-  * archiving ​and long term data management +  * Archiving ​and long term data management 
-  * state-of-the practice (e.g., using or optimizing ​a storage system for data center ​workloads)  +  * State-of-the-practice (e.g., using or optimising ​a storage system for data centre ​workloads)  
-  * research ​that tackles data center ​I/O challenges+  * Research ​that tackles data centre ​I/O challenges
  
  
 ==== Paper Deadlines ==== ==== Paper Deadlines ====
  
-  * Submission deadline: ​2020-02-24 AoE+  * <​del>​**2020-02-24**: Submission deadline: ​AoE ((Anywhere on Earth))</​del>​
     * //Note: The call for papers and talks is already open. //     * //Note: The call for papers and talks is already open. //
-    ​* //You can submit an abstract anytime.//​ +    * //We appreciate early submissions ​of abstracts and full papers ​and review them within 45 days.// 
-    ​* //We also appreciate early full submissions, too, and typically ​review ​with them within 45 days.// +  * **2020-04-15**: Extended Submission deadline: AoE ((Anywhere on Earth)) due to the Coronavirus 
-  * Author notification: ​2020-04-24 +    * //Please submit abstracts asap.// 
-  * Camera-ready papers for JHPS: 2020-05-10 (this depends on the author'​s ability to incorporate feedback into their submission in the incubator) +  * **2020-05-03**Author notification 
-  * Pre-final submission for ISC: 2020-06-10 ​(papers ​to be shared during the workshop, we will use the JHPS papers, if available) +  * **2020-05-10**: Camera-ready papers for JHPS (It depends on the author'​s ability to incorporate feedback into their submission in the incubator.
-  * Workshop: ​2020-06-25 +  * **2020-06-10**: ​Pre-final submission for ISC (Papers ​to be shared during the workshop. We will also use the JHPS papers, if available.
-  * Camera-ready papers for ISC ((tentative)): 2020-07-24 ​-- As they are needed for ISC's post-conference workshop proceedings. We embrace the chance ​for authors to improve their papers based on the feedback received during the workshop.+  * **2020-06-25**: Workshop 
 +  * **2020-07-24**: ​Camera-ready papers for ISC ((tentative)) -- As they are needed for ISC's post-conference workshop proceedings. We embrace the opportunity ​for authors to improve their papers based on the feedback received during the workshop.
  
  
 ==== Review Criteria ==== ==== Review Criteria ====
  
-The main acceptance ​criteria ​is the relevance of the approach to be presented ​-- i.e., is the core idea worthwhile ​in the community ​to be discussed ​or novel+The main acceptance ​criterion ​is the relevance of the approach to be presentedi.e., the core idea is novel and worthwhile to be discussed ​in the community
-Since the camera-ready version of the papers is due after the workshop, we pursue two rounds of reviews: +Considering that the camera-ready version of the papers is due after the workshop, we pursue two rounds of reviews: 
-  - Acceptance for the workshop (as a talk) +  - Acceptance for the workshop (as a talk). 
-  - Acceptance as a paper *after* the workshop, ​this incorporates ​feedback from the workshop.+  - Acceptance as a paper *after* the workshop, ​incorporating ​feedback from the workshop.
  
 After the first review, all papers undergo a shepherding process. After the first review, all papers undergo a shepherding process.
  
-The criteria for the Journal of High-Performance Storage are described on [[https://​jhps.vi4io.org/authors/|their page]]. +The criteria for [[https://​jhps.vi4io.org/​|The Journal of High-Performance Storage]] are described on its webpage.
-===== Track: Talks by I/O experts =====+
  
-The topics of interest in this track include but are not limited to: +===== TrackTalks by I/O Experts =====
-  * A description of the operational aspects of your data center +
-  * A particular solution for certain data center workloads in production+
  
-We also accept industry talksgiven that they focus on operational ​issues on data centers and omit marketing.+The topics of interest in this track includebut are not limited to: 
 +  * A description of the operational ​aspects of your data centre 
 +  * A particular solution for specific data centre workloads in production
  
-We use [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]] for managing the acceptance and PC interaction.  +We also accept industry talks, given that they are focused on operational issues on data centres and omit marketing.
-If you are interested to participate please submit a short (1/2 page) intended abstract of your talk together with a short Bio.+
  
-==== Deadlines ​for the submission of the abstract ====+We use [[https://​easychair.org/​conferences/?​conf=hpciodc20|Easychair]] ​for managing ​the interaction with the program committee.  
 +If you are interested in participating,​ please submit a short (1/2 page) intended ​abstract ​of your talk together with a brief Bio. 
 + 
 +==== Abstract Deadlines ​====
  
   * Submission deadline: 2020-04-10 AoE   * Submission deadline: 2020-04-10 AoE
Line 165: Line 195:
 ==== Content ==== ==== Content ====
  
-The following list of items should be tried to be integrated into a talk covering your data center, if possible. +The following list of items should be tried to be integrated into a talk covering your data centre, if possible. 
-We hope your sites administrator will support you to gather the information with little effort.+We hope your site'​s ​administrator will support you to gather the information with little effort.
  
-  - Workload ​characterization+  - Workload ​characterisation
     - Scientific Workflow (give a short introduction)     - Scientific Workflow (give a short introduction)
       - A typical use-case (if multiple are known, feel free to present more)       - A typical use-case (if multiple are known, feel free to present more)
-      - Involved number of files / amount of data+      - Involved number of files/​amount of data
     - Job mix      - Job mix 
-      - Node utilization ​(rel. to peak-performance)+      - Node utilisation ​(related ​to peak-performance)
   - System view   - System view
     - Architecture     - Architecture
Line 180: Line 210:
       - Potential peak-performance of the storage ​       - Potential peak-performance of the storage ​
         - Theoretical         - Theoretical
-        - Optional: ​performance ​results of acceptance tests. +        - Optional: ​Performance ​results of acceptance tests. 
-      - Software / Middleware used, e.g. NetCDF 4.X, HDF5, ...+      - Software/​Middleware used, e.g. NetCDF 4.X, HDF5, ...
     - Monitoring infrastructure     - Monitoring infrastructure
-      - Tools and systems used to gather and analyse ​utilization+      - Tools and systems used to gather and analyse ​utilisation
     - Actual observed performance in production     - Actual observed performance in production
-      - Throughput graphs of the storage (e.g. from Ganglia)+      - Throughput graphs of the storage (e.g.from Ganglia)
       - Metadata throughput (Ops/s)       - Metadata throughput (Ops/s)
     - Files on the storage     - Files on the storage
-      - Number of files (if possible per file type)+      - Number of files (if possibleper file type)
       - Distribution of file sizes       - Distribution of file sizes
-  - Issues / Obstacles+  - Issues/​Obstacles
     - Hardware     - Hardware
     - Software     - Software
-    - Pain points (what is seen as the biggest ​problem(s) and suggested solutions, if known)+    - Pain points (what is seen as the most significant ​problem(s) and suggested solutions, if known)
   - Conducted R&D (that aim to mitigate issues)   - Conducted R&D (that aim to mitigate issues)
     - Future perspective     - Future perspective
-    - Known or projected future workload ​characterization+    - Known or projected future workload ​characterisation
     - Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community     - Scheduled hardware upgrades and new capabilities we should focus on exploiting as a community
     - Ideal system characteristics and how it addresses current problems or challenges ​     - Ideal system characteristics and how it addresses current problems or challenges ​
-    - what hardware should be added +    - What hardware should be added 
-    - what software should be developed to make things work better (capabilities perspective) +    - What software should be developed to make things work better (capabilities perspective) 
-    - Items requiring discussion ​to work through how to address +    - Items requiring discussion
- +