Statistical File Scanner
The statistical-file-scanner utilizes statistics to compute the estimated value for a data characteristics of large data sets without actually requiring to scan the full data set.
The characteristics to determine is computed based on the occupied size (file size) of the data. For example, one may want to estimate the percentage (proportion) a given file type has on the overall data, or how well data compresses, i.e., what will be the compression ratio if I compress the 10 Petabyte of data with compression scheme X.
Key Information
Contact | Dr. Julian Kunkel | ||
Repository | Public on GitHub |
Publications
- SFS: A Tool for Large Scale Analysis of Compression Characteristics (Julian Kunkel), 2017-05-05 BibTeX PDF
- Identifying Relevant Factors in the I/O-Path using Statistical Methods (Julian Kunkel), 2015-03-14 BibTeX PDF
Talks
- Statistical File Characterization and Status Update: Monitoring at DKRZ (Dr. Julian Kunkel), BoF: Analyzing Parallel I/O, Supercomputing Conference, Salt Lake City, USA, 2016-11-17 Presentation
- Analyzing Data Properties Using Statistical Sampling Techniques – Illustrated on Scientific File Formats and Compression Features (Dr. Julian Kunkel), HPC-IODC Workshop, Frankfurt, Germany, 2016-06-23 Presentation
- Analyzing Data Properties Using Statistical Sampling Techniques – Illustrated on Scientific File Formats and Compression Features (Dr. Julian Kunkel), ISC High Performance, Frankfurt, Germany, 2016-06-21 Presentation