The statistical-file-scanner utilizes statistics to compute the estimated value for a data characteristics of large data sets without actually requiring to scan the full data set.
The characteristics to determine is computed based on the occupied size (file size) of the data. For example, one may want to estimate the percentage (proportion) a given file type has on the overall data, or how well data compresses, i.e., what will be the compression ratio if I compress the 10 Petabyte of data with compression scheme X.
Contact | Dr. Julian Kunkel | ||
Repository | Public on GitHub |