SZ: Fast Error-Bounded Lossy HPC Data Compressor

Today’s HPC applications are producing extremely large amounts of data, thus it is necessary to use an efficient compression before storing them to parallel file systems.

We developed the error-bounded HPC data compressor, by proposing a novel HPC data compression method that works very effectively on compressing large-scale HPC data sets.

The compression method starts by linearizing multi-dimensional snapshot data. The key idea is to fit/predict the successive data points with the bestfit selection of curve fitting models. The data that can be predicted precisely will be replaced by the code of the corresponding curve-fitting model. As for the unpredictable data that cannot be approximated by curve-fitting models, we perform an optimized lossy compression via a binary representation analysis.

The key features of SZ are listed below. 

1. Compression: Input: a data set (or a floating-point array with any dimensions) ; Output: the compressed byte stream

    Decompression: input: the compressed byte stream ; Output: the original data set with the compression error of each data point being within a pre-specified error bound ∆.

2. SZ supports C, Fortran, and Java. 

3. SZ supports two types of error bounds. The users can set either absolute error bound or relative error bound, or a combination of the two bounds (with operator AND or OR).

4. Detailed usage and examples can be found in the doc/user-guide.pdf and example/, in the package. 

5. Version history: 

 

SZ 0.2-0.4 Compression ratio is the same as SZ 0.5. The key difference is different implementation ways, such that SZ 0.5 is much faster than SZ 0.2-0.4.

 

6. Download

-->>> Code download <<<-- (soon, pending DoE approval of distribution licence)

(The code is ready to use, but it cannot be released now because the BSD license is under approval process. Before the official release, the code is available upon request. Contact: disheng222@gmail.com or sdi1@anl.gov)