Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

SZ: Fast Error-Bounded Lossy HPC Data Compressor

Today’s HPC applications are producing extremely large amounts of data, thus it is necessary to use an efficient compression before storing them to parallel file systems.

We developed the error-bounded HPC data compressor, by proposing a novel HPC data compression method that works very effectively on compressing large-scale HPC data sets.

The compression method starts by linearizing multi-dimensional snapshot data. The key idea is to fit/predict the successive data points with the bestfit selection of curve fitting models. The data that can be predicted precisely will be replaced by the code of the corresponding curve-fitting model. As for the unpredictable data that cannot be approximated by curve-fitting models, we perform an optimized lossy compression via a binary representation analysis.

The key features of SZ are listed below. 

1. Input: a data set (or a floating-point array with any dimensions) ; Output: the compressed byte stream

2. SZ supports C, Fortran, and Java. 

3. SZ supports two types of error bounds. The users can set either absolute error bound or relative error bound. 

he absolute error bound (denoted δ) is a constant, such as 1E-6. That is,  the decompressed data Di′ must be in the range [Di − δ,Di + δ], where  Di′ is referred as the decompressed value and Di is the original data value. As for the relative error bound, it is a linear function of the global data value range size, i.e., ∆=λr, where λ(∈(0,1)) and r refer to error bound ratio and range size respectively. 

For example, given a set of data, the range size r is equal to max (Di )− min (Di ), and the error bound can be written as λ( max (Di )− min (Di )).
i=1...M
i=1...M
The relative error bound allows to make sure that the
compression error for any data point must be no greater than
λ×100 percentage of the global data value range size.

  • No labels