Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The key features of SZ are listed below. 

1. Compression: Input: a data set (or a floating-point array with any dimensions) ; Output: the compressed byte stream

    Decompression: input: the compressed byte stream ; Output: the original data set with the compression error of each data point being within a pre-specified error bound ∆.

2. SZ supports C, Fortran, and Java. 

...

  • The absolute error bound (denoted δ) is a constant, such as 1E-6. That is,  the decompressed data Di′ must be in the range [Di − δ,Di + δ], where  Di′ is referred as the decompressed value and Di is the original data value. 
  • As for the relative error bound, it is a linear function of the global data value range size, i.e., ∆=λr, where λ(∈(0,1)) and r refer to error bound ratio and range size respectively. For example, given a set of data, the range size r is equal to max (Di )− min (Di ), and the error bound can be written as λ( max (Di )− min (Di )). The relative error bound allows to make sure that the compression error for any data point must be no greater than λ×100 percentage of the global data value range size.

4. Detailed usage and examples can be found in the doc/user-guide.pdf and example/, in the package. 

5. Version history: 

 

SZ 0.2-0.4 Compression ratio is the same as SZ 0.5. The key difference is different implementation ways, such that SZ 0.5 is much faster than SZ 0.2-0.4.

  • SZ 0.5.1 Support version checking
  • SZ 0.5.2 finer compression granularity for unpredictable data, and also remove redundant Java storage bytes
  • SZ 0.5.3 Integrate with the dynamic segmentation support
  • SZ 0.5.4 Gzip_mode: defaut --> fast_mode ; Support reserved value
  • SZ 0.5.5 runtime memory is shrinked (by changing int xxx to byte xxx in the codes)
  • The bug that writing decompressed data may encounter exceptions is fixed.
  • Memory leaking bug for ppc architecture is fixed.
  • SZ 0.5.6 improve compression ratio for some cases (when the values in some segementation are always the same, this segment will be merged forward)
  • SZ 0.5.7 improve the decompression speed for some cases
  • SZ 0.5.8 Refine the leading-zero granularity (change it from byte to bits based on the distribution). For example, in SZ0.5.7, the leading-zero is always in bytes, 0, 1, 2, or 3. In SZ0.5.8, the leading-zero part could be xxxx xxxx xx xx xx xx xxxx xxxx (where each x means a bit in the leading-zero part)
  • SZ 0.5.9 optimize the offset by using simple right-shifting method. Experiments show that this cannot improve compression ratio actually, because simple right-shifting actually make each data be multiplied by 2^{-k}, where k is # right-shifting bits. The pros is to save bits because of more leading-zero bytes, but the cons is much more required bits to save. A good solution is SZ 0.5.10!

 

  • SZ 0.5.10 optimze the offset by using the optimized formula of computing the median_value based on optimized right-shifting method. Anyway, SZ0.5.10 improves compression ratio a lot for hard-to-compress datasets. (Hard-to-compress datasets refer to the cases whose compression ratios are usually very limited)

6. Download

-->>> Code download <<<-- (soon, pending DoE approval of distribution licence)

(The code is ready to use, but it cannot be released now because the BSD license is under approval process. Before the official release, the code is available upon request. Contact: disheng222@gmail.com or sdi1@anl.gov)