SZ: Fast Error-Bounded Scientific Data Compressor

Keywords: floating point data compressor, lossy compressor, error bounded compression

Key developers: Sheng Di, Dingwen Tao, Xin Liang ; Other contributors: Ali M. Gok (Pastri version), Sihuan Li (Time-based compression for HACC simulation) ; Supervisor: Franck Cappello

Today’s HPC applications are producing extremely large amounts of data, thus it is necessary to use an efficient compression before storing them to parallel file systems.

We developed the error-bounded HPC data compressor, by proposing a novel HPC data compression method that works very effectively on compressing large-scale HPC data sets.

The key features of SZ are listed below. 

1. Usage: 

    Compression: Input: a data set (or a floating-point array with any dimensions); Output: the compressed byte stream

    Decompression: input: the compressed byte stream; Output: the original data set with the compression error of each data point being within a pre-specified error bound ∆.

2. Environment: SZ supports C, Fortran, and Java. It has been tested on Linux and Mac, with different architectures (x86, x64, ppc, etc.).

3. Error control: SZ supports many types of error bounds. The users can set absolute error bound, value-range based relative error bound, or a combination of the two bounds(with operator AND or OR). The users can also error bound mode to be PNSR-fixed, the point-wise relative error bound, etc. More details can be found in the configuration file (sz.config). 

4. Parallelism: OpenMP version is included in the package. We implemented OpenCL version simply based on OpenMP, but it is a deprecated version for GPU. An optimized GPU version is under development, to be released later.

5. SZ supports two compression modes (similar to Gzip): SZ_BEST_SPEED and SZ_BEST_COMPRESSION. SZ_BEST_SPEED results in the fastest compression. The best compression factor will be reached when using SZ_BEST_COMPRESSION and ZSTD_FAST_SPEED meanwhile. The default setting is SZ_BEST_COMPRESSION + Zstd.

6. User guide: More detailed usage and examples can be found under the directories doc/user-guide.pdf and example/ respectively, in the package. 

7. Citations: If you mention SZ in your paper, please cite the following references.

8. Download

Version SZ 2.1.7.0

-->>> Package Download (including everything) <<<--

-->>> Github of SZ <<<--

-->>> User Guide , hands-on-document <<<---

(Contact: disheng222@gmail.com or sdi1@anl.gov)

If you download the code, please let us know who you are. We are very keen of helping you using the SZ library.

9. Publications: 

  1. Sheng Di, Franck Cappello, "Fast Error-bounded Lossy HPC Data Compression with SZ," to appear in International Parallel and Distributed Processing Symposium (IEEE/ACM IPDPS 2016), 2016. [download]

  2. Dingwen Tao, Sheng Di, Franck Cappello, "A Novel Algorithm for Significantly Improving Lossy Compression of Scientific Data Sets, " to appear in International Parallel and Distributed Processing Symposium (IEEE/ACM IPDPS 2017), Orlando, Florida, 2017. [download]

  3. Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Capello, "Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets ", Proceedings of the 1st International Workshop on Data Reduction for Big Scientific Data (DRBSD1) in Conjunction with ISC'17, Frankfurt, Germany, June 22, 2017.

  4. Ian T. Foster, Mark Ainsworth, Bryce Allen, Julie Bessac, Franck Cappello, Jong Youl Choi, Emil M. Constantinescu, Philip E. Davis, Sheng Di, et al., "Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales", in 23rd International European Conference on Parallel and Distributed Computing (Euro-Par 2017), 2017. pp. 3-19.

  5. Sheng Di, Franck Cappello, "Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data," in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2017.

  6. Ali Murat Gok, Dingwen Tao, Sheng Di, Vladimir Mironov, Yuri Alexeev, Franck Cappello, "PaSTRI: A Novel Data Compression Algorithm for Two-Electron Integrals in Quantum Chemistry", in IEEE/ACM 29th The International Conference for High Performance computing, Networking, Storage and Analysis (SC2017). [poster]

  7. Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello, "In-Depth Exploration of Single-Snapshot Lossy Compression Techniques for N-Body Simulations", Proceedings of the 2017 IEEE International Conference on Big Data (BigData2017), Boston, MA, USA, December 11 - 14, 2017. [short paper]

  8. Dingwen Tao, Sheng Di, Hanqi Guo, Zizhong Chen, and Franck Cappello, "Z-checker: A Framework for Assessing Lossy Compression of Scientific Data", in The International Journal of High Performance Computing Applications (IJHPCA), 2017. [download]

  9. Sheng Di, Dingwen Tao, Xin Liang, and Franck Cappello, "Efficient Lossy Compression for Scientific Data based on Pointwise Relative Error Bound", in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2018.
  10. Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen and Franck Cappello, "Optimization of Fault Tolerance for Iterative Methods with Lossy Checkpointing", in 27th ACM Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC2018), 2018.

  11. Ali Murat Gok, Sheng Di, Yuri Alexeev, Dingwen Tao, V. Mironov, Xin Liang, Franck Cappello, "PaSTRI: Error-bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry", in IEEE CLUSTER 2018, 2018. [best paper award (in the application, algorithms and libraries track)]

  12. Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cappello, "Efficient Transformation Scheme for Lossy Data Compression with Point-wise Relative Error Bound", in IEEE CLUSTER 2018. [best paper award (in the Data, Storage, and Visualization track)]

  13. Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, and F. Cappello, "Fixed-PSNR Lossy Compression for Scientific Data", in IEEE CLUSTER 2018. (short paper)

  14. Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, Franck Cappello, "Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets", in IEEE Bigdata2018, 2018.
  15. Sihuan Li, Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello, "Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation", in IEEE Bigdata2018, 2018.
  16. Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Zizhong Chen, Franck Cappello, "Improving In-situ Lossy Compression with Spatio-Temporal Decimation based on SZ Model", in Proceedings of the 4th International Workshop on Data Reduction for Big Scientific Data (DRBSD-4), in conjunction with IEEE/ACM 29th The International Conference for High Performance computing, Networking, Storage and Analysis (SC2018).
  17. Xin-Chuan Wu, Sheng Di, Franck Cappello, Hal Finkel, Yuri Alexeev, Frederic T. Chong, "Amplitude-Aware Lossy Compression for Quantum Circuit Simulation", in Proceedings of the 4th International Workshop on Data Reduction for Big Scientific Data (DRBSD-4), in conjunction with IEEE/ACM 29th The International Conference for High Performance computing, Networking, Storage and Analysis (SC2018).
  18. Xin-Chuan Wu, Sheng Di, Franck Cappello, Hal Finkel, Yuri Alexeev , Frederic T. Chong, "Memory-Efficient Quantum Circuit Simulation by Using Lossy Data Compression", The 3rd International Workshop on Post-Moore Era Supercomputing (PME) in conjunction with IEEE/ACM 29th The International Conference for High Performance computing, Networking, Storage and Analysis (SC2018).
  19. Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello, "Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP", in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2019.
  20. XiangYu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, Sheng Di, Dingwen Tao, Franck Cappello, "Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms", in Proceedings of the 35th International Conference on Massive Storage Systems and Technology (MSST19), 2019.
  21. XiangYu Zou, Tao Lu, Sheng Di, Dingwen Tao, Wen Xia, Xuan Wang, Weizhe Zhang, Qing Liao, "Accelerating Lossy Compression on HPC datasets via Partitioning Computation for Parallel Processing", in The 21st IEEE International Conference on High Performance Computing and Communications (IEEE HPCC19), 2019.
  22. Sian Jin, Sheng Di, Xin Liang, Jiannan Tian, Dingwen Tao, Franck Cappello, "DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression", Proceedings of the 28th ACM International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC19), Phoenix, AZ, USA, June 24 - 28, 2019.
  23. Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Yuri Alexeev, Hal Finkel, Frederic T. Chong, "Full State Quantum Circuit Simulation by Using Data Compression", in IEEE/ACM 30th The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2019), 2019.
  24. Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, Franck Cappello, "Significantly Improving Lossy Compression Quality based on An Optimized Hybrid Prediction Model", in IEEE/ACM 30th The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2019), 2019.
  25. Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Bogdan Nicolae, Zizhong Chen, Franck Cappello, "Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation," in IEEE CLUSTER2019, 2019.
  26. Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali M. Gok, Dingwen Tao, Chun Hong Yoon , Xin-Chuan Wu, Yuri Alexeev, Federic T. Chong, "Use cases of lossy compression for floating-point data in scientific datasets", in The International Journal of High Performance Computing Applications (IJHPCA), 2019.

10. Version history: We recommend the latest version.

11. Other versions are available upon request