Keywords: floating point da= ta compressor, lossy compressor, error bounded compression<= /p>
Key developers: Sheng Di, Dingwen= Tao, Xin Liang ; Other contributors: Ali M. Gok (Past= ri version), Sihuan Li (Time-based compression for HACC simulation) ; = Supervisor: Franck Cappello
Today=E2=80=99s HPC applications are producing extremely large amounts o= f data, thus it is necessary to use an efficient compression before st= oring them to parallel file systems.
We developed the error-bounded HPC data compressor, by proposing a = novel HPC data compression method that works very effectively on compr= essing large-scale HPC data sets.
The key features of SZ are listed below.
1. Usage:
Compression: Input: a data set (or a floating-point array = with any dimensions); Output: the compressed byte stream
Decompression: input: the compressed byte stream; Output: = the original data set with the compression error of each data point being w= ithin a pre-specified error bound =E2=88=86.
2. Environment: SZ supports C, Fortran, and Java. It ha= s been tested on Linux and Mac, with different architectures (x86, x64, ppc= , etc.).
3. Error control: SZ supports many types of error bound= s. The users can set absolute error bound, value-ran= ge based relative error bound, or a combination of the two bo= unds=EF=BC=88with operator AND or OR). The users can also= error bound mode to be PNSR-fixed, the point-wise relative error bound, et= c. More details can be found in the configuration file (sz.config).
4. SZ supports two compression modes (similar to Gzip): SZ_BEST_SPEED an= d SZ_BEST_COMPRESSION. SZ_BEST_SPEED results in the fastest compression. Th= e best compression factor will be reached when using SZ_BEST_COMPRESSI= ON and ZSTD_FAST_SPEED meanwhile. The default setting is SZ_BEST_COMPRESSIO= N + Zstd.
5. User guide: More detailed usage and examples can be = found under the directories doc/user-guide.pdf and example/ respectively, i= n the package.
6. Citations: If you mention SZ in your paper, please c= ite the following references.
7. Download
Version SZ 2.0.2.0
-->>> Package Download (including everything) <<<= ;--
-->>>
-->>> User Gui= de , hands-on-document <<<---
(Contact: disheng222@gmail.com or sdi1@anl.gov)
If you download the code, please let us know who you a= re. We are very keen of helping you using the SZ library.
8. Publications: <= /span>
Sheng Di, Franck Cappello, "Fast Error-bounded Lossy HPC Data C= ompression with SZ," to appear in International Parallel and Distributed Pr= ocessing Symposium (IEEE/ACM IPDPS 2016), 2016. [download]
Dingwen Tao, Sheng Di, Franck Cappello, "A Novel Algorithm for = Significantly Improving Lossy Compression of Scientific Data Sets, " to app= ear in International Parallel and Distributed Processing Symposium (IEEE/ACM IPDPS 2017), Orlando, Florida, 2017. [downlo= ad]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Capello, "Explo= ration of Pattern-Matching Techniques for Lossy Compression on Cosmology Si= mulation Data Sets ", Proceedings of the 1st International Workshop on Data= Reduction for Big Scientific Data (DRBSD1) in Conjunction with ISC'17, Frankf= urt, Germany, June 22, 2017.
Ian T. Foster, Mark Ainsworth, Bryce Allen, Julie Bes=
sac, Franck Cappello, Jong Youl Choi, Emil M. Constantinescu=
, Philip E. Davis, Sheng Di, et al., "Computing Just What Yo=
u Need: Online Data Analysis and Reduction at Extreme Scales", in 23rd Inte=
rnational European Conference on Parallel and Distributed Computing (
Sheng Di, Franck Cappello, "Optimization of Error-Bounded Lossy= Compression for Hard-to-Compress HPC Data," in IEEE Transactions on P= arallel and Distributed Systems (IEEE TPDS), 20= 17.
Ali Murat Gok, Dingwen Tao, Sheng Di, Vladimir Mironov, Yuri Al= exeev, Franck Cappello, "PaSTRI: A Novel Data Compression Algorithm for Two= -Electron Integrals in Quantum Chemistry", in IEEE/ACM 29th The Internation= al Conference for High Performance computing, Networking, Storage and Analy= sis (SC2017). [poster]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello, "In-D= epth Exploration of Single-Snapshot Lossy Compression Techniques for N-Body= Simulations", Proceedings of the 2017 IEEE International Conference on Big= Data (BigData2017), Boston, MA, USA, December 11 - 14, 2017. [short= paper]
Dingwen Tao, Sheng Di, Hanqi Guo, Zizhong Chen, and Franck Capp= ello, "Z-checker: A Framework for Assessing Lossy Compression of Scientific= Data", in The International Journal of High Performance Computing Applicat= ions (IJHPCA), 2017. [download]
<= /li>Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen and Franck Cappello,= "Optimization of Fault Tolerance for Iterative Methods with Lossy Che= ckpointing", in 27th ACM Symposium on High-Performance Parallel and Distrib= uted Computing (ACM HPDC2018), 2= 018.
Ali Murat Gok, Sheng Di, Yuri Alexeev, Dingwen Tao, V. Mironov, Xin = Liang, Franck Cappello, "PaSTRI: Error-bounded Lossy Compression for Two-El= ectron Integrals in Quantum Chemistry", in IEEE CLUSTER 2018, 2018. [best paper award (in the a= pplication, algorithms and libraries track)]
Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cappello,= "Efficient Transformation Scheme for Lossy Data Compression with Point-wis= e Relative Error Bound", in IEEE CL= USTER 2018. [best paper award&nbs= p;(in the Data, Storage, and Visualiz= ation track)]
Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, and F. Cappello, "Fi= xed-PSNR Lossy Compression for Scientific Data", in IEEE CLUSTER 2018. (short paper)
9. Version history: We recommend the latest version.
10. Other versions are available upon request