Lead: Franck Cappello, ANL
Main collaborators: Marc Snir (ANL and UIUC), Jon Calhoun (Clemson), Bill Kramer (UIUC), Bogdan Nicolae (IBM Dublin), Thomas Ropars (EPFL), Amina Guermouche (UVSQ), Frederic Vivien (Inria), Yves Robert (LIP), Satoshi Matsuoka (Titech), Mitsuhisa Sato (U. Tsukuba), Omer Subasi (BSC), Osman Unsal (BSC), Leonardo Bautista Gomez (BSC)
X. Zou, T. Lu, W. Xia, X. Wang, W. Zhang, S. Di, D. Tao, F. Cappello, Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanism, in Proceedings of the 35th International Conference on Massive Storage Systems and Technology (MSST19), 2019.
S. Di, H. Guo, R. Gupta, E. Pershey, M. Snir, F. Cappello, Exploring Properties and Correlations of Fatal Events in a Large-Scale HPC System, in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2018.
W. He, H. Guo, T. Peterka, S. Di, F. Cappello, HW Shen, Parallel Partial Reduction for Large-Scale Data Analysis and Visualization, in The 8th IEEE Symposium on Large Data Analysis and Visualization (IEEE LDAV) in conjunction with IEEE VIS 2018, Berlin, Germany, October 21, 2018.
C. Wang, N. Dryden, F. Cappello, and M. Snir. Neural Network Based Silent Error Detector, in IEEE CLUSTER 2018, 2018. [best paper award (in the programming and system softwaretrack)]
A. Murat Gok, S. Di, Y. Alexeev, D. Tao, V. Mironov, F. Cappello. PaSTRI: Error-bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry, in IEEE CLUSTER 2018, 2018. [best paper award (in the application, algorithms and libraries track)]
X. Liang, S. Di, D. Tao, Z. Chen, and F. Cappello. Efficient Transformation Scheme for Lossy Data Compression with Point-wise Relative Error Bound, in IEEE CLUSTER 2018. [best paper award (in the Data, Storage, and Visualization track)]
D. Tao, S. Di, X. Liang, Z. Chen, and F. Cappello. Fixed-PSNR Lossy Compression for Scientific Data, in IEEE CLUSTER 2018. (short paper)
D. Tao, S. Di, X. Liang, Z. Chen and F. Cappello. Optimization of Fault Tolerance for Iterative Methods with Lossy Checkpointing, in 27th ACM Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC2018), 2018.
S. Di, D. Tao, X. Liang, and F. Cappello. Efficient Lossy Compression for Scientific Data based on Pointwise Relative Error Bound, in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2018.
H. Guo, S. Di, R. Gupta, T. Peterka, F. Cappello, La VALSE: Scalable Visual Analysis of Logs for Fault Characterization on Supercomputers, in EG Symposium on Parallel Graphics and Visualization (ECPGV2018), 2018.
D. Tao, S. Di, Z. Chen, and F. Cappello. In-Depth Exploration of Single-Snapshot Lossy Compression Techniques for N-Body Simulations, Proceedings of the 2017 IEEE International Conference on Big Data (BigData2017), Boston, MA, USA, December 11 - 14, 2017, short paper.
A. Murat Gok, D. Tao, S. Di, V. Mironov, Y. Alexeev, F. Cappello. PaSTRI: A Novel Data Compression Algorithm for Two-Electron Integrals in Quantum Chemistry, in IEEE/ACM 29th The International Conference for High Performance computing, Networking, Storage and Analysis (SC2017). [poster]
D. Tao, S. Di, H. Guo, Z. Chen, and F. Cappello. Z-checker: A Framework for Assessing Lossy Compression of Scientific Data. in The International Journal of High Performance Computing Applications (IJHPCA), 2017.
S. Di, F. Cappello. Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data. in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2017.
E. Berrocal, L. Bautista-Gomez, S. Di, Z. Lan, and F. Cappello. Toward General Software Level Silent Data Corruption Detection for Parallel Applications. in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2017.
F. Cappello, R. Gupta, S. Di, E. Constantinescu, T. Peterka, and S. M. Wild. Understanding and improving the trust in results of numerical simulations and scientific data analytics. in 10th workshop on resilience in high performance computing (resilience) in Clusters, Clouds and Grids, in the conjunction with 23rd International European Conference on Parallel and Distributed Computing (Euro-Par), 2017.
I. T. Foster, M. Ainsworth, B. Allen, J. Bessac, F. Cappello, J. Youl Choi, E. M. Constantinescu, P. E. Davis, S. Di, et al.. Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales. in 23rd International European Conference on Parallel and Distributed Computing (Euro-Par 2017), 2017. pp. 3-19.
D. Tao, S. Di, Z. Chen, and F. Capello. Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets. Proceedings of the 1st International Workshop on Data Reduction for Big Scientific Data (DRBSD1) in Conjunction with ISC'17, Frankfurt, Germany, June 22, 2017.
S. Di, Y. Robert, F. Vivien, and F. Cappello. Toward an Optimal Online Checkpoint Solution under a Two-Level HPC Checkpoint Model, in IEEE Transactions on Parallel and Distributed Computing (IEEE TPDS), 2017.
S. Di, R. Gupta, E. Pershey, M. Snir, F. Cappello. LogAider: A tool for mining potential correlations in HPC Log Events. in IEEE/ACM 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ACM CCGrid2017), Spain, 2017.
D. Tao, S. Di, F. Cappello. A Novel Algorithm for Significantly Improving Lossy Compression of Scientific Data Sets, " in International Parallel and Distributed Processing Symposium (IEEE/ACM IPDPS 2017), Orlando, Florida, 2017.
Pierre-Louis Guhur, Hong Zhong, Tom Peterka, Emil Constantinescu and Franck Cappello, Lightweight and Accurate Silent Data Corruption Detection in Ordinary Differential Equation Solvers, Europar 2016
E. Berrocal, L. Bautista Gomez, S. Di, Z. Lan and F. Cappello, Exploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications, Europar 2016
Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era, IEEE/ACM CCGRID'2016
E. Berrocal, L. Bautista-Gomez, S. Di, Z. Lan, F. Cappello, Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications, short paper, ACM HPDC 2015
S. Di, E. Berrocal, K. Heisey, L. Bautista-Gomez, R. Gupta, F. Cappello, Towards Effective Detection of Silent Data Errors for HPC Applications, Poster, IEEE/ACM SC14
L. Bautista Gomez, P. Balaprakash, S. Bouguerra, S. Wild, F. Cappello and P. Hovland, Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies, Poster, IEEE Cluster 2014
S. Di, F. Cappello, GloudSim: Google Trace based Cloud Simulator with Virtual Machines, in Journal of Software: Practice and Experience (Wiley SPE), 2014.