AID: Adaptive Impact-Driven Detection library for corruption detection
AID provides a way for HPC users of dynamic simulations over multiple time steps to detect corruptions that impact the results of their execution.
AID is designed to monitor the state data of the application: variables that are the outcome of the execution.
AID is a library offering functions to help programmers defining which variable should be monitored.
AID offers only detection. For recovery we suggest to combine AID with FTI. But AID could be used in combination with any other recovery library.
AID is simple to use:
There are only four steps for users to annotate their MPI application codes:
(1) initialize the detector by calling SDC_Init();
(2) specify the key variables to protect by calling SDC_Protect(var,ierr);
(3) annotate the execution iterations by inserting SDC_Snapshot() into the key loop;
(4) release the memory by calling SDC_Finalize() in the end.
AID supports both C and Fortran.
-->>> Code download <<<-- (soon, pending DoE approval of distribution licence)
(The code is ready to use, but it cannot be released now because the BSD license is under approval process. Before the official release, the code is available upon request. Contact: firstname.lastname@example.org)
If you download the code, please let us know who you are. We are very keen of helping you using the AID library.
A paper describing AID and its detection performance is under submission.
Spatial Support-vector-machines Detector (SSD)
SSD is a low-memory-overhead effective SDC detector, by leveraging epsilon-insensitive support vector machine regression.
SSD is simple to use, similar to AID, with only four steps for users to annotate their MPI application codes. It supports both C and Fortran interfaces, which are exactly the same as those of AID.