This confluence server is slated for retirement. To create new spaces, see The GCE Confluence Server. To request a migration of your existing Confluence spaces, see our space migration request form. For more information on the CELS General Computing Environment, see the CELS Virtual Help Desk.
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

AID: Adaptive Impact-Driven Detection library for corruption detection

AID provides a way for HPC users of dynamic simulations over multiple time steps to detect corruptions that impact the results of their execution.

AID is designed to monitor the state data of the application: variables that are the outcome of the execution.

AID is a library offering functions to help programmers defining which variable should be monitored.

AID offers only detection. For recovery we suggest to combine AID with FTI. But AID could be used in combination with any other recovery library.

AID is simple to use: 

   There are only four steps for users to annotate their MPI application codes:

   (1) initialize the detector by calling SDC_Init();

   (2) specify the key variables to protect by calling SDC_Protect(var,ierr);

   (3) annotate the execution iterations by inserting SDC_Snapshot() into the key loop;

   (4) release the memory by calling SDC_Finalize() in the end.

AID supports both C and Fortran.

-->>> Code download <<<-- (soon, pending DoE approval of distribution licence)

(The code is ready to use, but it cannot be released now because the BSD license is under approval process. Before the official release, the code is available upon request. Contact: disheng222@gmail.com)

If you download the code, please let us know who you are. We are very keen of helping you using the AID library.

A paper describing AID and its detection performance is under submission.

 

 

  • No labels