opal_crs man page on RedHat

Man page or keyword search:  
man Server   29550 pages
apropos Keyword Search (all sections)
Output format
RedHat logo
[printable version]

OPAL_CRS(7)			   Open MPI			   OPAL_CRS(7)

NAME
       OPAL_CRS	 -  Open PAL MCA Checkpoint/Restart Service (CRS): Overview of
       Open PAL's CRS framework, and selected modules.	Open MPI 1.6.4.

DESCRIPTION
       Open PAL can involuntarily checkpoint and restart sequential  programs.
       Doing  so  requires  that Open PAL was compiled with thread support and
       that the back-end checkpointing systems are available at run-time.

   Phases of Checkpoint / Restart
       Open PAL defines three phases for checkpoint /  restart	support	 in  a
       procress:

       Checkpoint
	   When	 the  checkpoint  request arrives, the procress is notified of
	   the request before the checkpoint is taken.

       Continue
	   After a checkpoint has successfully completed, the same process  as
	   the checkpoint is notified of its successful continuation of execu‐
	   tion.

       Restart
	   After a checkpoint has successfully completed, a  new  /  restarted
	   process is notified of its successful restart.

       The Continue and Restart phases are identical except for the process in
       which they are invoked. The Continue  phase  is	invoked	 in  the  same
       process	as the Checkpoint phase was invoked. The Restart phase is only
       invoked in newly restarted processes.

GENERAL PROCESS REQUIREMENTS
       In order for a process to use the  Open	PAL  CRS  components  it  must
       adhear to a few programmatic requirements.

       First,  the  program  must  call OPAL_INIT early in its execution. This
       should only be called once, and it is not possible  to  checkpoint  the
       process without it first having called this function.

       The  program  must  call	 OPAL_FINALIZE before termination. This does a
       significant amount of cleanup. If it is not called,  then  it  is  very
       likely that remnants are left in the filesystem.

       To  checkpoint and restart a process you must use the Open PAL tools to
       do so. Using the backend checkpointer's checkpoint  and	restart	 tools
       will   lead  to	undefined  behavior.   To  checkpoint  a  process  use
       opal_checkpoint	(opal_checkpoint(1)).	To  restart  a	 process   use
       opal_restart (opal_restart(1)).

AVAILABLE COMPONENTS
       Open PAL ships with two CRS components: self and blcr.

       The following MCA parameters apply to all components:

       crs_base_verbose
	   Set the verbosity level for all components. Default is 0, or silent
	   except on error.

       crs_base_snapshot_dir
	   The directory to store the checkpoint snapshots. Default is /tmp.

   self CRS Component
       The self component invokes user-defined functions to save  and  restore
       checkpoints.  It is simply a mechanism for user-defined functions to be
       invoked at Open PAL's Checkpoint, Continue, and Restart phases.	Hence,
       the only data that is saved during the checkpoint is what is written in
       the user's checkpoint function. No libary state is saved at all.

       As such, the model for the self component is slightly differnt than for
       other  components. Specifically, the Restart function is not invoked in
       the same process image  of  the	process	 that  was  checkpointed.  The
       Restart	phase  is  invoked during OPAL_INIT of the new instance of the
       applicaiton (i.e., it starts over from main()).

       The self component has the following MCA parameters:

       crs_self_prefix
	   Speficy a string prefix for the name of the	checkpoint,  continue,
	   and	restart functions that Open PAL will invoke during the respec‐
	   tive stages. That is,  by  specifying  "-mca	 crs_self_prefix  foo"
	   means that Open PAL expects to find three functions at run-time:

	      int foo_checkpoint()

	      int foo_continue()

	      int foo_restart()

	   By default, the prefix is set to "opal_crs_self_user".

       crs_self_priority
	   Set the self components default priority

       crs_self_verbose
	   Set the verbosity level. Default is 0, or silent except on error.

       crs_self_do_restart
	   This is mostly internally used. A general user should never need to
	   set this value. This is set to non-0 when a the new process	should
	   invoke  the	restart callback in OPAL_INIT. Default is 0, or normal
	   execution.

   blcr CRS Component
       The Berkeley Lab Checkpoint/Restart (BLCR) single-process checkpoint is
       a  software  system developed at Lawrence Berkeley National Laboratory.
       See the project website for more details:

	   http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml

       The blcr component has the following MCA parameters:

       crs_blcr_priority
	   Set the blcr components default priority.

       crs_blcr_verbose
	   Set the verbosity level. Default is 0, or silent except on error.

   none CRS Component
       The none component simply selects no CRS	 component.  All  of  the  CRS
       function calls return immediately with OPAL_SUCCESS.

       This  component	is  the last component to be selected by default. This
       means that if another component is available, and  the  none  component
       was  not	 explicity requested then OPAL will attempt to activate all of
       the available components before falling back to this component.

SEE ALSO
	 opal_checkpoint(1), opal_restart(1)

1.6.4				 Feb 19, 2013			   OPAL_CRS(7)
[top]

List of man pages available for RedHat

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net