sanlock man page on SuSE

Man page or keyword search:  
man Server   14857 pages
apropos Keyword Search (all sections)
Output format
SuSE logo
[printable version]

SANLOCK(8)							    SANLOCK(8)

NAME
       sanlock - shared storage lock manager

SYNOPSIS
       sanlock [COMMAND] [ACTION] ...

DESCRIPTION
       The sanlock daemon manages leases for applications running on a cluster
       of hosts with shared storage.  All lease management and coordination is
       done  through  reading  and  writing blocks on the shared storage.  Two
       types of leases are used, each based on a different algorithm:

       "delta leases" are slow to acquire and require regular  i/o  to	shared
       storage.	  A delta lease exists in a single sector of storage.  Acquir‐
       ing a delta lease involves reads and writes to that sector separated by
       specific	 delays.  Once acquired, a lease must be renewed by updating a
       timestamp in the sector regularly.  sanlock uses a delta	 lease	inter‐
       nally  to  hold a lease on a host_id.  host_id leases prevent two hosts
       from using the same host_id and provide basic host liveness information
       based on the renewals.

       "paxos  leases"	are  generally	fast to acquire and sanlock makes them
       available to applications as general purpose resource leases.  A	 paxos
       lease  exists in 1MB of shared storage (8MB for 4k sectors).  Acquiring
       a paxos lease involves reads and writes to max_hosts (2000) sectors  in
       a  specific  sequence  specified	 by  the  Disk Paxos algorithm.	 paxos
       leases use host_id's internally to indicate the owner of the lease, and
       the algorithm fails if different hosts use the same host_id.  So, delta
       leases provide the unique host_id's used in paxos leases.  paxos leases
       also refer to delta leases to check if a host_id is alive.

       Before  sanlock	can be used, the user must assign each host a host_id,
       which is a number between 1 and 2000.  Two hosts should	not  be	 given
       the  same host_id (even though delta leases attempt to detect this mis‐
       take.)

       sanlock views a pool of storage as a "lockspace".  Each	distinct  pool
       of  storage, e.g. from different sources, would typically be defined as
       a separate lockspace, with a unique lockspace name.

       Part of this storage space must be reserved and initialized for sanlock
       to  store delta leases.	Each host that wants to use the lockspace must
       first acquire a delta lease on its host_id number within the lockspace.
       (See  the add_lockspace action/api.)  The space required for 2000 delta
       leases in the lockspace (for 2000 possible host_id's) is 1MB  (8MB  for
       4k  sectors).   (This  is  the  same  size  required for a single paxos
       lease.)

       More storage space must be reserved and initialized for	paxos  leases,
       according to the needs of the applications using sanlock.

       The  following  steps illustrate these concepts using the command line.
       Applications may choose to do these same steps through libsanlock.

       1. Create storage pools and reserve and initialize host_id leases
       two different LUNs on a SAN: /dev/sdb, /dev/sdc
       # vgcreate pool1 /dev/sdb
       # vgcreate pool2 /dev/sdc
       # lvcreate -n hostid_leases -L 1MB pool1
       # lvcreate -n hostid_leases -L 1MB pool2
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -s LS2:0:/dev/pool2/hostid_leases:0

       2. Start the sanlock daemon on each host
       # sanlock daemon

       3. Add each lockspace to be used
       host1:
       # sanlock client add_lockspace -s LS1:1:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:1:/dev/pool2/hostid_leases:0
       host2:
       # sanlock client add_lockspace -s LS1:2:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:2:/dev/pool2/hostid_leases:0

       4. Applications can now reserve/initialize space for  resource  leases,
       and then acquire the leases as they need to access the resources.

       The  resource  leases that are created and how they are used depends on
       the application.	 For example, say application A, running on host1  and
       host2,	needs	to   synchronize   access   to	 data	it  stores  on
       /dev/pool1/Adata.  A could use a resource lease as follows:

       5. Reserve and initialize a single resource lease for Adata
       # lvcreate -n Adata_lease -L 1MB pool1
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0

       6. Acquire the lease from the app using libsanlock (see	sanlock_regis‐
       ter,  sanlock_acquire).	 If the app is already running as pid 123, and
       has registered with the sanlock daemon, the lease can be added  for  it
       manually.
       # sanlock client acquire -r LS1:Adata:/dev/pool1/Adata_lease:0 -p 123

       offsets

       offsets	must  be  1MB aligned for disks with 512 byte sectors, and 8MB
       aligned for disks with 4096 byte sectors.

       offsets may be used to place leases on  the  same  device  rather  than
       using  separate	devices	 and offset 0 as shown in examples above, e.g.
       these commands above:
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0
       could be replaced by:
       # sanlock direct init -s LS1:0:/dev/pool1/leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/leases:1048576

       failures

       If a process holding resource leases fails or exits  without  releasing
       its leases, sanlock will release the leases for it automatically.

       If  the	sanlock daemon cannot renew a lockspace host_id for a specific
       period of time (usually because storage access is lost),	 sanlock  will
       kill any process holding a resource lease within the lockspace.

       If  the	sanlock	 daemon crashes or gets stuck, it will no longer renew
       the expiry time of its per-host_id connections to the wdmd daemon,  and
       the watchdog device will reset the host.

       watchdog

       sanlock	uses  the  wdmd(8) daemon to access /dev/watchdog.  A separate
       wdmd connection is maintained with wdmd for each host_id being renewed.
       Each  host_id  connection  has  an  expiry time for some seconds in the
       future.	After each successful host_id  renewal,	 sanlock  updates  the
       associated  expiry time in wdmd.	 If wdmd finds any connection expired,
       it will not pet /dev/watchdog.  After enough successive	expired/failed
       checks, the watchdog device will fire and reset the host.

       After a number of failed attempts to renew a host_id, sanlock kills any
       process using that lockspace.  Once all those  processes	 have  exited,
       sanlock	will  unregister the associated wdmd connection.  wdmd will no
       longer find the expired connection, and will resume petting /dev/watch‐
       dog  (assuming  it finds no other failed/expired tests.)	 If the killed
       processes did not exit quickly enough, the expired wdmd connection will
       not be unregistered, and /dev/watchdog will reset the host.

       Based on these known timeout values, sanlock on another host can calcu‐
       late, based on the last host_id renewal, when the failed host will have
       been reset by its watchdog (or killed all the necessary processes).

       If  the	sanlock	 daemon	 itself	 fails, crashes, get stuck, it will no
       longer update the expiry time for  its  host_id	connections  to	 wdmd,
       which will also lead to the watchdog resetting the host.

       safety

       sanlock leases are meant to guarantee that two process on two hosts are
       never allowed to hold the same resource lease at once.  If  they	 were,
       the  resource being protected may be corrupted.	There are three levels
       of protection built into sanlock itself:

       1. The paxos leases and delta leases themselves.

       2. If the  leases  cannot  function  because  storage  access  is  lost
       (host_id's  cannot be renewed), the sanlock daemon kills any pids using
       resource leases in the lockspace.

       3. If the pids do not exit after being killed, or if the sanlock daemon
       fails, the watchdog device resets the host.

OPTIONS
       COMMAND can be one of three primary top level choices

       sanlock daemon start daemon
       sanlock client send request to daemon (default command if none given)
       sanlock direct access storage directly (no coordination with daemon)

       sanlock daemon [options]

       -D no fork and print all logging to stderr

       -Q 0|1 quiet error messages for common lock contention

       -R 0|1 renewal debugging, log debug info for each renewal

       -L pri write logging at priority level and up to logfile (-1 none)

       -S pri write logging at priority level and up to syslog (-1 none)

       -U uid user id

       -G gid group id

       -t num max worker threads

       -g sec seconds for graceful recovery

       -w 0|1 use watchdog through wdmd

       -h 0|1 use high priority (RR) scheduling

       -l num use mlockall (0 none, 1 current, 2 current and future)

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock client action [options]

       sanlock client status

       Print  processes, lockspaces, and resources being manged by the sanlock
       daemon.	Add -D to show extra internal  daemon  status  for  debugging.
       Add  -o	p  to  show  resources	by  pid,  or -o s to show resources by
       lockspace.

       sanlock client host_status

       Print state of host_id delta  leases  read  during  the	last  renewal.
       State  of  all  lockspaces  is shown (use -s to select one).  Add -D to
       show extra internal daemon status for debugging.

       sanlock client log_dump

       Print the sanlock daemon internal debug log.

       sanlock client shutdown

       Ask the sanlock daemon to exit.	Without the force option (-f  0),  the
       command will be ignored if any lockspaces exist.	 With the force option
       (-f 1), any registered processes will be killed, their resource	leases
       released, and lockspaces removed.

       sanlock client init -s LOCKSPACE
       sanlock client init -r RESOURCE

       Tell  the  sanlock  daemon to initialize storage for lease areas.  (See
       sanlock direct init.)

       sanlock client align -s LOCKSPACE

       Tell the sanlock daemon to report the required lease  alignment	for  a
       storage path.  Only path is used from the LOCKSPACE argument.

       sanlock client add_lockspace -s LOCKSPACE

       Tell  the  sanlock  daemon  to  acquire	the  specified	host_id in the
       lockspace.  This will allow resources to be acquired in the lockspace.

       sanlock client inq_lockspace -s LOCKSPACE

       Ask to the sanlock daemon weather the lockspace is acquired or not.

       sanlock client rem_lockspace -s LOCKSPACE

       Tell the sanlock	 daemon	 to  release  the  specified  host_id  in  the
       lockspace.   Any	 processes  holding  resource leases in this lockspace
       will be killed, and the resource leases not released.

       sanlock client command -r RESOURCE -c path args

       Register with the sanlock daemon, acquire the specified resource lease,
       and  exec  the  command at path with args.  When the command exits, the
       sanlock daemon will release the lease.  -c must be the final option.

       sanlock client acquire -r RESOURCE -p pid
       sanlock client release -r RESOURCE -p pid

       Tell the sanlock daemon to acquire or release  the  specified  resource
       lease  for  the given pid.  The pid must be registered with the sanlock
       daemon.	acquire	 can  optionally  take	a  versioned  RESOURCE	string
       RESOURCE:lver,  where  lver  is	the  version of the lease that must be
       acquired, or fail.

       sanlock client inquire -p pid

       Print the resource leases held the given pid.  The  format  is  a  ver‐
       sioned RESOURCE string "RESOURCE:lver" where lver is the version of the
       lease held.

       sanlock client request -r RESOURCE -f force_mode

       Request the owner of a resource do something specified  by  force_mode.
       A  versioned  RESOURCE:lver  string must be used with a greater version
       than is presently held.	Zero lver and force_mode clears the request.

       sanlock client examine -r RESOURCE

       Examine the request record for the currently held  resource  lease  and
       carry out the action specified by the requested force_mode.

       sanlock client examine -s LOCKSPACE

       Examine	requests  for  all resource leases currently held in the named
       lockspace.  Only lockspace_name is used from the LOCKSPACE argument.

       sanlock direct action [options]

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock direct init -s LOCKSPACE
       sanlock direct init -r RESOURCE

       Initialize storage for  2000  host_id  (delta)  leases  for  the	 given
       lockspace,  or initialize storage for one resource (paxos) lease.  Both
       options require 1MB of space.  The host_id in the LOCKSPACE  string  is
       not  relevant to initialization, so the value is ignored.  (The default
       of 2000 host_ids	 can  be  changed  for	special	 cases	using  the  -n
       num_hosts and -m max_hosts options.)

       sanlock direct read_leader -s LOCKSPACE
       sanlock direct read_leader -r RESOURCE

       Read a leader record from disk and print the fields.  The leader record
       is the single sector of a delta lease, or the first sector of  a	 paxos
       lease.

       sanlock direct read_id -s LOCKSPACE
       sanlock direct live_id -s LOCKSPACE

       read_id	reads a host_id and prints the owner.  live_id reads a host_id
       once a second until it the timestamp or owner change (prints  live  1),
       or  until  host_dead_seconds  (prints  live  0).	 (host_dead_seconds is
       derived from the io_timeout option.  The live 0|1 conclusion  will  not
       match  the  sanlock  daemon's conclusion unless the configured timeouts
       match.)

       sanlock direct dump path[:offset]

       Read disk sectors and print leader records for delta or	paxos  leases.
       Add  -f	1  to  print  the  request record values for paxos leases, and
       host_ids set in delta lease bitmaps.

   LOCKSPACE option string
       -s lockspace_name:host_id:path:offset

       lockspace_name name of lockspace
       host_id local host identifier in lockspace
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string
       -r lockspace_name:resource_name:path:offset

       lockspace_name name of lockspace
       resource_name name of resource
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string with version
       -r lockspace_name:resource_name:path:offset:lver

       lver leader version or SH for shared lease

   Defaults
       sanlock help shows the default values for the options above.

       sanlock version shows the build version.

USAGE
   Request/Examine
       The first part of making a  request  for	 a  resource  is  writing  the
       request	record	of  the	 resource  (the	 sector	 following  the leader
       record).	 To make a successful request:

       ·  RESOURCE:lver must be greater than the lver presently	 held  by  the
	  other host.  This implies the leader record must be read to discover
	  the lver, prior to making a request.

       ·  RESOURCE:lver must be greater than or equal to  the  lver  presently
	  written to the request record.  Two hosts may write a new request at
	  the same time for the same lver, in which case both  would  succeed,
	  but the force_mode from the last would win.

       ·  The force_mode must be greater than zero.

       ·  To  unconditionally  clear  the  request  record  (set both lver and
	  force_mode to 0), make request with RESOURCE:0 and force_mode 0.

       The owner of the requested resource will not know of the request unless
       it  is  explicitly  told	 to  examine  its  resources via the "examine"
       api/command, or otherwise notfied.

       The second part of making a request is  notifying  the  resource	 lease
       owner  that  it	should	examine	 the  request  records of its resource
       leases.	The notification will cause the lease owner  to	 automatically
       run  the	 equivalent  of	 "sanlock client examine -s LOCKSPACE" for the
       lockspace of the requested resource.

       The notification is made using a bitmap in each	host_id	 delta	lease.
       Each  bit represents each of the possible host_ids (1-2000).  If host A
       wants to notify host B to examine its resources, A sets the bit in  its
       own  bitmap  that  corresponds to the host_id of B.  When B next renews
       its delta lease, it reads the delta leases for  all  hosts  and	checks
       each  bitmap  to see if its own host_id has been set.  It finds the bit
       for its own host_id set	in  A's	 bitmap,  and  examines	 its  resource
       request	records.   (The bit remains set in A's bitmap for request_fin‐
       ish_seconds.)

       force_mode determines the action the resource lease owner should take:

       1 (KILL_PID): kill the process holding the resource  lease.   When  the
       process	has  exited, the resource lease will be released, and can then
       be acquired by anyone.  The kill	 signal	 is  SIGKILL  (or  SIGTERM  if
       SIGKILL is restricted.)

       2 (SIGUSR1): send SIGUSR1 to the process holding the resource lease.

   Graceful recovery
       When  a	lockspace  host_id  cannot be renewed for a specific period of
       time, sanlock enters a recovery mode in which it attempts  to  forcibly
       release	any  resource leases in that lockspace.	 If all the leases are
       not released within 60 seconds, the watchdog will fire,	resetting  the
       host.

       The  most  immediate way of releasing the resource leases in the failed
       lockspace is by sending SIGKILL to all pids  holding  the  leases,  and
       automatically  releasing	 the  resource leases as the pids exit.	 After
       all pids have exited, no resource leases are held in the lockspace, the
       watchdog	 expiration  is	 removed,  and the host can avoid the watchdog
       reset.

       A slightly more graceful approach is to send SIGTERM to	a  pid	before
       escalating  to  SIGKILL.	  sanlock does this by sending SIGTERM to each
       pid, once a second, for the first N  seconds,  before  sending  SIGKILL
       once a second for the remaining M seconds (N/M can be tuned with the -g
       daemon option.)

       An even more graceful approach is to configure a program for sanlock to
       run that will terminate or suspend each pid, and explicitly release the
       leases it held.	sanlock will run this program for each pid.  It has  N
       seconds	to  terminate  the pid or explicitly release its leases before
       sanlock escalates to SIGKILL for the remaining M seconds.

SEE ALSO
       wdmd(8)

				  2011-08-05			    SANLOCK(8)
[top]

List of man pages available for SuSE

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net