LinuxGuruz
  • Last 5 Forum Topics
    Replies
    Views
    Last post


The Web Only This Site
  • BOOKMARK

  • ADD TO FAVORITES

  • REFERENCES


  • MARC

    Mailing list ARChives
    - Search by -
     Subjects
     Authors
     Bodies





    FOLDOC

    Computing Dictionary




  • Text Link Ads






  • LINUX man pages
  • Linux Man Page Viewer


    The following form allows you to view linux man pages.

    Command:

    pbs_mom

    
    
    

    SYNOPSIS

           pbs_mom   [-a alarm]   [-C chkdirectory]   [-c config]   [-d directory]
           [-H hostname] [-L logfile] [-M MOMport] [-R RPPport] [-p|-q|-r] [-x]
    
    
    

    DESCRIPTION

           The pbs_mom command starts the operation of a  batch  Machine  Oriented
           Mini-server,  MOM,  on the local host.  Typically, this command will be
           in a local boot file such  as  /etc/rc.local  .   To  insure  that  the
           pbs_mom  command  is  not  runnable  by the general user community, the
           server will only execute if its real and effective uid is zero.
    
           One function of pbs_mom is to place jobs into execution as directed  by
           the  server,  establish resource usage limits, monitor the job's usage,
           and notify the server when the job completes.  If they  exist,  pbs_mom
           will  execute  a prologue script before executing a job and an epilogue
           script after executing the job.  The next function  of  pbs_mom  is  to
           respond to resource monitor requests.  This was done by a separate pro-
           cess in previous versions of PBS but has now  been  combined  into  one
           process.   The resource monitor function is provided mainly for the PBS
           scheduler.  It provides information about the status of  running  jobs,
           memory  available  etc.   The next function of pbs_mom is to respond to
           task manager requests.  This involves communicating with running  tasks
           over a tcp socket as well as communicating with other MOMs within a job
           (aka a "sisterhood").
    
           Pbs_mom will record a diagnostic message in a log file  for  any  error
           occurrence.   The  log  files  are maintained in the mom_logs directory
           below the home directory of the server.  If  the  log  file  cannot  be
           opened, the diagnostic message is written to the system console.
    
    
    

    OPTIONS

           -a alarm        Used  to  specify the alarm timeout in seconds for com-
                           puting a resource.  Every time a  resource  request  is
                           processed,  an  alarm  is  set  for the given amount of
                           time.  If the request  has  not  completed  before  the
                           given  time, an alarm signal is generated.  The default
                           is 5 seconds.
    
           -C chkdirectory Specifieds the path  of  the  directory  used  to  hold
                           checkpoint  files.   [Currently  this  is only valid on
                           Cray    systems.]     The    default    directory    is
                           PBS_HOME/spool/checkpoint,  see  the  -d  option.   The
                           directory specified with the -C option must be owned by
                           root  and  accessible (rwx) only by root to protect the
                           security of the checkpoint files.
    
           -c config       Specify a alternative configuration file, see  descrip-
                           tion below.  If this is a relative file name it will be
                           relative to PBS_HOME/mom_priv, see the -d  option.   If
                           the  specified  file  cannot  be  opened,  pbs_mom will
                           abort.  If the -c option is not supplied, pbs_mom  will
                           attempt to open the default
                           If  not  specified,  MOM will open a file named for the
                           current date in the  PBS_HOME/mom_logs  directory,  see
                           the -d option.
    
           -M port         Specifies  the  port  number  on  which the mini-server
                           (MOM) will listen for batch requests.
    
           -R port         Specifies the port  number  on  which  the  mini-server
                           (MOM)  will  listen for resource monitor requests, task
                           manager requests and inter-MOM messages.   Both  a  UDP
                           and a TCP port of this number will be used.
    
           -p              (Default  after  version 2.4.0) (Preserve running jobs)
                           -- Specifies the impact on jobs which were in execution
                           when the    mini-server shut-down.  The -p option tries
                           to preserve any running jobs  when  the  MOM  restarts.
                           The  new mini-server will not be the parent of any run-
                           ning jobs, MOM has lost control of her  offspring  (not
                           a  new situation for a mother).  The MOM will allow the
                           jobs to continue to run and monitor them indirectly via
                           polling. All recovered jobs will report an exit code of
                           0 when they are complete. The  -p  option  is  mutually
                           exclusive with the -r, -P and -q options.
    
           -P              (Terminate  all jobs and remove them from the queue) --
                           Specifies the impact on jobs which  were  in  execution
                           when the mini-server shut-down.  With the -P option, it
                           is assumed that  either  the  entire  system  has  been
                           restarted  or the MOM has been down so long that it can
                           no longer guarantee that the pid of any running process
                           is the same as the recorded job process pid of a recov-
                           ering job. Unlike the -p option no attempt is  made  to
                           try  and preserve or recover running jobs. All jobs are
                           terminated and removed from the queue.  The  -q  option
                           is mutually exclusive with the -p, -q and -r options.
    
           -q              (Requeue  all  jobs  -  This is the default behavior in
                           versions prior to 2.4.0) --  Specifies  the  impact  on
                           jobs  which were in execution when the mini-servershut-
                           down. Do not terminate running processes.  With the  -q
                           option, it is assumed that either the entire system has
                           been restarted or the MOM has been down so long that it
                           can  no  longer  guarantee  that the pid of any running
                           process is the same as the recorded job process pid  of
                           a  recovering  job. No attempt is made to kill job pro-
                           cesses.  The MOM will mark the jobs as  terminated  and
                           notify the batch server which owns the job. Re-runnable
                           jobs will be  requeued.   The  -q  option  is  mutually
                           exclusive with the -p, -P and -r options.
    
           -r              (Terminate  running  processes and requeue all jobs) --
                           Specifies the impact on jobs which  were  in  execution
                           match   the   port   value  which  was  used  to  start
                           pbs_server.
    
           -x              Disables the check for privileged port resource monitor
                           connections.  This is used mainly for testing since the
                           privileged port is the only mechanism used  to  prevent
                           any ordinary user from connecting.
    
    
    

    CONFIGURATION FILE

           The  configuration file may be specified on the command line at program
           start with the -c flag.  The use of this file  is  to  provide  several
           types  of  run  time  information to pbs_mom: static resource names and
           values, external resources provided by a program to be run  on  request
           via  a shell escape, and values to pass to internal set up functions at
           initialization (and re-initialization).
    
           Each item type is on a single line with the component  parts  separated
           by  white  space.  If the line starts with a hash mark (pound sign, #),
           the line is considered to be a comment and is skipped.
    
           Static Resources
                  For static resource names and  values,  the  configuration  file
                  contains  a  list  of  resource names/values pairs, one pair per
                  line and separated  by  white  space.    An  Example  of  static
                  resource  names and values could be the number of tape drives of
                  different types and could be specified by
    
                  tape3480      4
                  tape3420      2
                  tapedat       1
                  tape8mm       1
    
           Shell Commands
                  If the first character of the value is an exclamation mark  (!),
                  the  entire rest of the line is saved to be executed through the
                  services of the system(3) standard library routine.
    
                  The shell escape provides a means for the  resource  monitor  to
                  yield arbitrary information to the scheduler.  Parameter substi-
                  tution is done such that the value of any  qualifier  sent  with
                  the  query,  as explained below, replaces a token with a percent
                  sign (%) followed by the name of the  qualifier.   For  example,
                  here is a configuration file line which gives a resource name of
                  "escape":
    
                  escape     !echo %xxx %yyy
    
                  If a query for "escape" is sent with no qualifiers, the  command
                  executed  would  be "echo %xxx %yyy".  If one qualifier is sent,
                  "escape[xxx=hi there]", the command executed would be  "echo  hi
                  there     %yyy".      If     two     qualifiers     are    sent,
    
           Initialization Value
                  An initialization value directive has a name which starts with a
                  dollar sign ($) and must be known to MOM via an internal  table.
                  The entries in this table now are:
    
                  pbsserver
                         which  defines  hostnames running pbs_server that will be
                         allowed to  submit  jobs,  issue  Resource  Monitor  (RM)
                         requests,  and  get status updates.  MOM will continually
                         attempt to contact all server hosts for node  status  and
                         state  updates.   Like  $PBS_SERVER_HOME/server_name, the
                         hostname may be followed by a colon and  a  port  number.
                         This  parameter  replaces  the  oft-confused  $clienthost
                         parameter from TORQUE 2.0.0p0 and earlier.  Note that the
                         hostname  in  $PBS_SERVER_HOME/server_name  is used if no
                         $pbsserver parameters are found
    
                  pbsclient
                         which causes a host name to be added to the list of hosts
                         which  will  be allowed to connect to MOM as long as they
                         are using a privilaged port for the purposes of  resource
                         monitor  requests.   For example, here are two configura-
                         tion file lines which will allow  the  hosts  "fred"  and
                         "wilma" to connect:
    
                         $pbsclient      fred
                         $pbsclient      wilma
    
                         Two  host  name  are  always  allowed  to  connection  to
                         pbs_mom, "localhost" and the name returned to pbs_mom  by
                         the  system  call gethostname().  These names need not be
                         specified in the configuration file.  The hosts listed as
                         "clients"  can  issue  Resource  Monitor  (RM)  requests.
                         Other MOM nodes and servers do not need to be  listed  as
                         clients.
    
                  restricted
                         which causes a host name to be added to the list of hosts
                         which will be allowed to connect to MOM  without  needing
                         to use a privilaged port.  These names allow for wildcard
                         matching.  For example, here is a configuration file line
                         which  will  allow  queries from any host from the domain
                         "ibm.com".
    
                         $restricted      *.ibm.com
    
                         The restriction which applies  to  these  connections  is
                         that  only  internal  queries  may be made.  No resources
                         from a config file will be found.  This is to prevent any
                         shell commands from being run by a non-root process.
                         This  parameter is generally not required except for some
                         versions of OSX.
                         which sets a factor used to adjust cpu  time  used  by  a
                         job.   This  is  provided  to  allow  adjustment  of time
                         charged and limits enforced where the job  might  run  on
                         systems  with different cpu performance.  If Mom's system
                         is faster than the reference system, set  cputmult  to  a
                         decimal  value  greater  than  1.0.    If Mom's system is
                         slower, set cputmult to a value between 1.0 and 0.0.  For
                         example:
    
                         $cputmult 1.5
                         $cputmult 0.75
    
                  usecp  specifies  which  directories  should  be  staged with cp
                         instead of rcp/scp.  If a shared filesystem is  available
                         on all hosts in a cluster, this directive is used to make
                         these filesystems known to MOM.  For example, if /home is
                         NFS mounted on all nodes in a cluster:
    
                         $usecp *:/home  /home
    
                  wallmult
                         which  sets a factor used to adjust wall time usage by to
                         job to a common reference system.  The factor is used for
                         walltime  calculations and limits the same as cputmult is
                         used for cpu time.
    
                  configversion
                         specifies the version of the config file data, a  string.
    
                  check_poll_time
                         specifies  the  MOM interval in seconds.  MOM checks each
                         job for updated resource usages, exited processes,  over-
                         limit  conditions,  etc.  once  per interval.  This value
                         should be equal or lower to  pbs_server's  job_stat_rate.
                         High  values  result  in  stale  information  reported to
                         pbs_server.  Low values result in increased system  usage
                         by MOM.  Default is 45 seconds.
    
                  down_on_error
                         causes MOM to report itself as state "down" to pbs_server
                         in the event of a failed health check.  This  feature  is
                         EXPERIMENTAL and likely to be removed in the future.  See
                         HEALTH CHECK below.
    
                  ideal_load
                         ideal processor load.  Represents a low  water  mark  for
                         the  load  average.   Nodes  that are currently busy will
                         consider itself free after falling below ideal_load.
    
                  auto_ideal_load
                         if jobs are running, sets idea_load  based  on  a  simple
                         expression.   The expressions start with the variable 't'
                         log_file_max_size. This value  is  interpreted  as  kilo-
                         bytes.
    
                  log_file_roll_depth
                         If  this is set to a value >=1 and  log_file_max_size  is
                         set then  pbs_mom  will continue rolling the log files to
                         log-file-name.log_file_roll_depth.
    
                  max_load
                         maximum processor load.  Nodes over this load average are
                         considered busy (see ideal_load above).
    
                  auto_max_load
                         if jobs are running, sets  max_load  based  on  a  simple
                         expression.   The expressions start with the variable 't'
                         (total assigned CPUs) or 'c' (existing CPUs), an operator
                         (+ - / *), and followed by a float constant.
    
                  enablemomrestart
                         enable  automatic  restarts of MOM.  If enabled, MOM will
                         check if its binary has been updated and  restart  itself
                         at  a  safe  point  when no jobs are running; thus making
                         upgrades easier.  The check  is  made  by  comparing  the
                         mtime  of the pbs_mom executable.  Command-line args, the
                         process name, and the PATH  env  variable  are  preserved
                         across  restarts.   It  is  recommended  that this not be
                         enabled in the config file, but enabled when desired with
                         momctl (see RESOURCES for more information.)
    
                  node_check_script
                         specifies  the  fully  qualified  pathname  of the health
                         check script to run (see HEALTH CHECK for  more  informa-
                         tion).
    
                  node_check_interval
                         specifies  when  to  run the MOM health check.  The check
                         can be either periodic, event-driver, or both.  The value
                         starts  with  an  integer  specifying  the  number of MOM
                         intervals between subsequent executions of the  specified
                         health  check.   After  the integer is an optional comma-
                         separated list of event names.  Currently  supported  are
                         "jobstart"  and  "jobend".  This value defaults to 1 with
                         no events indicating the check is run every MOM interval.
                         (see HEALTH CHECK for more information)
    
                         $node_check_interval 0Disabled.
                         $node_check_interval 0,jobstartOnly
                         $node_check_interval 10,jobstart,jobend
    
                  prologalarm
                         Specifies  maximum  duration  (in  seconds) which the MOM
    
                  remote_checkpoint_dirs
                         Specifies what server checkpoint directories are remotely
                         mounted.   This  directive  is used to tell the MOM which
                         directories are shared with  the  server.   Using  remote
                         checkpoint  directories  eliminates  the need to copy the
                         checkpoint files back and forth between the MOM  and  the
                         server. This parameter is available in 2.4.1 and later.
    
                         $remote_checkpoint_dirs /var/spool/torque/checkpoint
    
                  remote_reconfig
                         Enables  the ability to remotely reconfigure pbs_mom with
                         a new config file.  Default is disabled.  This  parameter
                         accepts various forms of true, yes, and 1.
    
                  timeout
                         Specifies  the number of seconds before TCP messages will
                         time out.  TCP messages include job  obituaries,  and  TM
                         requests if RPP is disabled.  Default is 60 seconds.
    
                  tmpdir Sets  the  directory  basename  for  a  per-job temporary
                         directory.  Before job launch, MOM will append the  jobid
                         to  the  tmpdir basename and create the directory.  After
                         the job exit, MOM will recursively delete  it.   The  env
                         variable  TMPDIR  will be set for all pro/epilog scripts,
                         the job script, and TM tasks.
                         Directory creation and removal is done as the  job  owner
                         and  group,  so  the  owner must have write permission to
                         create the directory.  If the  directory  already  exists
                         and  is  owned  by  the job owner, it will not be deleted
                         after the job.  If the directory already  exists  and  is
                         NOT  owned  by  the  job  owner,  the  job  start will be
                         rejected.
    
                  status_update_time
                         Specifies (in seconds) how often MOM updates  its  status
                         information  to  pbs_server.  This value should correlate
                         with  the  server's  scheduling  interval.   High  values
                         increase  the  load  of  pbs_server and the network.  Low
                         values cause  pbs_server  to  report  stale  information.
                         Default is 45 seconds.
    
                  varattr
                         This  is  similar to a shell escape above, but includes a
                         TTL.  The command will only be run every TTL seconds.   A
                         TTL  of  -1  will  cause  the command to be executed only
                         once.  A TTL of 0 will cause the command to be run every-
                         time  varattr  is  requested.  This parameter may be used
                         multiple times, but all output will  be  grouped  into  a
                         single "varattr" attribute in the request and status out-
                         put.  The command should  output  data  in  the  form  of
                         varattrname=va1ue1[+value2]...
    
                  Specifies  the path to the xauth binary to enable X11 fowarding.
    
           ignvmem
                  If set to  true,  then  pbs_mom  will  ignore  vmem/pvmem  limit
                  enforcement.
    
           ignwalltime
                  If set to true, then pbs_mom will ignore walltime limit enforce-
                  ment.
    
           mom_host
                  Sets the local hostname as used by pbs_mom.
    
    
    

    RESOURCES

           Resource Monitor queries  can  be  made  with  momctl's  -q  option  to
           retrieve  and  set pbs_mom options.  Any configured static resource may
           be retrieved with a request of  the  same  name.   These  are  resource
           requests not otherwise documented in the PBS ERS.
    
           cycle  forces an immediate MOM cycle
    
           status_update_time
                  retrieve or set the $status_update_time parameter
    
           check_poll_time
                  retrieve or set the $check_poll_time parameter
    
           configversion
                  retrieve the config version
    
           jobstartblocktime
                  retrieve or set the $jobstartblocktime parameter
    
           enablemomrestart
                  retrieve or set the $enablemomrestart parameter
    
           loglevel
                  retrieve or set the $loglevel parameter
    
           down_on_error
                  retrieve or set the EXPERIMENTAL $down_on_error parameter
    
           diag0 - diag4
                  retrieves various diagnostic information
    
           rcpcmd retrieve or set the $rcpcmd parameter
    
           version
                  retrieves the pbs_mom version
    
    
    

    HEALTH CHECK

           The  health  check  script  is  executed directly by the pbs_mom daemon
           If the script detects a failure when run from "jobstart", then the  job
           will  be  rejected.   This  should  probably only be used with advanced
           schedulers like Moab so that the job can be routed to another node.
    
           TORQUE currently ignores ERROR messages by default, but advanced sched-
           ulers like moab can be configured to react appropriately.
    
           If the experimental $down_on_error MOM setting is enabled, MOM will set
           itself to state down and report  to  pbs_server;  and  pbs_server  will
           report   the   node   as   "down".    Additionally,   the  experimental
           "down_on_error" server attribute can be  enabled  which  has  the  same
           effect  but  moves the decision to pbs_server.  It is redundant to have
           MOM's $down_on_error and pbs_server's down_on_error  features  enabled.
           See "down_on_error" in pbs_server_attributes(7B).
    
    
    

    FILES

           $PBS_SERVER_HOME/server_name
                  contains the hostname running pbs_server.
    
           $PBS_SERVER_HOME/mom_priv
                     the  default  directory  for  configuration  files, typically
                     (/usr/spool/pbs)/mom_priv.
    
           $PBS_SERVER_HOME/mom_logs
                     directory for log files recorded by the server.
    
           $PBS_SERVER_HOME/mom_priv/prologue
                     the administrative script to be run before job execution.
    
           $PBS_SERVER_HOME/mom_priv/epilogue
                     the administrative script to be run after job execution.
    
    
    

    SIGNAL HANDLING

           pbs_mom handles the following signals:
    
           SIGHUP causes pbs_mom to re-read  its  configuration  file,  close  and
                  reopen the log file, and reinitialize resource structures.
    
           SIGALRM
                  results  in  a  log  file entry. The signal is used to limit the
                  time taken by certain children processes, such as  the  prologue
                  and epilogue.
    
           SIGINT and SIGTERM
                  results in pbs_mom exiting without terminating any running jobs.
                  This is the action for the following signals as  well:  SIGXCPU,
                  SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
    
           SIGUSR1, SIGUSR2
                  causes  MOM  to  increase  and  decrease logging levels, respec-
                  tively.
    
    
    

    SEE ALSO

           pbs_server(8B), pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the  PBS
           External Reference Specification, and the PBS Administrator's Guide.
    
    
    

    Local pbs_mom(8B)

    
    
  • MORE RESOURCE


  • Linux

    The Distributions





    Linux

    The Software





    Linux

    The News



  • MARKETING






  • Toll Free

webmaster@linuxguruz.com
Copyright © 1999 - 2016 by LinuxGuruz