TAA Tools
CHKSYSCND       CHECK SYSTEM CONDITION                 TAAMSGM

The  Check  System   Condition  command  checks  for   critical  system
messages and sends break messages to a list of users.

The system  sends critical system messages such  as 'mirroring has been
suspended'  or 'a disk storage capacity  threshold has been reached' to
QSYSOPR (See  the later  discussion for  more details).   Because  many
messages  may exist in  QSYSOPR, critical  messages may  be overlooked.
If  QSYSOPR is not  in break  mode, no one  may be aware  of a critical
system condition.    Critical system  messages  are  also sent  to  the
QSYSMSG message queue if it exists.

CHKSYSCND submits a  never ending batch job  which continually monitors
QSYSMSG.

The major advantages of CHKSYSCND are:

  **   It  selects the critical  message IDs rather  than every message
       that arrives in QSYSOPR.

  **   It uses the SHOUT TAA tool to  send messages to a named list  of
       users  so  that  a critical  condition  is  more  likely  to  be
       noticed.

To use  CHKSYSCND, you must  first create the QSYSMSG  message queue in
QSYS  if it does  not already exist.   No other  program can be reading
this message queue  and it cannot  be in break  mode to a  workstation.
Any  message  ID sent  to  QSYSMSG will  be  removed  from the  message
queue.   If you want to process some of  the message IDs not handled by
CHKSYSCND, see the later instructions for modifying the programs.

The  QSYSMSG  message   queue  is  described  in   detail  in  the   CL
Programmer's Guide.  To create the queue specify:

      CRTMSGQ     MSGQ(QSYS/QSYSMSG)
                    TEXT('Message queue for critical system messages')

A typical approach  to start CHKSYSCND would  be to use the  command in
the  auto  start job  for the  controlling  subsystem.   See  the later
discussion for  how  to  do  this.    You  must  have  *JOBCTL  special
authority to use CHKSYSCND.  A typical command would be:

      CHKSYSCND   USERS(QSYSOPR QSECOFR JONES *FIRSTUSER)

This  would  cause  the CHKSYSCND  job  to  be  submitted  which  would
continually monitor QSYSMSG.

The  value *FIRSTUSER  is a special  value intended  for the  case when
none  of the users  specified are active.   If this  occurs, the active
users are  checked  and the  user  with the  highest user  class  (e.g.
*SYSOPR,  *PGMR  etc.) is  sent  a  generic  message and  requested  to
inform the system administrator of the critical condition.

The  value *ALLACTIVE may be specified  (instead of *FIRSTUSER) to send
a similar message to all active users.

Because the CHKSYSCND  function is assumed  to be a  critical job,  any
failures  found   within  the  job   will  cause  a   'critical  system
condition'  message to  be sent.   The CHKSYSCND  command can  be ended
externally (ENDJOB) without causing this message.

Excess interruptions
--------------------

The intent of  CHKSYSCND is  to bring to  the attention  of the  system
administrator the fact that  a critical system condition exists.   Once
this is known  and a plan has been made  for correction, it is probably
desirable  to  re-adjust CHKSYSCND  to avoid  the  list of  users being
annoyed on an hourly basis until the problem is fixed.

For example, if you  are running out of addresses, the  normal recovery
action would  be to cause an IPL  during the evening.   Or if mirroring
has  been suspended, the Service representative  may be scheduled in to
repair the problem later in the  day.  In the meantime, CHKSYSCND  will
continue to send  the 'critical system condition'  message hourly until
the problem is fixed.

You  can avoid  this by  ending the  CHKSYSCND job.   You  could submit
CHKSYSCND with a smaller list of  users to be notified until the  known
problem is fixed and then revert back to the original list.

Because you may want  to use CHKSYSCND independently of  the auto start
job, it  may be desirable  to place the  normal CHKSYSCND command  in a
CL program and call it from the auto start job or when needed.

Command parameters                                    *CMD
------------------

   USERS         This  value is passed thru to  the TAA tool SHOUT.  It
                 is a list of up to  10 user names that will be  sent a
                 message if  a critical condition occurs.   If the user
                 is  active, the message is sent  as a break message to
                 the  workstation  message  queue  where  the  user  is
                 signed on.   If the user  is not active, a  message is
                 sent to his user message queue.

                 For  additional   details  and  a  discussion  of  the
                 *FIRSTUSER and  *ALLACTIVE  special  values,  see  the
                 SHOUT TAA tool.

   JOBQ          The qualified  name  of the  job queue  to submit  the
                 batch  job to.   The default is  QINTER in QGPL.   The
                 intent  of QINTER is  to allow the job  to act like an
                 interactive job and  not use one  of the normal  batch
                 job activity levels.

   JOBD          The qualified  name of the job description  to use for
                 the  batch job.  The default is  QBATCH in QGPL.  This
                 allows  control  of  other  job  attributes  for   the
                 CHKSYSCND job.

System handling of critical conditions
--------------------------------------

You  should review  the discussion  of  QSYSMSG in  the CL  Programmers
Guide if  you are interested in  the details of the  messages which are
sent.  The  following describes  the highlights of  the messages  which
are sent  to QSYSMSG  in QSYS  if it  exists.  For  the detail  message
IDs, see member TAAMSGMC2 in TAATOOL/QATTCL.

  **   Address threshold.   The system  will send a  message every hour
       if   the   addresses  used   percentage  (either   permanent  or
       temporary) exceeds 90%.

  **   Storage capacity  threshold.   A  message will  be  sent if  the
       storage  used  percentage  exceeds  the threshold  value.    The
       threshold  value is specified in SST for  each ASP.  The default
       is 90%.

  **   Disk errors.   Some disk units perform  a threshold checking  of
       recoverable  error conditions.   This  allows the  system to  be
       informed  when   the  disk  unit  is  operating,  but  excessive
       recoverable errors are  occurring.  When  an internal  threshold
       is reached,  a critical system  message is sent  requesting that
       Service be  informed.  In most cases,  the system will re-signal
       the message on an hourly basis (up to 10 times).

  **   Mirroring  suspended.   If  a mirroring  unit fails,  the system
       will send  a  single message  when mirroring  is  suspended.   A
       similar message will also be sent every hour.

  **   Parity protection  suspended.   If parity protection  exists and
       a  parity  unit fails,  the system  will  send a  single message
       when parity protection  is suspended.   A  similar message  will
       also be sent every hour.

  **   Battery weak  or failed.   Some  battery protection devices  can
       be  tested to determine  if they are  weak or have  failed.  The
       system sends a message if it senses either condition.

  **   Hardware failures.   Certain  hardware internal  units (such  as
       the bus)  are tested regularly  and the  system sends a  message
       if errors are found.

  **   Significant  security  violations.   Certain  critical  security
       errors  are also  sent to  QSYSMSG.   For example, a  message is
       sent if  a user attempts  to invalidly  signon to a  workstation
       more  than the  allowed  value for  the  QMAXSIGN system  value.
       Security  messages are  ignored  by CHKSYSCND.   You  may modify
       the program to process these separately.

All of the system critical messages are  sent to QSYSOPR.  It is up  to
the user  to place  QSYSOPR in  break mode  and be  sensitive to  which
messages  are critical  and which  are  not.   All the  critical system
messages are alertable and will also appear in QHST.

Alerts coming  into the  system  (from a  remote  system) are  sent  to
QSYSOPR.   The System Management  Utility supports  an option to  allow
you  to send  the Alerts  to a  specific queue.    You can  process the
queue in the same manner as QSYSMSG.

Using CHKSYSCND in an auto start job
------------------------------------

CHKSYSCND  can  be  easily  placed  into  an  auto  start  job  for the
controlling subsystem.  This  is normally the best place to  ensure the
job  will be  active  whenever  the system  is  up and  is  not in  the
restricted state.

The  source for the system supplied auto  start job can be retrieved by
use of the RTVCLSRC command  and naming a source file/member where  you
want the source placed.

       RTVCLSRC   PGM(QSTRUP)     SRCFILE(yyyyyyy) MBR(zzzz)

It would be  normal to place the  CHKSYSCND command as one  of the last
functions  performed (e.g.  just  prior to the RETURN  statement in the
startup program).  Create your own version of the startup program.

Change  the  system  value  QSTRUPPGM  to  specify  the  name  of   the
library/program  you  created.   You  should  consider  a  separate  CL
program containing  the CHKSYSCND command as described  in the previous
section on 'Excess interruptions'.

Message text sent
-----------------

There are two forms of messages sent:

  **   List  of users on the  command.  The list  of users specified on
       the command  will  receive the  actual  first level  of  message
       text  that has  causes  the system  critical  condition and  the
       message  ID.    The  message  is  'wrapped'  with standard  text
       before and after  the actual  text.  Because  a 'break  message'
       is  sent, there  is  no second  level  text  available with  the
       message.

       For   example,  the  message   text  supplied   in  the  program
       produces:

                        ***** A critical system condition has
                        occurred. ***** The message ID is xxxxxxx.
                         The text is- yyyyyyyyyyyyyyyyyyyyyyyyyyy
                         Contact the system administrator immediately.

  **   If none of  the specified  users are active  and *FIRSTUSER  was
       specified, the text sent to some user is:

                          ***** A critical system condition
                          has been found by CHKSYSCND. ******
                          No other active user has been
                          informed. Contact the system
                          administrator immediately.

  **   If *ALLACTIVE  is specified, specific  users may also  be named.
       Any specifically  named users always receive  the text described
       previously.   If  the user  is active, but  is not  in the named
       list, the following text is sent:

                          ***** A critical system condition
                          has been found by CHKSYSCND. ******
                          Contact the system administrator
                          immediately.

You may modify this text for your own requirements.

Testing CHKSYSCND
-----------------

There are two functions you can consider testing:

  **   SHOUT command.   You  can test your  list of  users by  directly
       executing the SHOUT command.

  **   CHKSYSCND  command.   You  can  simulate  the  system sending  a
       critical  message by  sending  the same  message ID  directly to
       QSYSMSG.   The  following TSTCHKSYS  program  can be  used  (the
       source from  this text can be  directly copied into a  CL source
       member).

             /* TSTCHKSYS - Test the CHKSYSCND command */
             PGM        PARM(&MSGID)
             DCL        &MSGID *CHAR LEN(7)
             SNDPGMMSG  MSGID(&MSGID) MSGF(QCPFMSG) +
                          TOMSGQ(QSYS/QSYSMSG)
             ENDPGM

       To  test  the address  threshold  condition,  you would  specify
       message CPI0997 message as:

             CALL       TSTCHKSYS PARM(CPI0997)

Modifying the programs
----------------------

You may want to modify the  program that is controlling the reading  of
QSYSMSG and the message  text used for the SHOUT command.   The program
to review  is TAAMSGMC2 in  TAATOOL.  If  you are going  to modify this
program,  you  should make  a copy  of the  entire CHKSYSCND  tool (See
CPYTAA2).  Make the changes  and then create the tool  using CRTTAATOOL
and specify your source library.

The following are typical changes you may want to make:

  **   Not  all of  the  messages arriving  at  QSYSMSG are  considered
       'critical'.    For  example, security  messages  are  ignored as
       well as  the fact  that mirroring  has been  resumed.   You  may
       want  to have  some  unique handling  of  these  messages.   You
       should  review  the QSYSMSG  discussion in  the  CL Programmer's
       Guide.   Any  message  received is  removed  from  QSYSMSG  (the
       messages also  exist in QSYSOPR).   If you  want to save  any of
       the  messages,  you  would  need  to modify  the  program  (e.g.
       resend them to a different queue).

  **   The  same  message  ID received  from  QSYSMSG is  sent  via the
       SHOUT command  for the  critical  conditions.   The first  level
       message  text is  wrapped with  the standard  text  as described
       previously.   You  may want  to alter  either the  standard text
       that is  wrapped around  the message  or the  text for  specific
       conditions.   Because a  break message  will be  sent, only  the
       first  level of  the text (no  message ID  or 2nd level)  can be
       sent.

  **   The SHOUT command  provides for  the case of  a generic  message
       to be  sent  to *FIRSTUSER  or *ALLACTIVE.   The  text does  not
       identify  the specific  problem, but  requests the  user contact
       the  system administrator.   TAAMSGMC2  provides a  text message
       (described earlier)  that is designed  to be  sent to a  typical
       end user.  You may want to modify this text.

Restrictions
------------

The  QSYSMSG message queue  must exist  and it  cannot be  allocated to
any other program (e.g.  be in break mode to a user).

You must have *JOBCTL special authority to use CHKSYSCND.

Prerequisites
-------------

The following TAA Tools must be on your system:

     EXTLST       Extract list
     SHOUT        Shout message
     SNDCOMPMSG   Send completion message
     SNDESCMSG    Send escape message

Implementation
--------------

The  tool  is  ready to  use,  but  you must  ensure  that  the QSYSMSG
message queue exists  in QSYS.   If it  has not been  created, see  the
earlier discussion.

Objects used by the tool
------------------------

   Object        Type        Attribute      Src member    Src file
   ------        ----        ---------      ----------    ----------

   CHKSYSCND     *CMD                       TAAMSGM       QATTCMD
   TAAMSGMC      *PGM           CLP         TAAMSGMC      QATTCL
   TAAMSGMC2     *PGM           CLP         TAAMSGMC2     QATTCL

Structure
---------

CHKSYSCND  command
  TAAMSGMC submits batch pgm TAAMSGMC2

      TAAMSGMC2   CL pgm
        SHOUT        TAA tool
					

Added to TAA Productivity tools April 1, 1995


Home Page Up to Top