The Analyze Field command sequences an externally described file on a
named field and allows analysis by percentiles, ranges, or unique
values. For each summary line printed, decimal fields may be summed,
averaged, the maximum and minimum values determined, and the standard
deviation calculated. Selection criteria may be specified to limit
what is processed.
A physical, logical, or joined logical file may be processed (only
single format files are valid).
A typical command to print the unique values for the field CODE and
summarize the AMT field would be:
ANZFLD FILE(xxx) ANZTYPE(*UNIQUE) ANZFLD(CODE)
SUMFLD((AMT *SUM))
The TAA SORTDBF command is used to sequence the file on the CODE
field and create a temporary file in QTEMP. SORTDBF uses OPNQRYF for
selection and the Sort API to sort the data.
The ANZFLD program reads the sorted file and outputs a spooled file.
There would be one line for each unique CODE value including the sum
of the AMT field and a count of the number of records. A final total
line is also printed along with the median value found (the record at
the 50th percentile), and the mode value found (the value with the
most number of occurrences).
The default is to display the information. The spooled file is
retained.
Up to 30 fields may be summarized. The width of the spooled file is
378 bytes (maximum supported by DSPSPLF). If you want a hard copy of
the results, you will probably not be able to request 30 summary
fields.
You may also get the average, maximum, minimum, and standard
deviation for AMT by modifying the SUMFLD parameter:
SUMFLD((AMT *AVG)(AMT *MAX)(AMT *MIN)(AMT *STDDEV)
You may use different fields such as a SUMFLD parameter which
specifies the SALES and COST fields:
SUMFLD((SALES *AVG)(COST *AVG)
Percentiles
-----------
The percentile function allows you to print a line for a group of
records that make up one or more percentiles. The records are
sequenced by the named field to determine the percentiles. For
example, if there are 1000 records in the file, each 10 records
represent one percentile.
You can print all 100 percentiles or a group of percentiles. For
example, to print a line for every 10 percentiles you would specify:
ANZFLD FILE(xxx) ANZTYPE(*PERCENTILE) ANZFLD(CODE)
PERCENTILE(10) SUMFLE(...)
The PERCENTILE(10) request provides for 10 percentiles (1-10, 11-20,
... 91-100). Each number represents how many percentile groups will
be printed. You may request 2, 3, 4, 5, 10, 20, or 100 percentiles.
Ranges
------
You may identify groups to be summarized by providing a range for
each group. Up to 50 ranges may be entered. Either a decimal or
character field may be used as the field name for the ANZFLD
parameter.
The ranges must be specified in ascending sequence and cannot
overlap. A typical command would be:
ANZFLD FILE(xxx) ANZTYPE(*RANGES) ANZFLD(CODE)
RANGES((AAA BBB)(CCC DDA)(DDB EEE))
SUMFLD(...)
The 'From value' for each range must be less than the 'To value' for
the same range and must be greater than the 'To value' of the
previous range.
One line would be printed for each range that contains records.
If any records exist that do not fall into any of the specified
ranges, an 'Other' total is summarized.
When a decimal field is specified for a range of values, a decimal
type of comparison is used. Consequently, you may specify a low
value of 0 or 1 even though the field may have several digits.
Negative values are valid such as a range of -10 to -2. Since
ascending values are required, you must begin with the largest
negative value.
Range values should be entered with a decimal notation. For example,
if you have a 5 digit field with 2 decimal positions and want a range
of 1.00 to 2.00, enter 1.00 to 2.00. You must enter the same number
of digits to the right of the decimal character as are defined for
the field specified for the ANZFLD parameter.
When a character field is specified for a range of values, a
character comparison is used. You may use a shorter length than the
full field, but you must be careful to specify the proper range. For
example, assume you have a 2 byte character field and you specify
ranges of A to B, and C to D. If the data in the field to be
analyzed is BA, it will not fall in either range (it would appear as
an 'Other' value). You could use a range such as A to B9 to allow
all data such as BA to fall into the first range.
Selection
---------
Selection may be specified, to minimize the number of records that
are to be processed such as:
ANZFLD FILE(xxx) ANZTYPE(*UNIQUE) ANZFLD(CODE)
SUMFLD((AMT *SUM))
SELFLD((STATE *EQ NY))
You may select on the field named for ANZFLD or different fields.
Multiple selection fields may be named and an *AND/*OR relationship
described.
The typical boolean operators *EQ, *NE, etc are supported along with:
** The *CT operator provides 'contains' which causes a scan of
the data. For example, you may select based on a string of
characters being anywhere in the field.
** The *WC operator provides for a 'wildcard' function. The
simplest type is a generic value such as ABC* meaning any
value beginning with ABC. Other wildcard functions are
supported.
Saving the ANZFLD command
-------------------------
The ANZFLD command is typically entered interactively and refined
with multiple iterations until you achieve the results you are
looking for. A subset of a large file may be used to provide better
performance until you are satisfied with the specifications.
Once you have a version of ANZFLD that meets your requirements, you
should consider saving the command for future use.
A good tool to consider is the FAVCMD TAA tool. This allows you to
store the command with an ID that may be used in the future.
To use FAVCMD, you must first create the required files in a library
on your library list with the command:
CRTFAVCMD FAVCMDPLIB(xxx)
Then you follow the ANZFLD command with ADDFAVCMD and describe an ID
such as:
ANZFLD ...
ADDFAVCMD FAVID(yyy)
ADDFAVCMD extracts the previous command from the job's message queue
and places it in the FAVCMD file.
If you want to run the command at a later date, you would enter:
EXCFAVCMD FAVID(yyy)
There is also a shorthand version of the command:
FV FAVID(yyy)
The WRKFAVCMD command will let you work with your stored versions.
You can copy, change, display, etc from a standard work type of
display. Using FAVCMD allows you the advantage of having a solution
that is like a CL program. You can easily display, rerun, modify,
and copy to meet the same or similar requirements.
Reusing the SORTDBF results
---------------------------
When ANZFLD runs, a file named TAATMPQRY is output to QTEMP as the
result of the internal SORTDBF TAA command. The TAATMPQRY file is
processed by the ANZFLD program to produce the spooled output.
TAATMPQRY is deleted by default at the end of the command. You can
retain the file by specifying RTNOUTFILE(*YES).
The file may be used for your own purposes, but the typical use is in
conjunction with another use of ANZFLD to avoid re-running SORTDBF.
When FILE(*QTEMP) is specified, the SORTDBF command is bypassed and
processing begins immediately on the TAATMPQRY file.
Thus the sequence of commands would be:
ANZFLD FILE(xxx) ... RTNOUTFILE(*YES)
ANZFLD FILE(*QTEMP) ...
ANZFLD FILE(*QTEMP) ...
After the first use of ANZFLD, the TAATMPQRY file is used again for
additional ANZFLD commands. This allows you to vary the ANZTYPE and
SUMFLD parameters to see different results.
When FILE(*QTEMP) is specified, the TAATMPQRY file is automatically
retained (the RTNOUTFILE parameter is ignored).
Note that when FILE(*QTEMP) is specified, the library qualifier for
the FILE parameter (such as *LIBL) is ignored. The following
parameters are also ignored because they are only specified when
SORTDBF is used:
SELFLD
MBR
TRNTBL
FLTWLDCRD
FIXWLDCRD
SORTTBL
CCSID
LANGID
OPNQRYFCMD
When FILE(*QTEMP) is specified, the value of the ANZFLD parameter
must be the same as that used to create the TAATMPQRY file.
ANZFLD escape messages you can monitor for
------------------------------------------
TAA9891 The member to be processed has no records
TAA9895 No records met the selection criteria
Escape messages from based on functions will be re-sent.
Command parameters *CMD
------------------
FILE The qualified name of the file to be read. The
library value defaults to *LIBL. *CURLIB may also
be used.
A physical or logical file (including a join logical
file) may be named. Logical files may have only a
single format.
The special value *QTEMP may be entered if a prior
use of ANZFLD has specified RTNOUTFILE(*YES). The
allows reuse of the TAATMPQRY file in QTEMP and
avoids the selection and sorting function. See the
RTNOUTFILE parameter for a complete discussion.
ANZTYPE The type of analysis to be performed.
*PERCNTILE. One line is listed for the number of
percentiles specified for the PERCENTILE parameter.
*RANGES. One line is listed for each range
identified in the RANGES parameter.
*UNIQUE. The values in the ANZFLD field are
sequenced in ascending order and one line for each
unique value is listed.
*FNLTOT. One line is listed for all selected
records. Fields named in SUMFLD will be listed in
the total as well as a count of the selected
records.
ANZFLD The field to be analyzed. Either a decimal field
(packed, zoned, or binary) or a character field may
be specified. The field will be sequenced in
ascending order.
The median value (the value of the record at the
50th percentile) is always printed at the end of the
listing.
The mode value (the value having the most number of
frequencies) is always printed at the end of the
listing.
A character field may be up to 32 bytes in length.
A decimal field may be up to 15 bytes in length.
PERCENTILE The number of percentiles to be listed. An entry
must be made when ANZTYPE(*PERCNTILE) is specified.
The values that may be entered are 2, 3, 4, 5, 10,
20, or 100.
RANGES The ranges to be summarized. Up to 50 ranges may be
entered when ANZTYPE(*RANGES) is specified.
Each range is specified as a 'From' and 'To' value.
The 'From value' must be less than or equal to the
'To value' for each range.
Ranges must be specified in ascending order and
cannot overlap. The 'From value' for each range
must be greater than the 'To value' of the preceding
range.
When a decimal field is specified for a range of
values, a decimal type of comparison is used.
Consequently, you may specify a low value of 0 or 1
even though the field may have several digits.
Negative values are valid such as a range of -10 to
-2. Since ascending values are required, you must
begin with the largest negative value.
Range values should be entered with a decimal
notation. For example, if you have a 5 digit field
with 2 decimal positions and want a range of 1.00 to
2.00, enter 1.00 to 2.00. You must enter the same
number of digits to the right of the decimal
character as are defined for the field specified for
the ANZFLD parameter.
When a character field is specified for a range of
values, a character comparison is used. You may use
a shorter length than the full field, but you must
be careful to specify the proper range.
For example, assume you have a 2 byte character
field and you specify ranges of A to B, and C to D.
If the data in the field to be analyzed is BA, it
will not fall in either range (it would appear as an
'Other' value). You could use a range such as A to
B9 to allow all data such as BA to fall into the
first range.
If a blank is required for a character field range,
enter *BLANK or *BLANKS.
If a zero is required for a decimal field range,
enter 0, *ZERO, or *ZEROS.
SUMFLD Up to 30 fields to be summarized. Only decimal
fields of packed, zoned, or binary types of up to 15
digits in length may be specified. The number of
decimal positions may not exceed 9.
Each field will appear for each line that is printed
along with a count of the number of records that
exist for the summary line.
For each field specified, the type of summary must
be included:
*SUM. The sum of the values.
*AVG. The average of the values (the sum divided by
the number of records making up the sum). The same
number of decimal positions will be shown in the
answer as exist in the definition of the field being
averaged.
*MAX. The maximum value of the values.
*MIN. The minimum value of the values.
*STDDEV. The standard deviation of the values.
Standard deviation allows you to express as a number
the shape of the distribution curve. The value
shown for standard deviation is a number which if
added and subtracted from the average would provide
a range for approximately 67% of the values in a
normal distribution.
For example, if numbers 1,2,3...100 exist as values,
the median value is 50 and the standard deviation
would be 28. This means that between 22 (50-28) and
78 (50+28) you should find approximately 67% of the
values. Of the 100 values, 67 lie between 22 and
78.
The lower the standard deviation value, the tighter
the curve is. A value of 0 means that all values
fall at the median.
Two standard deviations should account for
approximately 95% of the values. Three standard
deviations should account for approximately 95+% of
the values.
Standard deviation is calculated by using the square
root of the variance. The variance is calculated as
((b-(a*a)/c))/c where
a = the sum of the values
b = the sum of the square of the values
c = the count of the values
SELFLD The field to perform selection on. This is a 'list
parameter' that allows up to 30 fields to be
described for selection.
1) Field name. The field name to perform selection
on. The default is *NONE. The field must exist in
the file unless *NONE is specified. Decimal fields
(DDS Data types of P = Packed, S = Zoned decimal, B
= Binary, F = Floating point) have certain
restrictions as described in the following parts.
You may use *NONE if you have multiple selection
fields that are primed by multiple conditions and
not all conditions have the same number of selection
values. For example if CODE = A you want to select
on FLDB, but if CODE = B you want to select on FLDA
and FLDB. You may prime the selection fields with
IF logic and then have a single ANZFLD command.
2) Operator. The type of comparison to perform.
*EQ is the default. *NE, *GT, *LT, *GE, *LE, *CT
(Contains) and *WC (Wildcard) are supported.
Decimal fields do not allow *CT or *WC operators.
*CT (contains) means the entire field is scanned for
the value). Only character fields may be used with
*CT.
*WC (Wildcard) causes an *EQ search against the data
base data with one or more wildcards. Only
character fields may be used with *WC.
Both 'fixed' and 'floating' wildcard functions are
supported. The simplest use of wildcards is for
either generic requests or where a position within a
field should be ignored when selecting.
When a generic request is needed such as selecting
all values beginning with ABC, just enter ABC or
ABC*. The default floating wildcard character is
'*'. If the floating wildcard character does not
exist at the end of the value, it is added
automatically (there must be room at the end of the
value to add a wildcard if none exists).
The fixed form of wildcard support allows any value
to exist in the search field at a single position.
You would enter A_C to select ABC, ADC, AEC, etc.
If the field is longer than 3 positions, the
floating wildcard is automatically added at the end
(if not specified) so you would select values such
as ABC, ABCDEF, ADC, and ADCX. '_' is the default
fixed wildcard character.
When *WC is used, if the length of the compare value
is equal to the length of the field in the data
base, at least one wild card character (either fixed
or floating) must exist within the compare value.
Multiple fixed wildcards can be specified such as
A_D_F. Multiple floating wildcards can be specified
such as A*A meaning that any number of values may
exist between the A's. You would select AA, ABA,
ABBBA, etc.
You may specify your own characters for the fixed
and floating wildcards. See the FLTWLDCRD and
FIXWLDCRD parameters.
For additional examples and explanation, see the
section on 'Wildcard processing' described for the
SORTDBF command.
3) Compare value. The compare value (literal) to be
used to select data. Up to 32 characters may be
entered. The length of data may not exceed the
length of the field length in the data base. See
the Translate option and *WC operator for special
handling.
For decimal data types, enter a value left adjusted.
For example, to select on a 5 digit field with a
value of 10, enter 10. If the field to select on
has decimal positions, you can select using a whole
number (such as 10) or a number with decimal
positions (such as 10.5). The decimal notation (US
= '.') should be entered in job decimal format.
A compare value must be entered (a blank value is
invalid). The special values *BLANK or *BLANKS must
be used for character fields to specify a 'blank
value'.
The special values *ZERO or *ZEROS may be used (or
enter a 0) for decimal fields to specify a 'zero
value'.
Another field name in the same data base file cannot
be used as a compare value.
4) Translate. Whether to translate the values in
the data base to upper case before making a
comparison. The default is *XLATE. Translate is
ignored for Decimal fields. If *XLATE is used, any
character fields in the data base are translated
using a translate table (see the TRNTBL parameter).
For example, if you enter a compare value of 'ABC',
the data will match in the data base for ABC, abc,
Abc, AbC, etc. Note that the default is for the
situation where you want any 'ABC' value regardless
of the case (upper or lower) of the data in the data
base. Just enter any case (upper or lower) into the
command prompt without surrounding it with quotes
(the command prompter will fold the value to upper
case).
Note that translation operates on the data in the
data base and not on the compare value.
If you enter a compare value with surrounding quotes
such as 'Abc' and take the default for translation,
you will not find any values in the data base. All
the data base values have been translated to upper
case by default before the comparison is made.
If you only want the 'Abc' values and not values
like 'ABC' or 'AbC', enter a value of 'Abc' (use
quotes surrounding the compare value) and specify
*NOXLATE for translation.
There is some performance degradation by using
translation. In most applications this is of minor
concern. If your data base data is all the same
case, you can save some overhead by specifying
*NOXLATE.
5) And/Or. The relationship of this select field
with the next select field. The entry allows you to
have one or more fields that form a group. All
fields in the group must match the selection
criteria to select the record.
*AND is the default meaning the next selection is
considered part of the same group. The value is
ignored for the last select field.
*OR may be specified to start a new group of
selections. For example, you may specify FLD1 and
FLD2 as one group and FLD3 and FLD4 as a second
group. All And/Or values would specify *AND except
for FLD2 which would specify *OR.
If you want either FLDA or FLDB equal to certain
values as long as FLDX is greater than some value,
you must specify two groups (FLDA and FLDX is one
group, FLDB and FLDX is a second group).
For more details, see the section on 'And/Or
Relationships' described for the SORTDBF tool.
OUTPUT How to output the results. * is the default which
displays the spooled file if the command is run
interactively. The spooled file is retained. If
the command is run in batch, the spooled file is
output.
*PRINT may be specified to just output the spooled
file. The spooled file is not automatically
displayed.
MBR The member to be processed. The default is *FIRST
for the first member of the file.
RTNOUTFILE Whether to retain the TAATMPQRY outfile created
internally into QTEMP. The default is *DFT meaning
the file is not retained unless FILE(*QTEMP) was
specified. If you are doing a single ANZFLD and do
not need the outfile, take the default.
*YES may be specified to retain the outfile. The
purpose of retaining the outfile is to allow the use
of multiple ANZFLD commands that reuse the file
output by the TAA SORTDBF command. This provides a
faster solution in that no selection or sorting is
done.
To reuse the TAATMPQRY file in subsequent ANZFLD
commands, specify FILE(*QTEMP) and use the same name
for the ANZFLD that was used to create the outfile.
You may specify a different ANZTYPE value, different
ranges, percentiles, summary fields, etc.
When reusing TAATMPQRY (by specifying FILE(*QTEMP),
the SELFLD parameter and other parameters such as
TRNTBL, and SORTTBL which relate to SORTDBF are
ignored. The RTNOUTFILE parameter is also ignored.
TRNTBL The translate table to be used for field
translation. *DFT is the default meaning to use the
translate table defined in the TAATRNTBL data area
in TAASECURE. The shipped default is QSYSTRNTBL
(the system supplied translate table for translating
lower case US English letters to upper case).
The data area may be changed with the command
EDTCONARR DTAARA(TAASECURE/TAATRNTBL). For more
information about this function, see the TAA Tool
RTVTRNTBL.
FLTWLDCRD The Floating Wildcard character to be used. The
default is '*'. A blank is not valid. For more
information, see the discussion with the SELFLD
parameter and/or the SORTDBF command.
FIXWLDCRD The Fixed Wildcard character to be used. The
default is '_'. A blank is not valid. For more
information, see the discussion with the SELFLD
parameter and/or the SORTDBF command.
SORTTBL The qualified name of the sort table to use. The
default is *JOB meaning to use the sort sequence of
the job.
*LANGIDUNQ may be entered to mean the unique-weight
sort sequence table that is associated with the
LANGID parameter.
*LANGIDSHR may be entered to mean the shared-weight
sort sequence table that is associated with the
LANGID parameter.
CCSID The sort sequence CCSID to be used along with the
LANGID value for retrieving the national language
sort sequence table for sorting character data. The
default is *JOB meaning to use the CCSID of the job.
A specific CCSID in the range of 1 to 65535 may be
entered.
LANGID The language ID to be used to obtain a national
language sort sequence table for sorting character
data. The default is *JOB meaning to use the
language ID of the job.
OPNQRYFCMD A *YES/*NO parameter for whether the SORTDBF command
that is generated and the OPNQRYF command used by
SORTDBF should be sent as messages.
*NO is the default.
*YES may be specified to assist in understanding how
SORTDBF operates or for problem determination.
Restrictions
------------
The maximum record length supported is 9999.
The maximum printer width supported is 378 bytes.
The maximum number of fields that may exist in a file to be analyzed
is 999.
Character fields to be analyzed or summarized cannot exceed 32
characters.
Decimal fields to be analyzed or summarized cannot exceed 15 digits
nor 9 decimal positions.
When a field is summed, 3 extra digits are added to the value to a
maximum of 15 digits.
Prerequisites
-------------
The following TAA Tools must be on your system:
CHKOBJ2 Check object 2
CHKOBJ3 Check object 3
EDTVAR Edit variable
EDTVAR2 Edit variable 2
EXTLST Extract list
HLRMVMSG HLL Remove message
RTVDBFA Retrieve data base file attributes
RTVFLDA Retrieve field attributes
RTVSYSVAL3 Retrieve system value 3
RTVTRNTBL Retrieve translate table
RTVVALA Retrieve value attributes
SCNVAR Scan variable
SNDCOMPMSG Send completion message
SNDESCMSG Send escape message
SNDSTSMSG Send status message
Implementation
--------------
None, the tool is ready to use.
Objects used by the tool
------------------------
Object Type Attribute Src member Src file
------ ---- --------- ---------- ----------
ANZFLD *CMD TAAQRYD QATTCMD
TAAQRYDC *PGM CLP TAAQRYDC QATTCL
TAAQRYDR *PGM RPG TAAQRYDR QATTRPG
|