Performance Questions and a little need for some education.
Bob Deskin
Bob.Deskin at ca.ibm.com
Tue Aug 18 13:51:30 CDT 2009
Guy's comments reminded me that you can use CHOOSE with an index name
(CHOOSE VIAINDEX) to retrieve data via that index name. You may be able to
get rid of a sort (use SORTED) or at least make a sort more efficient
because the data will be read in part of the required order.
What follows are some tips that were from the Advanced PowerHouse Topics
Seminar. They should still be useful.
Bob
ACCESS
• The ACCESS statement is intended to provide a data structure that can be
used to report or update. The actual retrieval sequence should be
optimized for that purpose.
• ACCESS statements should be coded keeping data structure and content in
mind, in order to achieve significant I/O reductions.
Example
> ACCESS fileA LINK TO fileB LINK TO file C
> SELECT IF conditions
If the conditions that apply to fileC are seldom satisfied, but the
conditions that apply to fileB are frequently satisfied, then a
significant I/O reduction is achieved by coding.
> ACCESS fileA LINK TO fileC LINK TO fileB
• Similarly, specify required files early in the ACCESS list, and OPTIONAL
files later.
• Take advantage of records with unique keys. QUIZ/QTP has "smart" file
retrieval where these values are concerned, and only rereads them when the
key value changes.
• Sorting can be avoided by linking via a key which would have been the
sort-key. This is especially true in QTP. SORTED can be used instead of
SORT when it is important to have all transactions with a specific key
value together, but the sequence of key values is unimportant.
CHOOSE Statement
• The CHOOSE statement forces a keyed retrieval on the primary file in the
ACCESS statement. When no CHOOSE statement is specified, the primary file
is read sequentially. Employing CHOOSE can improve performance by
reducing the number of records read. Always use CHOOSE instead of SELECT,
if specific key item values are known.
Example
> ACCESS fileA ; contains 10,000 records
> SELECT IF key-item-of-A = "ok" ; true for only 100 records
will read 10,000 records, whereas the statement
> ACCESS fileA
> CHOOSE key-item-of-A "ok"
will read only 100 - a 100-fold improvement.
• CHOOSE can be used without a key value for KSAM files to avoid sorting,
since the read is in key sequence.
SELECT Statement
• SELECT file IF is more efficient than SELECT IF, due to the timing of
the condition evaluation. Use SELECT file IF when the condition is based
on one file.
• The one exception to this is when a file in the SELECT file IF is
retrieved via a unique key. If the condition fails, the buffer is
initialized and the record must be reread for the next complex, even if
the key value is the same (the "smart" retrieval cannot be done). In this
case, use SELECT IF.
DEFINE Statement
• DEFINE statements in QUIZ are evaluated once per record complex, as soon
as the required data is read. The DEFINE is evaluated only when the
records required to evaluate the DEFINE are reread. Evaluation can be
delayed by conditioning on a file at the end of the ACCESS list, using
RECORD file EXISTS. This is useful when the DEFINE is based on items in a
file that occur early in the ACCESS list, but a large percentage of record
complexes will be rejected.
• DEFINE statements in QTP are evaluated when the name is referenced.
Combine expressions whenever possible to avoid extra evaluations.
Example
> DEFINE A = 1
> DEFINE B = A + A
> DEFINE C = B + B
When C is referenced, A must be evaluated four times. Note that in
QUIZ, each DEFINE is evaluated at most once per record complex.
• If the condition and result are constants, the CASE option can be used
and is more efficient than IF/ELSE. With either option, sequence the
conditions such that the most likely occurs first.
SORT vs SORTED
• Overuse of the SORT statement is a common error.
Example
> ACCESS fileA LINK TO fileB
> SORT ON sort-key-1
> REPORT SUMMARY sort-key-1 sort-key-2 other-item
> SET SUBFILE
> GO
> ACCESS *QUIZWORK LINK TO fileC
> SORT ON sort-key-1
In this case, the sort in the second pass should read SORTED because the
subfile is already in the correct sequence.
• SORTED, instead of SORT, is especially useful in QTP when the sort-key
is also the key used for linkage and retrieval. Because all of the
records for one key value are retrieved together, (and if no other file
items must be sorted), there is no need to sort. It is the grouping by
key value that is important, not the sequence of groups. Note that SORTED
does not check to ensure that records are in true sequence.
• For indexed files only, a sorted read on the key can be forced using the
CHOOSE statement with no key values. In this case, the KSAM key file
structure eliminates the need for a SORT.
Example
> ACCESS indexed-file
> CHOOSE VIAINDEX indexed-key
> SORTED ON indexed-key-segment
• Because the limiting factor in sorting is generally physical I/Os,
efficiency can be improved by presorting a smaller record. In the case
where fileB is a big record (many bytes per record).
Example
> REQUEST ONE
> ACCESS fileA
> SORT ON segment
> SUBFILE SORTKEY INCLUDE segment
> REQUEST TWO
> ACCESS *SORTKEY LINK segment TO segment OF fileA LINK segment OF fileA
TO segment OF fileB
> SORTED ON segment
>—-> other statements <—
The above example is more efficient than
> ACCESS fileA LINK TO fileB
> SORT ON segment
>—-> other statements <—
because the sort is performed on a smaller record so that a single
physical I/O transfers many more logical records.
• This technique is most useful in QTP, which sorts the entire
transaction, but is also useful in QUIZ if a large record complex is to be
sorted. This technique not only improves speed, but also requires less
disk space for sorting.
• The previous technique works if the subfile can include the key item
that links to the other files. Sometimes, the sort-keys are usually not
the key items, and the key items cannot be conveniently added to the
subfile. This may be the case in a complex linkage where the subfile
construction requires the complete ACCESS statement. In effect, all
linkages and record retrieval would be done twice.
• If disk space is a concern, and the transaction is large, a three pass
technique can be used. The linkage and record retrieval is done in the
first pass, which creates two subfiles - one with the sort-keys and a
counter, the other with the transaction. Any selection should be done in
the first pass. The sort-key subfile is sorted and then linked to the
second subfile by record number.
Example
> REQUEST ONE
> ACCESS file1 LINK TO file2…
> TEMPORARY RECORD-COUNT INTEGER SIZE 4
> ITEM RECORD-COUNT COUNT
> SUBFILE SORTKEY INCLUDE RECORD-COUNT, sort-key1, sortkey2…
> SUBFILE TRANS INCLUDE file1, file2…
> REQUEST TWO
> ACCESS *SORTKEY
> SORT ON sort-key1, sort-key2…
> SUBFILE SORTKEYS INCLUDE SORTKEY
> REQUEST THREE
> ACCESS *SORTKEYS &
> LINK TO RECORD (RECORD-COUNT - 1) OF *TRANS
> SORTED ON sort-key1, sort-key2…
>——->other statements<——
QTP Techniques
• Use a TEMPORARY with ITEM statements instead of a DEFINE. The DEFINE is
evaluated when referenced. The request can be constructed to evaluate the
TEMPORARY once only. This is useful if the item value does not change
during the request, and therefore should not be evaluated. Constant
values include items based only on execution-time parameters. If a value
does not change for an entire run, use a GLOBAL TEMPORARY item.
• Update at control-breaks whenever possible.
• Condition changes to record items for files to be updated, if the record
status may not change. QTP updates only if item values change.
• Evaluate the structure of large runs and requests (and reports in QUIZ).
It may be possible to combine passes that access the same file, thereby
reducing I/O. If this is done, ensure that protective mechanisms are not
being bypassed, as when using one pass to update master files and the next
pass to delete the transactions already used. Alternatively, it may be
more efficient to split a large request into two, using subfiles to pass
data.
More information about the powerh-l
mailing list