|
1. Data in the lower triangle [AL] is stored by rows. Data in the upper triangle [AU] is stored by columns. |
|
NEQBLK. Data in the lower triangle [AL] is first stored in columns NEQBLK equation tall. These columns are then stored in a direction proceeding toward the diagonal. Data in the upper triangle [AU] is first stored in rows NEQBLK equations wide. These rows are then stored in a direction proceeding toward the diagonal. |
|
NEQBIO.
(BLOCK and SLAB matrices only). Data in the lower triangle [AL] is stored in the blocks by columns (not transposed). Data in the upper triangle [AU] is stored in the blocks by rows (transposed). |
This option is provided in FMS to accommodate different machine architectures. In all cases, fetching data from memory sequentially is desirable (incremental addressing). The following algorithms are matched to each of the data storage options provided by IJSTEP to produce incremental addressing.
DO J = 1,N
DO I = 1,N
S = 0
C Loop across row I of [AL], down column J of [AU]:
DO K = 1,N
S = S + AL(I,K)*AU(K,J)
END DO
C(I,J) = C(I,J) + S
END DO
END DO
On most machines dot products give good performance because there are only two memory load operations and no memory store in the inner-loop. However, the accumulation of data into the register S is not implemented on some machines with vector hardware.
DO J = 1,N,NEQBLK
DO I = 1,N,NEQBLK
S11= 0
S21= 0
S12= 0
S22= 0
DO K = 1,N
C Fetch next NEQBLK terms from [AL] and [AU]:
S11 = S11 + AL(I ,K)*AU(K,J )
S21 = S21 + AL(I+1,K)*AU(K,J )
S12 = S12 + AL(I ,K)*AU(K,J+1)
S22 = S22 + AL(I+1,K)*AU(K,J+1)
END DO
C(I ,J ) = C(I ,J ) + S11
C(I+1,J ) = C(I+1,J ) + S21
C(I ,J+1) = C(I ,J+1) + S12
C(I+1,J+1) = C(I+1,J+1) + S22
END DO
END DO
Note that all data is fetched from [AL] and [AU] incrementally. In addition, the inner-loop has only 4 memory loads for 4 multiply and 4 add operations. This uses only half the memory bandwidth of the dot product. Increasing NEQBLK further reduces memory requirements. This algorithm is preferred for RISC processors. NEQBLK is picked as large as possible to use all the floating point registers.
DO K = 1,N
DO J = 1,N
C Loop across row J of [AU]:
S = AU(K,J)
C Loop down column I of [C] and [AL]:
DO I = 1,N
C(I,J) = C(I,J) + AL(I,K)*S
END DO
END DO
END DO
This algorithm avoids the accumulation of the dot product and is optimal for some vector machines.
If you attempt to set IJSTEP to a value that is not permitted, FMS will correct it to the closest reasonable value.
The default value of IJSTEP is designed to work in conjunction with the optimized matrix kernels specified with NEQBLK. Changing the value of IJSTEP may significantly effect performance.
NOTE: If you are performing substructuring, values of IJSTEP=NEQBLK are not permitted. In some cases, it may be necessary to override the default values.