There are two options for the inner-loop when performing back substitution:
DO J = N,1,-1
S = X(J)
DO I = 1,(J-1)
X(I) = X(I) - S*U(I,J) <---Addresses [U] by columns
END DO Loads X(I), U(I,J) and stores X(I)
END DO
DO I = N,1,-1
S = 0
DO J = (I+1),N
S = S + U(I,J)*X(J) <---Addresses [U] by rows
END DO Loads X(J), U(I,J) and no stores.
X(I) = X(I) - S
END DO
The outer product has an advantage because it addresses the matrix [U] by columns, which is how the data is stored. However it requires 3 memory references, including a store, per cycle of the inner loop.
The inner product has an advantage because it requires only 2 memory references and no store per cycle of the inner loop. However the matrix [U] is addressed by columns, which is across the direction of storage.
To overcome the addressing difficulties of the dot product, it is possible to build a temporary vector containing the ith row of [U]. Then the algorithm proceeds with the advantage of the dot product and incremental addressing.
This strategy only works if there are several vectors being processed (multiple RHS's) to amortize the cost of loading the temporary vector with the row of [U].
The options provided by the IPBACK parameter direct FMS how to make these choices. These options are:
This is one of those fine tuning parameters that is problem and machine dependent. It is recommended that you use the default value unless you are processing a large number of solution vectors on a small memory machine.