At the lowest level, FMS implements highly optimized matrix kernel operations which are designed to achieve peak performance on each specific machine. The block size used by these operations is often dictated by the machine architecture (number of floating point registers, size of cache, etc.). These optimized kernels are therefore written for specific values of NEQBLK which maximize the use of these low level machine resources.
The matrix kernels are designed to be run on a single processor. They become the atomic building blocks for all FMS operations. FMS allocates these kernel operations to the processors to obtain the parallel solution.
FMS may have more than one optimized kernel available. For example, it is typical to have a different kernel and value of NEQBLK for real and complex data.
You may change the value of NEQBLK but be prepared for dramatic changes in performance. For example, setting NEQBLK = 1 will have the effect of performing all processing at the level 1 BLAS performance.