Version 7.1 has the following enhancements over Version 7.0:

Performance Enhancements to GPU Matrix Kernels

NVIDIA's CuBlas library now takes advantage of architecheral enhancements available on the latest GPUs. When exploited by this new library, significant performance enhancements are realized, compared to the earlier libraries. These enhancements are now a standard part of this distribution.

Unfortunately GPUs with compute capability 2.0 or older (Fermi architecture) do not contain the hardware to execute this new library. To be compatible with all GPU models, FMSlib detects the compute capability of the current hardware at execution time. If hardware is detected which is incompatible with the latest software, FMSlib automatically switches to a shared object version which is compatible. These legacy shared object versions of the CuBlas library are provided as part of this distribution.

If you know you will never be running on earlier hardware you may delete these files. They are only used when earlier hardware is detected.

The following table lists the CuBlas libraries which are provided:

CuBlas used with FMSlib Version 7.1
Operating System Linux 64 Windows 64 Windows 32
Curent CuBlas 9.1
Linked in
cublas64_91.dll N/A
Legacy CuBlas 8.0
Provided as
libcublas.so
cublas64_80.dll cublas32_65.dll

New Dashboard WEB Pages for GPU Properties

New WEB pages GPU-Fixed, GPU-Chg., GPU-Dyn. and GPU-RTL have been added to the Dashboard reports, which are available in MatrixWarrior and FMSlib. These pages summarize information about the GPU's hardware properties and oprating environment.

Two layers of NVIDIA software are interrogated to obtain this information:

  1. NVML, The Nvidia Management Library.
    This is the device driver, the lowest level of software managing the GPU. It provides the following types of information:
    • Static
      This includes properties of the GPU which do not change with time. Examples include model number and where it is installed on the PCI bus.
    • Changable
      Settings which can be changed, either through the nvidia-smi utility or by an application. Examples include power and temperature operating limits.
    • Dynamic
      Performance information which is continuously changing while the GPU is operating. Examples include temperature, clock rate and power usage.
  2. Run Time Library
    The Run Time Library is layered on the NVML device driver. It also extracts some information from the device driver, as well as other settings which control the runtime environment.

Temperature, Clock Rate and Power Consumption on the Performance Page

Component Performance (Gflops)
Routine All 40
CPUs
GPU 1
33 °C
1312 MHz
57 Watts
GPU 2
34 °C
1312 MHz
55 Watts
GPU 3
34 °C
1312 MHz
56 Watts
GPU 4
32 °C
1312 MHz
54 Watts
GPU 5
31 °C
1312 MHz
55 Watts
GPU 6
33 °C
1312 MHz
56 Watts
GPU 7
35 °C
1312 MHz
56 Watts
GPU 8
31 °C
1312 MHz
54 Watts
Matrix Multiply 51730 0 6502 6519 6530 6515 6493 6494 6504 6503
CPU(0%)
Triangle Solve 48636 0 6173 6229 6221 6203 6100 6080 6161 6250
CPU(0%)
Diagonal Factor 23893 9 3123 3111 3130 3122 3133 3129 3129 3124
GPU model = Tesla V100-SXM2-16GB

Depending on the GPU model, the Performance page now displays the current temperature, SM clock rate and power consumption for each GPU. This information may be used for monitoring and making any adjustments to temperature, clock or power limits. It is also useful in obtaining a snapshot on how well an algorithm divides it's work among the GPUs.