Skip to main content

NVIDIA GPU Monitoring

Glouton supports monitoring NVIDIA GPU devices with NVIDIA SMI (System Management Interface). SMI is supported by NVIDIA's Tesla, Quadro and GRID devices from Fermi and higher architecture families. Very limited information is also provided for Geforce devices.

Configuration​

To enable NVIDIA SMI, make sure you have the NVIDIA driver installed.

Make sure the user running Glouton is able to run the following command:

nvidia-smi -q -x

Then you need to add the following to your Bleemeo agent configuration (/etc/glouton/conf.d/60-nvidia.conf):

nvidia_smi:
enable: true
bin_path: "/path/to/nvidia-smi"

If Glouton is running in a container, see Docker configuration.

Metrics​

Glouton retrieves the following metrics:

Metric nameDescription
nvidia_smi_fan_speedGPU fan speed in percent
nvidia_smi_memory_freeGPU memory free in Byte
nvidia_smi_memory_usedGPU memory used in Byte
nvidia_smi_memory_totalGPU memory total in Byte
nvidia_smi_power_drawGPU power draw in watt
nvidia_smi_temperature_gpuGPU temperature in Β°C
nvidia_smi_utilization_gpuGPU utilization in percent
nvidia_smi_utilization_memoryGPU memory utilization in percent
nvidia_smi_utilization_encoderGPU encoder utilization in percent
nvidia_smi_utilization_decoderGPU decoder utilization in percent
nvidia_smi_clocks_current_graphicsCurrent frequency of the graphics (shader) clock, in Hz
nvidia_smi_clocks_current_smCurrent frequency of the Streaming Multiprocessor clock, in Hz
nvidia_smi_clocks_current_memoryCurrent frequency of the memory clock, in Hz
nvidia_smi_clocks_current_videoCurrent frequency of the video (encoder plus decoder) clocks, in Hz
nvidia_smi_fbc_stats_session_countCount of active Frame Buffer Capture sessions
nvidia_smi_fbc_stats_average_fpsFrame Buffer Capture average FPS
nvidia_smi_fbc_stats_average_latencyFrame Buffer Capture average latency in seconds
nvidia_smi_pcie_link_gen_currentCurrent PCI-E link generation
nvidia_smi_pcie_link_width_currentCurrent PCI-E link width
nvidia_smi_encoder_stats_session_countCount of encoder sessions
nvidia_smi_encoder_stats_average_fpsAverage frames encoded per second
nvidia_smi_encoder_stats_average_latencyEncoder average latency in seconds

Each metric has the following labels:

  • name (type of GPU e.g. GeForce GTX 1070 Ti)
  • compute_mode (The compute mode of the GPU e.g. Default)
  • index (The port index where the GPU is connected to the motherboard e.g. 1)
  • uuid (A unique identifier for the GPU e.g. GPU-f9ba66fc-a7f5-94c5-da19-019ef2f9c665)

Docker​

To enable NVIDIA SMI monitoring in Docker, you will need to pass through the /dev/nvidia* devices, the nvidia-smi binary and the nvidia libraries. If you use Glouton in a docker-compose where the host is mounted at /hostroot you should add the nvidia runtime, devices and load the NVIDIA shared libraries.

glouton:
image: bleemeo/bleemeo-agent
runtime: nvidia
devices:
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia0:/dev/nvidia0
environment:
- GLOUTON_NVIDIA_SMI_ENABLE=true
- GLOUTON_NVIDIA_SMI_BIN_PATH=/hostroot/usr/bin/nvidia-smi
- LD_PRELOAD=/hostroot/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so