NVIDIA GPU Monitoring
Glouton supports monitoring NVIDIA GPU devices with NVIDIA SMI (System Management Interface). SMI is supported by NVIDIA's Tesla, Quadro and GRID devices from Fermi and higher architecture families. Very limited information is also provided for Geforce devices.
Configurationβ
To enable NVIDIA SMI, make sure you have the NVIDIA driver installed.
Make sure the user running Glouton is able to run the following command:
nvidia-smi -q -x
Then you need to add the following to your Bleemeo agent configuration (/etc/glouton/conf.d/60-nvidia.conf
):
nvidia_smi:
enable: true
bin_path: "/path/to/nvidia-smi"
If Glouton is running in a container, see Docker configuration.
Metricsβ
Glouton retrieves the following metrics:
Metric name | Description |
---|---|
nvidia_smi_fan_speed | GPU fan speed in percent |
nvidia_smi_memory_free | GPU memory free in Byte |
nvidia_smi_memory_used | GPU memory used in Byte |
nvidia_smi_memory_total | GPU memory total in Byte |
nvidia_smi_power_draw | GPU power draw in watt |
nvidia_smi_temperature_gpu | GPU temperature in Β°C |
nvidia_smi_utilization_gpu | GPU utilization in percent |
nvidia_smi_utilization_memory | GPU memory utilization in percent |
nvidia_smi_utilization_encoder | GPU encoder utilization in percent |
nvidia_smi_utilization_decoder | GPU decoder utilization in percent |
nvidia_smi_clocks_current_graphics | Current frequency of the graphics (shader) clock, in Hz |
nvidia_smi_clocks_current_sm | Current frequency of the Streaming Multiprocessor clock, in Hz |
nvidia_smi_clocks_current_memory | Current frequency of the memory clock, in Hz |
nvidia_smi_clocks_current_video | Current frequency of the video (encoder plus decoder) clocks, in Hz |
nvidia_smi_fbc_stats_session_count | Count of active Frame Buffer Capture sessions |
nvidia_smi_fbc_stats_average_fps | Frame Buffer Capture average FPS |
nvidia_smi_fbc_stats_average_latency | Frame Buffer Capture average latency in seconds |
nvidia_smi_pcie_link_gen_current | Current PCI-E link generation |
nvidia_smi_pcie_link_width_current | Current PCI-E link width |
nvidia_smi_encoder_stats_session_count | Count of encoder sessions |
nvidia_smi_encoder_stats_average_fps | Average frames encoded per second |
nvidia_smi_encoder_stats_average_latency | Encoder average latency in seconds |
Each metric has the following labels:
name
(type of GPU e.g. GeForce GTX 1070 Ti)compute_mode
(The compute mode of the GPU e.g. Default)index
(The port index where the GPU is connected to the motherboard e.g. 1)uuid
(A unique identifier for the GPU e.g. GPU-f9ba66fc-a7f5-94c5-da19-019ef2f9c665)
Dockerβ
To enable NVIDIA SMI monitoring in Docker, you will need to pass through the /dev/nvidia*
devices,
the nvidia-smi binary and the nvidia libraries. If you use Glouton in a docker-compose where the host is mounted
at /hostroot
you should add the nvidia runtime, devices and load the NVIDIA shared libraries.
glouton:
image: bleemeo/bleemeo-agent
runtime: nvidia
devices:
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia0:/dev/nvidia0
environment:
- GLOUTON_NVIDIA_SMI_ENABLE=true
- GLOUTON_NVIDIA_SMI_BIN_PATH=/hostroot/usr/bin/nvidia-smi
- LD_PRELOAD=/hostroot/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so