Skip to main content

Operating System Metrics

The Bleemeo agent automatically monitors your operating system metrics (check that your operating system is a supported OS).

The monitoring covers:

  • System resources utilization: CPU, memory, disk, ...
  • S.M.A.R.T. metrics to monitor the health of your disks
  • Sensor temperatures
  • Notifications for:
    • overutilization of resources: CPU, memory, disk space and swap. Default thresholds are 80% for warning status and 90% for critical status
    • loss of connection to the Bleemeo Cloud platform
    • network errors
    • pending security updates

The agent gathers the following metrics:

MetricDescriptionOS SupportedAlerting
agent_config_warningBleemeo agent configuration files issueslinux Β  windows
agent_gather_timeTime spent to gather metrics by Bleemeo agent in secondslinux Β  windows
agent_statusStatus of Agent connectionlinux Β  windowsYes
cpu_idleCPU idle in percentlinux Β  windows
cpu_interruptCPU used by low-level driver in percentlinux
cpu_niceCPU used by niced applications in percentlinux
cpu_otherCPU not used by user or system in percentlinux Β  windows
cpu_softirqCPU used by driver in percentlinux Β  windows
cpu_stealCPU used by hypervisor in percentlinux
cpu_systemCPU used by system call in percentlinux Β  windows
cpu_usedCPU used in percentlinux Β  windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
cpu_used_statusStatus of CPU usagelinux Β  windows
cpu_userCPU used by applications in percentlinux Β  windows
cpu_waitCPU idle while waiting for IO operation in percentlinux
cpu_guest_niceCPU used by niced guest VM in percentlinux
cpu_guestCPU used by guest VM in percentlinux
disk_freeFilesystem space available in byteslinux Β  windows
disk_inodes_freeNumber of inodes availablelinux Β  windows
disk_inodes_totalNumber of inodes for this filesystemlinux Β  windows
disk_inodes_usedNumber of used inodeslinux Β  windows
disk_totalFilesystem size in byteslinux Β  windows
disk_usedFilesystem space used in byteslinux Β  windows
disk_used_percFilesystem space used in percentlinux Β  windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
disk_used_perc_statusStatus of disk usagelinux Β  windows
io_read_mergedNumber of read operations that were merged before hitting disklinux Β  windows
io_write_mergedNumber of write operations that were merged before hitting disklinux Β  windows
io_read_bytesDisk read throughput in bytes per secondlinux Β  windows
io_read_utilizationDisk IO read utilizationlinux Β  windows
io_readsNumber of reads completed per secondlinux Β  windows
io_utilizationDisk IO utilization in percentlinux Β  windows
io_write_bytesDisk write throughput in bytes per secondlinux Β  windows
io_write_utilizationDisk IO write utilizationlinux Β  windows
io_writesNumber of writes completed per secondlinux Β  windows
mem_availableMemory available for application in byteslinux Β  windows
mem_available_percMemory available for application in percentlinux Β  windows
mem_bufferedMemory used for raw block cache in byteslinux Β  windows
mem_cachedMemory used for file cache in byteslinux Β  windows
mem_freeMemory unused in byteslinux Β  windows
mem_totalMemory size in byteslinux Β  windows
mem_usedMemory used by applications in byteslinux Β  windows
mem_used_percMemory used by applications in percentlinux Β  windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
mem_used_perc_statusStatus of memory usagelinux Β  windows
net_bits_recvNetwork traffic received in bits per secondlinux Β  windows
net_bits_sentNetwork traffic sent in bits per secondlinux Β  windows
net_drop_inNumber of received packets dropped per secondlinux Β  windows
net_drop_outNumber of sent packets dropped per secondlinux Β  windows
net_err_inNumber of errors per second while receiving packetlinux Β  windowsDefault thresholds are: above 0 for critical status.
net_err_in_statusStatus of network errors for received packetslinux Β  windows
net_err_outNumber of errors per second while sending packetlinux Β  windowsDefault thresholds are: above 0 for critical status.
net_err_out_statusStatus of network errors for sent packetslinux Β  windows
net_packets_recvNumber of packets received per secondlinux Β  windows
net_packets_sentNumber of packets sent per secondlinux Β  windows
process_status_blockedNumber of processes blocked in system calllinux Β  windows
process_status_pagingNumber of processes blocked by paging operationlinux Β  windows
process_status_runningNumber of processes currently runninglinux Β  windows
process_status_sleepingNumber of idle processeslinux Β  windows
process_status_stoppedNumber of stopped processeslinux Β  windows
process_status_zombiesNumber of zombie processeslinux Β  windows
process_totalNumber of processeslinux Β  windows
process_total_threadsNumber of threadslinux Β  windows
swap_freeSwap unused in byteslinux Β  windows
swap_inSwap read throughput in bytes per secondlinux
swap_outSwap write throughput in bytes per secondlinux
swap_totalSwap size in byteslinux
swap_usedSwap used in byteslinux
swap_used_percSwap used in percentlinux Β  windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
swap_used_perc_statusStatus of swap usagelinux Β  windows
system_load1System load over last minutelinux Β  windows
system_load5System load over last 5 minuteslinux Β  windows
system_load15System load over last 15 minuteslinux Β  windows
system_pending_updatesNumber of pending system updateslinux Β  windows
system_pending_security_updatesNumber of pending system security updateslinuxYes, after 24h
time_driftDifference between local time and reference time in secondslinux Β  windowsDefault thresholds are: 3 minutes for warning status and 5 minutes for critical status
uptimeTime elapsed since last boot in secondslinux Β  windows
users_loggedNumber of users currently logged in the systemlinux Β  windows

Power consumption metrics​

On Linux and TrueNAS, the agent gathers the system power consumption using IPMI. It require your server to support IPMI and report the power consumption through IPMI. It should be the case for all server hardware. The agent use the freeipmi command and fallback on ipmitool command if not available. So one of those commands need to be accessible in the PATH. To configure where to find ipmi-dcmi (freeipmi) or ipmitool, see the configuration page.

On TrueNAS, ipmitool is installed by default, no additional setup is needed.

On Linux, if you don't have ipmi-dcmi, you can install it with:

# On Ubuntu/Debian
apt install freeipmi
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install freeipmi

If you installed Glouton as a package or with wget, you don't need any additional setup, otherwise you need to allow glouton to run the command as root. In /etc/sudoers.d/glouton, add the following line (you may need to change the location of ipmi-dcmi and ipmi-sensors, use which ipmi-dcmi and which ipmi-sensors):

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/ipmi-dcmi, /usr/sbin/ipmi-sensors

To test whether your server support IPMI, run the following command:

sudo ipmi-dcmi --get-system-power-statistics
sudo ipmi-sensors -W discretereading --sdr-cache-recreate

The following metric will be gathered:

MetricDescription
system_power_consumptionSystem power consumption in Watt

SMART metrics​

On Linux, the agent gathers metrics using smartctl if it's accessible in the PATH. To configure where to find smartctl and what devices to monitor, see the configuration page.

If you don't have smartctl, you can install it with:

# On Ubuntu/Debian
apt install smartmontools
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install smartmontools

If you installed Glouton as a package or with wget, you don't need any additional setup, otherwise you need to allow glouton to run the command as root. In /etc/sudoers.d/glouton, add the following line (you may need to change the location of smartctl, use which smartctl):

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/smartctl

The following metrics will be gathered:

MetricDescription
smart_device_health_statusDisk health status
smart_device_read_error_rateRead error rate
smart_device_seek_error_rateSeek error rate
smart_device_temp_cDisk temperature in Β°C
smart_device_udma_crc_errorsCount of errors in data transfer via the interface cable
smart_device_media_wearout_indicatorMedia wearout indicator
smart_device_percent_lifetime_remainLifetime remaining in percent
smart_device_wear_leveling_countWear leveling count

Note that some metrics may not be available depending on your disk. These metrics report the raw value of SMART attributes, how to interpret it may depend on your drive manufacturer.

Sensor temperatures​

On Linux and Windows, the agent will gather the temperature of your hardware components in the metric sensor_temperature. By default only the CPU temperature is allowed (with the label sensor="coretemp_package_id_*"), you can add more metrics using metric filtering.

For example to allow metrics from all sensors you can add the following to your configuration:

metric:
allow_metrics:
- sensor_temperature

Multi-disks arrays​

On Linux systems, the agent have the ability to collect information about MD arrays by using the /proc/mdstat file and the mdadm command.

If you don't have mdadm, you can install it with:

# On Ubuntu/Debian
apt install mdadm
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install mdadm

If you installed Glouton as a package or with wget, you don't need any additional setup, otherwise you need to allow glouton to run the command as root. In /etc/sudoers.d/glouton, add the following line (you may need to change the location of mdadm, use which mdadm):

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/mdadm --detail *

The following metrics will be gathered:

MetricDescription
mdstat_health_statusArray health status
mdstat_disks_active_countCount of active disks
mdstat_disks_down_countCount of down disks
mdstat_disks_failed_countCount of failing disks
mdstat_disks_spare_countCount of spare disks
mdstat_disks_total_countTotal count of disks

The mdstat_health_status metric description may give additional information about the time remaining for the array recovery / resynchronization.