Skip to main content

Operating System Metrics

The Bleemeo agent automatically monitors your operating system metrics (check that your operating system is a supported OS).

The monitoring covers:

  • System resources utilization: CPU, memory, disk, ...
  • S.M.A.R.T. metrics to monitor the health of your disks
  • Sensor temperatures
  • Notifications for:
    • overutilization of resources: CPU, memory, disk space and swap. Default thresholds are 80% for warning status and 90% for critical status
    • loss of connection to the Bleemeo Cloud platform
    • network errors
    • pending security updates

The agent gathers the following metrics:

MetricDescriptionOS SupportedAlerting
agent_config_warningBleemeo agent configuration files issueslinux   windows
agent_gather_timeTime spent to gather metrics by Bleemeo agent in secondslinux   windows
agent_statusStatus of Agent connectionlinux   windowsYes
cpu_idleCPU idle in percentlinux   windows
cpu_interruptCPU used by low-level driver in percentlinux
cpu_niceCPU used by niced applications in percentlinux
cpu_otherCPU not used by user or system in percentlinux   windows
cpu_softirqCPU used by driver in percentlinux   windows
cpu_stealCPU used by hypervisor in percentlinux
cpu_systemCPU used by system call in percentlinux   windows
cpu_usedCPU used in percentlinux   windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
cpu_used_statusStatus of CPU usagelinux   windows
cpu_userCPU used by applications in percentlinux   windows
cpu_waitCPU idle while waiting for IO operation in percentlinux
cpu_guest_niceCPU used by niced guest VM in percentlinux
cpu_guestCPU used by guest VM in percentlinux
disk_freeFilesystem space available in byteslinux   windows
disk_inodes_freeNumber of inodes availablelinux   windows
disk_inodes_totalNumber of inodes for this filesystemlinux   windows
disk_inodes_usedNumber of used inodeslinux   windows
disk_totalFilesystem size in byteslinux   windows
disk_usedFilesystem space used in byteslinux   windows
disk_used_percFilesystem space used in percentlinux   windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
disk_used_perc_statusStatus of disk usagelinux   windows
io_read_mergedNumber of read operations that were merged before hitting disklinux   windows
io_write_mergedNumber of write operations that were merged before hitting disklinux   windows
io_read_bytesDisk read throughput in bytes per secondlinux   windows
io_read_timeTime spent reading in milliseconds per secondlinux   windows
io_readsNumber of reads completed per secondlinux   windows
io_timeTime spent doing I/O in milliseconds per secondlinux   windows
io_utilizationDisk IO utilization in percentlinux   windows
io_write_bytesDisk write throughput in bytes per secondlinux   windows
io_write_timeTime spent writing in milliseconds per secondlinux   windows
io_writesNumber of writes completed per secondlinux   windows
mem_availableMemory available for application in byteslinux   windows
mem_available_percMemory available for application in percentlinux   windows
mem_bufferedMemory used for raw block cache in byteslinux   windows
mem_cachedMemory used for file cache in byteslinux   windows
mem_freeMemory unused in byteslinux   windows
mem_totalMemory size in byteslinux   windows
mem_usedMemory used by applications in byteslinux   windows
mem_used_percMemory used by applications in percentlinux   windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
mem_used_perc_statusStatus of memory usagelinux   windows
net_bits_recvNetwork traffic received in bits per secondlinux   windows
net_bits_sentNetwork traffic sent in bits per secondlinux   windows
net_drop_inNumber of received packets dropped per secondlinux   windows
net_drop_outNumber of sent packets dropped per secondlinux   windows
net_err_inNumber of errors per second while receiving packetlinux   windowsDefault thresholds are: above 0 for critical status.
net_err_in_statusStatus of network errors for received packetslinux   windows
net_err_outNumber of errors per second while sending packetlinux   windowsDefault thresholds are: above 0 for critical status.
net_err_out_statusStatus of network errors for sent packetslinux   windows
net_packets_recvNumber of packets received per secondlinux   windows
net_packets_sentNumber of packets sent per secondlinux   windows
process_status_blockedNumber of processes blocked in system calllinux   windows
process_status_pagingNumber of processes blocked by paging operationlinux   windows
process_status_runningNumber of processes currently runninglinux   windows
process_status_sleepingNumber of idle processeslinux   windows
process_status_stoppedNumber of stopped processeslinux   windows
process_status_zombiesNumber of zombie processeslinux   windows
process_totalNumber of processeslinux   windows
process_total_threadsNumber of threadslinux   windows
swap_freeSwap unused in byteslinux   windows
swap_inSwap read throughput in bytes per secondlinux
swap_outSwap write throughput in bytes per secondlinux
swap_totalSwap size in byteslinux
swap_usedSwap used in byteslinux
swap_used_percSwap used in percentlinux   windowsDefault thresholds are: above 80% for warning status and above 90% for critical status.
swap_used_perc_statusStatus of swap usagelinux   windows
system_load1System load over last minutelinux   windows
system_load5System load over last 5 minuteslinux   windows
system_load15System load over last 15 minuteslinux   windows
system_pending_updatesNumber of pending system updateslinux   windows
system_pending_security_updatesNumber of pending system security updateslinuxYes, after 24h
time_driftDifference between local time and reference time in secondslinux   windowsDefault thresholds are: 3 minutes for warning status and 5 minutes for critical status
uptimeTime elapsed since last boot in secondslinux   windows
users_loggedNumber of users currently logged in the systemlinux   windows

SMART metrics

On Linux, the agent gathers metrics using smartctl if it's accessible in the PATH. To configure where to find smartctl and what devices to monitor, see the configuration page.

If you don't have smartctl, you can install it with:

# On Ubuntu/Debian
apt install smartmontools
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install smartmontools

If you installed Glouton as a package or with wget, you don't need to setup anything, otherwise you need to allow glouton to run the command as root. In /etc/sudoers.d/glouton, add the following line (you may need to change the location of smartctl, use which smartctl):

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/smartctl

The following metrics will be gathered:

MetricDescription
smart_statusDisk health status
smart_device_exit_statusThe exit status of the smartctl command
smart_device_health_ok1 if the disk health is ok, else 0
smart_device_read_error_rateRead error rate
smart_device_seek_error_rateSeek error rate
smart_device_udma_crc_errorsCount of errors in data transfer via the interface cable
smart_device_media_wearout_indicatorMedia wearout indicator
smart_device_percent_lifetime_remainLifetime remaining in percent
smart_device_wear_leveling_countWear leveling count

Note that some metrics may not be available depending on your disk. These metrics report the raw value of SMART attributes, how to interpret it may depend on your drive manufacturer.

Sensor temperatures

On Linux and Windows, the agent will gather the temperature of your hardware components in the metric sensor_temperature. By default only the CPU temperature is allowed (with the label sensor="coretemp_package_id_*"), you can add more metrics using metric filtering.

For example to allow metrics from all sensors you can add the following to your configuration:

metric:
allow_metrics:
- sensor_temperature