Operating System Metrics

The Bleemeo agent automatically monitors your operating system metrics (check that your operating system is a supported OS).

The monitoring covers:

System resources utilization: CPU, memory, disk, ...
S.M.A.R.T. metrics to monitor the health of your disks
Sensor temperatures
Notifications for:
- overutilization of resources: CPU, memory, disk space and swap. Default thresholds are 80% for warning status and 90% for critical status
- loss of connection to the Bleemeo Cloud platform
- network errors
- pending security updates

The agent gathers the following metrics:

Metric	Description	Alerting
agent_config_warning	Bleemeo agent configuration files issues
agent_gather_time	Time spent to gather metrics by Bleemeo agent in seconds
agent_status	Status of Agent connection	Yes
cpu_idle	CPU idle in percent
cpu_interrupt	CPU used by low-level driver in percent
cpu_nice	CPU used by niced applications in percent
cpu_other	CPU not used by user or system in percent
cpu_softirq	CPU used by driver in percent
cpu_steal	CPU used by hypervisor in percent
cpu_system	CPU used by system call in percent
cpu_used	CPU used in percent	Default thresholds are: above 80% for warning status and above 90% for critical status.
cpu_used_status	Status of CPU usage
cpu_user	CPU used by applications in percent
cpu_wait	CPU idle while waiting for IO operation in percent
cpu_guest_nice	CPU used by niced guest VM in percent
cpu_guest	CPU used by guest VM in percent
disk_free	Filesystem space available in bytes
disk_inodes_free	Number of inodes available
disk_inodes_total	Number of inodes for this filesystem
disk_inodes_used	Number of used inodes
disk_total	Filesystem size in bytes
disk_used	Filesystem space used in bytes
disk_used_perc	Filesystem space used in percent	Default thresholds are: above 80% for warning status and above 90% for critical status.
disk_used_perc_status	Status of disk usage
io_read_merged	Number of read operations that were merged before hitting disk
io_write_merged	Number of write operations that were merged before hitting disk
io_read_bytes	Disk read throughput in bytes per second
io_read_utilization	Disk IO read utilization
io_reads	Number of reads completed per second
io_utilization	Disk IO utilization in percent
io_write_bytes	Disk write throughput in bytes per second
io_write_utilization	Disk IO write utilization
io_writes	Number of writes completed per second
mem_available	Memory available for application in bytes
mem_available_perc	Memory available for application in percent
mem_buffered	Memory used for raw block cache in bytes
mem_cached	Memory used for file cache in bytes
mem_free	Memory unused in bytes
mem_total	Memory size in bytes
mem_used	Memory used by applications in bytes
mem_used_perc	Memory used by applications in percent	Default thresholds are: above 80% for warning status and above 90% for critical status.
mem_used_perc_status	Status of memory usage
net_bits_recv	Network traffic received in bits per second
net_bits_sent	Network traffic sent in bits per second
net_drop_in	Number of received packets dropped per second
net_drop_out	Number of sent packets dropped per second
net_err_in	Number of errors per second while receiving packet	Default thresholds are: above 0 for critical status.
net_err_in_status	Status of network errors for received packets
net_err_out	Number of errors per second while sending packet	Default thresholds are: above 0 for critical status.
net_err_out_status	Status of network errors for sent packets
net_packets_recv	Number of packets received per second
net_packets_sent	Number of packets sent per second
process_status_blocked	Number of processes blocked in system call
process_status_paging	Number of processes blocked by paging operation
process_status_running	Number of processes currently running
process_status_sleeping	Number of idle processes
process_status_stopped	Number of stopped processes
process_status_zombies	Number of zombie processes
process_total	Number of processes
process_total_threads	Number of threads
swap_free	Swap unused in bytes
swap_in	Swap read throughput in bytes per second
swap_out	Swap write throughput in bytes per second
swap_total	Swap size in bytes
swap_used	Swap used in bytes
swap_used_perc	Swap used in percent	Default thresholds are: above 80% for warning status and above 90% for critical status.
swap_used_perc_status	Status of swap usage
system_load1	System load over last minute
system_load5	System load over last 5 minutes
system_load15	System load over last 15 minutes
system_pending_updates	Number of pending system updates
system_pending_security_updates	Number of pending system security updates	Yes, after 24h
time_drift	Difference between local time and reference time in seconds	Default thresholds are: 3 minutes for warning status and 5 minutes for critical status
uptime	Time elapsed since last boot in seconds
users_logged	Number of users currently logged in the system

Power consumption metrics

On Linux and TrueNAS, the agent gathers the system power consumption using IPMI. It require your server to support IPMI and report the power consumption through IPMI. It should be the case for all server hardware. The agent use the freeipmi command and fallback on ipmitool command if not available. So one of those commands need to be accessible in the PATH. To configure where to find ipmi-dcmi (freeipmi) or ipmitool, see the configuration page.

On TrueNAS, ipmitool is installed by default, no additional setup is needed.

On Linux, if you don't have ipmi-dcmi, you can install it with:

# On Ubuntu/Debian
apt install freeipmi
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install freeipmi

If you installed Glouton as a package or with wget, you don't need any additional setup, otherwise you need to allow glouton to run the command as root. In /etc/sudoers.d/glouton, add the following line (you may need to change the location of ipmi-dcmi and ipmi-sensors, use which ipmi-dcmi and which ipmi-sensors):

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/ipmi-dcmi, /usr/sbin/ipmi-sensors

To test whether your server support IPMI, run the following command:

sudo ipmi-dcmi --get-system-power-statistics
sudo ipmi-sensors -W discretereading --sdr-cache-recreate

The following metric will be gathered:

Metric	Description
system_power_consumption	System power consumption in Watt

SMART metrics

On Linux, the agent gathers metrics using smartctl if it's accessible in the PATH. To configure where to find smartctl and what devices to monitor, see the configuration page.

If you don't have smartctl, you can install it with:

# On Ubuntu/Debian
apt install smartmontools
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install smartmontools

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/smartctl

The following metrics will be gathered:

Metric	Description
smart_device_health_status	Disk health status
smart_device_read_error_rate	Read error rate
smart_device_seek_error_rate	Seek error rate
smart_device_temp_c	Disk temperature in °C
smart_device_udma_crc_errors	Count of errors in data transfer via the interface cable
smart_device_media_wearout_indicator	Media wearout indicator
smart_device_percent_lifetime_remain	Lifetime remaining in percent
smart_device_wear_leveling_count	Wear leveling count

Note that some metrics may not be available depending on your disk. These metrics report the raw value of SMART attributes, how to interpret it may depend on your drive manufacturer.

Sensor temperatures

On Linux and Windows, the agent will gather the temperature of your hardware components in the metric sensor_temperature. By default only the CPU temperature is allowed (with the label sensor="coretemp_package_id_*"), you can add more metrics using metric filtering.

For example to allow metrics from all sensors you can add the following to your configuration:

metric:
  allow_metrics:
    - sensor_temperature

Multi-disks arrays

On Linux systems, the agent have the ability to collect information about MD arrays by using the /proc/mdstat file and the mdadm command.

If you don't have mdadm, you can install it with:

# On Ubuntu/Debian
apt install mdadm
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install mdadm

glouton     ALL=(ALL) NOPASSWD: /usr/sbin/mdadm --detail *

The following metrics will be gathered:

Metric	Description
mdstat_health_status	Array health status
mdstat_disks_active_count	Count of active disks
mdstat_disks_down_count	Count of down disks
mdstat_disks_failed_count	Count of failing disks
mdstat_disks_spare_count	Count of spare disks
mdstat_disks_total_count	Total count of disks

The mdstat_health_status metric description may give additional information about the time remaining for the array recovery / resynchronization.

Power consumption metrics​

SMART metrics​

Sensor temperatures​

Multi-disks arrays​

Power consumption metrics

SMART metrics

Sensor temperatures

Multi-disks arrays