Operating System Metrics
The Bleemeo agent automatically monitors your operating system metrics (check that your operating system is a supported OS).
The monitoring covers:
- System resources utilization: CPU, memory, disk, …
- S.M.A.R.T. metrics to monitor the health of your disks
- Sensor temperatures
- Notifications for:
- overutilization of resources: CPU, memory, disk space and swap. Default thresholds are 80% for warning status and 90% for critical status
- loss of connection to the Bleemeo Cloud platform
- network errors
- pending security updates
The agent gathers the following metrics:
| Metric | Description | OS | Alerting |
|---|---|---|---|
agent_config_warning | Bleemeo agent configuration files issues | — | |
agent_gather_time | Time spent to gather metrics by Bleemeo agent in seconds | — | |
agent_status | Status of Agent connection | | |
cpu_idle | CPU idle in percent | — | |
cpu_interrupt | CPU used by low-level driver in percent | — | |
cpu_nice | CPU used by niced applications in percent | — | |
cpu_other | CPU not used by user or system in percent | — | |
cpu_softirq | CPU used by driver in percent | — | |
cpu_steal | CPU used by hypervisor in percent | — | |
cpu_system | CPU used by system call in percent | — | |
cpu_used | CPU used in percent | Above 80% warning, above 90% critical | |
cpu_used_status | Status of CPU usage | — | |
cpu_user | CPU used by applications in percent | — | |
cpu_wait | CPU idle while waiting for IO operation in percent | — | |
cpu_guest_nice | CPU used by niced guest VM in percent | — | |
cpu_guest | CPU used by guest VM in percent | — | |
disk_free | Filesystem space available in bytes | — | |
disk_inodes_free | Number of inodes available | — | |
disk_inodes_total | Number of inodes for this filesystem | — | |
disk_inodes_used | Number of used inodes | — | |
disk_total | Filesystem size in bytes | — | |
disk_used | Filesystem space used in bytes | — | |
disk_used_perc | Filesystem space used in percent | Above 80% warning, above 90% critical | |
disk_used_perc_status | Status of disk usage | — | |
io_read_merged | Number of read operations that were merged before hitting disk | — | |
io_write_merged | Number of write operations that were merged before hitting disk | — | |
io_read_bytes | Disk read throughput in bytes per second | — | |
io_read_utilization | Disk IO read utilization | — | |
io_reads | Number of reads completed per second | — | |
io_utilization | Disk IO utilization in percent | — | |
io_write_bytes | Disk write throughput in bytes per second | — | |
io_write_utilization | Disk IO write utilization | — | |
io_writes | Number of writes completed per second | — | |
mem_available | Memory available for application in bytes | — | |
mem_available_perc | Memory available for application in percent | — | |
mem_buffered | Memory used for raw block cache in bytes | — | |
mem_cached | Memory used for file cache in bytes | — | |
mem_free | Memory unused in bytes | — | |
mem_total | Memory size in bytes | — | |
mem_used | Memory used by applications in bytes | — | |
mem_used_perc | Memory used by applications in percent | Above 80% warning, above 90% critical | |
mem_used_perc_status | Status of memory usage | — | |
net_bits_recv | Network traffic received in bits per second | — | |
net_bits_sent | Network traffic sent in bits per second | — | |
net_drop_in | Number of received packets dropped per second | — | |
net_drop_out | Number of sent packets dropped per second | — | |
net_err_in | Number of errors per second while receiving packet | Above 0 is critical | |
net_err_in_status | Status of network errors for received packets | — | |
net_err_out | Number of errors per second while sending packet | Above 0 is critical | |
net_err_out_status | Status of network errors for sent packets | — | |
net_packets_recv | Number of packets received per second | — | |
net_packets_sent | Number of packets sent per second | — | |
process_status_blocked | Number of processes blocked in system call | — | |
process_status_paging | Number of processes blocked by paging operation | — | |
process_status_running | Number of processes currently running | — | |
process_status_sleeping | Number of idle processes | — | |
process_status_stopped | Number of stopped processes | — | |
process_status_zombies | Number of zombie processes | — | |
process_total | Number of processes | — | |
process_total_threads | Number of threads | — | |
swap_free | Swap unused in bytes | — | |
swap_in | Swap read throughput in bytes per second | — | |
swap_out | Swap write throughput in bytes per second | — | |
swap_total | Swap size in bytes | — | |
swap_used | Swap used in bytes | — | |
swap_used_perc | Swap used in percent | Above 80% warning, above 90% critical | |
swap_used_perc_status | Status of swap usage | — | |
system_load1 | System load over last minute | — | |
system_load5 | System load over last 5 minutes | — | |
system_load15 | System load over last 15 minutes | — | |
system_pending_updates | Number of pending system updates | — | |
system_pending_security_updates | Number of pending system security updates | Yes, after 24h | |
time_drift | Difference between local time and reference time in seconds | 3 min warning, 5 min critical | |
uptime | Time elapsed since last boot in seconds | — | |
users_logged | Number of users currently logged in the system | — |
Power consumption metrics
Section titled “Power consumption metrics”On Linux and TrueNAS, the agent gathers the system power consumption using IPMI.
It require your server to support IPMI and report the power consumption through
IPMI. It should be the case for all server hardware. The agent use the freeipmi
command and fallback on ipmitool command if not available. So one of those commands
need to be accessible in the PATH. To configure where to find ipmi-dcmi (freeipmi)
or ipmitool, see the configuration page.
On TrueNAS, ipmitool is installed by default, no additional setup is needed.
On Linux, if you don’t have ipmi-dcmi, you can install it with:
# On Ubuntu/Debianapt install freeipmi# On Fedora, CentOS, Almalinux, RockyLinux or similaryum install freeipmiIf you installed Glouton as a package or with wget, you don’t need any additional
setup, otherwise you need to allow glouton to run the command as root. In
/etc/sudoers.d/glouton, add the following line (you may need to change the location of
ipmi-dcmi and ipmi-sensors, use which ipmi-dcmi and which ipmi-sensors):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/ipmi-dcmi, /usr/sbin/ipmi-sensorsTo test whether your server support IPMI, run the following command:
sudo ipmi-dcmi --get-system-power-statisticssudo ipmi-sensors -W discretereading --sdr-cache-recreateThe following metric will be gathered:
| Metric | Description |
|---|---|
system_power_consumption | System power consumption in Watt |
SMART metrics
Section titled “SMART metrics”On Linux, the agent gathers metrics using smartctl if it’s accessible in the
PATH. To configure where to find smartctl and what devices to monitor, see the
configuration page.
If you don’t have smartctl, you can install it with:
# On Ubuntu/Debianapt install smartmontools# On Fedora, CentOS, Almalinux, RockyLinux or similaryum install smartmontoolsIf you installed Glouton as a package or with wget, you don’t need any additional
setup, otherwise you need to allow glouton to run the command as root. In
/etc/sudoers.d/glouton, add the following line (you may need to change the
location of smartctl, use which smartctl):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/smartctlThe following metrics will be gathered:
| Metric | Description |
|---|---|
smart_device_health_status | Disk health status |
smart_device_read_error_rate | Read error rate |
smart_device_seek_error_rate | Seek error rate |
smart_device_temp_c | Disk temperature in °C |
smart_device_udma_crc_errors | Count of errors in data transfer via the interface cable |
smart_device_media_wearout_indicator | Media wearout indicator |
smart_device_percent_lifetime_remain | Lifetime remaining in percent |
smart_device_wear_leveling_count | Wear leveling count |
Sensor temperatures
Section titled “Sensor temperatures”On Linux and Windows, the agent will gather the temperature of your hardware
components in the metric sensor_temperature. By default only the CPU
temperature is allowed (with the label sensor="coretemp_package_id_*"), you
can add more metrics using metric filtering.
For example to allow metrics from all sensors you can add the following to your configuration:
metric: allow_metrics: - sensor_temperatureMulti-disks arrays
Section titled “Multi-disks arrays”On Linux systems, the agent have the ability to collect information about MD arrays
by using the /proc/mdstat file and the mdadm command.
If you don’t have mdadm, you can install it with:
# On Ubuntu/Debianapt install mdadm# On Fedora, CentOS, Almalinux, RockyLinux or similaryum install mdadmIf you installed Glouton as a package or with wget, you don’t need any additional
setup, otherwise you need to allow glouton to run the command as root. In
/etc/sudoers.d/glouton, add the following line (you may need to change the
location of mdadm, use which mdadm):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/mdadm --detail *The following metrics will be gathered:
| Metric | Description |
|---|---|
mdstat_health_status | Array health status |
mdstat_disks_active_count | Count of active disks |
mdstat_disks_down_count | Count of down disks |
mdstat_disks_failed_count | Count of failing disks |
mdstat_disks_spare_count | Count of spare disks |
mdstat_disks_total_count | Total count of disks |
The mdstat_health_status metric description may give additional information
about the time remaining for the array recovery / resynchronization.