Operating System Metrics
The Bleemeo agent automatically monitors your operating system metrics (check that your operating system is a supported OS).
The monitoring covers:
- System resources utilization: CPU, memory, disk, ...
- S.M.A.R.T. metrics to monitor the health of your disks
- Sensor temperatures
- Notifications for:
- overutilization of resources: CPU, memory, disk space and swap. Default thresholds are 80% for warning status and 90% for critical status
- loss of connection to the Bleemeo Cloud platform
- network errors
- pending security updates
The agent gathers the following metrics:
Metric | Description | OS Supported | Alerting |
---|---|---|---|
agent_config_warning | Bleemeo agent configuration files issues | Β | |
agent_gather_time | Time spent to gather metrics by Bleemeo agent in seconds | Β | |
agent_status | Status of Agent connection | Β | Yes |
cpu_idle | CPU idle in percent | Β | |
cpu_interrupt | CPU used by low-level driver in percent | ||
cpu_nice | CPU used by niced applications in percent | ||
cpu_other | CPU not used by user or system in percent | Β | |
cpu_softirq | CPU used by driver in percent | Β | |
cpu_steal | CPU used by hypervisor in percent | ||
cpu_system | CPU used by system call in percent | Β | |
cpu_used | CPU used in percent | Β | Default thresholds are: above 80% for warning status and above 90% for critical status. |
cpu_used_status | Status of CPU usage | Β | |
cpu_user | CPU used by applications in percent | Β | |
cpu_wait | CPU idle while waiting for IO operation in percent | ||
cpu_guest_nice | CPU used by niced guest VM in percent | ||
cpu_guest | CPU used by guest VM in percent | ||
disk_free | Filesystem space available in bytes | Β | |
disk_inodes_free | Number of inodes available | Β | |
disk_inodes_total | Number of inodes for this filesystem | Β | |
disk_inodes_used | Number of used inodes | Β | |
disk_total | Filesystem size in bytes | Β | |
disk_used | Filesystem space used in bytes | Β | |
disk_used_perc | Filesystem space used in percent | Β | Default thresholds are: above 80% for warning status and above 90% for critical status. |
disk_used_perc_status | Status of disk usage | Β | |
io_read_merged | Number of read operations that were merged before hitting disk | Β | |
io_write_merged | Number of write operations that were merged before hitting disk | Β | |
io_read_bytes | Disk read throughput in bytes per second | Β | |
io_read_utilization | Disk IO read utilization | Β | |
io_reads | Number of reads completed per second | Β | |
io_utilization | Disk IO utilization in percent | Β | |
io_write_bytes | Disk write throughput in bytes per second | Β | |
io_write_utilization | Disk IO write utilization | Β | |
io_writes | Number of writes completed per second | Β | |
mem_available | Memory available for application in bytes | Β | |
mem_available_perc | Memory available for application in percent | Β | |
mem_buffered | Memory used for raw block cache in bytes | Β | |
mem_cached | Memory used for file cache in bytes | Β | |
mem_free | Memory unused in bytes | Β | |
mem_total | Memory size in bytes | Β | |
mem_used | Memory used by applications in bytes | Β | |
mem_used_perc | Memory used by applications in percent | Β | Default thresholds are: above 80% for warning status and above 90% for critical status. |
mem_used_perc_status | Status of memory usage | Β | |
net_bits_recv | Network traffic received in bits per second | Β | |
net_bits_sent | Network traffic sent in bits per second | Β | |
net_drop_in | Number of received packets dropped per second | Β | |
net_drop_out | Number of sent packets dropped per second | Β | |
net_err_in | Number of errors per second while receiving packet | Β | Default thresholds are: above 0 for critical status. |
net_err_in_status | Status of network errors for received packets | Β | |
net_err_out | Number of errors per second while sending packet | Β | Default thresholds are: above 0 for critical status. |
net_err_out_status | Status of network errors for sent packets | Β | |
net_packets_recv | Number of packets received per second | Β | |
net_packets_sent | Number of packets sent per second | Β | |
process_status_blocked | Number of processes blocked in system call | Β | |
process_status_paging | Number of processes blocked by paging operation | Β | |
process_status_running | Number of processes currently running | Β | |
process_status_sleeping | Number of idle processes | Β | |
process_status_stopped | Number of stopped processes | Β | |
process_status_zombies | Number of zombie processes | Β | |
process_total | Number of processes | Β | |
process_total_threads | Number of threads | Β | |
swap_free | Swap unused in bytes | Β | |
swap_in | Swap read throughput in bytes per second | ||
swap_out | Swap write throughput in bytes per second | ||
swap_total | Swap size in bytes | ||
swap_used | Swap used in bytes | ||
swap_used_perc | Swap used in percent | Β | Default thresholds are: above 80% for warning status and above 90% for critical status. |
swap_used_perc_status | Status of swap usage | Β | |
system_load1 | System load over last minute | Β | |
system_load5 | System load over last 5 minutes | Β | |
system_load15 | System load over last 15 minutes | Β | |
system_pending_updates | Number of pending system updates | Β | |
system_pending_security_updates | Number of pending system security updates | Yes, after 24h | |
time_drift | Difference between local time and reference time in seconds | Β | Default thresholds are: 3 minutes for warning status and 5 minutes for critical status |
uptime | Time elapsed since last boot in seconds | Β | |
users_logged | Number of users currently logged in the system | Β |
Power consumption metricsβ
On Linux and TrueNAS, the agent gathers the system power consumption using IPMI.
It require your server to support IPMI and report the power consumption through
IPMI. It should be the case for all server hardware. The agent use the freeipmi
command and fallback on ipmitool command if not available. So one of those commands
need to be accessible in the PATH. To configure where to find ipmi-dcmi
(freeipmi)
or ipmitool
, see the configuration page.
On TrueNAS, ipmitool
is installed by default, no additional setup is needed.
On Linux, if you don't have ipmi-dcmi
, you can install it with:
# On Ubuntu/Debian
apt install freeipmi
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install freeipmi
If you installed Glouton as a package or with wget
, you don't need any additional
setup, otherwise you need to allow glouton
to run the command as root
. In
/etc/sudoers.d/glouton
, add the following line (you may need to change the location of
ipmi-dcmi
and ipmi-sensors
, use which ipmi-dcmi
and which ipmi-sensors
):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/ipmi-dcmi, /usr/sbin/ipmi-sensors
To test whether your server support IPMI, run the following command:
sudo ipmi-dcmi --get-system-power-statistics
sudo ipmi-sensors -W discretereading --sdr-cache-recreate
The following metric will be gathered:
Metric | Description |
---|---|
system_power_consumption | System power consumption in Watt |
SMART metricsβ
On Linux, the agent gathers metrics using smartctl
if it's accessible in the
PATH. To configure where to find smartctl
and what devices to monitor, see the
configuration page.
If you don't have smartctl
, you can install it with:
# On Ubuntu/Debian
apt install smartmontools
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install smartmontools
If you installed Glouton as a package or with wget
, you don't need any additional
setup, otherwise you need to allow glouton
to run the command as root
. In
/etc/sudoers.d/glouton
, add the following line (you may need to change the
location of smartctl
, use which smartctl
):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/smartctl
The following metrics will be gathered:
Metric | Description |
---|---|
smart_device_health_status | Disk health status |
smart_device_read_error_rate | Read error rate |
smart_device_seek_error_rate | Seek error rate |
smart_device_temp_c | Disk temperature in Β°C |
smart_device_udma_crc_errors | Count of errors in data transfer via the interface cable |
smart_device_media_wearout_indicator | Media wearout indicator |
smart_device_percent_lifetime_remain | Lifetime remaining in percent |
smart_device_wear_leveling_count | Wear leveling count |
Note that some metrics may not be available depending on your disk. These metrics report the raw value of SMART attributes, how to interpret it may depend on your drive manufacturer.
Sensor temperaturesβ
On Linux and Windows, the agent will gather the temperature of your hardware
components in the metric sensor_temperature
. By default only the CPU
temperature is allowed (with the label sensor="coretemp_package_id_*"
), you
can add more metrics using metric filtering.
For example to allow metrics from all sensors you can add the following to your configuration:
metric:
allow_metrics:
- sensor_temperature
Multi-disks arraysβ
On Linux systems, the agent have the ability to collect information about MD arrays
by using the /proc/mdstat
file and the mdadm
command.
If you don't have mdadm
, you can install it with:
# On Ubuntu/Debian
apt install mdadm
# On Fedora, CentOS, Almalinux, RockyLinux or similar
yum install mdadm
If you installed Glouton as a package or with wget
, you don't need any additional
setup, otherwise you need to allow glouton
to run the command as root
. In
/etc/sudoers.d/glouton
, add the following line (you may need to change the
location of mdadm
, use which mdadm
):
glouton ALL=(ALL) NOPASSWD: /usr/sbin/mdadm --detail *
The following metrics will be gathered:
Metric | Description |
---|---|
mdstat_health_status | Array health status |
mdstat_disks_active_count | Count of active disks |
mdstat_disks_down_count | Count of down disks |
mdstat_disks_failed_count | Count of failing disks |
mdstat_disks_spare_count | Count of spare disks |
mdstat_disks_total_count | Total count of disks |
The mdstat_health_status
metric description may give additional information
about the time remaining for the array recovery / resynchronization.