Services Monitoring Overview

Free

Starter

Professional

Bleemeo agent can discover services running on your system and automatically monitor specific metrics for such services. For example with Apache HTTP server, the number of requests served is automatically monitored. For each service detected a tag with the service name is created which allows to filter your agents by service running on them.

If you have any service not listed on this page, you can define a custom check or define a custom metric.

If you want to disable metrics for a service, you can ignore some services.

Common Features

Service Status

The agent checks TCP sockets for each service. By default, a simple TCP connection is used to test the service, but some services support a specific check. See the service details below for the supported specific checks. Those checks are executed every minute. They may be run earlier for TCP services: the Bleemeo agent keeps a connection open with the service and if that connection is broken the check is executed immediately.

The current status of the service can be viewed in the Status Dashboard and you can configure a notification to be alerted when the status changes.

The history of the status is also stored in a metric named service_status. One metric per service is created, the metric has two labels to identify the service:

service: the kind of the service, like “apache”, “nginx”…
service_instance: the container name. This label is absent when the service isn’t running in a container.

The value of this metric is:

0 when the check passed successfully
1 when the check passed with a warning. For example an Apache server responded with a 404 page
2 when the check detected an issue with the service
3 when the check doesn’t know the status of the service. This happens when the check itself failed, usually due to a timeout

Overridable auto-discovery settings

It’s possible to override the auto-discovery parameter of any service using either configuration files, Docker labels or Kubernetes annotations.

Using configuration file

Using Bleemeo agent configuration files is done by adding entries like the following to /etc/glouton/conf.d/90-service-override.conf:

service:
  - type: "apache"
    ignore_ports:
      - 8000
      - 9000
  - type: "mysql"
    instance: "name_of_a_container"
    username: root
    password: root

The service key contains a list of service to override settings. Add one entry per service. Each service is identified by the couple “type” and “instance”. The service “type” it’s not a customizable name, unless your are creating a Custom checks, the value should match one of the supported service type (“apache”, “nginx”, “postgresql”…). See below for the full list of supported services. Service instance could be omitted when the service is running outside a container. It’s the container name for containerized service.

All other value (port, username and password in above examples) are overridable settings which are described below

Using Docker labels or Kubernetes annotations

If you are using Docker or Kubernetes, instead of Bleemeo agent configuration file, you can use Docker labels or Kubernetes annotations. Any overridable settings could be added using the name glouton.SETTING.

For instance, to ignore ports 8000 and 9000 on a Docker container, use:

docker run --label glouton.ignore_ports="8000,9000" [...]

The same thing for a Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "my-application"
spec:
  template:
    metadata:
      # Create the annotations on the pod, not on the deployment
      annotations:
        glouton.ignore_ports: "8000,9000"

Overridable settings

Sample of a service with all settings overridden:

# This is only a sample to list all possible value. It don't make sense to have all
# settings on a single service. Only add the setting you need to override.

service:
  - type: apache
    instance: my_container
    address: 127.0.0.1
    port: 1234
    ignore_ports:
      - 8000
      - 9000
    tags:
      - mytag1
      - mytag2
    interval: 60
    http_path: /my_custom_path
    http_status_code: 200
    http_host: example.com
    check_type: nagios
    match_process: command-to-check
    check_command: command-to-run
    nagios_nrpe_name: nagios_nrpe_name
    username: username
    password: secret
    metrics_unix_socket: /var/run/mysqld/mysqld.sock
    stats_url: http://localhost:9000/status
    stats_port: 9000
    stats_protocol: http
    detailed_items:
      - table1
      - table2
    included_items:
      - job1
    excluded_items:
      - job2
    jmx_port: 3333
    jmx_username: monitorRole
    jmx_password: secret
    jmx_metrics: []
    ssl: false
    ssl_insecure: false
    starttls: false
    ca_file: /etc/ssl/certs/ca-certificates.crt
    cert_file: /etc/ssl/certs/client.crt
    key_file: /etc/ssl/private/client.key
    log_files: []
    log_format: custom_apache_format
    log_filter: custom_apache_filter

address

This is the IP address on which the service is listening.

port

This is the TCP port on which the service is listening.

ignore_ports

This is a list of port to ignore from auto-discovery. The auto-discovery could find the a service is listening on multiple ports, and the service would be monitored on every ports. But this could be wrong, for example a Docker nginx that have two exposed ports in the Dockerfile (80 and 443) but only listen on 80. In that case you will want to ignore the port 443.

interval

This is the interval in second to check for the service status. The number could not be smaller than 60 seconds which is the default, but could be increased if your service check is expensive.

http_path, http_status_code, http_host

For service that expose an HTTP service, by default the agent will use the service’s address and port to do a query on http://address:port/ and expect a HTTP 200 status code. Service’s address is an IP address, likely “127.0.0.1” when not using a container, this could not match your service configuration, especially when using virtual host.

http_host allows to specify the HTTP host header send in the request. This is useful when the HTTP server configured with virtual host and don’t reply to request on something like http://127.0.0.1.

http_path allows to check a some sub-path, like “/ready” to check for service specific page dedicated to service checking.

http_status_code allows the check to expect other HTTP status code.

This is supported by the following services:

Apache HTTP
InfluxDB
Nginx
Squid
custom HTTP check

For example, with the following service override:

service:
  - type: apache
    address: 127.0.0.1
    port: 8080
    http_path: /ready
    http_status_code: 204
    http_host: example.com

The Bleemeo agent will connect to 127.0.0.1 on port 8080 and send the HTTP request http://example.com/ready

check_type, match_process, check_command

This is used to configure a custom check, see Custom checks for details.

nagios_nrpe_name

If the NRPE server is enabled on the agent configuration (see nrpe.enable), you can expose any service check to your Nagios server.

Example:

service:
  - type: "apache"
    nagios_nrpe_name: check_apache

username, password

Some service required authentication for metrics collection and/or service check. This configure the credentials to use.

This is supported by the following services:

Jenkins
MySQL / MariaDB
OpenLDAP
PostgreSQL
RabbitMQ
UPSD

stats_url

Some service use a different port for the service itself and for exposing monitoring information. This setting allows to specify where the service expose its metrics.

This is supported by the following services:

HAProxy
Jenkins
PHP-FPM

For example:

service:
  - type: haproxy
    port: 80
    stats_url: "http://localhost:8080/statistics"

The agent will check that the HAproxy service is listening on port 80, but it will use the URL on port 8080 to collect metrics for the HAProxy.

stats_port

Some service use a different port for the service itself and for exposing monitoring information. This setting allows to specify the port where the service expose its metrics.

This setting is similar to stats_url but only specify the port rather than the full URL.

This is supported by the following services:

NATS
RabbitMQ
uWSGI

stats_protocol

Some service could use multiple protocol to expose monitoring information. This setting allows to specify the protocol that could be used. It works with stats_port.

This is supported by the following services:

uWSGI

For example:

service:
  - type: "uwsgi"
    address: "127.0.0.1"
    port: 8080
    stats_port: 1717
    # This assume uWSGI uses the --stats-http option and expose them to port 1717
    stats_protocol: "http"

metrics_unix_socket

Some service could listen on Unix socket rather than TCP socket. This allows to configure the path to Unix socket and the agent will try to use that socket for service metrics collection.

This is supported by the following services:

MySQL / MariaDB

detailed_items

Some service exposes metrics per item (like per tables, per databases…). This allows to configure for which items metrics should be collected. Those services also expose global metrics which are enabled by default.

This is supported by the following services:

PostgreSQL: for detailed metrics on databases
Cassandra: for detailed metrics on tables
Kafka: for detailed metrics on topics

included_items, excluded_items

Some service exposes metrics per item (like per jobs…). This allows to configure for which items metrics should be collected. Unlike detailed_items, this is used on service that don’t expose a global metric which is the aggregation of each per item metrics.

This is supported by the following services:

Jenkins: for metrics per job

jmx_port, jmx_username, jmx_password, jmx_metrics

For a Java service, configure how to collect metrics using JMX. See Java metrics for details.

ssl, ssl_insecure, starttls, ca_file, cert_file, key_file

Some service could expose plain-text or TLS version of the service. This allows to enable TLS or stay in plain-text (default).

This is supported by the following services:

OpenLDAP

For example:

service:
  - type: "openldap"
    address: "127.0.0.1"
    port: 389
    starttls: true
    # By default SSL certificate are verified. This means that your OpenLDAP service need to either
    # to use a certificate signed by trusted authority or provide the ca_file to your self-signed CA.
    ssl_insecure: false
    ca_file: "/path/to/your-self-signed-ca.crt"

log_files, log_format, log_filter

To customize the handling of log files related to a service, two approaches are possible:

Specifying a log format and/or a log filter that will apply to all related log files:

service:
  - type: "my-app"
    ...
    log_format: "my-app-format"
    log_filter: "my-app-filter"

Specifying the format/filter for each file (or file pattern):

- type: "my-app"
  ...
  log_files:
    - file_path: "/var/log/my-app/access.log"
      log_format: "my-app-access-format"
      log_filter: "my-app-access-filter"
    - file_path: "/var/log/my-app/error.*.log"
      log_format: "my-app-error-format"
      log_filter: "my-app-error-filter"

When using the second approach, if an item of the log_files list has the log_format or log_filter property left blank, the value of the service’s log_format and log_filter will be used (if defined).

Services with default log support

The services for which Glouton comes with default log configuration are: