Services Monitoring
Bleemeo agent can discover services running on your system and automatically monitor specific metrics for such services. For example with Apache HTTP server, the number of requests served is automatically monitored. For each service detected a tag with the service name is created which allows to filter your agents by service running on them.
If you have any service not listed on this page, you can define a custom check or define a custom metric.
If you want to disable metrics for a service, you can ignore some services.
Common Features
Section titled “Common Features”Service Status
Section titled “Service Status”The agent checks TCP sockets for each service. By default, a simple TCP connection is used to test the service, but some services support a specific check. See the service details below for the supported specific checks. Those checks are executed every minute. They may be run earlier for TCP services: the Bleemeo agent keeps a connection open with the service and if that connection is broken the check is executed immediately.
The current status of the service can be viewed in the Status Dashboard and you can configure a notification to be alerted when the status changes.
The history of the status is also stored in a metric named service_status. One metric per service
is created, the metric has two labels to identify the service:
service: the kind of the service, like “apache”, “nginx”…service_instance: the container name. This label is absent when the service isn’t running in a container.
The value of this metric is:
- 0 when the check passed successfully
- 1 when the check passed with a warning. For example an Apache server responded with a 404 page
- 2 when the check detected an issue with the service
- 3 when the check doesn’t know the status of the service. This happens when the check itself failed, usually due to a timeout
Overridable auto-discovery settings
Section titled “Overridable auto-discovery settings”It’s possible to override the auto-discovery parameter of any service using either configuration files, Docker labels or Kubernetes annotations.
Using configuration file
Section titled “Using configuration file”Using Bleemeo agent configuration files is done by adding entries like the following to /etc/glouton/conf.d/90-service-override.conf:
service: - type: "apache" ignore_ports: - 8000 - 9000 - type: "mysql" instance: "name_of_a_container" username: root password: rootThe service key contains a list of service to override settings. Add one entry per service.
Each service is identified by the couple “type” and “instance”. The service “type” it’s not a customizable name, unless
your are creating a Custom checks, the value should match one of the supported service
type (“apache”, “nginx”, “postgresql”…). See below for the full list of supported services.
Service instance could be omitted when the service is running outside a container. It’s the container name for containerized service.
All other value (port, username and password in above examples) are overridable settings which are described below
Using Docker labels or Kubernetes annotations
Section titled “Using Docker labels or Kubernetes annotations”If you are using Docker or Kubernetes, instead of Bleemeo agent configuration file, you can use Docker labels or
Kubernetes annotations. Any overridable settings could be added using the name glouton.SETTING.
For instance, to ignore ports 8000 and 9000 on a Docker container, use:
docker run --label glouton.ignore_ports="8000,9000" [...]The same thing for a Kubernetes deployment:
apiVersion: apps/v1kind: Deploymentmetadata: name: "my-application"spec: template: metadata: # Create the annotations on the pod, not on the deployment annotations: glouton.ignore_ports: "8000,9000"Overridable settings
Section titled “Overridable settings”Sample of a service with all settings overridden:
# This is only a sample to list all possible value. It don't make sense to have all# settings on a single service. Only add the setting you need to override.
service: - type: apache instance: my_container address: 127.0.0.1 port: 1234 ignore_ports: - 8000 - 9000 tags: - mytag1 - mytag2 interval: 60 http_path: /my_custom_path http_status_code: 200 http_host: example.com check_type: nagios match_process: command-to-check check_command: command-to-run nagios_nrpe_name: nagios_nrpe_name username: username password: secret metrics_unix_socket: /var/run/mysqld/mysqld.sock stats_url: http://localhost:9000/status stats_port: 9000 stats_protocol: http detailed_items: - table1 - table2 included_items: - job1 excluded_items: - job2 jmx_port: 3333 jmx_username: monitorRole jmx_password: secret jmx_metrics: [] ssl: false ssl_insecure: false starttls: false ca_file: /etc/ssl/certs/ca-certificates.crt cert_file: /etc/ssl/certs/client.crt key_file: /etc/ssl/private/client.key log_files: [] log_format: custom_apache_format log_filter: custom_apache_filteraddress
Section titled “address”This is the IP address on which the service is listening.
This is the TCP port on which the service is listening.
ignore_ports
Section titled “ignore_ports”This is a list of port to ignore from auto-discovery. The auto-discovery could find the a service is listening on multiple ports, and the service would be monitored on every ports. But this could be wrong, for example a Docker nginx that have two exposed ports in the Dockerfile (80 and 443) but only listen on 80. In that case you will want to ignore the port 443.
This is a list of tag name to associate with the services.
interval
Section titled “interval”This is the interval in second to check for the service status. The number could not be smaller than 60 seconds which is the default, but could be increased if your service check is expensive.
http_path, http_status_code, http_host
Section titled “http_path, http_status_code, http_host”For service that expose an HTTP service, by default the agent will use the service’s address and port to do a query on http://address:port/ and
expect a HTTP 200 status code. Service’s address is an IP address, likely “127.0.0.1” when not using a container, this could not match
your service configuration, especially when using virtual host.
http_host allows to specify the HTTP host header send in the request. This is useful when the HTTP server configured with virtual host and don’t
reply to request on something like http://127.0.0.1.
http_path allows to check a some sub-path, like “/ready” to check for service specific page dedicated to service checking.
http_status_code allows the check to expect other HTTP status code.
This is supported by the following services:
- Apache HTTP
- InfluxDB
- Nginx
- Squid
- custom HTTP check
For example, with the following service override:
service: - type: apache address: 127.0.0.1 port: 8080 http_path: /ready http_status_code: 204 http_host: example.comThe Bleemeo agent will connect to 127.0.0.1 on port 8080 and send the HTTP request http://example.com/ready
check_type, match_process, check_command
Section titled “check_type, match_process, check_command”This is used to configure a custom check, see Custom checks for details.
nagios_nrpe_name
Section titled “nagios_nrpe_name”If the NRPE server is enabled on the agent configuration (see nrpe.enable), you can expose any service check to your Nagios server.
Example:
service: - type: "apache" nagios_nrpe_name: check_apacheusername, password
Section titled “username, password”Some service required authentication for metrics collection and/or service check. This configure the credentials to use.
This is supported by the following services:
- Jenkins
- MySQL / MariaDB
- OpenLDAP
- PostgreSQL
- RabbitMQ
- UPSD
stats_url
Section titled “stats_url”Some service use a different port for the service itself and for exposing monitoring information. This setting allows to specify where the service expose its metrics.
This is supported by the following services:
- HAProxy
- Jenkins
- PHP-FPM
For example:
service: - type: haproxy port: 80 stats_url: "http://localhost:8080/statistics"The agent will check that the HAproxy service is listening on port 80, but it will use the URL on port 8080 to collect metrics for the HAProxy.
stats_port
Section titled “stats_port”Some service use a different port for the service itself and for exposing monitoring information. This setting allows to specify the port where the service expose its metrics.
This setting is similar to stats_url but only specify the port rather than the full URL.
This is supported by the following services:
- NATS
- RabbitMQ
- uWSGI
stats_protocol
Section titled “stats_protocol”Some service could use multiple protocol to expose monitoring information. This setting allows to specify the protocol that could be used. It works with stats_port.
This is supported by the following services:
- uWSGI
For example:
service: - type: "uwsgi" address: "127.0.0.1" port: 8080 stats_port: 1717 # This assume uWSGI uses the --stats-http option and expose them to port 1717 stats_protocol: "http"metrics_unix_socket
Section titled “metrics_unix_socket”Some service could listen on Unix socket rather than TCP socket. This allows to configure the path to Unix socket and the agent will try to use that socket for service metrics collection.
This is supported by the following services:
- MySQL / MariaDB
detailed_items
Section titled “detailed_items”Some service exposes metrics per item (like per tables, per databases…). This allows to configure for which items metrics should be collected. Those services also expose global metrics which are enabled by default.
This is supported by the following services:
- PostgreSQL: for detailed metrics on databases
- Cassandra: for detailed metrics on tables
- Kafka: for detailed metrics on topics
included_items, excluded_items
Section titled “included_items, excluded_items”Some service exposes metrics per item (like per jobs…). This allows to configure for which items metrics should be collected. Unlike detailed_items, this is used on service that don’t expose a global metric which is the aggregation of each per item metrics.
This is supported by the following services:
- Jenkins: for metrics per job
jmx_port, jmx_username, jmx_password, jmx_metrics
Section titled “jmx_port, jmx_username, jmx_password, jmx_metrics”For a Java service, configure how to collect metrics using JMX. See Java metrics for details.
ssl, ssl_insecure, starttls, ca_file, cert_file, key_file
Section titled “ssl, ssl_insecure, starttls, ca_file, cert_file, key_file”Some service could expose plain-text or TLS version of the service. This allows to enable TLS or stay in plain-text (default).
This is supported by the following services:
- OpenLDAP
For example:
service: - type: "openldap" address: "127.0.0.1" port: 389 starttls: true # By default SSL certificate are verified. This means that your OpenLDAP service need to either # to use a certificate signed by trusted authority or provide the ca_file to your self-signed CA. ssl_insecure: false ca_file: "/path/to/your-self-signed-ca.crt"log_files, log_format, log_filter
Section titled “log_files, log_format, log_filter”To customize the handling of log files related to a service, two approaches are possible:
-
Specifying a log format and/or a log filter that will apply to all related log files:
service:- type: "my-app"...log_format: "my-app-format"log_filter: "my-app-filter" -
Specifying the format/filter for each file (or file pattern):
- type: "my-app"...log_files:- file_path: "/var/log/my-app/access.log"log_format: "my-app-access-format"log_filter: "my-app-access-filter"- file_path: "/var/log/my-app/error.*.log"log_format: "my-app-error-format"log_filter: "my-app-error-filter"When using the second approach, if an item of the
log_fileslist has thelog_formatorlog_filterproperty left blank, the value of the service’slog_formatandlog_filterwill be used (if defined).
Services with default log support
Section titled “Services with default log support”The services for which Glouton comes with default log configuration are: