blob: 2a6c959abebe5044a592597d37524d630b91d928 [file] [log] [blame]
:title: Monitoring
Monitoring
==========
Statsd reporting
----------------
Zuul comes with support for the statsd protocol, when enabled and configured
(see below), the Zuul scheduler will emit raw metrics to a statsd receiver
which let you in turn generate nice graphics.
Configuration
~~~~~~~~~~~~~
Statsd support uses the statsd python module. Note that Zuul will start without
the statsd python module, so an existing Zuul installation may be missing it.
The configuration is done via environment variables STATSD_HOST and
STATSD_PORT. They are interpreted by the statsd module directly and there is no
such parameter in zuul.conf yet. Your init script will have to initialize both
of them before executing Zuul.
Your init script most probably loads a configuration file named
``/etc/default/zuul`` which would contain the environment variables::
$ cat /etc/default/zuul
STATSD_HOST=10.0.0.1
STATSD_PORT=8125
Metrics
~~~~~~~
The metrics are emitted by the Zuul scheduler (`zuul/scheduler.py`):
**gerrit.event.<type> (counters)**
Gerrit emits different kind of message over its `stream-events`
interface. Zuul will report counters for each type of event it
receives from Gerrit.
Some of the events emitted are:
* patchset-created
* draft-published
* change-abandonned
* change-restored
* change-merged
* merge-failed
* comment-added
* ref-updated
* reviewer-added
Refer to your Gerrit installation documentation for an exhaustive list of
Gerrit event types.
**zuul.pipeline.**
Holds metrics specific to jobs. The hierarchy is:
#. **<pipeline name>** as defined in your `layout.yaml` file (ex: `gate`,
`test`, `publish`). It contains:
#. **all_jobs** counter of jobs triggered by the pipeline.
#. **current_changes** A gauge for the number of Gerrit changes being
processed by this pipeline.
#. **job** subtree detailing per jobs statistics:
#. **<jobname>** The triggered job name.
#. **<build result>** Result as defined in your triggering system. For
Jenkins that would be SUCCESS, FAILURE, UNSTABLE, LOST. The
metrics holds both an increasing counter and a timing
reporting the duration of the build. Whenever the result is a
SUCCESS or FAILURE, Zuul will additionally report the duration
of the build as a timing event.
#. **resident_time** timing representing how long the Change has been
known by Zuul (which includes build time and Zuul overhead).
#. **total_changes** counter of the number of change proceeding since
Zuul started.
#. **wait_time** counter and timer of the wait time, with the difference
of the job start time and the execute time, in milliseconds.
Additionally, the `zuul.pipeline.<pipeline name>` hierarchy contains
`current_changes` (gauge), `resident_time` (timing) and `total_changes`
(counter) metrics for each projects. The slash separator used in Gerrit name
being replaced by dots.
As an example, given a job named `myjob` triggered by the `gate` pipeline
which took 40 seconds to build, the Zuul scheduler will emit the following
statsd events:
* `zuul.pipeline.gate.job.myjob.SUCCESS` +1
* `zuul.pipeline.gate.job.myjob` 40 seconds
* `zuul.pipeline.gate.all_jobs` +1