blob: fbcedad043e2f640ebf3f1712ea4181af2023f1a [file] [log] [blame]
James E. Blaireff5a9d2017-06-20 00:00:37 -07001:title: Monitoring
2
3Monitoring
4==========
Antoine Mussoa8eea7d2013-10-05 16:08:00 +02005
James E. Blairded241e2017-10-10 13:22:40 -07006.. _statsd:
7
Antoine Mussoa8eea7d2013-10-05 16:08:00 +02008Statsd reporting
James E. Blaireff5a9d2017-06-20 00:00:37 -07009----------------
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020010
11Zuul comes with support for the statsd protocol, when enabled and configured
Michael Prokop526926a2013-10-24 16:16:57 +020012(see below), the Zuul scheduler will emit raw metrics to a statsd receiver
James E. Blaireff5a9d2017-06-20 00:00:37 -070013which let you in turn generate nice graphics.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020014
15Configuration
James E. Blaireff5a9d2017-06-20 00:00:37 -070016~~~~~~~~~~~~~
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020017
James E. Blairded241e2017-10-10 13:22:40 -070018Statsd support uses the ``statsd`` python module. Note that support
19is optional and Zuul will start without the statsd python module
20present.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020021
James E. Blairded241e2017-10-10 13:22:40 -070022Configuration is in the :attr:`statsd` section of ``zuul.conf``.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020023
24Metrics
James E. Blaireff5a9d2017-06-20 00:00:37 -070025~~~~~~~
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020026
David Shrewsbury1c61c712017-08-16 16:02:33 -040027These metrics are emitted by the Zuul :ref:`scheduler`:
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020028
Tobias Henkel60a85472018-01-31 11:16:15 +010029.. stat:: zuul.event.<driver>.<type>
James E. Blair91c9dde2017-08-04 11:10:24 -070030 :type: counter
James E. Blaireff5a9d2017-06-20 00:00:37 -070031
James E. Blair80ac1582017-10-09 07:02:40 -070032 Zuul will report counters for each type of event it receives from
33 each of its configured drivers.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020034
James E. Blairfaf81982017-10-10 15:42:26 -070035.. stat:: zuul.tenant.<tenant>.pipeline
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020036
James E. Blair91c9dde2017-08-04 11:10:24 -070037 Holds metrics specific to jobs. This hierarchy includes:
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020038
James E. Blair91c9dde2017-08-04 11:10:24 -070039 .. stat:: <pipeline name>
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020040
James E. Blair91c9dde2017-08-04 11:10:24 -070041 A set of metrics for each pipeline named as defined in the Zuul
42 config.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020043
James E. Blair91c9dde2017-08-04 11:10:24 -070044 .. stat:: all_jobs
45 :type: counter
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020046
James E. Blair91c9dde2017-08-04 11:10:24 -070047 Number of jobs triggered by the pipeline.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020048
James E. Blair91c9dde2017-08-04 11:10:24 -070049 .. stat:: current_changes
50 :type: gauge
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020051
James E. Blair91c9dde2017-08-04 11:10:24 -070052 The number of items currently being processed by this
53 pipeline.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020054
James E. Blair80ac1582017-10-09 07:02:40 -070055 .. stat:: project
James E. Blair91c9dde2017-08-04 11:10:24 -070056
James E. Blair80ac1582017-10-09 07:02:40 -070057 This hierarchy holds more specific metrics for each project
58 participating in the pipeline.
James E. Blair91c9dde2017-08-04 11:10:24 -070059
James E. Blair80ac1582017-10-09 07:02:40 -070060 .. stat:: <canonical_hostname>
James E. Blair91c9dde2017-08-04 11:10:24 -070061
James E. Blair80ac1582017-10-09 07:02:40 -070062 The canonical hostname for the triggering project.
63 Embedded ``.`` characters will be translated to ``_``.
James E. Blair91c9dde2017-08-04 11:10:24 -070064
James E. Blair80ac1582017-10-09 07:02:40 -070065 .. stat:: <project>
James E. Blair91c9dde2017-08-04 11:10:24 -070066
James E. Blair80ac1582017-10-09 07:02:40 -070067 The name of the triggering project. Embedded ``/`` or
68 ``.`` characters will be translated to ``_``.
69
70 .. stat:: <branch>
71
72 The name of the triggering branch. Embedded ``/`` or
73 ``.`` characters will be translated to ``_``.
74
75 .. stat:: job
76
77 Subtree detailing per-project job statistics:
78
79 .. stat:: <jobname>
80
81 The triggered job name.
82
83 .. stat:: <result>
84 :type: counter, timer
85
86 A counter for each type of result (e.g., ``SUCCESS`` or
87 ``FAILURE``, ``ERROR``, etc.) for the job. If the
88 result is ``SUCCESS`` or ``FAILURE``, Zuul will
89 additionally report the duration of the build as a
90 timer.
91
92 .. stat:: current_changes
93 :type: gauge
94
95 The number of items of this project currently being
96 processed by this pipeline.
97
98 .. stat:: resident_time
99 :type: timer
100
101 A timer metric reporting how long each item for this
102 project has been in the pipeline.
103
104 .. stat:: total_changes
105 :type: counter
106
107 The number of changes for this project processed by the
108 pipeline since Zuul started.
James E. Blair91c9dde2017-08-04 11:10:24 -0700109
110 .. stat:: resident_time
111 :type: timer
112
113 A timer metric reporting how long each item has been in the
114 pipeline.
115
116 .. stat:: total_changes
117 :type: counter
118
David Shrewsbury1c61c712017-08-16 16:02:33 -0400119 The number of changes processed by the pipeline since Zuul
James E. Blair91c9dde2017-08-04 11:10:24 -0700120 started.
121
122 .. stat:: wait_time
123 :type: timer
124
125 How long each item spent in the pipeline before its first job
126 started.
127
James E. Blairfaf81982017-10-10 15:42:26 -0700128.. stat:: zuul.executor.<executor>
129
130 Holds metrics emitted by individual executors. The ``<executor>``
131 component of the key will be replaced with the hostname of the
132 executor.
133
David Moreau Simard12671442018-02-06 16:40:33 -0500134 .. stat:: merger.<result>
135 :type: counter
136
137 Incremented to represent the status of a Zuul executor's merger
138 operations. ``<result>`` can be either ``SUCCESS`` or ``FAILURE``.
139 A failed merge operation which would be accounted for as a ``FAILURE``
140 is what ends up being returned by Zuul as a ``MERGER_FAILURE``.
141
James E. Blairfaf81982017-10-10 15:42:26 -0700142 .. stat:: builds
143 :type: counter
144
145 Incremented each time the executor starts a build.
146
James E. Blairdf37ad22018-02-01 13:59:48 -0800147 .. stat:: starting_builds
148 :type: gauge
149
150 The number of builds starting on this executor. These are
151 builds which have not yet begun their first pre-playbook.
152
James E. Blairfaf81982017-10-10 15:42:26 -0700153 .. stat:: running_builds
154 :type: gauge
155
James E. Blairdf37ad22018-02-01 13:59:48 -0800156 The number of builds currently running on this executor. This
157 includes starting builds.
James E. Blairfaf81982017-10-10 15:42:26 -0700158
David Moreau Simard12671442018-02-06 16:40:33 -0500159 .. stat:: phase
160
161 Subtree detailing per-phase execution statistics:
162
163 .. stat:: <phase>
164
165 ``<phase>`` represents a phase in the execution of a job.
166 This can be an *internal* phase (such as ``setup`` or ``cleanup``) as
167 well as *job* phases such as ``pre``, ``run`` or ``post``.
168
169 .. stat:: <result>
170 :type: counter
171
172 A counter for each type of result.
173 These results do not, by themselves, determine the status of a build
174 but are indicators of the exit status provided by Ansible for the
175 execution of a particular phase.
176
177 Example of possible counters for each phase are: ``RESULT_NORMAL``,
178 ``RESULT_TIMED_OUT``, ``RESULT_UNREACHABLE``, ``RESULT_ABORTED``.
179
James E. Blairfaf81982017-10-10 15:42:26 -0700180 .. stat:: load_average
181 :type: gauge
182
183 The one-minute load average of this executor, multiplied by 100.
184
James E. Blaira4f94a12018-02-14 15:39:57 -0800185 .. stat:: pct_used_ram
James E. Blair40ca3792018-01-31 14:22:07 -0800186 :type: gauge
187
James E. Blaira4f94a12018-02-14 15:39:57 -0800188 The used RAM (excluding buffers and cache) on this executor, as
189 a percentage multiplied by 100.
James E. Blair40ca3792018-01-31 14:22:07 -0800190
James E. Blair4f1731b2017-10-10 18:11:42 -0700191.. stat:: zuul.nodepool
192
193 Holds metrics related to Zuul requests from Nodepool.
194
195 .. stat:: requested
196 :type: counter
197
198 Incremented each time a node request is submitted to Nodepool.
199
200 .. stat:: label.<label>
201 :type: counter
202
203 Incremented each time a request for a specific label is
204 submitted to Nodepool.
205
206 .. stat:: size.<size>
207 :type: counter
208
209 Incremented each time a request of a specific size is submitted
210 to Nodepool. For example, a request for 3 nodes would use the
211 key ``zuul.nodepool.requested.size.3``.
212
213 .. stat:: canceled
214 :type: counter, timer
215
216 The counter is incremented each time a node request is canceled
217 by Zuul. The timer records the elapsed time from request to
218 cancelation.
219
220 .. stat:: label.<label>
221 :type: counter, timer
222
223 The same, for a specific label.
224
225 .. stat:: size.<size>
226 :type: counter, timer
227
228 The same, for a specific request size.
229
230 .. stat:: fulfilled
231 :type: counter, timer
232
233 The counter is incremented each time a node request is fulfilled
234 by Nodepool. The timer records the elapsed time from request to
235 fulfillment.
236
237 .. stat:: label.<label>
238 :type: counter, timer
239
240 The same, for a specific label.
241
242 .. stat:: size.<size>
243 :type: counter, timer
244
245 The same, for a specific request size.
246
247 .. stat:: failed
248 :type: counter, timer
249
250 The counter is incremented each time Nodepool fails to fulfill a
251 node request. The timer records the elapsed time from request
252 to failure.
253
254 .. stat:: label.<label>
255 :type: counter, timer
256
257 The same, for a specific label.
258
259 .. stat:: size.<size>
260 :type: counter, timer
261
262 The same, for a specific request size.
263
264 .. stat:: current_requests
265 :type: gauge
266
267 The number of outstanding nodepool requests from Zuul.
268
James E. Blair4dd5f4b2017-10-23 07:44:08 -0700269.. stat:: zuul.mergers
270
271 Holds metrics related to Zuul mergers.
272
273 .. stat:: online
274 :type: gauge
275
276 The number of Zuul merger processes online.
277
278 .. stat:: jobs_running
279 :type: gauge
280
281 The number of merge jobs running.
282
283 .. stat:: jobs_queued
284 :type: gauge
285
286 The number of merge jobs queued.
287
288.. stat:: zuul.executors
289
290 Holds metrics related to Zuul executors.
291
292 .. stat:: online
293 :type: gauge
294
295 The number of Zuul executor processes online.
296
297 .. stat:: accepting
298 :type: gauge
299
300 The number of Zuul executor processes accepting new jobs.
301
302 .. stat:: jobs_running
303 :type: gauge
304
305 The number of executor jobs running.
306
307 .. stat:: jobs_queued
308 :type: gauge
309
310 The number of executor jobs queued.
311
James E. Blair91c9dde2017-08-04 11:10:24 -0700312
James E. Blair80ac1582017-10-09 07:02:40 -0700313As an example, given a job named `myjob` in `mytenant` triggered by a
314change to `myproject` on the `master` branch in the `gate` pipeline
315which took 40 seconds to build, the Zuul scheduler will emit the
316following statsd events:
James E. Blair91c9dde2017-08-04 11:10:24 -0700317
James E. Blair80ac1582017-10-09 07:02:40 -0700318 * ``zuul.tenant.mytenant.pipeline.gate.project.example_com.myproject.master.job.myjob.SUCCESS`` +1
319 * ``zuul.tenant.mytenant.pipeline.gate.project.example_com.myproject.master.job.myjob.SUCCESS`` 40 seconds
320 * ``zuul.tenant.mytenant.pipeline.gate.all_jobs`` +1