blob: 0fdb3b22331007225d45e5b763064ed25a059596 [file] [log] [blame]
James E. Blaireff5a9d2017-06-20 00:00:37 -07001:title: Monitoring
2
3Monitoring
4==========
Antoine Mussoa8eea7d2013-10-05 16:08:00 +02005
James E. Blairded241e2017-10-10 13:22:40 -07006.. _statsd:
7
Antoine Mussoa8eea7d2013-10-05 16:08:00 +02008Statsd reporting
James E. Blaireff5a9d2017-06-20 00:00:37 -07009----------------
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020010
11Zuul comes with support for the statsd protocol, when enabled and configured
Michael Prokop526926a2013-10-24 16:16:57 +020012(see below), the Zuul scheduler will emit raw metrics to a statsd receiver
James E. Blaireff5a9d2017-06-20 00:00:37 -070013which let you in turn generate nice graphics.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020014
15Configuration
James E. Blaireff5a9d2017-06-20 00:00:37 -070016~~~~~~~~~~~~~
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020017
James E. Blairded241e2017-10-10 13:22:40 -070018Statsd support uses the ``statsd`` python module. Note that support
19is optional and Zuul will start without the statsd python module
20present.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020021
James E. Blairded241e2017-10-10 13:22:40 -070022Configuration is in the :attr:`statsd` section of ``zuul.conf``.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020023
24Metrics
James E. Blaireff5a9d2017-06-20 00:00:37 -070025~~~~~~~
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020026
David Shrewsbury1c61c712017-08-16 16:02:33 -040027These metrics are emitted by the Zuul :ref:`scheduler`:
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020028
Tobias Henkel60a85472018-01-31 11:16:15 +010029.. stat:: zuul.event.<driver>.<type>
James E. Blair91c9dde2017-08-04 11:10:24 -070030 :type: counter
James E. Blaireff5a9d2017-06-20 00:00:37 -070031
James E. Blair80ac1582017-10-09 07:02:40 -070032 Zuul will report counters for each type of event it receives from
33 each of its configured drivers.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020034
James E. Blairfaf81982017-10-10 15:42:26 -070035.. stat:: zuul.tenant.<tenant>.pipeline
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020036
James E. Blair91c9dde2017-08-04 11:10:24 -070037 Holds metrics specific to jobs. This hierarchy includes:
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020038
James E. Blair91c9dde2017-08-04 11:10:24 -070039 .. stat:: <pipeline name>
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020040
James E. Blair91c9dde2017-08-04 11:10:24 -070041 A set of metrics for each pipeline named as defined in the Zuul
42 config.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020043
James E. Blair91c9dde2017-08-04 11:10:24 -070044 .. stat:: all_jobs
45 :type: counter
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020046
James E. Blair91c9dde2017-08-04 11:10:24 -070047 Number of jobs triggered by the pipeline.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020048
James E. Blair91c9dde2017-08-04 11:10:24 -070049 .. stat:: current_changes
50 :type: gauge
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020051
James E. Blair91c9dde2017-08-04 11:10:24 -070052 The number of items currently being processed by this
53 pipeline.
Antoine Mussoa8eea7d2013-10-05 16:08:00 +020054
James E. Blair80ac1582017-10-09 07:02:40 -070055 .. stat:: project
James E. Blair91c9dde2017-08-04 11:10:24 -070056
James E. Blair80ac1582017-10-09 07:02:40 -070057 This hierarchy holds more specific metrics for each project
58 participating in the pipeline.
James E. Blair91c9dde2017-08-04 11:10:24 -070059
James E. Blair80ac1582017-10-09 07:02:40 -070060 .. stat:: <canonical_hostname>
James E. Blair91c9dde2017-08-04 11:10:24 -070061
James E. Blair80ac1582017-10-09 07:02:40 -070062 The canonical hostname for the triggering project.
63 Embedded ``.`` characters will be translated to ``_``.
James E. Blair91c9dde2017-08-04 11:10:24 -070064
James E. Blair80ac1582017-10-09 07:02:40 -070065 .. stat:: <project>
James E. Blair91c9dde2017-08-04 11:10:24 -070066
James E. Blair80ac1582017-10-09 07:02:40 -070067 The name of the triggering project. Embedded ``/`` or
68 ``.`` characters will be translated to ``_``.
69
70 .. stat:: <branch>
71
72 The name of the triggering branch. Embedded ``/`` or
73 ``.`` characters will be translated to ``_``.
74
75 .. stat:: job
76
77 Subtree detailing per-project job statistics:
78
79 .. stat:: <jobname>
80
81 The triggered job name.
82
83 .. stat:: <result>
84 :type: counter, timer
85
86 A counter for each type of result (e.g., ``SUCCESS`` or
87 ``FAILURE``, ``ERROR``, etc.) for the job. If the
88 result is ``SUCCESS`` or ``FAILURE``, Zuul will
89 additionally report the duration of the build as a
90 timer.
91
92 .. stat:: current_changes
93 :type: gauge
94
95 The number of items of this project currently being
96 processed by this pipeline.
97
98 .. stat:: resident_time
99 :type: timer
100
101 A timer metric reporting how long each item for this
102 project has been in the pipeline.
103
104 .. stat:: total_changes
105 :type: counter
106
107 The number of changes for this project processed by the
108 pipeline since Zuul started.
James E. Blair91c9dde2017-08-04 11:10:24 -0700109
110 .. stat:: resident_time
111 :type: timer
112
113 A timer metric reporting how long each item has been in the
114 pipeline.
115
116 .. stat:: total_changes
117 :type: counter
118
David Shrewsbury1c61c712017-08-16 16:02:33 -0400119 The number of changes processed by the pipeline since Zuul
James E. Blair91c9dde2017-08-04 11:10:24 -0700120 started.
121
122 .. stat:: wait_time
123 :type: timer
124
125 How long each item spent in the pipeline before its first job
126 started.
127
James E. Blairfaf81982017-10-10 15:42:26 -0700128.. stat:: zuul.executor.<executor>
129
130 Holds metrics emitted by individual executors. The ``<executor>``
131 component of the key will be replaced with the hostname of the
132 executor.
133
134 .. stat:: builds
135 :type: counter
136
137 Incremented each time the executor starts a build.
138
139 .. stat:: running_builds
140 :type: gauge
141
142 The number of builds currently running on this executor.
143
144 .. stat:: load_average
145 :type: gauge
146
147 The one-minute load average of this executor, multiplied by 100.
148
James E. Blair40ca3792018-01-31 14:22:07 -0800149 .. stat:: pct_available_ram
150 :type: gauge
151
152 The available RAM (including buffers and cache) on this
153 executor, as a percentage multiplied by 100.
154
James E. Blair4f1731b2017-10-10 18:11:42 -0700155.. stat:: zuul.nodepool
156
157 Holds metrics related to Zuul requests from Nodepool.
158
159 .. stat:: requested
160 :type: counter
161
162 Incremented each time a node request is submitted to Nodepool.
163
164 .. stat:: label.<label>
165 :type: counter
166
167 Incremented each time a request for a specific label is
168 submitted to Nodepool.
169
170 .. stat:: size.<size>
171 :type: counter
172
173 Incremented each time a request of a specific size is submitted
174 to Nodepool. For example, a request for 3 nodes would use the
175 key ``zuul.nodepool.requested.size.3``.
176
177 .. stat:: canceled
178 :type: counter, timer
179
180 The counter is incremented each time a node request is canceled
181 by Zuul. The timer records the elapsed time from request to
182 cancelation.
183
184 .. stat:: label.<label>
185 :type: counter, timer
186
187 The same, for a specific label.
188
189 .. stat:: size.<size>
190 :type: counter, timer
191
192 The same, for a specific request size.
193
194 .. stat:: fulfilled
195 :type: counter, timer
196
197 The counter is incremented each time a node request is fulfilled
198 by Nodepool. The timer records the elapsed time from request to
199 fulfillment.
200
201 .. stat:: label.<label>
202 :type: counter, timer
203
204 The same, for a specific label.
205
206 .. stat:: size.<size>
207 :type: counter, timer
208
209 The same, for a specific request size.
210
211 .. stat:: failed
212 :type: counter, timer
213
214 The counter is incremented each time Nodepool fails to fulfill a
215 node request. The timer records the elapsed time from request
216 to failure.
217
218 .. stat:: label.<label>
219 :type: counter, timer
220
221 The same, for a specific label.
222
223 .. stat:: size.<size>
224 :type: counter, timer
225
226 The same, for a specific request size.
227
228 .. stat:: current_requests
229 :type: gauge
230
231 The number of outstanding nodepool requests from Zuul.
232
James E. Blair4dd5f4b2017-10-23 07:44:08 -0700233.. stat:: zuul.mergers
234
235 Holds metrics related to Zuul mergers.
236
237 .. stat:: online
238 :type: gauge
239
240 The number of Zuul merger processes online.
241
242 .. stat:: jobs_running
243 :type: gauge
244
245 The number of merge jobs running.
246
247 .. stat:: jobs_queued
248 :type: gauge
249
250 The number of merge jobs queued.
251
252.. stat:: zuul.executors
253
254 Holds metrics related to Zuul executors.
255
256 .. stat:: online
257 :type: gauge
258
259 The number of Zuul executor processes online.
260
261 .. stat:: accepting
262 :type: gauge
263
264 The number of Zuul executor processes accepting new jobs.
265
266 .. stat:: jobs_running
267 :type: gauge
268
269 The number of executor jobs running.
270
271 .. stat:: jobs_queued
272 :type: gauge
273
274 The number of executor jobs queued.
275
James E. Blair91c9dde2017-08-04 11:10:24 -0700276
James E. Blair80ac1582017-10-09 07:02:40 -0700277As an example, given a job named `myjob` in `mytenant` triggered by a
278change to `myproject` on the `master` branch in the `gate` pipeline
279which took 40 seconds to build, the Zuul scheduler will emit the
280following statsd events:
James E. Blair91c9dde2017-08-04 11:10:24 -0700281
James E. Blair80ac1582017-10-09 07:02:40 -0700282 * ``zuul.tenant.mytenant.pipeline.gate.project.example_com.myproject.master.job.myjob.SUCCESS`` +1
283 * ``zuul.tenant.mytenant.pipeline.gate.project.example_com.myproject.master.job.myjob.SUCCESS`` 40 seconds
284 * ``zuul.tenant.mytenant.pipeline.gate.all_jobs`` +1