lab: save the most recent logs whenever a service crashes

We have a central systemd-journald "syslog" server these days, but the
logs are very, very verbose, including a full copy of the SPI traffic,
for example. This has some merit, but at the same time the log volume is
just too much, even in a lab setup. Let's store the most recent one
minute worth of logging in case something crashed on any given lab
device.

This is implemented through a simple Python script which sets up a
filter which listens for all systemd messages which say that any service
has failed. Once that happens, the code spawns two processes: a
`journalctl` for exporting the relevant part of the recent logs, and a
`systemd-journal-remote` for storing that just-exported stuff into a
native journal file on disk. This two-step thingy is required because
`journalctl` cannot really produce a native journal file on disk, and I
was thinking that it's a good idea to actually have these stored in a
native format -- if only because it allows for some easy filtering. The
code also dumps (a part of) that log into a text file, just for
convenience.

To deploy this, simply run:

  ansible-playbook -i production site.yml -l czl-logs

This includes a workaround for "too old" systemd which by default just
wouldn't rotate the log files that are captured from a remote journal.
The new files with the "relevant snippet of the logs", however, are
*not* rotated in any manner; in my testing it's about 16MB per crash.
This means that we have space for about 1500 crashes on that 30GB
rootfs, which Should Be Enough For Everybody™.

Change-Id: I9261247608cfcc4afe373e72935489c66064e8dd
7 files changed
tree: c920c8223a5bd22b6e060002fb60e9d5e9d14b96
  1. README.md
  2. ansible.cfg
  3. doc/
  4. files/
  5. group_vars/
  6. production
  7. requirements.yml
  8. roles/
  9. site.yml
README.md

Continuous Integration (CI) Setup via Ansible

This is what is currently powering the CI infrastructure tied to our Gerrit. It's mostly about Zuul v3 with Nodepool, log storage, etc.

Note that some pieces (Gerrit itself in particular) are still deployed via Puppet for legacy reasons. That configuration is internal.

# Example: provision the Zuul server
ansible-playbook -i production site.yml -l zuul-server