commit	dc6e7726bf742fae4f576dd8b95ae2d800378420	[log] [tgz]
author	Jan Kundrát <jan.kundrat@cesnet.cz>	Thu Mar 14 23:13:41 2024 +0100
committer	Jan Kundrát <jan.kundrat@cesnet.cz>	Thu Mar 14 23:23:41 2024 +0100
tree	c920c8223a5bd22b6e060002fb60e9d5e9d14b96
parent	7d61c678b26a9db60cb752476d450dba2b2e5b59 [diff]

lab: save the most recent logs whenever a service crashes We have a central systemd-journald "syslog" server these days, but the logs are very, very verbose, including a full copy of the SPI traffic, for example. This has some merit, but at the same time the log volume is just too much, even in a lab setup. Let's store the most recent one minute worth of logging in case something crashed on any given lab device. This is implemented through a simple Python script which sets up a filter which listens for all systemd messages which say that any service has failed. Once that happens, the code spawns two processes: a `journalctl` for exporting the relevant part of the recent logs, and a `systemd-journal-remote` for storing that just-exported stuff into a native journal file on disk. This two-step thingy is required because `journalctl` cannot really produce a native journal file on disk, and I was thinking that it's a good idea to actually have these stored in a native format -- if only because it allows for some easy filtering. The code also dumps (a part of) that log into a text file, just for convenience. To deploy this, simply run: ansible-playbook -i production site.yml -l czl-logs This includes a workaround for "too old" systemd which by default just wouldn't rotate the log files that are captured from a remote journal. The new files with the "relevant snippet of the logs", however, are *not* rotated in any manner; in my testing it's about 16MB per crash. This means that we have space for about 1500 crashes on that 30GB rootfs, which Should Be Enough For Everybody™. Change-Id: I9261247608cfcc4afe373e72935489c66064e8dd

tree: c920c8223a5bd22b6e060002fb60e9d5e9d14b96

README.md

Continuous Integration (CI) Setup via Ansible

This is what is currently powering the CI infrastructure tied to our Gerrit. It's mostly about Zuul v3 with Nodepool, log storage, etc.

Note that some pieces (Gerrit itself in particular) are still deployed via Puppet for legacy reasons. That configuration is internal.

# Example: provision the Zuul server
ansible-playbook -i production site.yml -l zuul-server