Zuul references cleaner
Zuul mergers create a vast number of git references under /refs/zuul
which are never garbage collected.
With hundred of thousands of references, that makes git fetch operations
very slow since git uploads all references to Gerrit to synchronize the
Zuul maintained repository. On one of Wikimedia busy repository
(mediawiki/core) we had 55000 such references and it can take up to 18
seconds for a fetch to complete. I have seen occurences of a merge
taking 2 minutes to complete.
As such, this tiny script clears out references for which the commit date
of the pointed commit object is older than 360 days (the default).
It is not perfect since a recent reference can well point to an old
object. That would be the case on repositories that are barely active.
In such case the ref will be gone despite it being recently created.
A better way would be to vary Zuul references by using month/day which
will let one easily garbage collect them. But I am being lazy and that
would not let us clear out references using the current scheme.
Example usage:
zuul-clear-refs.py --verbose --dry-run --until 90 /srv/zuul/git/project
Would show a list of references pointing to commit dates older than 90
days and output a message whenever the script would delete them.
Hint about the utility in our merger documentation.
Reference:
https://phabricator.wikimedia.org/T70481
Change-Id: Id4e55f5d571ebd5e8271e516f53f8e05c1f78c1a
diff --git a/doc/source/merger.rst b/doc/source/merger.rst
index e01bc8c..82e204b 100644
--- a/doc/source/merger.rst
+++ b/doc/source/merger.rst
@@ -58,3 +58,17 @@
depending on what the state of Zuul's repository is when the clone
happens). They are, however, suitable for automated systems that
respond to Zuul triggers.
+
+Clearing old references
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The references created under refs/zuul are not garbage collected. Since
+git fetch send them all to Gerrit to sync the repositories, the time
+spent on merge will slightly grow overtime and start being noticeable.
+
+To clean them you can use the ``tools/zuul-clear-refs.py`` script on
+each repositories. It will delete Zuul references that point to commits
+for which the commit date is older than a given amount of days (default
+360)::
+
+ ./tools/zuul-clear-refs.py /path/to/zuul/git/repo