b4bed1d2c36a40047ad6ed68ea1f2c8ba5e110c8 - github/openstack-infra/zuul

commit	b4bed1d2c36a40047ad6ed68ea1f2c8ba5e110c8	[log] [tgz]
author	James E. Blair <jeblair@redhat.com>	Tue Feb 06 13:43:50 2018 -0800
committer	James E. Blair <jeblair@redhat.com>	Tue Feb 06 15:40:28 2018 -0800
tree	c80d74b297a4bc2b2984684194a618b86e77339e
parent	acb632d51b06e3977fc27daafd134a23408796d0 [diff]

Fix stuck node requests across ZK reconnection

When a request is fulfilled by nodepool, we add it to the scheduler's
event queue, and later, the scheduler processes the event and accepts
the nodes.  If there is a ZooKeeper disconnection in the interim, then
we will have noticed it and not locked the nodes, however, the scheduler
will still pass on the request to the pipeline manager and we will
attempt to run jobs on the unlocked nodes, which will continually
fail.

This change extends the handling of a lost request so that if it happens,
we retry the request (which is what would happen if the request is lucky
enough to have been lost before fulfillment).

This extends the fix in 94e95886e2179f4a6aeecad687509bc7b1ab7fd3.

Change-Id: If81a790ed8b16594f4f9186d9256200b8d5e707e

3 files changed

tree: c80d74b297a4bc2b2984684194a618b86e77339e