Merge "Merger: log non Git exceptions on merge"
diff --git a/doc/source/gating.rst b/doc/source/gating.rst
index f3f2d3c..43a5928 100644
--- a/doc/source/gating.rst
+++ b/doc/source/gating.rst
@@ -28,6 +28,9 @@
Zuul was designed to handle the workflow of the OpenStack project, but
can be used with any project.
+Testing in parallel
+-------------------
+
A particular focus of Zuul is ensuring correctly ordered testing of
changes in parallel. A gating system should always test each change
applied to the tip of the branch exactly as it is going to be merged.
@@ -208,3 +211,72 @@
}
}
+
+Cross projects dependencies
+---------------------------
+
+When your projects are closely coupled together, you want to make sure
+changes entering the gate are going to be tested with the version of
+other projects currently enqueued in the gate (since they will
+eventually be merged and might introduce breaking features).
+
+Such dependencies can be defined in Zuul configuration by registering a job
+in a DependentPipeline of several projects. Whenever a change enters such a
+pipeline, it will create references for the other projects as well. As an
+example, given a main project ``acme`` and a plugin ``plugin`` you can
+define a job ``acme-tests`` which should be run for both projects:
+
+.. code-block:: yaml
+
+ pipelines:
+ - name: gate
+ manager: DependentPipelineManager
+
+ projects::
+ - name: acme
+ gate:
+ - acme-tests
+ - name: plugin
+ gate:
+ - acme-tests # Register job again
+
+Whenever a change enters the ``gate`` pipeline queue, Zuul creates a reference
+for it. For each subsequent change, an additional reference is created for the
+changes ahead in the queue. As a result, you will always be able to fetch the
+future state of your project dependencies for each change in the queue.
+
+Based on the pipeline and project definitions above, three changes are
+inserted in the ``gate`` pipeline with the associated references:
+
+ ======== ======= ====== =========
+ Change Project Branch Zuul Ref.
+ ======== ======= ====== =========
+ Change 1 acme master master/Z1
+ Change 2 plugin stable stable/Z2
+ Change 3 plugin master master/Z3
+ ======== ======= ====== =========
+
+Since the changes enter a DependentPipelineManager pipeline, Zuul creates
+additional references:
+
+ ====== ======= ========= =============================
+ Change Project Zuul Ref. Description
+ ====== ======= ========= =============================
+ 1 acme master/Z1 acme master + change 1
+ ------ ------- --------- -----------------------------
+ 2 acme master/Z2 acme master + change 1
+ 2 plugin stable/Z2 plugin stable + change 2
+ ------ ------- --------- -----------------------------
+ 3 acme master/Z3 acme master + change 1
+ 3 plugin stable/Z3 plugin stable + change 2
+ 3 plugin master/Z3 plugin master + change 3
+ ====== ======= ========= =============================
+
+In order to test change 3, you would clone both repositories and simply
+fetch the Z3 reference for each combination of project/branch you are
+interested in testing. For example, you could fetch ``acme`` with
+master/Z3 and ``plugin`` with master/Z3 and thus have ``acme`` with
+change 1 applied as the expected state for when Change 3 would merge.
+When your job fetches several repositories without changes ahead in the
+queue, they may not have a Z reference in which case you can just check
+out the branch.
diff --git a/doc/source/launchers.rst b/doc/source/launchers.rst
index c56d6e9..db49933 100644
--- a/doc/source/launchers.rst
+++ b/doc/source/launchers.rst
@@ -87,8 +87,8 @@
**ZUUL_PIPELINE**
The Zuul pipeline that is building this job
**ZUUL_URL**
- The url for the zuul server as configured in zuul.conf.
- A test runner may use this URL as the basis for fetching
+ The url for the zuul server as configured in zuul.conf.
+ A test runner may use this URL as the basis for fetching
git commits.
The following additional parameters will only be provided for builds
@@ -195,6 +195,30 @@
The URL with the status or results of the build. Will be used in
the status page and the final report.
+To help with debugging builds a worker may send back some optional
+metadata:
+
+**worker_name** (optional)
+ The name of the worker.
+
+**worker_hostname** (optional)
+ The hostname of the worker.
+
+**worker_ips** (optional)
+ A list of IPs for the worker.
+
+**worker_fqdn** (optional)
+ The FQDN of the worker.
+
+**worker_program** (optional)
+ The program name of the worker. For example Jenkins or turbo-hipster.
+
+**worker_version** (optional)
+ The version of the software running the job.
+
+**worker_extra** (optional)
+ A dictionary of any extra metadata you may want to pass along.
+
It should then immediately send a WORK_STATUS packet with a value of 0
percent complete. It may then optionally send subsequent WORK_STATUS
packets with updated completion values.
diff --git a/doc/source/zuul.rst b/doc/source/zuul.rst
index ee70523..1a6a23d 100644
--- a/doc/source/zuul.rst
+++ b/doc/source/zuul.rst
@@ -87,11 +87,11 @@
``layout_config=/etc/zuul/layout.yaml``
**log_config**
- Path to log config file. Used by all Zuul commands.
+ Path to log config file. Used by zuul-server only.
``log_config=/etc/zuul/logging.yaml``
**pidfile**
- Path to PID lock file. Used by all Zuul commands.
+ Path to PID lock file. Used by zuul-server only.
``pidfile=/var/run/zuul/zuul.pid``
**state_dir**
@@ -143,6 +143,14 @@
"http://zuul.example.com/p" or "http://zuul-merger01.example.com/p"
depending on whether the merger is co-located with the Zuul server.
+**log_config**
+ Path to log config file for the merger process.
+ ``log_config=/etc/zuul/logging.yaml``
+
+**pidfile**
+ Path to PID lock file for the merger process.
+ ``pidfile=/var/run/zuul-merger/merger.pid``
+
smtp
""""
diff --git a/requirements.txt b/requirements.txt
index 170b5152..92bb296 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,5 +13,5 @@
extras
statsd>=1.0.0,<3.0
voluptuous>=0.7
-gear>=0.4.0,<1.0.0
+gear>=0.5.1,<1.0.0
apscheduler>=2.1.1,<3.0
diff --git a/tests/fixtures/layout-no-jobs.yaml b/tests/fixtures/layout-no-jobs.yaml
new file mode 100644
index 0000000..ee8dc62
--- /dev/null
+++ b/tests/fixtures/layout-no-jobs.yaml
@@ -0,0 +1,43 @@
+includes:
+ - python-file: custom_functions.py
+
+pipelines:
+ - name: check
+ manager: IndependentPipelineManager
+ trigger:
+ gerrit:
+ - event: patchset-created
+ success:
+ gerrit:
+ verified: 1
+ failure:
+ gerrit:
+ verified: -1
+
+ - name: gate
+ manager: DependentPipelineManager
+ failure-message: Build failed. For information on how to proceed, see http://wiki.example.org/Test_Failures
+ trigger:
+ gerrit:
+ - event: comment-added
+ approval:
+ - approved: 1
+ success:
+ gerrit:
+ verified: 2
+ submit: true
+ failure:
+ gerrit:
+ verified: -2
+ start:
+ gerrit:
+ verified: 0
+ precedence: high
+
+projects:
+ - name: org/project
+ merge-mode: cherry-pick
+ check:
+ - noop
+ gate:
+ - noop
diff --git a/tests/fixtures/layout.yaml b/tests/fixtures/layout.yaml
index 98dfe86..b1c94de 100644
--- a/tests/fixtures/layout.yaml
+++ b/tests/fixtures/layout.yaml
@@ -231,3 +231,7 @@
- conflict-project-merge:
- conflict-project-test1
- conflict-project-test2
+
+ - name: org/noop-project
+ gate:
+ - noop
diff --git a/tests/test_scheduler.py b/tests/test_scheduler.py
index b2106f8..9576440 100755
--- a/tests/test_scheduler.py
+++ b/tests/test_scheduler.py
@@ -498,12 +498,23 @@
'name': self.name,
'number': self.number,
'manager': self.worker.worker_id,
+ 'worker_name': 'My Worker',
+ 'worker_hostname': 'localhost',
+ 'worker_ips': ['127.0.0.1', '192.168.1.1'],
+ 'worker_fqdn': 'zuul.example.org',
+ 'worker_program': 'FakeBuilder',
+ 'worker_version': 'v1.1',
+ 'worker_extra': {'something': 'else'}
}
+ self.log.debug('Running build %s' % self.unique)
+
self.job.sendWorkData(json.dumps(data))
+ self.log.debug('Sent WorkData packet with %s' % json.dumps(data))
self.job.sendWorkStatus(0, 100)
if self.worker.hold_jobs_in_build:
+ self.log.debug('Holding build %s' % self.unique)
self._wait()
self.log.debug("Build %s continuing" % self.unique)
@@ -2813,6 +2824,29 @@
self.assertReportedStat('test-timing', '3|ms')
self.assertReportedStat('test-guage', '12|g')
+ def test_stuck_job_cleanup(self):
+ "Test that pending jobs are cleaned up if removed from layout"
+ self.gearman_server.hold_jobs_in_queue = True
+ A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
+ A.addApproval('CRVW', 2)
+ self.fake_gerrit.addEvent(A.addApproval('APRV', 1))
+ self.waitUntilSettled()
+ self.assertEqual(len(self.gearman_server.getQueue()), 1)
+
+ self.config.set('zuul', 'layout_config',
+ 'tests/fixtures/layout-no-jobs.yaml')
+ self.sched.reconfigure(self.config)
+ self.waitUntilSettled()
+
+ self.gearman_server.release('noop')
+ self.waitUntilSettled()
+ self.assertEqual(len(self.gearman_server.getQueue()), 0)
+ self.assertTrue(self.sched._areAllBuildsComplete())
+
+ self.assertEqual(len(self.history), 1)
+ self.assertEqual(self.history[0].name, 'noop')
+ self.assertEqual(self.history[0].result, 'SUCCESS')
+
def test_file_jobs(self):
"Test that file jobs run only when appropriate"
A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
@@ -3568,3 +3602,41 @@
self.assertEqual(queue.window, 2)
self.assertEqual(queue.window_floor, 1)
self.assertEqual(C.data['status'], 'MERGED')
+
+ def test_worker_update_metadata(self):
+ "Test if a worker can send back metadata about itself"
+ self.worker.hold_jobs_in_build = True
+
+ A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
+ A.addApproval('CRVW', 2)
+ self.fake_gerrit.addEvent(A.addApproval('APRV', 1))
+ self.waitUntilSettled()
+
+ self.assertEqual(len(self.launcher.builds), 1)
+
+ self.log.debug('Current builds:')
+ self.log.debug(self.launcher.builds)
+
+ start = time.time()
+ while True:
+ if time.time() - start > 10:
+ raise Exception("Timeout waiting for gearman server to report "
+ + "back to the client")
+ build = self.launcher.builds.values()[0]
+ if build.worker.name == "My Worker":
+ break
+ else:
+ time.sleep(0)
+
+ self.log.debug(build)
+ self.assertEqual("My Worker", build.worker.name)
+ self.assertEqual("localhost", build.worker.hostname)
+ self.assertEqual(['127.0.0.1', '192.168.1.1'], build.worker.ips)
+ self.assertEqual("zuul.example.org", build.worker.fqdn)
+ self.assertEqual("FakeBuilder", build.worker.program)
+ self.assertEqual("v1.1", build.worker.version)
+ self.assertEqual({'something': 'else'}, build.worker.extra)
+
+ self.worker.hold_jobs_in_build = False
+ self.worker.release()
+ self.waitUntilSettled()
diff --git a/zuul/cmd/merger.py b/zuul/cmd/merger.py
index e9722cf..f046235 100644
--- a/zuul/cmd/merger.py
+++ b/zuul/cmd/merger.py
@@ -94,7 +94,7 @@
# See comment at top of file about zuul imports
import zuul.merger.server
- self.setup_logging('zuul', 'log_config')
+ self.setup_logging('merger', 'log_config')
self.merger = zuul.merger.server.MergeServer(self.config)
self.merger.start()
@@ -135,10 +135,10 @@
print
raise
- if server.config.has_option('zuul', 'pidfile'):
- pid_fn = os.path.expanduser(server.config.get('zuul', 'pidfile'))
+ if server.config.has_option('merger', 'pidfile'):
+ pid_fn = os.path.expanduser(server.config.get('merger', 'pidfile'))
else:
- pid_fn = '/var/run/zuul/merger.pid'
+ pid_fn = '/var/run/zuul-merger/merger.pid'
pid = pid_file_module.TimeoutPIDLockFile(pid_fn, 10)
if server.args.nodaemon:
diff --git a/zuul/cmd/server.py b/zuul/cmd/server.py
index 5d83959..79a2538 100755
--- a/zuul/cmd/server.py
+++ b/zuul/cmd/server.py
@@ -154,7 +154,13 @@
os.close(pipe_write)
self.setup_logging('gearman_server', 'log_config')
import gear
- gear.Server(4730)
+ statsd_host = os.environ.get('STATSD_HOST')
+ statsd_port = int(os.environ.get('STATSD_PORT', 8125))
+ gear.Server(4730,
+ statsd_host=statsd_host,
+ statsd_port=statsd_port,
+ statsd_prefix='zuul.geard')
+
# Keep running until the parent dies:
pipe_read = os.fdopen(pipe_read)
pipe_read.read()
@@ -185,6 +191,7 @@
self.start_gear_server()
self.setup_logging('zuul', 'log_config')
+ self.log = logging.getLogger("zuul.Server")
self.sched = zuul.scheduler.Scheduler()
@@ -213,10 +220,13 @@
self.sched.registerReporter(gerrit_reporter)
self.sched.registerReporter(smtp_reporter)
+ self.log.info('Starting scheduler')
self.sched.start()
self.sched.reconfigure(self.config)
self.sched.resume()
+ self.log.info('Starting Webapp')
webapp.start()
+ self.log.info('Starting RPC')
rpc.start()
signal.signal(signal.SIGHUP, self.reconfigure_handler)
diff --git a/zuul/launcher/gearman.py b/zuul/launcher/gearman.py
index 37fc743..3a690dc 100644
--- a/zuul/launcher/gearman.py
+++ b/zuul/launcher/gearman.py
@@ -298,7 +298,7 @@
if not self.isJobRegistered(gearman_job.name):
self.log.error("Job %s is not registered with Gearman" %
gearman_job)
- self.onBuildCompleted(gearman_job, 'LOST')
+ self.onBuildCompleted(gearman_job, 'NOT_REGISTERED')
return build
if pipeline.precedence == zuul.model.PRECEDENCE_NORMAL:
@@ -312,14 +312,14 @@
self.gearman.submitJob(gearman_job, precedence=precedence)
except Exception:
self.log.exception("Unable to submit job to Gearman")
- self.onBuildCompleted(gearman_job, 'LOST')
+ self.onBuildCompleted(gearman_job, 'EXCEPTION')
return build
if not gearman_job.handle:
self.log.error("No job handle was received for %s after 30 seconds"
" marking as lost." %
gearman_job)
- self.onBuildCompleted(gearman_job, 'LOST')
+ self.onBuildCompleted(gearman_job, 'NO_HANDLE')
return build
@@ -380,6 +380,8 @@
if build:
# Allow URL to be updated
build.url = data.get('url') or build.url
+ # Update information about worker
+ build.worker.updateFromData(data)
if build.number is None:
self.log.info("Build %s started" % job)
@@ -394,7 +396,7 @@
def onDisconnect(self, job):
self.log.info("Gearman job %s lost due to disconnect" % job)
- self.onBuildCompleted(job, 'LOST')
+ self.onBuildCompleted(job)
def onUnknownJob(self, job):
self.log.info("Gearman job %s lost due to unknown handle" % job)
diff --git a/zuul/merger/server.py b/zuul/merger/server.py
index 5d52041..d8bc1b8 100644
--- a/zuul/merger/server.py
+++ b/zuul/merger/server.py
@@ -61,11 +61,14 @@
port = 4730
self.worker = gear.Worker('Zuul Merger')
self.worker.addServer(server, port)
+ self.log.debug("Waiting for server")
+ self.worker.waitForServer()
+ self.log.debug("Registering")
+ self.register()
+ self.log.debug("Starting worker")
self.thread = threading.Thread(target=self.run)
self.thread.daemon = True
self.thread.start()
- self.worker.waitForServer()
- self.register()
def register(self):
self.worker.registerFunction("merger:merge")
diff --git a/zuul/model.py b/zuul/model.py
index 2a52306..22475e6 100644
--- a/zuul/model.py
+++ b/zuul/model.py
@@ -627,9 +627,36 @@
self.canceled = False
self.retry = False
self.parameters = {}
+ self.worker = Worker()
def __repr__(self):
- return '<Build %s of %s>' % (self.uuid, self.job.name)
+ return ('<Build %s of %s on %s>' %
+ (self.uuid, self.job.name, self.worker))
+
+
+class Worker(object):
+ """A model of the worker running a job"""
+ def __init__(self):
+ self.name = "Unknown"
+ self.hostname = None
+ self.ips = []
+ self.fqdn = None
+ self.program = None
+ self.version = None
+ self.extra = {}
+
+ def updateFromData(self, data):
+ """Update worker information if contained in the WORK_DATA response."""
+ self.name = data.get('worker_name', self.name)
+ self.hostname = data.get('worker_hostname', self.hostname)
+ self.ips = data.get('worker_ips', self.ips)
+ self.fqdn = data.get('worker_fqdn', self.fqdn)
+ self.program = data.get('worker_program', self.program)
+ self.version = data.get('worker_version', self.version)
+ self.extra = data.get('worker_extra', self.extra)
+
+ def __repr__(self):
+ return '<Worker %s>' % self.name
class BuildSet(object):
diff --git a/zuul/scheduler.py b/zuul/scheduler.py
index eaa5eae..815da8c 100644
--- a/zuul/scheduler.py
+++ b/zuul/scheduler.py
@@ -565,9 +565,9 @@
self.log.warning("No old pipeline matching %s found "
"when reconfiguring" % name)
continue
- self.log.debug("Re-enqueueing changes for pipeline %s" %
- name)
+ self.log.debug("Re-enqueueing changes for pipeline %s" % name)
items_to_remove = []
+ builds_to_remove = []
for shared_queue in old_pipeline.queues:
for item in shared_queue.queue:
item.item_ahead = None
@@ -582,19 +582,26 @@
items_to_remove.append(item)
continue
item.change.project = project
+ for build in item.current_build_set.getBuilds():
+ job = layout.jobs.get(build.job.name)
+ if job:
+ build.job = job
+ else:
+ builds_to_remove.append(build)
if not new_pipeline.manager.reEnqueueItem(item):
items_to_remove.append(item)
- builds_to_remove = []
- for build, item in old_pipeline.manager.building_jobs.items():
- if item in items_to_remove:
+ for item in items_to_remove:
+ for build in item.current_build_set.getBuilds():
builds_to_remove.append(build)
- self.log.warning("Deleting running build %s for "
- "change %s while reenqueueing" % (
- build, item.change))
for build in builds_to_remove:
- del old_pipeline.manager.building_jobs[build]
- new_pipeline.manager.building_jobs = \
- old_pipeline.manager.building_jobs
+ self.log.warning(
+ "Canceling build %s during reconfiguration" % (build,))
+ try:
+ self.launcher.cancel(build)
+ except Exception:
+ self.log.exception(
+ "Exception while canceling build %s "
+ "for change %s" % (build, item.change))
self.layout = layout
for trigger in self.triggers.values():
trigger.postConfig()
@@ -655,9 +662,12 @@
if self.merger.areMergesOutstanding():
waiting = True
for pipeline in self.layout.pipelines.values():
- for build in pipeline.manager.building_jobs.keys():
- self.log.debug("%s waiting on %s" % (pipeline.manager, build))
- waiting = True
+ for item in pipeline.getAllItems():
+ for build in item.current_build_set.getBuilds():
+ if build.result is None:
+ self.log.debug("%s waiting on %s" %
+ (pipeline.manager, build))
+ waiting = True
if not waiting:
self.log.debug("All builds are complete")
return True
@@ -875,7 +885,6 @@
def __init__(self, sched, pipeline):
self.sched = sched
self.pipeline = pipeline
- self.building_jobs = {}
self.event_filters = []
if self.sched.config and self.sched.config.has_option(
'zuul', 'report_times'):
@@ -1144,7 +1153,6 @@
build = self.sched.launcher.launch(job, item,
self.pipeline,
dependent_items)
- self.building_jobs[build] = item
self.log.debug("Adding build %s of job %s to item %s" %
(build, job, item))
item.addBuild(build)
@@ -1160,24 +1168,17 @@
def cancelJobs(self, item, prime=True):
self.log.debug("Cancel jobs for change %s" % item.change)
canceled = False
- to_remove = []
+ old_build_set = item.current_build_set
if prime and item.current_build_set.ref:
item.resetAllBuilds()
- for build, build_item in self.building_jobs.items():
- if build_item == item:
- self.log.debug("Found build %s for change %s to cancel" %
- (build, item.change))
- try:
- self.sched.launcher.cancel(build)
- except:
- self.log.exception("Exception while canceling build %s "
- "for change %s" % (build, item.change))
- to_remove.append(build)
- canceled = True
- for build in to_remove:
- self.log.debug("Removing build %s from running builds" % build)
+ for build in old_build_set.getBuilds():
+ try:
+ self.sched.launcher.cancel(build)
+ except:
+ self.log.exception("Exception while canceling build %s "
+ "for change %s" % (build, item.change))
build.result = 'CANCELED'
- del self.building_jobs[build]
+ canceled = True
for item_behind in item.items_behind:
self.log.debug("Canceling jobs for change %s, behind change %s" %
(item_behind.change, item.change))
@@ -1293,31 +1294,17 @@
self.sched.launcher.setBuildDescription(build, desc)
def onBuildStarted(self, build):
- if build not in self.building_jobs:
- # Or triggered externally, or triggered before zuul started,
- # or restarted
- return False
-
self.log.debug("Build %s started" % build)
self.updateBuildDescriptions(build.build_set)
return True
def onBuildCompleted(self, build):
- if build not in self.building_jobs:
- # Or triggered externally, or triggered before zuul started,
- # or restarted
- return False
-
self.log.debug("Build %s completed" % build)
- change = self.building_jobs[build]
- self.log.debug("Found change %s which triggered completed build %s" %
- (change, build))
+ item = build.build_set.item
- del self.building_jobs[build]
-
- self.pipeline.setResult(change, build)
- self.log.debug("Change %s status is now:\n %s" %
- (change, self.pipeline.formatStatus(change)))
+ self.pipeline.setResult(item, build)
+ self.log.debug("Item %s status is now:\n %s" %
+ (item, self.pipeline.formatStatus(item)))
self.updateBuildDescriptions(build.build_set)
return True