Merge "Merger: log non Git exceptions on merge"
diff --git a/doc/source/gating.rst b/doc/source/gating.rst
index f3f2d3c..43a5928 100644
--- a/doc/source/gating.rst
+++ b/doc/source/gating.rst
@@ -28,6 +28,9 @@
 Zuul was designed to handle the workflow of the OpenStack project, but
 can be used with any project.
 
+Testing in parallel
+-------------------
+
 A particular focus of Zuul is ensuring correctly ordered testing of
 changes in parallel.  A gating system should always test each change
 applied to the tip of the branch exactly as it is going to be merged.
@@ -208,3 +211,72 @@
     }
   }
 
+
+Cross projects dependencies
+---------------------------
+
+When your projects are closely coupled together, you want to make sure
+changes entering the gate are going to be tested with the version of
+other projects currently enqueued in the gate (since they will
+eventually be merged and might introduce breaking features).
+
+Such dependencies can be defined in Zuul configuration by registering a job
+in a DependentPipeline of several projects. Whenever a change enters such a
+pipeline, it will create references for the other projects as well.  As an
+example, given a main project ``acme`` and a plugin ``plugin`` you can
+define a job ``acme-tests`` which should be run for both projects:
+
+.. code-block:: yaml
+
+  pipelines:
+    - name: gate
+      manager: DependentPipelineManager
+
+  projects::
+    - name: acme
+      gate:
+       - acme-tests
+    - name: plugin
+      gate:
+       - acme-tests  # Register job again
+
+Whenever a change enters the ``gate`` pipeline queue, Zuul creates a reference
+for it.  For each subsequent change, an additional reference is created for the
+changes ahead in the queue.  As a result, you will always be able to fetch the
+future state of your project dependencies for each change in the queue.
+
+Based on the pipeline and project definitions above, three changes are
+inserted in the ``gate`` pipeline with the associated references:
+
+  ========  ======= ====== =========
+  Change    Project Branch Zuul Ref.
+  ========  ======= ====== =========
+  Change 1  acme    master master/Z1
+  Change 2  plugin  stable stable/Z2
+  Change 3  plugin  master master/Z3
+  ========  ======= ====== =========
+
+Since the changes enter a DependentPipelineManager pipeline, Zuul creates
+additional references:
+
+  ====== ======= ========= =============================
+  Change Project Zuul Ref. Description
+  ====== ======= ========= =============================
+  1      acme    master/Z1 acme master + change 1
+  ------ ------- --------- -----------------------------
+  2      acme    master/Z2 acme master + change 1
+  2      plugin  stable/Z2 plugin stable + change 2
+  ------ ------- --------- -----------------------------
+  3      acme    master/Z3 acme master + change 1
+  3      plugin  stable/Z3 plugin stable + change 2
+  3      plugin  master/Z3 plugin master + change 3
+  ====== ======= ========= =============================
+
+In order to test change 3, you would clone both repositories and simply
+fetch the Z3 reference for each combination of project/branch you are
+interested in testing. For example, you could fetch ``acme`` with
+master/Z3 and ``plugin`` with master/Z3 and thus have ``acme`` with
+change 1 applied as the expected state for when Change 3 would merge.
+When your job fetches several repositories without changes ahead in the
+queue, they may not have a Z reference in which case you can just check
+out the branch.
diff --git a/doc/source/launchers.rst b/doc/source/launchers.rst
index c56d6e9..db49933 100644
--- a/doc/source/launchers.rst
+++ b/doc/source/launchers.rst
@@ -87,8 +87,8 @@
 **ZUUL_PIPELINE**
   The Zuul pipeline that is building this job
 **ZUUL_URL**
-  The url for the zuul server as configured in zuul.conf.  
-  A test runner may use this URL as the basis for fetching 
+  The url for the zuul server as configured in zuul.conf.
+  A test runner may use this URL as the basis for fetching
   git commits.
 
 The following additional parameters will only be provided for builds
@@ -195,6 +195,30 @@
   The URL with the status or results of the build.  Will be used in
   the status page and the final report.
 
+To help with debugging builds a worker may send back some optional
+metadata:
+
+**worker_name** (optional)
+  The name of the worker.
+
+**worker_hostname** (optional)
+  The hostname of the worker.
+
+**worker_ips** (optional)
+  A list of IPs for the worker.
+
+**worker_fqdn** (optional)
+  The FQDN of the worker.
+
+**worker_program** (optional)
+  The program name of the worker. For example Jenkins or turbo-hipster.
+
+**worker_version** (optional)
+  The version of the software running the job.
+
+**worker_extra** (optional)
+  A dictionary of any extra metadata you may want to pass along.
+
 It should then immediately send a WORK_STATUS packet with a value of 0
 percent complete.  It may then optionally send subsequent WORK_STATUS
 packets with updated completion values.
diff --git a/doc/source/zuul.rst b/doc/source/zuul.rst
index ee70523..1a6a23d 100644
--- a/doc/source/zuul.rst
+++ b/doc/source/zuul.rst
@@ -87,11 +87,11 @@
   ``layout_config=/etc/zuul/layout.yaml``
 
 **log_config**
-  Path to log config file.  Used by all Zuul commands.
+  Path to log config file.  Used by zuul-server only.
   ``log_config=/etc/zuul/logging.yaml``
 
 **pidfile**
-  Path to PID lock file.  Used by all Zuul commands.
+  Path to PID lock file.  Used by zuul-server only.
   ``pidfile=/var/run/zuul/zuul.pid``
 
 **state_dir**
@@ -143,6 +143,14 @@
   "http://zuul.example.com/p" or "http://zuul-merger01.example.com/p"
   depending on whether the merger is co-located with the Zuul server.
 
+**log_config**
+  Path to log config file for the merger process.
+  ``log_config=/etc/zuul/logging.yaml``
+
+**pidfile**
+  Path to PID lock file for the merger process.
+  ``pidfile=/var/run/zuul-merger/merger.pid``
+
 smtp
 """"
 
diff --git a/requirements.txt b/requirements.txt
index 170b5152..92bb296 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,5 +13,5 @@
 extras
 statsd>=1.0.0,<3.0
 voluptuous>=0.7
-gear>=0.4.0,<1.0.0
+gear>=0.5.1,<1.0.0
 apscheduler>=2.1.1,<3.0
diff --git a/tests/fixtures/layout-no-jobs.yaml b/tests/fixtures/layout-no-jobs.yaml
new file mode 100644
index 0000000..ee8dc62
--- /dev/null
+++ b/tests/fixtures/layout-no-jobs.yaml
@@ -0,0 +1,43 @@
+includes:
+  - python-file: custom_functions.py
+
+pipelines:
+  - name: check
+    manager: IndependentPipelineManager
+    trigger:
+      gerrit:
+        - event: patchset-created
+    success:
+      gerrit:
+        verified: 1
+    failure:
+      gerrit:
+        verified: -1
+
+  - name: gate
+    manager: DependentPipelineManager
+    failure-message: Build failed.  For information on how to proceed, see http://wiki.example.org/Test_Failures
+    trigger:
+      gerrit:
+        - event: comment-added
+          approval:
+            - approved: 1
+    success:
+      gerrit:
+        verified: 2
+        submit: true
+    failure:
+      gerrit:
+        verified: -2
+    start:
+      gerrit:
+        verified: 0
+    precedence: high
+
+projects:
+  - name: org/project
+    merge-mode: cherry-pick
+    check:
+      - noop
+    gate:
+      - noop
diff --git a/tests/fixtures/layout.yaml b/tests/fixtures/layout.yaml
index 98dfe86..b1c94de 100644
--- a/tests/fixtures/layout.yaml
+++ b/tests/fixtures/layout.yaml
@@ -231,3 +231,7 @@
       - conflict-project-merge:
         - conflict-project-test1
         - conflict-project-test2
+
+  - name: org/noop-project
+    gate:
+      - noop
diff --git a/tests/test_scheduler.py b/tests/test_scheduler.py
index b2106f8..9576440 100755
--- a/tests/test_scheduler.py
+++ b/tests/test_scheduler.py
@@ -498,12 +498,23 @@
             'name': self.name,
             'number': self.number,
             'manager': self.worker.worker_id,
+            'worker_name': 'My Worker',
+            'worker_hostname': 'localhost',
+            'worker_ips': ['127.0.0.1', '192.168.1.1'],
+            'worker_fqdn': 'zuul.example.org',
+            'worker_program': 'FakeBuilder',
+            'worker_version': 'v1.1',
+            'worker_extra': {'something': 'else'}
         }
 
+        self.log.debug('Running build %s' % self.unique)
+
         self.job.sendWorkData(json.dumps(data))
+        self.log.debug('Sent WorkData packet with %s' % json.dumps(data))
         self.job.sendWorkStatus(0, 100)
 
         if self.worker.hold_jobs_in_build:
+            self.log.debug('Holding build %s' % self.unique)
             self._wait()
         self.log.debug("Build %s continuing" % self.unique)
 
@@ -2813,6 +2824,29 @@
         self.assertReportedStat('test-timing', '3|ms')
         self.assertReportedStat('test-guage', '12|g')
 
+    def test_stuck_job_cleanup(self):
+        "Test that pending jobs are cleaned up if removed from layout"
+        self.gearman_server.hold_jobs_in_queue = True
+        A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
+        A.addApproval('CRVW', 2)
+        self.fake_gerrit.addEvent(A.addApproval('APRV', 1))
+        self.waitUntilSettled()
+        self.assertEqual(len(self.gearman_server.getQueue()), 1)
+
+        self.config.set('zuul', 'layout_config',
+                        'tests/fixtures/layout-no-jobs.yaml')
+        self.sched.reconfigure(self.config)
+        self.waitUntilSettled()
+
+        self.gearman_server.release('noop')
+        self.waitUntilSettled()
+        self.assertEqual(len(self.gearman_server.getQueue()), 0)
+        self.assertTrue(self.sched._areAllBuildsComplete())
+
+        self.assertEqual(len(self.history), 1)
+        self.assertEqual(self.history[0].name, 'noop')
+        self.assertEqual(self.history[0].result, 'SUCCESS')
+
     def test_file_jobs(self):
         "Test that file jobs run only when appropriate"
         A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
@@ -3568,3 +3602,41 @@
         self.assertEqual(queue.window, 2)
         self.assertEqual(queue.window_floor, 1)
         self.assertEqual(C.data['status'], 'MERGED')
+
+    def test_worker_update_metadata(self):
+        "Test if a worker can send back metadata about itself"
+        self.worker.hold_jobs_in_build = True
+
+        A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
+        A.addApproval('CRVW', 2)
+        self.fake_gerrit.addEvent(A.addApproval('APRV', 1))
+        self.waitUntilSettled()
+
+        self.assertEqual(len(self.launcher.builds), 1)
+
+        self.log.debug('Current builds:')
+        self.log.debug(self.launcher.builds)
+
+        start = time.time()
+        while True:
+            if time.time() - start > 10:
+                raise Exception("Timeout waiting for gearman server to report "
+                                + "back to the client")
+            build = self.launcher.builds.values()[0]
+            if build.worker.name == "My Worker":
+                break
+            else:
+                time.sleep(0)
+
+        self.log.debug(build)
+        self.assertEqual("My Worker", build.worker.name)
+        self.assertEqual("localhost", build.worker.hostname)
+        self.assertEqual(['127.0.0.1', '192.168.1.1'], build.worker.ips)
+        self.assertEqual("zuul.example.org", build.worker.fqdn)
+        self.assertEqual("FakeBuilder", build.worker.program)
+        self.assertEqual("v1.1", build.worker.version)
+        self.assertEqual({'something': 'else'}, build.worker.extra)
+
+        self.worker.hold_jobs_in_build = False
+        self.worker.release()
+        self.waitUntilSettled()
diff --git a/zuul/cmd/merger.py b/zuul/cmd/merger.py
index e9722cf..f046235 100644
--- a/zuul/cmd/merger.py
+++ b/zuul/cmd/merger.py
@@ -94,7 +94,7 @@
         # See comment at top of file about zuul imports
         import zuul.merger.server
 
-        self.setup_logging('zuul', 'log_config')
+        self.setup_logging('merger', 'log_config')
 
         self.merger = zuul.merger.server.MergeServer(self.config)
         self.merger.start()
@@ -135,10 +135,10 @@
         print
         raise
 
-    if server.config.has_option('zuul', 'pidfile'):
-        pid_fn = os.path.expanduser(server.config.get('zuul', 'pidfile'))
+    if server.config.has_option('merger', 'pidfile'):
+        pid_fn = os.path.expanduser(server.config.get('merger', 'pidfile'))
     else:
-        pid_fn = '/var/run/zuul/merger.pid'
+        pid_fn = '/var/run/zuul-merger/merger.pid'
     pid = pid_file_module.TimeoutPIDLockFile(pid_fn, 10)
 
     if server.args.nodaemon:
diff --git a/zuul/cmd/server.py b/zuul/cmd/server.py
index 5d83959..79a2538 100755
--- a/zuul/cmd/server.py
+++ b/zuul/cmd/server.py
@@ -154,7 +154,13 @@
             os.close(pipe_write)
             self.setup_logging('gearman_server', 'log_config')
             import gear
-            gear.Server(4730)
+            statsd_host = os.environ.get('STATSD_HOST')
+            statsd_port = int(os.environ.get('STATSD_PORT', 8125))
+            gear.Server(4730,
+                        statsd_host=statsd_host,
+                        statsd_port=statsd_port,
+                        statsd_prefix='zuul.geard')
+
             # Keep running until the parent dies:
             pipe_read = os.fdopen(pipe_read)
             pipe_read.read()
@@ -185,6 +191,7 @@
             self.start_gear_server()
 
         self.setup_logging('zuul', 'log_config')
+        self.log = logging.getLogger("zuul.Server")
 
         self.sched = zuul.scheduler.Scheduler()
 
@@ -213,10 +220,13 @@
         self.sched.registerReporter(gerrit_reporter)
         self.sched.registerReporter(smtp_reporter)
 
+        self.log.info('Starting scheduler')
         self.sched.start()
         self.sched.reconfigure(self.config)
         self.sched.resume()
+        self.log.info('Starting Webapp')
         webapp.start()
+        self.log.info('Starting RPC')
         rpc.start()
 
         signal.signal(signal.SIGHUP, self.reconfigure_handler)
diff --git a/zuul/launcher/gearman.py b/zuul/launcher/gearman.py
index 37fc743..3a690dc 100644
--- a/zuul/launcher/gearman.py
+++ b/zuul/launcher/gearman.py
@@ -298,7 +298,7 @@
         if not self.isJobRegistered(gearman_job.name):
             self.log.error("Job %s is not registered with Gearman" %
                            gearman_job)
-            self.onBuildCompleted(gearman_job, 'LOST')
+            self.onBuildCompleted(gearman_job, 'NOT_REGISTERED')
             return build
 
         if pipeline.precedence == zuul.model.PRECEDENCE_NORMAL:
@@ -312,14 +312,14 @@
             self.gearman.submitJob(gearman_job, precedence=precedence)
         except Exception:
             self.log.exception("Unable to submit job to Gearman")
-            self.onBuildCompleted(gearman_job, 'LOST')
+            self.onBuildCompleted(gearman_job, 'EXCEPTION')
             return build
 
         if not gearman_job.handle:
             self.log.error("No job handle was received for %s after 30 seconds"
                            " marking as lost." %
                            gearman_job)
-            self.onBuildCompleted(gearman_job, 'LOST')
+            self.onBuildCompleted(gearman_job, 'NO_HANDLE')
 
         return build
 
@@ -380,6 +380,8 @@
         if build:
             # Allow URL to be updated
             build.url = data.get('url') or build.url
+            # Update information about worker
+            build.worker.updateFromData(data)
 
             if build.number is None:
                 self.log.info("Build %s started" % job)
@@ -394,7 +396,7 @@
 
     def onDisconnect(self, job):
         self.log.info("Gearman job %s lost due to disconnect" % job)
-        self.onBuildCompleted(job, 'LOST')
+        self.onBuildCompleted(job)
 
     def onUnknownJob(self, job):
         self.log.info("Gearman job %s lost due to unknown handle" % job)
diff --git a/zuul/merger/server.py b/zuul/merger/server.py
index 5d52041..d8bc1b8 100644
--- a/zuul/merger/server.py
+++ b/zuul/merger/server.py
@@ -61,11 +61,14 @@
             port = 4730
         self.worker = gear.Worker('Zuul Merger')
         self.worker.addServer(server, port)
+        self.log.debug("Waiting for server")
+        self.worker.waitForServer()
+        self.log.debug("Registering")
+        self.register()
+        self.log.debug("Starting worker")
         self.thread = threading.Thread(target=self.run)
         self.thread.daemon = True
         self.thread.start()
-        self.worker.waitForServer()
-        self.register()
 
     def register(self):
         self.worker.registerFunction("merger:merge")
diff --git a/zuul/model.py b/zuul/model.py
index 2a52306..22475e6 100644
--- a/zuul/model.py
+++ b/zuul/model.py
@@ -627,9 +627,36 @@
         self.canceled = False
         self.retry = False
         self.parameters = {}
+        self.worker = Worker()
 
     def __repr__(self):
-        return '<Build %s of %s>' % (self.uuid, self.job.name)
+        return ('<Build %s of %s on %s>' %
+                (self.uuid, self.job.name, self.worker))
+
+
+class Worker(object):
+    """A model of the worker running a job"""
+    def __init__(self):
+        self.name = "Unknown"
+        self.hostname = None
+        self.ips = []
+        self.fqdn = None
+        self.program = None
+        self.version = None
+        self.extra = {}
+
+    def updateFromData(self, data):
+        """Update worker information if contained in the WORK_DATA response."""
+        self.name = data.get('worker_name', self.name)
+        self.hostname = data.get('worker_hostname', self.hostname)
+        self.ips = data.get('worker_ips', self.ips)
+        self.fqdn = data.get('worker_fqdn', self.fqdn)
+        self.program = data.get('worker_program', self.program)
+        self.version = data.get('worker_version', self.version)
+        self.extra = data.get('worker_extra', self.extra)
+
+    def __repr__(self):
+        return '<Worker %s>' % self.name
 
 
 class BuildSet(object):
diff --git a/zuul/scheduler.py b/zuul/scheduler.py
index eaa5eae..815da8c 100644
--- a/zuul/scheduler.py
+++ b/zuul/scheduler.py
@@ -565,9 +565,9 @@
                         self.log.warning("No old pipeline matching %s found "
                                          "when reconfiguring" % name)
                     continue
-                self.log.debug("Re-enqueueing changes for pipeline %s" %
-                               name)
+                self.log.debug("Re-enqueueing changes for pipeline %s" % name)
                 items_to_remove = []
+                builds_to_remove = []
                 for shared_queue in old_pipeline.queues:
                     for item in shared_queue.queue:
                         item.item_ahead = None
@@ -582,19 +582,26 @@
                             items_to_remove.append(item)
                             continue
                         item.change.project = project
+                        for build in item.current_build_set.getBuilds():
+                            job = layout.jobs.get(build.job.name)
+                            if job:
+                                build.job = job
+                            else:
+                                builds_to_remove.append(build)
                         if not new_pipeline.manager.reEnqueueItem(item):
                             items_to_remove.append(item)
-                builds_to_remove = []
-                for build, item in old_pipeline.manager.building_jobs.items():
-                    if item in items_to_remove:
+                for item in items_to_remove:
+                    for build in item.current_build_set.getBuilds():
                         builds_to_remove.append(build)
-                        self.log.warning("Deleting running build %s for "
-                                         "change %s while reenqueueing" % (
-                                         build, item.change))
                 for build in builds_to_remove:
-                    del old_pipeline.manager.building_jobs[build]
-                new_pipeline.manager.building_jobs = \
-                    old_pipeline.manager.building_jobs
+                    self.log.warning(
+                        "Canceling build %s during reconfiguration" % (build,))
+                    try:
+                        self.launcher.cancel(build)
+                    except Exception:
+                        self.log.exception(
+                            "Exception while canceling build %s "
+                            "for change %s" % (build, item.change))
             self.layout = layout
             for trigger in self.triggers.values():
                 trigger.postConfig()
@@ -655,9 +662,12 @@
         if self.merger.areMergesOutstanding():
             waiting = True
         for pipeline in self.layout.pipelines.values():
-            for build in pipeline.manager.building_jobs.keys():
-                self.log.debug("%s waiting on %s" % (pipeline.manager, build))
-                waiting = True
+            for item in pipeline.getAllItems():
+                for build in item.current_build_set.getBuilds():
+                    if build.result is None:
+                        self.log.debug("%s waiting on %s" %
+                                       (pipeline.manager, build))
+                        waiting = True
         if not waiting:
             self.log.debug("All builds are complete")
             return True
@@ -875,7 +885,6 @@
     def __init__(self, sched, pipeline):
         self.sched = sched
         self.pipeline = pipeline
-        self.building_jobs = {}
         self.event_filters = []
         if self.sched.config and self.sched.config.has_option(
             'zuul', 'report_times'):
@@ -1144,7 +1153,6 @@
                 build = self.sched.launcher.launch(job, item,
                                                    self.pipeline,
                                                    dependent_items)
-                self.building_jobs[build] = item
                 self.log.debug("Adding build %s of job %s to item %s" %
                                (build, job, item))
                 item.addBuild(build)
@@ -1160,24 +1168,17 @@
     def cancelJobs(self, item, prime=True):
         self.log.debug("Cancel jobs for change %s" % item.change)
         canceled = False
-        to_remove = []
+        old_build_set = item.current_build_set
         if prime and item.current_build_set.ref:
             item.resetAllBuilds()
-        for build, build_item in self.building_jobs.items():
-            if build_item == item:
-                self.log.debug("Found build %s for change %s to cancel" %
-                               (build, item.change))
-                try:
-                    self.sched.launcher.cancel(build)
-                except:
-                    self.log.exception("Exception while canceling build %s "
-                                       "for change %s" % (build, item.change))
-                to_remove.append(build)
-                canceled = True
-        for build in to_remove:
-            self.log.debug("Removing build %s from running builds" % build)
+        for build in old_build_set.getBuilds():
+            try:
+                self.sched.launcher.cancel(build)
+            except:
+                self.log.exception("Exception while canceling build %s "
+                                   "for change %s" % (build, item.change))
             build.result = 'CANCELED'
-            del self.building_jobs[build]
+            canceled = True
         for item_behind in item.items_behind:
             self.log.debug("Canceling jobs for change %s, behind change %s" %
                            (item_behind.change, item.change))
@@ -1293,31 +1294,17 @@
                 self.sched.launcher.setBuildDescription(build, desc)
 
     def onBuildStarted(self, build):
-        if build not in self.building_jobs:
-            # Or triggered externally, or triggered before zuul started,
-            # or restarted
-            return False
-
         self.log.debug("Build %s started" % build)
         self.updateBuildDescriptions(build.build_set)
         return True
 
     def onBuildCompleted(self, build):
-        if build not in self.building_jobs:
-            # Or triggered externally, or triggered before zuul started,
-            # or restarted
-            return False
-
         self.log.debug("Build %s completed" % build)
-        change = self.building_jobs[build]
-        self.log.debug("Found change %s which triggered completed build %s" %
-                       (change, build))
+        item = build.build_set.item
 
-        del self.building_jobs[build]
-
-        self.pipeline.setResult(change, build)
-        self.log.debug("Change %s status is now:\n %s" %
-                       (change, self.pipeline.formatStatus(change)))
+        self.pipeline.setResult(item, build)
+        self.log.debug("Item %s status is now:\n %s" %
+                       (item, self.pipeline.formatStatus(item)))
         self.updateBuildDescriptions(build.build_set)
         return True