James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 1 | :title: Project Gating |
| 2 | |
| 3 | Project Gating |
| 4 | ============== |
| 5 | |
| 6 | Traditionally, many software development projects merge changes from |
| 7 | developers into the repository, and then identify regressions |
| 8 | resulting from those changes (perhaps by running a test suite with a |
| 9 | continuous integration system such as Jenkins), followed by more |
| 10 | patches to fix those bugs. When the mainline of development is |
| 11 | broken, it can be very frustrating for developers and can cause lost |
| 12 | productivity, particularly so when the number of contributors or |
| 13 | contributions is large. |
| 14 | |
| 15 | The process of gating attempts to prevent changes that introduce |
| 16 | regressions from being merged. This keeps the mainline of development |
| 17 | open and working for all developers, and only when a change is |
| 18 | confirmed to work without disruption is it merged. |
| 19 | |
| 20 | Many projects practice an informal method of gating where developers |
| 21 | with mainline commit access ensure that a test suite runs before |
| 22 | merging a change. With more developers, more changes, and more |
| 23 | comprehensive test suites, that process does not scale very well, and |
| 24 | is not the best use of a developer's time. Zuul can help automate |
| 25 | this process, with a particular emphasis on ensuring large numbers of |
| 26 | changes are tested correctly. |
| 27 | |
| 28 | Zuul was designed to handle the workflow of the OpenStack project, but |
| 29 | can be used with any project. |
| 30 | |
| 31 | A particular focus of Zuul is ensuring correctly ordered testing of |
| 32 | changes in parallel. A gating system should always test each change |
| 33 | applied to the tip of the branch exactly as it is going to be merged. |
| 34 | A simple way to do that would be to test one change at a time, and |
| 35 | merge it only if it passes tests. That works very well, but if |
| 36 | changes take a long time to test, developers may have to wait a long |
| 37 | time for their changes to make it into the repository. With some |
| 38 | projects, it may take hours to test changes, and it is easy for |
| 39 | developers to create changes at a rate faster than they can be tested |
| 40 | and merged. |
| 41 | |
| 42 | Zuul's DependentQueueManager allows for parallel execution of test |
| 43 | jobs for gating while ensuring changes are tested correctly, exactly |
| 44 | as if they had been tested one at a time. It does this by performing |
| 45 | speculative execution of test jobs; it assumes that all jobs will |
| 46 | succeed and tests them in parallel accordingly. If they do succeed, |
| 47 | they can all be merged. However, if one fails, then changes that were |
| 48 | expecting it to succeed are re-tested without the failed change. In |
| 49 | the best case, as many changes as execution contexts are available may |
| 50 | be tested in parallel and merged at once. In the worst case, changes |
| 51 | are tested one at a time (as each subsequent change fails, changes |
| 52 | behind it start again). In practice, the OpenStack project observes |
| 53 | something closer to the best case. |
| 54 | |
| 55 | For example, if a core developer approves five changes in rapid |
| 56 | succession:: |
| 57 | |
| 58 | A, B, C, D, E |
| 59 | |
| 60 | Zuul queues those changes in the order they were approved, and notes |
| 61 | that each subsequent change depends on the one ahead of it merging:: |
| 62 | |
| 63 | A <-- B <-- C <-- D <-- E |
| 64 | |
| 65 | Zuul then starts immediately testing all of the changes in parallel. |
| 66 | But in the case of changes that depend on others, it instructs the |
| 67 | test system to include the changes ahead of it, with the assumption |
| 68 | they pass. That means jobs testing change *B* include change *A* as |
| 69 | well:: |
| 70 | |
| 71 | Jobs for A: merge change A, then test |
| 72 | Jobs for B: merge changes A and B, then test |
| 73 | Jobs for C: merge changes A, B and C, then test |
| 74 | Jobs for D: merge changes A, B, C and D, then test |
| 75 | Jobs for E: merge changes A, B, C, D and E, then test |
| 76 | |
| 77 | If changes *A* and *B* pass tests, and *C*, *D*, and *E* fail:: |
| 78 | |
| 79 | A[pass] <-- B[pass] <-- C[fail] <-- D[fail] <-- E[fail] |
| 80 | |
| 81 | Zuul will merge change *A* followed by change *B*, leaving this queue:: |
| 82 | |
| 83 | C[fail] <-- D[fail] <-- E[fail] |
| 84 | |
| 85 | Since *D* was dependent on *C*, it is not clear whether *D*'s failure is the |
| 86 | result of a defect in *D* or *C*:: |
| 87 | |
| 88 | C[fail] <-- D[unknown] <-- E[unknown] |
| 89 | |
| 90 | Since *C* failed, it will report the failure and drop *C* from the queue:: |
| 91 | |
| 92 | D[unknown] <-- E[unknown] |
| 93 | |
| 94 | This queue is the same as if two new changes had just arrived, so Zuul |
| 95 | starts the process again testing *D* against the tip of the branch, and |
| 96 | *E* against *D*. |