James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 1 | :title: Project Gating |
| 2 | |
| 3 | Project Gating |
| 4 | ============== |
| 5 | |
| 6 | Traditionally, many software development projects merge changes from |
| 7 | developers into the repository, and then identify regressions |
| 8 | resulting from those changes (perhaps by running a test suite with a |
| 9 | continuous integration system such as Jenkins), followed by more |
| 10 | patches to fix those bugs. When the mainline of development is |
| 11 | broken, it can be very frustrating for developers and can cause lost |
| 12 | productivity, particularly so when the number of contributors or |
| 13 | contributions is large. |
| 14 | |
| 15 | The process of gating attempts to prevent changes that introduce |
| 16 | regressions from being merged. This keeps the mainline of development |
| 17 | open and working for all developers, and only when a change is |
| 18 | confirmed to work without disruption is it merged. |
| 19 | |
| 20 | Many projects practice an informal method of gating where developers |
| 21 | with mainline commit access ensure that a test suite runs before |
| 22 | merging a change. With more developers, more changes, and more |
| 23 | comprehensive test suites, that process does not scale very well, and |
| 24 | is not the best use of a developer's time. Zuul can help automate |
| 25 | this process, with a particular emphasis on ensuring large numbers of |
| 26 | changes are tested correctly. |
| 27 | |
| 28 | Zuul was designed to handle the workflow of the OpenStack project, but |
| 29 | can be used with any project. |
| 30 | |
| 31 | A particular focus of Zuul is ensuring correctly ordered testing of |
| 32 | changes in parallel. A gating system should always test each change |
| 33 | applied to the tip of the branch exactly as it is going to be merged. |
| 34 | A simple way to do that would be to test one change at a time, and |
| 35 | merge it only if it passes tests. That works very well, but if |
| 36 | changes take a long time to test, developers may have to wait a long |
| 37 | time for their changes to make it into the repository. With some |
| 38 | projects, it may take hours to test changes, and it is easy for |
| 39 | developers to create changes at a rate faster than they can be tested |
| 40 | and merged. |
| 41 | |
Clark Boylan | 00635dc | 2012-09-19 14:03:08 -0700 | [diff] [blame] | 42 | Zuul's DependentPipelineManager allows for parallel execution of test |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 43 | jobs for gating while ensuring changes are tested correctly, exactly |
| 44 | as if they had been tested one at a time. It does this by performing |
| 45 | speculative execution of test jobs; it assumes that all jobs will |
| 46 | succeed and tests them in parallel accordingly. If they do succeed, |
| 47 | they can all be merged. However, if one fails, then changes that were |
| 48 | expecting it to succeed are re-tested without the failed change. In |
| 49 | the best case, as many changes as execution contexts are available may |
| 50 | be tested in parallel and merged at once. In the worst case, changes |
| 51 | are tested one at a time (as each subsequent change fails, changes |
| 52 | behind it start again). In practice, the OpenStack project observes |
| 53 | something closer to the best case. |
| 54 | |
| 55 | For example, if a core developer approves five changes in rapid |
| 56 | succession:: |
| 57 | |
| 58 | A, B, C, D, E |
| 59 | |
| 60 | Zuul queues those changes in the order they were approved, and notes |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 61 | that each subsequent change depends on the one ahead of it merging: |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 62 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 63 | .. blockdiag:: |
| 64 | |
| 65 | blockdiag foo { |
| 66 | node_width = 40; |
| 67 | span_width = 40; |
| 68 | A <- B <- C <- D <- E; |
| 69 | } |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 70 | |
| 71 | Zuul then starts immediately testing all of the changes in parallel. |
| 72 | But in the case of changes that depend on others, it instructs the |
| 73 | test system to include the changes ahead of it, with the assumption |
| 74 | they pass. That means jobs testing change *B* include change *A* as |
| 75 | well:: |
| 76 | |
| 77 | Jobs for A: merge change A, then test |
| 78 | Jobs for B: merge changes A and B, then test |
| 79 | Jobs for C: merge changes A, B and C, then test |
| 80 | Jobs for D: merge changes A, B, C and D, then test |
| 81 | Jobs for E: merge changes A, B, C, D and E, then test |
| 82 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 83 | Hence jobs triggered to tests A will only test A and ignore B, C, D: |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 84 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 85 | .. blockdiag:: |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 86 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 87 | blockdiag foo { |
| 88 | node_width = 40; |
| 89 | span_width = 40; |
| 90 | master -> A -> B -> C -> D -> E; |
| 91 | group jobs_for_A { |
| 92 | label = "Merged changes for A"; |
| 93 | master -> A; |
| 94 | } |
| 95 | group ignored_to_test_A { |
| 96 | label = "Ignored changes"; |
| 97 | color = "lightgray"; |
| 98 | B -> C -> D -> E; |
| 99 | } |
| 100 | } |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 101 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 102 | The jobs for E would include the whole dependency chain: A, B, C, D, and E. |
| 103 | E will be tested assuming A, B, C, and D passed: |
| 104 | |
| 105 | .. blockdiag:: |
| 106 | |
| 107 | blockdiag foo { |
| 108 | node_width = 40; |
| 109 | span_width = 40; |
| 110 | group jobs_for_E { |
| 111 | label = "Merged changes for E"; |
| 112 | master -> A -> B -> C -> D -> E; |
| 113 | } |
| 114 | } |
| 115 | |
| 116 | If changes *A* and *B* pass tests (green), and *C*, *D*, and *E* fail (red): |
| 117 | |
| 118 | .. blockdiag:: |
| 119 | |
| 120 | blockdiag foo { |
| 121 | node_width = 40; |
| 122 | span_width = 40; |
| 123 | |
| 124 | A [color = lightgreen]; |
| 125 | B [color = lightgreen]; |
| 126 | C [color = pink]; |
| 127 | D [color = pink]; |
| 128 | E [color = pink]; |
| 129 | |
| 130 | master <- A <- B <- C <- D <- E; |
| 131 | } |
| 132 | |
| 133 | Zuul will merge change *A* followed by change *B*, leaving this queue: |
| 134 | |
| 135 | .. blockdiag:: |
| 136 | |
| 137 | blockdiag foo { |
| 138 | node_width = 40; |
| 139 | span_width = 40; |
| 140 | |
| 141 | C [color = pink]; |
| 142 | D [color = pink]; |
| 143 | E [color = pink]; |
| 144 | |
| 145 | C <- D <- E; |
| 146 | } |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 147 | |
| 148 | Since *D* was dependent on *C*, it is not clear whether *D*'s failure is the |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 149 | result of a defect in *D* or *C*: |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 150 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 151 | .. blockdiag:: |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 152 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 153 | blockdiag foo { |
| 154 | node_width = 40; |
| 155 | span_width = 40; |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 156 | |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 157 | C [color = pink]; |
| 158 | D [label = "D\n?"]; |
| 159 | E [label = "E\n?"]; |
| 160 | |
| 161 | C <- D <- E; |
| 162 | } |
| 163 | |
| 164 | Since *C* failed, Zuul will report its failure and drop *C* from the queue, |
| 165 | keeping D and E: |
| 166 | |
| 167 | .. blockdiag:: |
| 168 | |
| 169 | blockdiag foo { |
| 170 | node_width = 40; |
| 171 | span_width = 40; |
| 172 | |
| 173 | D [label = "D\n?"]; |
| 174 | E [label = "E\n?"]; |
| 175 | |
| 176 | D <- E; |
| 177 | } |
James E. Blair | cdd0007 | 2012-06-08 19:17:28 -0700 | [diff] [blame] | 178 | |
| 179 | This queue is the same as if two new changes had just arrived, so Zuul |
| 180 | starts the process again testing *D* against the tip of the branch, and |
Antoine Musso | 3a43e14 | 2013-10-30 23:51:58 +0100 | [diff] [blame^] | 181 | *E* against *D*: |
| 182 | |
| 183 | .. blockdiag:: |
| 184 | |
| 185 | blockdiag foo { |
| 186 | node_width = 40; |
| 187 | span_width = 40; |
| 188 | master -> D -> E; |
| 189 | group jobs_for_D { |
| 190 | label = "Merged changes for D"; |
| 191 | master -> D; |
| 192 | } |
| 193 | group ignored_to_test_D { |
| 194 | label = "Skip"; |
| 195 | color = "lightgray"; |
| 196 | E; |
| 197 | } |
| 198 | } |
| 199 | |
| 200 | .. blockdiag:: |
| 201 | |
| 202 | blockdiag foo { |
| 203 | node_width = 40; |
| 204 | span_width = 40; |
| 205 | group jobs_for_E { |
| 206 | label = "Merged changes for E"; |
| 207 | master -> D -> E; |
| 208 | } |
| 209 | } |
| 210 | |