Have a formal way for developers to ensure their patch won't break the Continuous Integration checks with a relatively high confidence level.
Before the Chromium Commit Queue, it‘s on each developer’s shoulder to manually run multiple try jobs on the Try Server and check their results before committing.
This is “brain wasted time” as this can be completely automated. The Commit Queue aims at automating this manual verification so the developer can start right away working on the next patch. This is necessary to scale at a sustained ~100 commits/day.
The CQ polls rietveld for CLs that are ready to be committed. The Chromium's fork of Rietveld has a ‘Commit’ checkbox. When the author or the reviewer checks it, it will be included in the next poll. Once the CQ learns about the CL, it verifies the author or one of the reviewers approving this CL is a full committer, then runs the presubmit checks, then runs try jobs and then commit the patch on the behalf of the author, faking its credential. The whole project is written in python.
It runs as a single-thread process. The infrastructure is really minimal since it's just a logic layer above the current infrastructures to automate something that was done manually by the developers. In particular, the CQ reuses:
Runs the following loop:
Issues are processed asynchronously so whatever faster try job completes first wins. This is important as a flaky CL won't bottleneck the remaining one. This is mainly because the number of commit per hour is disproportionate to the try server latency, e.g. the full build and test cycle time for all the platforms.
To commit on the behalf of the author, an unconventional technique is used in subversion with a server-side pre-commit hook. The code is at: http://src.chromium.org/viewvc/chrome/trunk/tools/depot_tools/tests/sample_pre_commit_hook?view=markup
The control flow is:
This works much better than using svn propset --revprop since there is no race condition and the revision property modifications are done during the commit. This technique can only be used when it is possible to set a server side hook so for example, it can't be used with projects hosted on code.google.com.
The main problem is test flakiness. The CQ works around this problem partially by retrying failed tests a second time. It retries compile failure with a full rebuild, versus an incremental build normal, to work around cases of broken incremental compiles, which does happen relatively frequently. Fixing these two problems is outside the scope of the CQ project.
Rietveld doesn't enforce a coherent svn url mapping on its CLs, causing CLs to be ignored by the CQ.
The CQ is very slow at the moment. This is why the test isolation effort was started. The CQ is bound by:
The main scalability issue is running the presubmit checks and sending the try jobs. Many of the presubmit checks assume they are not running concurrently on the system and running them in parallel on one VM could cause random problems. If the CQ used a more decentralized approach, it would scale much better for that.
There are multiple single points of failure;
The commit queue require a full committer involvement, either to be the author of the CL or to be a reviewer giving approval. The security depends on Rietveld assumptions about its meta data:
All communications happens over https.
Comprehensive set of unit tests was written. The code itself sends stack traces in case of exception for monitoring purposes.