site/developers/testing/commit-queue/design/index.md - website.git - Git at Google

 ---
 breadcrumbs:
 - - /developers
   - For Developers
 - - /developers/testing
   - Testing and infrastructure
 - - /developers/testing/commit-queue
   - Chromium Commit Queue
 page_name: design
 title: 'Design doc: Chromium Commit queue'
 ---

 ## Objective

 Have a formal way for developers to ensure their patch won't break the
 Continuous Integration checks with a relatively high confidence level.

 ## Background

 Before the Chromium Commit Queue, it's on each developer's shoulder to manually
 run multiple try jobs on the [Try Server](/system/errors/NodeNotFound) and check
 their results before committing.

 This is "brain wasted time" as this can be completely automated. The Commit
 Queue aims at automating this manual verification so the developer can start
 right away working on the next patch. This is necessary to scale at a sustained
 ~100 commits/day.

 ## Overview

 The CQ polls rietveld for CLs that are ready to be committed. The Chromium's
 fork of Rietveld has a 'Commit' checkbox. When the author or the reviewer checks
 it, it will be included in the next poll. Once the CQ learns about the CL, it
 verifies the author or one of the reviewers approving this CL is a full
 committer, then runs the presubmit checks, then runs try jobs and then commit
 the patch on the behalf of the author, faking its credential. The whole project
 is written in python.

 ## Infrastructure

 It runs as a single-thread process. The infrastructure is really minimal since
 it's just a logic layer above the current infrastructures to automate something
 that was done manually by the developers. In particular, the CQ reuses:

 *   the [try server](/system/errors/NodeNotFound)
 *   chromium's branch of [rietveld](http://code.google.com/p/rietveld/)
 *   [presubmit
             scripts](/developers/how-tos/depottools/presubmit-scripts) included
             in [depot_tools](/developers/how-tos/depottools)

 ## Detailed Design

 Runs the following loop:

 1.  Polls <http://codereview.chromium.org/search?closed=3&commit=2> to
             find new issues to attempt to commit.
 2.  For each issue found,
     1.  Runs preliminary checks,
         1.  Make sure someone is a full committer
         2.  Make sure there's a LGTM or the CL is TBR'ed.
     2.  Runs presubmit checks, including OWNERS check.
     3.  Sends new try jobs with -r HEAD.
         1.  They are publicly visible at
                     <http://build.chromium.org/p/tryserver.chromium/waterfall?committer=commit-bot@chromium.org>.
         2.  Astute users will see that it runs a subset of the tests.
                     Please contribute to Flaky tests fight so more tests can be
                     used.
         3.  If ToT is broken, the commit queue will retry your patch on
                     an older revision automatically.
     4.  Makes sure no new comments were added to the issue.
     5.  If everything passes, once the tree is open;
         1.  Commits the change on the behalf of the issue owner, even if
                     the owner is not a Chromium commiter.
     6.  If one of the steps fails, the 'commit' checkbox is cleared and
                 the issue is removed from the queue.
         1.  However, they author or a reviewer can re-check the box and
                     the CQ will try again.

 Issues are processed asynchronously so whatever faster try job completes first
 wins. This is important as a flaky CL won't bottleneck the remaining one. This
 is mainly because the number of commit per hour is disproportionate to the try
 server latency, e.g. the full build and test cycle time for all the platforms.

 ### Faking author in subversion

 To commit on the behalf of the author, an unconventional technique is used in
 subversion with a server-side pre-commit hook. The code is at:
 <http://src.chromium.org/viewvc/chrome/trunk/tools/depot_tools/tests/sample_pre_commit_hook?view=markup>

 The control flow is:

 1.  On the client side, with a checkout with a special committer
             credential;
     1.  svn commit --with-revprop realauthor=&lt;author to attribute the
                 commit to&gt;
 2.  On the server side;
     1.  A pre_commit_hook intercepts the commits
     2.  It opens &lt;repo&gt;/db/transactions/&lt;tx&gt;.txn/props and
                 parses its data
     3.  If the realauthor svn property if found,
         1.  It verifies svn:author is a special committer
         2.  It does simple sanity checks
         3.  It replaces svn:author with realauthor's value
         4.  It sets the commit-bot svn property to know this revision
                     was committed by the CQ.
     4.  The updated data is saved in the transaction file.

 This works much better than using svn propset --revprop since there is no race
 condition and the revision property modifications are done *during* the commit.
 This technique can only be used when it is possible to set a server side hook so
 for example, it can't be used with projects hosted on code.google.com.

 ## Project Information

 *   maruel@ wrote it and dpranke@ did the code reviews.
 *   Code:
             <http://src.chromium.org/viewvc/chrome/trunk/tools/commit-queue/>

 ## Caveats

 The main problem is test flakiness. The CQ works around this problem partially
 by retrying failed tests a second time. It retries compile failure with a full
 rebuild, versus an incremental build normal, to work around cases of broken
 incremental compiles, which does happen relatively frequently. Fixing these two
 problems is outside the scope of the CQ project.

 Rietveld doesn't enforce a coherent svn url mapping on its CLs, causing CLs to
 be ignored by the CQ.

 ## Latency

 The CQ is very slow at the moment. This is why the [test
 isolation](/system/errors/NodeNotFound) effort was started. The CQ is bound by:

 *   Rietveld polling, which is at best 10 seconds.
 *   Synchronous presubmit check execution, which is synchronous and
             single threaded, but with a timeout.
 *   Try job execution, including automatic retries.
 *   Waiting for the tree to open.
 *   The actual commit, which is fairly fast.

 ## Scalability

 The main scalability issue is running the presubmit checks and sending the try
 jobs. Many of the presubmit checks assume they are not running concurrently on
 the system and running them in parallel on one VM could cause random problems.
 If the CQ used a more decentralized approach, it would scale much better for
 that.

 ## Redundancy and Reliability

 There are multiple single points of failure;

 *   The CQ itself, running on a single process.
 *   The try server, which is itself not redundant.

 ## Security Considerations

 The commit queue require a full committer involvement, either to be the author
 of the CL or to be a reviewer giving approval. The security depends on Rietveld
 assumptions about its meta data:

 *   A rietveld issue cannot change of owner.
 *   A rietveld comment cannot be faked by a third party.

 All communications happens over https.

 ## Testing Plan

 Comprehensive set of unit tests was written. The code itself sends stack traces
 in case of exception for monitoring purposes.
	---
	breadcrumbs:
	- - /developers
	- For Developers
	- - /developers/testing
	- Testing and infrastructure
	- - /developers/testing/commit-queue
	- Chromium Commit Queue
	page_name: design
	title: 'Design doc: Chromium Commit queue'
	---

	## Objective

	Have a formal way for developers to ensure their patch won't break the
	Continuous Integration checks with a relatively high confidence level.

	## Background

	Before the Chromium Commit Queue, it's on each developer's shoulder to manually
	run multiple try jobs on the [Try Server](/system/errors/NodeNotFound) and check
	their results before committing.

	This is "brain wasted time" as this can be completely automated. The Commit
	Queue aims at automating this manual verification so the developer can start
	right away working on the next patch. This is necessary to scale at a sustained
	~100 commits/day.

	## Overview

	The CQ polls rietveld for CLs that are ready to be committed. The Chromium's
	fork of Rietveld has a 'Commit' checkbox. When the author or the reviewer checks
	it, it will be included in the next poll. Once the CQ learns about the CL, it
	verifies the author or one of the reviewers approving this CL is a full
	committer, then runs the presubmit checks, then runs try jobs and then commit
	the patch on the behalf of the author, faking its credential. The whole project
	is written in python.

	## Infrastructure

	It runs as a single-thread process. The infrastructure is really minimal since
	it's just a logic layer above the current infrastructures to automate something
	that was done manually by the developers. In particular, the CQ reuses:

	* the [try server](/system/errors/NodeNotFound)
	* chromium's branch of [rietveld](http://code.google.com/p/rietveld/)
	* [presubmit
	scripts](/developers/how-tos/depottools/presubmit-scripts) included
	in [depot_tools](/developers/how-tos/depottools)

	## Detailed Design

	Runs the following loop:

	1. Polls <http://codereview.chromium.org/search?closed=3&commit=2> to
	find new issues to attempt to commit.
	2. For each issue found,
	1. Runs preliminary checks,
	1. Make sure someone is a full committer
	2. Make sure there's a LGTM or the CL is TBR'ed.
	2. Runs presubmit checks, including OWNERS check.
	3. Sends new try jobs with -r HEAD.
	1. They are publicly visible at
	<http://build.chromium.org/p/tryserver.chromium/waterfall?committer=commit-bot@chromium.org>.
	2. Astute users will see that it runs a subset of the tests.
	Please contribute to Flaky tests fight so more tests can be
	used.
	3. If ToT is broken, the commit queue will retry your patch on
	an older revision automatically.
	4. Makes sure no new comments were added to the issue.
	5. If everything passes, once the tree is open;
	1. Commits the change on the behalf of the issue owner, even if
	the owner is not a Chromium commiter.
	6. If one of the steps fails, the 'commit' checkbox is cleared and
	the issue is removed from the queue.
	1. However, they author or a reviewer can re-check the box and
	the CQ will try again.

	Issues are processed asynchronously so whatever faster try job completes first
	wins. This is important as a flaky CL won't bottleneck the remaining one. This
	is mainly because the number of commit per hour is disproportionate to the try
	server latency, e.g. the full build and test cycle time for all the platforms.

	### Faking author in subversion

	To commit on the behalf of the author, an unconventional technique is used in
	subversion with a server-side pre-commit hook. The code is at:
	<http://src.chromium.org/viewvc/chrome/trunk/tools/depot_tools/tests/sample_pre_commit_hook?view=markup>

	The control flow is:

	1. On the client side, with a checkout with a special committer
	credential;
	1. svn commit --with-revprop realauthor=<author to attribute the
	commit to>
	2. On the server side;
	1. A pre_commit_hook intercepts the commits
	2. It opens <repo>/db/transactions/<tx>.txn/props and
	parses its data
	3. If the realauthor svn property if found,
	1. It verifies svn:author is a special committer
	2. It does simple sanity checks
	3. It replaces svn:author with realauthor's value
	4. It sets the commit-bot svn property to know this revision
	was committed by the CQ.
	4. The updated data is saved in the transaction file.

	This works much better than using svn propset --revprop since there is no race
	condition and the revision property modifications are done during the commit.
	This technique can only be used when it is possible to set a server side hook so
	for example, it can't be used with projects hosted on code.google.com.

	## Project Information

	* maruel@ wrote it and dpranke@ did the code reviews.
	* Code:
	<http://src.chromium.org/viewvc/chrome/trunk/tools/commit-queue/>

	## Caveats

	The main problem is test flakiness. The CQ works around this problem partially
	by retrying failed tests a second time. It retries compile failure with a full
	rebuild, versus an incremental build normal, to work around cases of broken
	incremental compiles, which does happen relatively frequently. Fixing these two
	problems is outside the scope of the CQ project.

	Rietveld doesn't enforce a coherent svn url mapping on its CLs, causing CLs to
	be ignored by the CQ.

	## Latency

	The CQ is very slow at the moment. This is why the [test
	isolation](/system/errors/NodeNotFound) effort was started. The CQ is bound by:

	* Rietveld polling, which is at best 10 seconds.
	* Synchronous presubmit check execution, which is synchronous and
	single threaded, but with a timeout.
	* Try job execution, including automatic retries.
	* Waiting for the tree to open.
	* The actual commit, which is fairly fast.

	## Scalability

	The main scalability issue is running the presubmit checks and sending the try
	jobs. Many of the presubmit checks assume they are not running concurrently on
	the system and running them in parallel on one VM could cause random problems.
	If the CQ used a more decentralized approach, it would scale much better for
	that.

	## Redundancy and Reliability

	There are multiple single points of failure;

	* The CQ itself, running on a single process.
	* The try server, which is itself not redundant.

	## Security Considerations

	The commit queue require a full committer involvement, either to be the author
	of the CL or to be a reviewer giving approval. The security depends on Rietveld
	assumptions about its meta data:

	* A rietveld issue cannot change of owner.
	* A rietveld comment cannot be faked by a third party.

	All communications happens over https.

	## Testing Plan

	Comprehensive set of unit tests was written. The code itself sends stack traces
	in case of exception for monitoring purposes.