site/developers/testing/commit-queue/integration-with-rietveld/index.md - website.git - Git at Google

 ---
 breadcrumbs:
 - - /developers
   - For Developers
 - - /developers/testing
   - Testing and infrastructure
 - - /developers/testing/commit-queue
   - Chromium Commit Queue
 page_name: integration-with-rietveld
 title: 'Design: 3-way Integration with Rietveld and the Try Server'
 ---

 [TOC]

 ## Objective

 Drastically reduce the *perceived time* (wall-clock time) from asking the
 infrastructure to commit a change list (CL) to the time it is committed. This
 reduces the time a developer has to care about a particular CL, by getting it in
 or refusing it as fast as possible.

 ## Background

 The Chromium team uses the [Commit
 Queue](/developers/testing/commit-queue/design), which in turn uses the [Try
 Server](/system/errors/NodeNotFound), to perform automated testing of their
 pending CL on multiple platforms in multiple configurations.
 [Rietveld](http://code.google.com/p/rietveld/) is used for code reviews and the
 Chromium team uses a [slightly forked
 version](http://code.google.com/p/rietveld/source/browse/?name=chromium) to
 better integrate it with the rest of the Chromium infrastructure.

 Before the integration, the Commit Queue is awaken as soon as the CL author, or
 one of his reviewer, checks the infamous *Commit checkbox* on Rietveld. The
 Commit Queue polls updates from Rietveld at a rate of 0.1hz to get new CLs to
 test.

 This means that pre-commit verifications, which include code styling, OWNERS
 check, a proper LGTM from at least one Chromium Committer, properly testing the
 patches on the Try Server, only starts at the moment the checkbox is set. This
 means a maximal time is spent between the moment the checkbox is checked and the
 change is committed.

 Also, before the integration, because of the way try jobs are triggered, the
 Commit Queue cannot trust the try jobs triggered by anyone but itself. The
 "previous way" of triggering try jobs is to commit a raw diff into subversion.
 Let's just say that it is *suboptimal*.

 ## Overview

 This project is about implementing techniques in the integration so the Commit
 Queue can efficiently run presubmit checks and Try Jobs on the Try Server before
 the CL author request the commit to be merged into the source tree;

 The Commit Queue is also trusting the try jobs triggered by others than itself.
 The new way involves the Try Server polling Rietveld directly to fetch the diff
 *directly* from the code review. This is the reason the try jobs can be
 "trusted".

 Rietveld acts as a datastore to map a Verification steps to a Patchset.

 ## Infrastructure

 The infrastructure consist of:

 1.  Rietveld is running on AppEngine, using AppEngine's supplied GAIA
             account management.
     1.  It becomes the primary database for the try job results.
 2.  The Commit Queue process, which is a stand alone process.
 3.  The Try Server, which is an infrastructure itself.

 ## Detailed Design

 ### 1. Trusting try jobs

 The first task is to make sure Try Jobs results on Rietveld can be trusted. This
 is done by differentiating try jobs triggered directly from Rietveld from any
 other try jobs. The polling code is in
 [try_job_rietveld.py](http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?view=markup)
 and sets the build property `try_job_key` that is then used to certify this try
 jobs was indeed triggered from Rietveld in
 [view_chromium.py](http://code.google.com/p/rietveld/source/browse/codereview/views_chromium.py?name=chromium).
 The json packets are pushed via Buildbot's [HttpStatusPush
 implementation](http://src.chromium.org/viewvc/chrome/trunk/tools/build/third_party/buildbot_8_4p1/buildbot/status/status_push.py?view=markup)
 from the Try Server back into Rietveld's /status_listener URL as HTTP POST
 requests.

 ### 2. Polling for pending CLs

 Rietveld already has a mechanism to search for issues with specific criteria.
 For example,

 <https://codereview.chromium.org/search?closed=3&commit=3&modified_after=2012-09-18&format=json&limit=10>

 While polling it really not optimal, it permits reusing the current API. The
 `/search` interface supports specifying an opaque datastore `cursor` so only new
 issues are returned. The cursor to use on the next request is returned inside
 the json data.

 ### 3. Reacting to CL modifications

 For each new CL modified, the CQ must decide if it reacts to the CL. To enable
 preemptive CL verification, the following steps are used on each new CL;

 1.  Determine if the CL author is a trusted author, e.g. a [Chromium
             Committer](/getting-involved/become-a-committer).
     1.  If it is and the user asked for review, initiate checks.
     2.  If not, wait for a LGTM from a trusted reviewer. Note that it's
                 not waiting for the Commit checkbox to be checked.
 2.  Keep track of the previous checks for the last patchset, discarding
             data as new patchsets are uploaded.
 3.  Add a note on the issue if a check failure occurs.

 Then for commit time;

 1.  Upon Commit checkbox signal, only re-run the Secure Checks and
             commit.
     1.  Secure checks include things that must not change; the CL
                 description must have stayed the same, the tree is open, etc.

 ## Project Information

 This project is driven primary by rogerta@ and maruel@. It benefits immediately
 from Try Server enhancement projects like the [Swarm integration
 project](/system/errors/NodeNotFound) but it is still designed to work without
 Try Server performance boost and work around its general sluggishness.

 ## Caveats

 1.  Presubmit checks are not yet run on the Try Server but are run
             locally instead. This needs to be changed to accelerate the CQ and
             keep its integrity.
 2.  This significantly increases the Try Server utilization. Without the
             [Swarm integration project](/system/errors/NodeNotFound), this would
             likely overload the Try Server.

 ## Latency

 The whole purpose of this project is to reduce the latency of the pre-commit
 checks. The latency will boils down to

 1.  Poll rate.
 2.  Pre-commit forced checks.

 At that point, we are not aiming at optimizing the commit delay below 30
 seconds, lower than that is a nice to have.

 ## Scalability

 The Commit Queue is a single threaded process. As such, it is bound by its
 slowest operations. This currently include the PRESUBMIT checks and
 applying/commiting CLs. The goal is to make checks happen earlier and reuse the
 results from previous checks whenever they are trusted, not to make the checks
 faster.

 ## Redundancy and Reliability

 There are multiple single points of failure;

 1.  Rietveld on AppEngine. If AppEngine blows up, we're screwed.
 2.  The Try Server, which is a single-threaded process. Restarting it,
             in case of hard failure, is only lossy for the jobs in progress at
             the time of shutdown. In case of catastrophic failure, all the
             previous try jobs would need to be re-run. This already happened in
             the past and can be handled relatively fast, within an hour or so.
 3.  The Commit Queue process itself is also single-threaded and
             single-homed. It is designed to be mostly lossless when restarted or
             if an exception occurs. The Try Jobs database can be reconstructed
             from the information stored on Rietveld.

 ## Security Consideration

 The security is using the following assumptions;

 1.  Rietveld administrators are not rogue.
 2.  Rietveld does not permit a CL author to fake its identity and the
             patchsets, once created, are immutable.
     1.  The commit queue ensures the CL description is not modified
                 between the moment it is running its presubmit check and the
                 moment it is committing.
 3.  The Commit Queue is doing proper verification w.r.t. which is a
             trusted committer.
     1.  In the end, the Commit Queue process is the one deciding if a
                 commit gets in or not and doing it.
 4.  The committer are trusted. They already can commit anything they
             want. It is an non-goal to restrict what committers can commit.

 ## Testing Plan

 The integration itself, being an integration project, is mostly tested directly
 "on prod". While there are a few unit tests to check basic functionality, most
 of the bugs are at integration points. This being a productivity tool for
 developers, it is expected that they have some clue about what's going wrong.
	---
	breadcrumbs:
	- - /developers
	- For Developers
	- - /developers/testing
	- Testing and infrastructure
	- - /developers/testing/commit-queue
	- Chromium Commit Queue
	page_name: integration-with-rietveld
	title: 'Design: 3-way Integration with Rietveld and the Try Server'
	---

	[TOC]

	## Objective

	Drastically reduce the perceived time (wall-clock time) from asking the
	infrastructure to commit a change list (CL) to the time it is committed. This
	reduces the time a developer has to care about a particular CL, by getting it in
	or refusing it as fast as possible.

	## Background

	The Chromium team uses the [Commit
	Queue](/developers/testing/commit-queue/design), which in turn uses the [Try
	Server](/system/errors/NodeNotFound), to perform automated testing of their
	pending CL on multiple platforms in multiple configurations.
	[Rietveld](http://code.google.com/p/rietveld/) is used for code reviews and the
	Chromium team uses a [slightly forked
	version](http://code.google.com/p/rietveld/source/browse/?name=chromium) to
	better integrate it with the rest of the Chromium infrastructure.

	Before the integration, the Commit Queue is awaken as soon as the CL author, or
	one of his reviewer, checks the infamous Commit checkbox on Rietveld. The
	Commit Queue polls updates from Rietveld at a rate of 0.1hz to get new CLs to
	test.

	This means that pre-commit verifications, which include code styling, OWNERS
	check, a proper LGTM from at least one Chromium Committer, properly testing the
	patches on the Try Server, only starts at the moment the checkbox is set. This
	means a maximal time is spent between the moment the checkbox is checked and the
	change is committed.

	Also, before the integration, because of the way try jobs are triggered, the
	Commit Queue cannot trust the try jobs triggered by anyone but itself. The
	"previous way" of triggering try jobs is to commit a raw diff into subversion.
	Let's just say that it is suboptimal.

	## Overview

	This project is about implementing techniques in the integration so the Commit
	Queue can efficiently run presubmit checks and Try Jobs on the Try Server before
	the CL author request the commit to be merged into the source tree;

	The Commit Queue is also trusting the try jobs triggered by others than itself.
	The new way involves the Try Server polling Rietveld directly to fetch the diff
	directly from the code review. This is the reason the try jobs can be
	"trusted".

	Rietveld acts as a datastore to map a Verification steps to a Patchset.

	## Infrastructure

	The infrastructure consist of:

	1. Rietveld is running on AppEngine, using AppEngine's supplied GAIA
	account management.
	1. It becomes the primary database for the try job results.
	2. The Commit Queue process, which is a stand alone process.
	3. The Try Server, which is an infrastructure itself.

	## Detailed Design

	### 1. Trusting try jobs

	The first task is to make sure Try Jobs results on Rietveld can be trusted. This
	is done by differentiating try jobs triggered directly from Rietveld from any
	other try jobs. The polling code is in
	[try_job_rietveld.py](http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?view=markup)
	and sets the build property `try_job_key` that is then used to certify this try
	jobs was indeed triggered from Rietveld in
	[view_chromium.py](http://code.google.com/p/rietveld/source/browse/codereview/views_chromium.py?name=chromium).
	The json packets are pushed via Buildbot's [HttpStatusPush
	implementation](http://src.chromium.org/viewvc/chrome/trunk/tools/build/third_party/buildbot_8_4p1/buildbot/status/status_push.py?view=markup)
	from the Try Server back into Rietveld's /status_listener URL as HTTP POST
	requests.

	### 2. Polling for pending CLs

	Rietveld already has a mechanism to search for issues with specific criteria.
	For example,

	<https://codereview.chromium.org/search?closed=3&commit=3&modified_after=2012-09-18&format=json&limit=10>

	While polling it really not optimal, it permits reusing the current API. The
	`/search` interface supports specifying an opaque datastore `cursor` so only new
	issues are returned. The cursor to use on the next request is returned inside
	the json data.

	### 3. Reacting to CL modifications

	For each new CL modified, the CQ must decide if it reacts to the CL. To enable
	preemptive CL verification, the following steps are used on each new CL;

	1. Determine if the CL author is a trusted author, e.g. a [Chromium
	Committer](/getting-involved/become-a-committer).
	1. If it is and the user asked for review, initiate checks.
	2. If not, wait for a LGTM from a trusted reviewer. Note that it's
	not waiting for the Commit checkbox to be checked.
	2. Keep track of the previous checks for the last patchset, discarding
	data as new patchsets are uploaded.
	3. Add a note on the issue if a check failure occurs.

	Then for commit time;

	1. Upon Commit checkbox signal, only re-run the Secure Checks and
	commit.
	1. Secure checks include things that must not change; the CL
	description must have stayed the same, the tree is open, etc.

	## Project Information

	This project is driven primary by rogerta@ and maruel@. It benefits immediately
	from Try Server enhancement projects like the [Swarm integration
	project](/system/errors/NodeNotFound) but it is still designed to work without
	Try Server performance boost and work around its general sluggishness.

	## Caveats

	1. Presubmit checks are not yet run on the Try Server but are run
	locally instead. This needs to be changed to accelerate the CQ and
	keep its integrity.
	2. This significantly increases the Try Server utilization. Without the
	[Swarm integration project](/system/errors/NodeNotFound), this would
	likely overload the Try Server.

	## Latency

	The whole purpose of this project is to reduce the latency of the pre-commit
	checks. The latency will boils down to

	1. Poll rate.
	2. Pre-commit forced checks.

	At that point, we are not aiming at optimizing the commit delay below 30
	seconds, lower than that is a nice to have.

	## Scalability

	The Commit Queue is a single threaded process. As such, it is bound by its
	slowest operations. This currently include the PRESUBMIT checks and
	applying/commiting CLs. The goal is to make checks happen earlier and reuse the
	results from previous checks whenever they are trusted, not to make the checks
	faster.

	## Redundancy and Reliability

	There are multiple single points of failure;

	1. Rietveld on AppEngine. If AppEngine blows up, we're screwed.
	2. The Try Server, which is a single-threaded process. Restarting it,
	in case of hard failure, is only lossy for the jobs in progress at
	the time of shutdown. In case of catastrophic failure, all the
	previous try jobs would need to be re-run. This already happened in
	the past and can be handled relatively fast, within an hour or so.
	3. The Commit Queue process itself is also single-threaded and
	single-homed. It is designed to be mostly lossless when restarted or
	if an exception occurs. The Try Jobs database can be reconstructed
	from the information stored on Rietveld.

	## Security Consideration

	The security is using the following assumptions;

	1. Rietveld administrators are not rogue.
	2. Rietveld does not permit a CL author to fake its identity and the
	patchsets, once created, are immutable.
	1. The commit queue ensures the CL description is not modified
	between the moment it is running its presubmit check and the
	moment it is committing.
	3. The Commit Queue is doing proper verification w.r.t. which is a
	trusted committer.
	1. In the end, the Commit Queue process is the one deciding if a
	commit gets in or not and doing it.
	4. The committer are trusted. They already can commit anything they
	want. It is an non-goal to restrict what committers can commit.

	## Testing Plan

	The integration itself, being an integration project, is mostly tested directly
	"on prod". While there are a few unit tests to check basic functionality, most
	of the bugs are at integration points. This being a productivity tool for
	developers, it is expected that they have some clue about what's going wrong.