Writing Layout Tests

Layout tests is a bit of a misnomer. This term is a part of our WebKit heritage, and we use it to refer to every test that is written as a Web page (HTML, SVG, or XHTML) and lives in third_party/WebKit/LayoutTests/.

Overview

Layout tests should be used to accomplish one of the following goals:

The entire surface of Blink that is exposed to the Web should be covered by tests that we contribute to web-platform-tests (WPT). This helps us avoid regressions, and helps us identify Web Platform areas where the major browsers don't have interoperable implementations. Furthermore, by contributing to projects such as WPT, we share the burden of writing tests with the other browser vendors, and we help all the browsers get better. This is very much in line with our goal to move the Web forward.
When a Blink feature cannot be tested using the tools provided by WPT, and cannot be easily covered by C++ unit tests, the feature must be covered by layout tests, to avoid unexpected regressions. These tests will use Blink-specific testing APIs that are only available in content_shell.

Test Types

There are four broad types of layout tests, listed in the order of preference.

JavaScript Tests are the layout test implementation of xUnit tests. These tests contain assertions written in JavaScript, and pass if the assertions evaluate to true.
Reference Tests render a test page and a reference page, and pass if the two renderings are identical, according to a pixel-by-pixel comparison. These tests are less robust, harder to debug, and significantly slower than JavaScript tests, and are only used when JavaScript tests are insufficient, such as when testing paint code.
Pixel Tests render a test page and compare the result against a pre-rendered baseline image in the repository. Pixel tests are less robust than the first two types, because the rendering of a page is influenced by many factors such as the host computer‘s graphics card and driver, the platform’s text rendering system, and various user-configurable operating system settings. For this reason, it is common for a pixel test to have a different reference image for each platform that Blink is tested on, and the reference images are quite cumbersome to manage. You should only write a pixel test if you cannot use a reference test. By default a pixel test will also dump the layout tree as text output, so they are similar to ...
Layout tree tests, which output a textual representation of the layout tree, which is the key data structure in Blink's page rendering system. The test passes if the output matches a baseline text file in the repository. Layout tree tests are used as a last resort to test the internal quirks of the implementation, and they should be avoided in favor of one of the earlier options.

General Principles

Tests should be written under the assumption that they will be upstreamed to the WPT project. To this end, tests should follow the WPT guidelines.

There is no style guide that applies to all layout tests. However, some projects have adopted style guides, such as the ServiceWorker Tests Style guide.

Our document on layout tests tips summarizes the most important WPT guidelines and highlights some JavaScript concepts that are worth paying attention to when trying to infer style rules from existing tests. If you're unopinionated and looking for a style guide to follow, the document also suggests some defaults.

JavaScript Tests

Whenever possible, the testing criteria should be expressed in JavaScript. The alternatives, which will be described in future sections, result in slower and less reliable tests.

All new JavaScript tests should be written using the testharness.js testing framework. This framework is used by the tests in the web-platform-tests repository, which is shared with all the other browser vendors, so testharness.js tests are more accessible to browser developers.

See the API documentation for a thorough introduction to testharness.js.

Layout tests should follow the recommendations of the above documentation. Furthermore, layout tests should include relevant metadata. The specification URL (in <link rel="help">) is almost always relevant, and is incredibly helpful to a developer who needs to understand the test quickly.

Below is a skeleton for a JavaScript test embedded in an HTML page. Note that, in order to follow the minimality guideline, the test omits the tags <html>, <head>, and <body>, as they can be inferred by the HTML parser.

<!doctype html>
<title>JavaScript: the true literal is immutable and equal to itself</title>
<link rel="help" href="https://tc39.github.io/ecma262/#sec-boolean-literals">
<script src="/resources/testharness.js"></script>
<script src="/resources/testharnessreport.js"></script>
<script>
'use strict';

// Synchronous test example.
test(() => {
  const value = true;
  assert_true(value, 'true literal');
  assert_equals(value.toString(), 'true', 'the string representation of true');
}, 'The literal true in a synchronous test case');

// Asynchronous test example.
async_test(t => {
  const originallyTrue = true;
  setTimeout(t.step_func_done(() => {
    assert_equals(originallyTrue, true);
  }), 0);
}, 'The literal true in a setTimeout callback');

// Promise test example.
promise_test(() => {
  return new Promise((resolve, reject) => {
    resolve(true);
  }).then(value => {
    assert_true(value);
  });
}, 'The literal true used to resolve a Promise');

</script>

Some points that are not immediately obvious from the example:

When calling an assert_ function that compares two values, the first argument is the actual value (produced by the functionality being tested), and the second argument is the expected value (known good, golden). The order is important, because the testing harness relies on it to generate expressive error messages that are relied upon when debugging test failures.
The assertion description (the string argument to assert_ methods) conveys the way the actual value was obtained.
- If the expected value doesn't make it clear, the assertion description should explain the desired behavior.
- Test cases with a single assertion should omit the assertion's description when it is sufficiently clear.
Each test case describes the circumstance that it tests, without being redundant.
- Do not start test case descriptions with redundant terms like “Testing” or “Test for”.
- Test files with a single test case should omit the test case description. The file's <title> should be sufficient to describe the scenario being tested.
Asynchronous tests have a few subtleties.
- The async_test wrapper calls its function with a test case argument that is used to signal when the test case is done, and to connect assertion failures to the correct test.
- t.done() must be called after all the test case's assertions have executed.
- Test case assertions (actually, any callback code that can throw exceptions) must be wrapped in t.step_func() calls, so that assertion failures and exceptions can be traced back to the correct test case.
- t.step_func_done() is a shortcut that combines t.step_func() with a t.done() call.

Layout tests that load from file:// origins must currently use relative paths to point to /resources/testharness.js and /resources/testharnessreport.js. This is contrary to the WPT guidelines, which call for absolute paths. This limitation does not apply to the tests in LayoutTests/http, which rely on an HTTP server, or to the tests in LayoutTests/external/wpt, which are imported from the WPT repository.

WPT Supplemental Testing APIs

Some tests simply cannot be expressed using the Web Platform APIs. For example, some tests that require a user to perform a gesture, such as a mouse click, cannot be implemented using Web APIs. The WPT project covers some of these cases via supplemental testing APIs.

When writing tests that rely on supplemental testing APIs, please consider the cost and benefits of having the tests gracefully degrade to manual tests in the absence of the testing APIs.

Relying on Blink-Specific Testing APIs

Tests that cannot be expressed using the Web Platform APIs or WPT's testing APIs use Blink-specific testing APIs. These APIs are only available in content_shell, and should only be used as a last resort.

A downside of Blink-specific APIs is that they are not as well documented as the Web Platform features. Learning to use a Blink-specific feature requires finding other tests that use it, or reading its source code.

For example, the most popular Blink-specific API is testRunner, which is implemented in components/test_runner/test_runner.h and components/test_runner/test_runner.cc. By skimming the TestRunnerBindings::Install method, we learn that the testRunner API is presented by the window.testRunner and window.layoutTestsController objects, which are synonyms. Reading the TestRunnerBindings::GetObjectTemplateBuilder method tells us what properties are available on the window.testRunner object.

window.testRunner is the preferred way to access the testRunner APIs. window.layoutTestsController is still supported because it is used by 3rd-party tests.

testRunner is the most popular testing API because it is also used indirectly by tests that stick to Web Platform APIs. The testharnessreport.js file in testharness.js is specifically designated to hold glue code that connects testharness.js to the testing environment. Our implementation is in third_party/WebKit/LayoutTests/resources/testharnessreport.js, and uses the testRunner API.

See the components/test_runner/ directory and WebKit's LayoutTests guide for other useful APIs. For example, window.eventSender (components/test_runner/event_sender.h and components/test_runner/event_sender.cc) has methods that simulate events input such as keyboard / mouse input and drag-and-drop.

Here is a UML diagram of how the testRunner bindings fit into Chromium.

Text Test Baselines

By default, all the test cases in a file that uses testharness.js are expected to pass. However, in some cases, we prefer to add failing test cases to the repository, so that we can be notified when the failure modes change (e.g., we want to know if a test starts crashing rather than returning incorrect output). In these situations, a test file will be accompanied by a baseline, which is an -expected.txt file that contains the test's expected output.

The baselines are generated automatically when appropriate by run-webkit-tests, which is described here, and by the rebaselining tools.

Text baselines for testharness.js should be avoided, as having a text baseline associated with a testharness.js indicates the presence of a bug. For this reason, CLs that add text baselines must include a crbug.com link for an issue tracking the removal of the text expectations.

When creating tests that will be upstreamed to WPT, and Blink‘s current behavior does not match the specification that is being tested, a text baseline is necessary. Remember to create an issue tracking the expectation’s removal, and to link the issue in the CL description.
Layout tests that cannot be upstreamed to WPT should use JavaScript to document Blink's current behavior, rather than using JavaScript to document desired behavior and a text file to document current behavior.

The js-test.js Legacy Harness

For historical reasons, older tests are written using the js-test harness. This harness is deprecated, and should not be used for new tests.

If you need to understand old tests, the best js-test documentation is its implementation at third_party/WebKit/LayoutTests/resources/js-test.js.

js-test tests lean heavily on the Blink-specific testRunner testing API. In a nutshell, the tests call testRunner.dumpAsText() to signal that the page content should be dumped and compared against a text baseline (an -expected.txt file). As a consequence, js-test tests are always accompanied by text baselines. Asynchronous tests also use testRunner.waitUntilDone() and testRunner.notifyDone() to tell the testing tools when they are complete.

Tests that use an HTTP Server

By default, tests are loaded as if via file: URLs. Some web platform features require tests served via HTTP or HTTPS, for example absolute paths (src=/foo) or features restricted to secure protocols.

HTTP tests are those under LayoutTests/http/tests (or virtual variants). Use a locally running HTTP server (Apache) to run them. Tests are served off of ports 8000 and 8080 for HTTP, and 8443 for HTTPS. If you run the tests using run-webkit-tests, the server will be started automatically. To run the server manually to reproduce or debug a failure:

cd src/third_party/WebKit/Tools/Scripts
./run-blink-httpd

The layout tests will be served from http://127.0.0.1:8000. For example, to run the test http/tests/serviceworker/chromium/service-worker-allowed.html, navigate to http://127.0.0.1:8000/serviceworker/chromium/service-worker-allowed.html. Some tests will behave differently if you go to 127.0.0.1 instead of localhost, so use 127.0.0.1.

To kill the server, hit any key on the terminal where run-blink-httpd is running, or just use taskkill or the Task Manager on Windows, and killall or Activity Monitor on MacOS.

The test server sets up an alias to the LayoutTests/resources directory. In HTTP tests, you can access the testing framework at e.g. src="/resources/testharness.js".

TODO: Document wptserve when we are in a position to use it to run layout tests.

Reference Tests (Reftests)

Reference tests, also known as reftests, perform a pixel-by-pixel comparison between the rendered image of a test page and the rendered image of a reference page. Most reference tests pass if the two images match, but there are cases where it is useful to have a test pass when the two images do not match.

Reference tests are more difficult to debug than JavaScript tests, and tend to be slower as well. Therefore, they should only be used for functionality that cannot be covered by JavaScript tests.

New reference tests should follow the WPT reftests guidelines. The most important points are summarized below.

🚧 The test page declares the reference page using a <link rel="match"> or <link rel="mismatch">, depending on whether the test passes when the test image matches or does not match the reference image.
The reference page must not use the feature being tested. Otherwise, the test is meaningless.
The reference page should be as simple as possible, and should not depend on advanced features. Ideally, the reference page should render as intended even on browsers with poor CSS support.
Reference tests should be self-describing.
Reference tests do not include testharness.js.

🚧 Our testing infrastructure was designed for the WebKit reftests that Blink has inherited. The consequences are summarized below.

Each reference page must be in the same directory as its associated test. Given a test page named foo (e.g. foo.html or foo.svg),
- The reference page must be named foo-expected (e.g., foo-expected.html) if the test passes when the two images match.
- The reference page must be named foo-expected-mismatch (e.g., foo-expected-mismatch.svg) if the test passes when the two images do not match.
Multiple references and chained references are not supported.

The following example demonstrates a reference test for <ol>'s reversed attribute. The example assumes that the test page is named ol-reversed.html.

<!doctype html>
<link rel="match" href="ol-reversed-expected.html">

<ol reversed>
  <li>A</li>
  <li>B</li>
  <li>C</li>
</ol>

The reference page, which must be named ol-reversed-expected.html, is below.

<!doctype html>

<ol>
  <li value="3">A</li>
  <li value="2">B</li>
  <li value="1">C</li>
</ol>

Pixel Tests

testRunner APIs such as window.testRunner.dumpAsTextWithPixelResults() and window.testRunner.dumpDragImage() create an image result that is associated with the test. The image result is compared against an image baseline, which is an -expected.png file associated with the test, and the test passes if the image result is identical to the baseline, according to a pixel-by-pixel comparison. Tests that have image results (and baselines) are called pixel tests.

Pixel tests should still follow the principles laid out above. Pixel tests pose unique challenges to the desire to have self-describing and cross-platform tests. The WPT rendering test guidelines contain useful guidance. The most relevant pieces of advice are below.

Whenever possible, use a green paragraph / page / square to indicate success. If that is not possible, make the test self-describing by including a textual description of the desired (passing) outcome.
Only use the red color or the word FAIL to highlight errors. This does not apply when testing the color red.
🚧 Use the Ahem font to reduce the variance introduced by the platform's text rendering system. This does not apply when testing text, text flow, font selection, font fallback, font features, or other typographic information.

TODO: Document how to opt out of generating a layout tree when generating pixel results.

When using window.testRunner.dumpAsTextWithPixelResults(), the image result will always be 800x600px, because test pages are rendered in an 800x600px viewport. Pixel tests that do not specifically cover scrolling should fit in an 800x600px viewport without creating scrollbars.

The following snippet includes the Ahem font in a layout test.

<style>
body {
  font: 10px Ahem;
}
</style>
<script src="/resources/ahem.js"></script>

Tests outside LayoutTests/http and LayoutTests/external/wpt currently need to use a relative path to /third_party/WebKit/LayoutTests/resources/ahem.js

Tests that need to paint, raster, or draw a frame of intermediate output

A layout test does not actually draw frames of output until the test exits. Tests that need to generate a painted frame can use window.testRunner.displayAsyncThen, which will run the machinery to put up a frame, then call the passed callback. There is also a library at fast/repaint/resources/text-based-repaint.js to help with writing paint invalidation and repaint tests.

Layout tree tests

A layout tree test renders a web page and produces up to two results, which are compared against baseline files:

All tests output a textual representation of Blink's layout tree (called the render tree on that page), which is compared against an -expected.txt text baseline.
Some tests also output the image of the rendered page, which is compared against an -expected.png image baseline, using the same method as pixel tests.

Whether you want a pixel test or a layout tree test depends on whether you care about the visual image, the details of how that image was constructed, or both. It is possible for multiple layout trees to produce the same pixel output, so it is important to make it clear in the test which outputs you really care about.

TODO: Document the API used by layout tree tests to opt out of producing image results.

A layout tree test passes if all of its results match their baselines. Like pixel tests, the output of layout tree tests depends on platform-specific details, so layout tree tests often require per-platform baselines. Furthermore, since the tests obviously depend on the layout tree structure, that means that if we change the layout tree you have to rebaseline each layout tree test to see if the results are still correct and whether the test is still meaningful. There are actually many cases where the layout tree output is misstated (i.e., wrong), because people didn't want to have to update existing baselines and tests. This is really unfortunate and confusing.

For these reasons, layout tree tests should only be used to cover aspects of the layout code that can only be tested by looking at the layout tree. Any combination of the other test types is preferable to a layout tree test. Layout tree tests are inherited from WebKit, so the repository may have some unfortunate examples of layout tree tests.

The following page is an example of a layout tree test.

<!doctype html>
<style>
body { font: 10px Ahem; }
span::after {
  content: "pass";
  color: green;
}
</style>
<script src="/resources/ahem.js"></script>

<p><span>Pass if a green PASS appears to the right: </span></p>

The most important aspects of the example are that the test page does not include a testing framework, and that it follows the guidelines for pixel tests. The test page produces the text result below.

layer at (0,0) size 800x600
  LayoutView at (0,0) size 800x600
layer at (0,0) size 800x30
  LayoutBlockFlow {HTML} at (0,0) size 800x30
    LayoutBlockFlow {BODY} at (8,10) size 784x10
      LayoutBlockFlow {P} at (0,0) size 784x10
        LayoutInline {SPAN} at (0,0) size 470x10
          LayoutText {#text} at (0,0) size 430x10
            text run at (0,0) width 430: "Pass if a green PASS appears to the right: "
          LayoutInline {<pseudo:after>} at (0,0) size 40x10 [color=#008000]
            LayoutTextFragment (anonymous) at (430,0) size 40x10
              text run at (430,0) width 40: "pass"

Notice that the test result above depends on the size of the <p> text. The test page uses the Ahem font (introduced above), whose main design goal is consistent cross-platform rendering. Had the test used another font, its text baseline would have depended on the fonts installed on the testing computer, and on the platform's font rendering system. Please follow the pixel tests guidelines and write reliable layout tree tests!

WebKit‘s layout tree is described in a series of posts on WebKit’s blog. Some of the concepts there still apply to Blink's layout tree.

Directory Structure

The LayoutTests directory currently lacks a strict, formal structure. The following directories have special meaning:

The http/ directory hosts tests that require an HTTP server (see above).
The resources/ subdirectory in every directory contains binary files, such as media files, and code that is shared by multiple test files.

Some layout tests consist of a minimal HTML page that references a JavaScript file in resources/. Please do not use this pattern for new tests, as it goes against the minimality principle. JavaScript and CSS files should only live in resources/ if they are shared by at least two test files.