Layout tests is a bit of a misnomer. This term is a part of our WebKit heritage, and we use it to refer to every test that is written as a Web page (HTML, SVG, or XHTML) and lives in third_party/WebKit/LayoutTests/.
Layout tests should be used to accomplish one of the following goals:
There are four broad types of layout tests, listed in the order of preference.
The principles below are adapted from Test the Web Forward's Test Format Guidelines and WebKit's Wiki page on Writing good test cases.
Tests should be concise, without compromising on the principles below. Every element and piece of code on the page should be necessary and relevant to what is being tested. For example, don't build a fully functional signup form if you only need a text field or a button.
Content needed to satisfy the principles below is considered necessary. For example, it is acceptable and desirable to add elements that make the test self-describing (see below), and to add code that makes the test more reliable (see below).
Content that makes test failures easier to debug is considered necessary (to maintaining a good development speed), and is both acceptable and desirable.
Tests should be as fast as possible, without compromising on the principles below. Blink has several thousand layout tests that are run in parallel, and avoiding unnecessary delays is crucial to keeping our Commit Queue in good shape.
Avoid window.setTimeout, as it wastes time on the testing infrastructure. Instead, use specific event handlers, such as window.onload, to decide when to advance to the next step in a test.
Tests should be reliable and yield consistent results for a given implementation. Flaky tests slow down your fellow developers' debugging efforts and the Commit Queue.
window.setTimeout
is again a primary offender here. Asides from wasting time on a fast system, tests that rely on fixed timeouts can fail when on systems that are slower than expected.
Follow the guidelines in this PSA on writing reliable layout tests.
Tests should be self-describing, so that a project member can recognize whether a test passes or fails without having to read the specification of the feature being tested. testharness.js
makes a test self-describing when used correctly, but tests that degrade to manual tests must be carefully designed to be self-describing.
Tests should require a minimal amount of cognitive effort to read and maintain.
Avoid depending on edge case behavior of features that aren't explicitly covered by the test. For example, except where testing parsing, tests should contain valid markup (no parsing errors).
Tests should provide as much relevant information as possible when failing. testharness.js
tests should prefer rich assert_ functions to combining assert_true()
with a boolean operator. Using appropriate assert_
functions results in better diagnostic output when the assertion fails.
🚧 Prefer JavaScript‘s === operator to == so that readers don’t have to reason about type conversion.
Tests should be as cross-platform as reasonably possible. Avoid assumptions about device type, screen resolution, etc. Unavoidable assumptions should be documented.
When possible, tests should only use Web platform features, as specified in the relevant standards. When the Web platform's APIs are insufficient, tests should prefer to use WPT extended testing APIs, such as wpt_automation
, over Blink-specific testing APIs.
🚧 Tests that use testing APIs should feature-test for the presence of those APIs, and gracefully degrade to manual tests (see below) when the testing APIs are not available.
Test pages should use the HTML5 doctype (<!doctype html>
) unless they specifically cover quirks mode behavior.
Tests should be written under the assumption that they will be upstreamed to the WPT project. For example, tests should follow the WPT guidelines.
Tests should avoid using features that haven't been shipped by the actively-developed major rendering engines (Blink, WebKit, Gecko, Edge). When unsure, check caniuse.com. By necessity, this recommendation does not apply to the feature targeted by the test.
Tests that use Blink-specific testing APIs should feature-test for the presence of the testing APIs and degrade to manual tests when the testing APIs are not present.
Tests must be self-contained and not depend on external network resources.
Unless used by multiple test files, CSS and JavaScript should be inlined using <style>
and <script>
tags. Content shared by multiple tests should be placed in a resources/
directory near the tests that share it. See below for using multiple origins in a test.
Test file names should describe what is being tested.
File names should use snake-case
, but preserve the case of any embedded API names. For example, prefer document-createElement.html
to document-create-element.html
.
Tests should prefer modern features in JavaScript and in the Web Platform, provided that they meet the recommendations above for cross-platform tests.
Tests should use strict mode for all JavaScript, except when specifically testing sloppy mode behavior. Strict mode flags deprecated features and helps catch some errors, such as forgetting to declare variables.
🚧 JavaScript code should prefer const and let over var
, classes over other OOP constructs, and Promises over other mechanisms for structuring asynchronous code.
🚧 Tests should use the UTF-8 character encoding, which should be declared by <meta charset=utf-8>
. This does not apply when specifically testing encodings.
The <meta>
tag must be the first child of the document's <head>
element. In documents that do not have an explicit <head>
, the <meta>
tag must follow the doctype.
When HTML pages do not explicitly declare a character encoding, browsers determine the encoding using an encoding sniffing algorithm that will surprise most modern Web developers. Highlights include a default encoding that depends on the user's locale, and non-standardized browser-specific heuristics.
<meta>
tag. This exception is currently discussed on blink-dev. If taking that route, please keep in mind that Firefox currently issues a development tools warning for pages without a declared encoding.Tests should aim to have a coding style that is consistent with Google's JavaScript Style Guide, and Google's HTML/CSS Style Guide, with the following exceptions.
Whenever possible, the testing criteria should be expressed in JavaScript. The alternatives, which will be described in future sections, result in slower and less reliable tests.
All new JavaScript tests should be written using the testharness.js testing framework. This framework is used by the tests in the web-platform-tests repository, which is shared with all the other browser vendors, so testharness.js
tests are more accessible to browser developers.
As a shared framework, testharness.js
enjoys high-quality documentation, such as a tutorial and API documentation. Layout tests should follow the recommendations of the above documents. Furthermore, layout tests should include relevant metadata. The specification URL (in <link rel="help">
) is almost always relevant, and is incredibly helpful to a developer who needs to understand the test quickly.
Below is a skeleton for a JavaScript test embedded in an HTML page. Note that, in order to follow the minimality guideline, the test omits the tags <html>
, <head>
, and <body>
, as they can be inferred by the HTML parser.
<!doctype html> <meta charset="utf-8"> <title>JavaScript: the true literal</title> <link rel="help" href="https://tc39.github.io/ecma262/#sec-boolean-literals"> <meta name="assert" value="The true literal is equal to itself and immutable"> <script src="/resources/testharness.js"></script> <script src="/resources/testharnessreport.js"></script> <script> 'use strict'; // Synchronous test example. test(() => { const value = true; assert_true(value, 'true literal'); assert_equals(value.toString(), 'true', 'the string representation of true'); }, 'The literal true in a synchronous test case'); // Asynchronous test example. async_test(t => { const originallyTrue = true; setTimeout(t.step_func_done(() => { assert_equals(originallyTrue, true); }), 0); }, 'The literal true in a setTimeout callback'); // Promise test example. promise_test(() => { return new Promise((resolve, reject) => { resolve(true); }).then(value => { assert_true(value); }); }, 'The literal true used to resolve a Promise'); </script>
Some points that are not immediately obvious from the example:
<meta name="assert">
describes the purpose of the entire file, and is not redundant to <title>
. Don't add a <meta name="assert">
when the information in the <title>
is sufficient.assert_
function that compares two values, the first argument is the actual value (produced by the functionality being tested), and the second argument is the expected value (known good, golden). The order is important, because the testing harness relies on it to generate expressive error messages that are relied upon when debugging test failures.assert_
methods) conveys the way the actual value was obtained.<title>
should be sufficient to describe the scenario being tested.async_test
wrapper calls its function with a test case argument that is used to signal when the test case is done, and to connect assertion failures to the correct test.t.done()
must be called after all the test case's assertions have executed.t.step_func()
calls, so that assertion failures and exceptions can be traced back to the correct test case.t.step_func_done()
is a shortcut that combines t.step_func()
with a t.done()
call.file://
origins must currently use relative paths to point to /resources/testharness.js and /resources/testharnessreport.js. This is contrary to the WPT guidelines, which call for absolute paths. This limitation does not apply to the tests in LayoutTests/http
, which rely on an HTTP server, or to the tests in LayoutTests/imported/wpt
, which are imported from the WPT repository.Some tests simply cannot be expressed using the Web Platform APIs. For example, some tests that require a user to perform a gesture, such as a mouse click, cannot be implemented using Web APIs. The WPT project covers some of these cases via supplemental testing APIs.
Tests that cannot be expressed using the Web Platform APIs or WPT's testing APIs use Blink-specific testing APIs. These APIs are only available in content_shell, and should only be used as a last resort.
A downside of Blink-specific APIs is that they are not as well documented as the Web Platform features. Learning to use a Blink-specific feature requires finding other tests that use it, or reading its source code.
For example, the most popular Blink-specific API is testRunner
, which is implemented in components/test_runner/test_runner.h and components/test_runner/test_runner.cpp. By skimming the TestRunnerBindings::Install
method, we learn that the testRunner API is presented by the window.testRunner
and window.layoutTestsController
objects, which are synonyms. Reading the TestRunnerBindings::GetObjectTemplateBuilder
method tells us what properties are available on the window.testRunner
object.
window.testRunner
is the preferred way to access the testRunner
APIs. window.layoutTestsController
is still supported because it is used by 3rd-party tests.testRunner
is the most popular testing API because it is also used indirectly by tests that stick to Web Platform APIs. The testharnessreport.js
file in testharness.js
is specifically designated to hold glue code that connects testharness.js
to the testing environment. Our implementation is in third_party/WebKit/LayoutTests/resources/testharnessreport.js, and uses the testRunner
API.See the components/test_runner/ directory and WebKit's LayoutTests guide for other useful APIs. For example, window.eventSender
(components/test_runner/event_sender.h and components/test_runner/event_sender.cpp) has methods that simulate events input such as keyboard / mouse input and drag-and-drop.
Here is a UML diagram of how the testRunner
bindings fit into Chromium.
🚧 Whenever possible, tests that rely on (WPT‘s or Blink’s) testing APIs should also be usable as manual tests. This makes it easy to debug the test, and to check whether our behavior matches other browsers.
Manual tests should minimize the chance of user error. This implies keeping the manual steps to a minimum, and having simple and clear instructions that describe all the configuration changes and user gestures that match the effect of the Blink-specific APIs used by the test.
Below is an example of a fairly minimal test that uses a Blink-Specific API (window.eventSender
), and gracefully degrades to a manual test.
<!doctype html> <meta charset="utf-8"> <title>DOM: Event.isTrusted for UI events</title> <link rel="help" href="https://dom.spec.whatwg.org/#dom-event-istrusted"> <link rel="help" href="https://dom.spec.whatwg.org/#constructing-events"> <meta name="assert" content="Event.isTrusted is true for events generated by user interaction"> <script src="../../resources/testharness.js"></script> <script src="../../resources/testharnessreport.js"></script> <p>Please click on the button below.</p> <button>Click Me!</button> <script> 'use strict'; setup({ explicit_timeout: true }); promise_test(() => { const button = document.querySelector('button'); return new Promise((resolve, reject) => { const button = document.querySelector('button'); button.addEventListener('click', (event) => { resolve(event); }); if (window.eventSender) { eventSender.mouseMoveTo(button.offsetLeft, button.offsetTop); eventSender.mouseDown(); eventSender.mouseUp(); } }).then((clickEvent) => { assert_true(clickEvent.isTrusted); }); }, 'Click generated by user interaction'); </script>
The test exhibits the following desirable features:
<link rel="help">
), because the paragraph that documents the tested feature (referenced by the primary URL) is not very informative on its own.<p>
) that tells the tester exactly what to do, and the <button>
that needs to be clicked is clearly labeled.testharness.js
by calling setup({ explicit_timeout: true });
window.eventSender
) before invoking them. The test does not automatically fail when the APIs are not present.Notice that the test is pretty heavy compared to a minimal JavaScript test that does not rely on testing APIs. Only use testing APIs when the desired testing conditions cannot be set up using Web Platform APIs.
By default, all the test cases in a file that uses testharness.js
are expected to pass. However, in some cases, we prefer to add failing test cases to the repository, so that we can be notified when the failure modes change (e.g., we want to know if a test starts crashing rather than returning incorrect output). In these situations, a test file will be accompanied by a baseline, which is an -expected.txt
file that contains the test's expected output.
The baselines are generated automatically when appropriate by run-webkit-tests
, which is described here, and by the rebaselining tools.
Text baselines for testharness.js
should be avoided, as having a text baseline associated with a testharness.js
indicates the presence of a bug. For this reason, CLs that add text baselines must include a crbug.com link for an issue tracking the removal of the text expectations.
js-test
harness. This harness is deprecated, and should not be used for new tests.If you need to understand old tests, the best js-test
documentation is its implementation at third_party/WebKit/LayoutTests/resources/js-test.js.
js-test
tests lean heavily on the Blink-specific testRunner
testing API. In a nutshell, the tests call testRunner.dumpAsText()
to signal that the page content should be dumped and compared against a text baseline (an -expected.txt
file). As a consequence, js-test
tests are always accompanied by text baselines. Asynchronous tests also use testRunner.waitUntilDone()
and testRunner.notifyDone()
to tell the testing tools when they are complete.
By default, tests are loaded as if via file:
URLs. Some web platform features require tests served via HTTP or HTTPS, for example absolute paths (src=/foo
) or features restricted to secure protocols.
HTTP tests are those under LayoutTests/http/tests
(or virtual variants). Use a locally running HTTP server (Apache) to run them. Tests are served off of ports 8000 and 8080 for HTTP, and 8443 for HTTPS. If you run the tests using run-webkit-tests
, the server will be started automatically. To run the server manually to reproduce or debug a failure:
cd src/third_party/WebKit/Tools/Scripts run-blink-httpd start
The layout tests will be served from http://127.0.0.1:8000
. For example, to run the test http/tests/serviceworker/chromium/service-worker-allowed.html
, navigate to http://127.0.0.1:8000/serviceworker/chromium/service-worker-allowed.html
. Some tests will behave differently if you go to 127.0.0.1 instead of localhost, so use 127.0.0.1.
To kill the server, run run-blink-httpd --server stop
, or just use taskkill
or the Task Manager on Windows, and killall
or Activity Monitor on MacOS.
The test server sets up an alias to the LayoutTests/resources
directory. In HTTP tests, you can access the testing framework at e.g. src="/resources/testharness.js"
.
TODO: Document wptserve when we are in a position to use it to run layout tests.
Reference tests, also known as reftests, perform a pixel-by-pixel comparison between the rendered image of a test page and the rendered image of a reference page. Most reference tests pass if the two images match, but there are cases where it is useful to have a test pass when the two images do not match.
Reference tests are more difficult to debug than JavaScript tests, and tend to be slower as well. Therefore, they should only be used for functionality that cannot be covered by JavaScript tests.
New reference tests should follow the WPT reftests guidelines. The most important points are summarized below.
<link rel="match">
or <link rel="mismatch">
, depending on whether the test passes when the test image matches or does not match the reference image.testharness.js
.🚧 Our testing infrastructure was designed for the WebKit reftests that Blink has inherited. The consequences are summarized below.
foo
(e.g. foo.html
or foo.svg
),foo-expected
(e.g., foo-expected.html
) if the test passes when the two images match.foo-expected-mismatch
(e.g., foo-expected-mismatch.svg
) if the test passes when the two images do not match.The following example demonstrates a reference test for <ol>
's reversed attribute. The example assumes that the test page is named ol-reversed.html
.
<!doctype html> <meta charset="utf-8"> <link rel="match" href="ol-reversed-expected.html"> <ol reversed> <li>A</li> <li>B</li> <li>C</li> </ol>
The reference page, which must be named ol-reversed-expected.html
, is below.
<!doctype html> <meta charset="utf-8"> <ol> <li value="3">A</li> <li value="2">B</li> <li value="1">C</li> </ol>
testRunner
APIs such as window.testRunner.dumpAsTextWithPixelResults()
and window.testRunner.dumpDragImage()
create an image result that is associated with the test. The image result is compared against an image baseline, which is an -expected.png
file associated with the test, and the test passes if the image result is identical to the baseline, according to a pixel-by-pixel comparison. Tests that have image results (and baselines) are called pixel tests.
Pixel tests should still follow the principles laid out above. Pixel tests pose unique challenges to the desire to have self-describing and cross-platform tests. The WPT test style guidelines contain useful guidance. The most relevant pieces of advice are below.
FAIL
to highlight errors. This does not apply when testing the color red.TODO: Document how to opt out of generating a layout tree when generating pixel results.
window.testRunner.dumpAsTextWithPixelResults()
, the image result will always be 800x600px, because test pages are rendered in an 800x600px viewport. Pixel tests that do not specifically cover scrolling should fit in an 800x600px viewport without creating scrollbars.The following snippet includes the Ahem font in a layout test.
<style> body { font: 10px Ahem; } </style> <script src="/resources/ahem.js"></script>
LayoutTests/http
and LayoutTests/imported/wpt
currently need to use a relative path to /third_party/WebKit/LayoutTests/resources/ahem.jsA layout test does not actually draw frames of output until the test exits. Tests that need to generate a painted frame can use window.testRunner.displayAsyncThen
, which will run the machinery to put up a frame, then call the passed callback. There is also a library at fast/repaint/resources/text-based-repaint.js
to help with writing paint invalidation and repaint tests.
A layout tree test renders a web page and produces up to two results, which are compared against baseline files:
-expected.txt
text baseline.-expected.png
image baseline, using the same method as pixel tests.Whether you want a pixel test or a layout tree test depends on whether you care about the visual image, the details of how that image was constructed, or both. It is possible for multiple layout trees to produce the same pixel output, so it is important to make it clear in the test which outputs you really care about.
TODO: Document the API used by layout tree tests to opt out of producing image results.
A layout tree test passes if all of its results match their baselines. Like pixel tests, the output of layout tree tests depends on platform-specific details, so layout tree tests often require per-platform baselines. Furthermore, since the tests obviously depend on the layout tree structure, that means that if we change the layout tree you have to rebaseline each layout tree test to see if the results are still correct and whether the test is still meaningful. There are actually many cases where the layout tree output is misstated (i.e., wrong), because people didn't want to have to update existing baselines and tests. This is really unfortunate and confusing.
For these reasons, layout tree tests should only be used to cover aspects of the layout code that can only be tested by looking at the layout tree. Any combination of the other test types is preferable to a layout tree test. Layout tree tests are inherited from WebKit, so the repository may have some unfortunate examples of layout tree tests.
The following page is an example of a layout tree test.
<!doctype html> <meta charset="utf-8"> <style> body { font: 10px Ahem; } span::after { content: "pass"; color: green; } </style> <script src="/resources/ahem.js"></script> <p><span>Pass if a green PASS appears to the right: </span></p>
The most important aspects of the example are that the test page does not include a testing framework, and that it follows the guidelines for pixel tests. The test page produces the text result below.
layer at (0,0) size 800x600 LayoutView at (0,0) size 800x600 layer at (0,0) size 800x30 LayoutBlockFlow {HTML} at (0,0) size 800x30 LayoutBlockFlow {BODY} at (8,10) size 784x10 LayoutBlockFlow {P} at (0,0) size 784x10 LayoutInline {SPAN} at (0,0) size 470x10 LayoutText {#text} at (0,0) size 430x10 text run at (0,0) width 430: "Pass if a green PASS appears to the right: " LayoutInline {<pseudo:after>} at (0,0) size 40x10 [color=#008000] LayoutTextFragment (anonymous) at (430,0) size 40x10 text run at (430,0) width 40: "pass"
Notice that the test result above depends on the size of the <p>
text. The test page uses the Ahem font (introduced above), whose main design goal is consistent cross-platform rendering. Had the test used another font, its text baseline would have depended on the fonts installed on the testing computer, and on the platform's font rendering system. Please follow the pixel tests guidelines and write reliable layout tree tests!
WebKit‘s layout tree is described in a series of posts on WebKit’s blog. Some of the concepts there still apply to Blink's layout tree.
The LayoutTests directory currently lacks a strict, formal structure. The following directories have special meaning:
http/
directory hosts tests that require an HTTP server (see above).resources/
subdirectory in every directory contains binary files, such as media files, and code that is shared by multiple test files.resources/
. Please do not use this pattern for new tests, as it goes against the minimality principle. JavaScript and CSS files should only live in resources/
if they are shared by at least two test files.