Web Test Baseline Fallback

Read Web Test Expectations and Baselines first if you have not.

Baselines can vary by platforms, in which case we need to check in multiple versions of a baseline. Meanwhile, we would like to avoid storing identical baselines by allowing a platform to fall back to another. This document first introduces how platform-specific baselines are structured and how we search for a baseline (the fallback mechanism), and then goes into the details of baseline optimization and rebaselining.

Terminology

  • Root directory: //src/third_party/blink/web_tests is the root directory (of all the web tests and baselines). All relative paths in this document start from this directory.
  • Test name: the name of a test is its relative path from the root directory (e.g. html/dom/foo/bar.html).
  • Baseline name: replacing the extension of a test name with -expected.{txt,png,wav} gives the corresponding baseline name.
  • Virtual tests: tests can have virtual variants. For example, virtual/gpu/html/dom/foo/bar.html is the virtual variant of html/dom/foo/bar.html in the gpu suite. Only the latter file exists on disk, and is called the base of the virtual test. See Web Tests#Testing Runtime Flags for more details.
  • Platform directory: each directory under platform/ is a platform directory that contains baselines (no tests) for that platform. Directory names are in the form of PLATFORM-VERSION (e.g. mac-mac10.12), except for the latest version of a platform which is just PLATFORM (e.g. mac).

Baseline fallback

Each platform has a pre-configured fallback when a baseline cannot be found in this platform directory. A general rule is to have older versions of an OS falling back to newer versions. Besides, Android falls back to Linux, which then falls back to Windows. Eventually, all platforms fall back to the root directory (i.e. the generic baselines that live alongside tests). The rules are configured by FALLBACK_PATHS in each Port class in //src/third_party/blink/tools/blinkpy/web_tests/port.

All platforms can be organized into a tree based on their fallback relations (we are not considering virtual test suites yet). See the lower half (the non-virtual subtree) of this graph. Walking from a platform to the root gives the search path of that platform. We check each directory on the search path in order and see if “directory + baseline name” points to a file on disk (note that baseline names are relative paths), and stop at the first one found.

Virtual test suites

Now we add virtual test suites to the picture, using a test named virtual/gpu/html/dom/foo/bar.html as an example to demonstrate the process. The baseline search process for a virtual test consists of two passes:

  1. Treat the virtual test name as a regular test name and search for the corresponding baseline name using the same search path, which means we are in fact searching in directories like platform/*/virtual/gpu/..., and eventually virtual/gpu/... (a.k.a. the virtual root).
  2. If no baseline can be found so far, we retry with the non-virtual (base) test name html/dom/foo/bar.html and walk the search path again.

The graph visualizes the full picture. Note that the two passes are in fact the same with different test names, so the virtual subtree is a mirror of the non-virtual subtree. The two trees are connected by the virtual root that has different ancestors (fallbacks) depending on which platform we start from; this is the result of the two-pass baseline search.

Note: there are in fact two more places to be searched before everything else: additional directories given via command line arguments and flag-specific baseline directories. They are maintained manually and are not discussed in this document.

Tooling implementation

This section describes the implications the fallback mechanism has on the implementation details of tooling, namely blink_tool.py. If you are not hacking blinkpy, you can stop here.

Optimization

We can remove a baseline if it is the same as its fallback. An extreme example is that if all platforms have the same result, we can just have a single generic baseline. Here is the algorithm used by blink_tool.py optimize-baselines to optimize the duplication away.

Notice from the previous section that the virtual and non-virtual parts are two identically structured subtrees. Trees are easy to work with: we can simply traverse the tree from leaves up to the root, and if there are two identical baselines on two nodes on the path with no other nodes in between or all nodes in between have no baselines, keep the one closer to the root (delete the baseline on the node further from the root).

The virtual root is special because it has multiple parents. Yet if we can cut the edges between the two subtrees (i.e. to make the virtual subtree self-contained), we can apply the same algorithm to both of them. A subtree is self-contained when it does not need to fallback to ancestors, which can be guaranteed by placing a baseline on its root. If the virtual root already has a baseline, we can simply ignore these edges without doing anything; otherwise, we need to make sure all children of the virtual root have baselines by copying the non-virtual fallbacks to the ones that do not (we cannot copy the generic baseline to the virtual root because virtual platforms may have different results).

In addition, the optimizer also removes redundant all-PASS testharness.js results. Such baselines are redundant when there are no other fallbacks later on the search path (including if the all-PASS baselines are at root), because run_web_tests.py assumes all-PASS testharness.js results when baselines can not be found for a platform.

Rebaseline

The fallback mechanism also affects the rebaseline tool (blink_tool.py rebaseline{-cl}). When asked to rebaseline a test on some platforms, the tool downloads results from corresponding try bots and put them into the respective platform directories. This is potentially problematic. Because of the fallback mechanism, the new baselines may affect some other platforms that are not being rebaselining but fall back to the rebaselined platforms.

The solution is to copy the current baselines from the to-be-rebaselined platforms to all the platforms that immediately fall back to them (i.e. down one level in the fallback tree) before downloading new baselines. This is done in a hidden internal command blink_tool.py copy-existing-baselines, which is always executed by blink_tool.py rebaseline.

Finally, blink_tool.py rebaseline{-cl} also does optimization in the end by default.