The JSON Test Results Format is a generic file format we use to record the results of each individual test in test run (whether the test is run on a bot, or run locally).
We use these files on the bots in order to determine whether a test step had any failing tests (using a separate file means that we don't need to parse the output of the test run, and hence the test can be tailored for human readability as a result). We also upload the test results to dashboards like the Flakiness Dashboard.
The test format originated with the Blink layout tests, but has since been adopted by GTest-based tests and Python unittest-based tests, so we've standardized on it for anything related to tracking test flakiness.
Here's a very simple example for one Python test:
% python mojo/tools/run_mojo_python_tests.py --write-full-results-to results.json mojom_tests.parse.ast_unittest.ASTTest.testNodeBase Running Python unit tests under mojo/public/tools/bindings/pylib ... . ---------------------------------------------------------------------- Ran 1 test in 0.000s OK % cat results.json { "tests": { "mojom_tests": { "parse": { "ast_unittest": { "ASTTest": { "testNodeBase": { "expected": "PASS", "actual": "PASS", "artifacts": { "screenshot": ["screenshots/page.png"], } } } } } } }, "interrupted": false, "path_delimiter": ".", "version": 3, "seconds_since_epoch": 1406662283.764424, "num_failures_by_type": { "FAIL": 0, "PASS": 1 }, "artifact_types": { "screenshot": "image/png" } }
As you can see, the format consists of a one top level dictionary containing a set of metadata fields describing the test run, plus a single tests
key that contains the results of every test run, structured in a hierarchical trie format to reduce duplication of test suite names (as you can see from the deeply hierarchical Python test name).
The file is strictly JSON-compliant. As a part of this, the order the name appear in each object is unimportant.
Field Name | Data Type | Description |
---|---|---|
interrupted | boolean | Required. Whether the test run was interrupted and terminated early (either via the runner bailing out or the user hitting ctrl-C, etc.) If true, this indicates that not all of the tests in the suite were run and the results are at best incomplete and possibly totally invalid. |
num_failures_by_type | dict | Required. A summary of the totals of each result type. If a test was run more than once, only the first invocation's result is included in the totals. Each key is one of the result types listed below. A missing result type is the same as being present and set to zero (0). |
path_delimiter | string | Optional, will be mandatory. The separator string to use in between components of a tests name; normally “.” for GTest- and Python-based tests and “/” for layout tests; if not present, you should default to “/” for backwards-compatibility. |
seconds_since_epoch | float | Required. The start time of the test run expressed as a floating-point offset in seconds from the UNIX epoch. |
tests | dict | Required. The actual trie of test results. Each directory or module component in the test name is a node in the trie, and the leaf contains the dict of per-test fields as described below. |
version | integer | Required. Version of the file format. Current version is 3. |
artifact_types | dict | Optional. Required if any artifacts are present for any tests. MIME Type information for artifacts in this json file. All artifacts with the same name must share the same MIME type. |
artifact_permament_location | string | Optional. The URI of the root location where the artifacts are stored. If present, any artifact locations are taken to be relative to this location. Currently only the gs:// scheme is supported. |
build_number | string | Optional. If this test run was produced on a bot, this should be the build number of the run, e.g., “1234”. |
builder_name | string | Optional. If this test run was produced on a bot, this should be the builder name of the bot, e.g., “Linux Tests”. |
chromium_revision | string | Optional. The revision of the current Chromium checkout, if relevant, e.g. “356123”. |
has_pretty_patch | bool | Optional, layout test specific, deprecated. Whether the layout tests' output contains PrettyDiff-formatted diffs for test failures. |
has_wdiff | bool | Optional, layout test specific, deprecated. Whether the layout tests' output contains wdiff-formatted diffs for test failures. |
layout_tests_dir | string | Optional, layout test specific. Path to the LayoutTests directory for the test run (used so that we can link to the tests used in the run). |
pixel_tests_enabled | bool | Optional, layout test specific. Whether the layout tests' were run with the --pixel-tests flag. |
fixable | integer | Optional, deprecated. The number of tests that were run but were expected to fail. |
num_flaky | integer | Optional, deprecated. The number of tests that were run more than once and produced different results each time. |
num_passes | integer | Optional, deprecated. The number of successful tests; equivalent to num_failures_by_type["Pass"] |
num_regressions | integer | Optional, deprecated. The number of tests that produced results that were unexpected failures. |
skips | integer | Optional, deprecated. The number of tests that were found but not run (tests should be listed in the trie with “expected” and “actual” values of SKIP . |
Each leaf of the tests
trie contains a dict containing the results of a particular test name. If a test is run multiple times, the dict contains the results for each invocation in the actual
field.
Field Name | Data Type | Description |
---|---|---|
actual | string | Required. An ordered space-separated list of the results the test actually produced. FAIL PASS means that a test was run twice, failed the first time, and then passed when it was retried. If a test produces multiple different results, then it was actually flaky during the run. |
expected | string | Required. An unordered space-separated list of the result types expected for the test, e.g. FAIL PASS means that a test is expected to either pass or fail. A test that contains multiple values is expected to be flaky. |
artifacts | dict | Optional. A dictionary describing test artifacts generated by the execution of the test. The dictionary maps the name of the artifact (screenshot , crash_log ) to a list of relative locations of the artifact (screenshot/page.png , logs/crash.txt ). Any ‘/’ characters in the file paths are meant to be platform agnostic; tools will replace them with the appropriate per platform path separators. There is one entry in the list per test execution. If artifact_permanent_location is specified, then this location is relative to that path. Otherwise, the path is assumed to be relative to the location of the json file which contains this. |
bugs | string | Optional. A comma-separated list of URLs to bug database entries associated with each test. |
is_unexpected | bool | Optional. If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored. |
time | float | Optional. If present, the time it took in seconds to execute the first invocation of the test. |
times | array of floats | Optional. If present, the times in seconds of each invocation of the test. |
has_repaint_overlay | bool | Optional, layout test specific. If present and true, indicates that the test output contains the data needed to draw repaint overlays to help explain the results (only used in layout tests). |
is_missing_audio | bool | Optional, layout test specific. If present and true, the test was supposed to have an audio baseline to compare against, and we didn't find one. |
is_missing_text | bool | Optional, layout test specific. If present and true, the test was supposed to have a text baseline to compare against, and we didn't find one. |
is_missing_video | bool | Optional, layout test specific. If present and true, the test was supposed to have an image baseline to compare against and we didn't find one. |
is_testharness_test | bool | Optional, layout test specific. If present, indicates that the layout test was written using the w3c‘s test harness and we don’t necessarily have any baselines to compare against. |
reftest_type | string | Optional, layout test specific. If present, one of == or != to indicate that the test is a “reference test” and the results were expected to match the reference or not match the reference, respectively (only used in layout tests). |
Any test may fail in one of several different ways. There are a few generic types of failures, and the layout tests contain a few additional specialized failure types.
Result type | Description |
---|---|
SKIP | The test was not run. |
PASS | The test ran as expected. |
FAIL | The test did not run as expected. |
CRASH | The test runner crashed during the test. |
TIMEOUT | The test hung (did not complete) and was aborted. |
MISSING | Layout test specific. The test completed but we could not find an expected baseline to compare against |
LEAK | Layout test specific. Memory leaks were detected during the test execution. |
SLOW | Layout test specific. The test is expected to take longer than normal to run. |
TEXT | Layout test specific, deprecated. The test is expected to produce a text-only failure (the image, if present, will match). Normally you will see FAIL instead. |
AUDIO | Layout test specific, deprecated. The test is expected to produce audio output that doesn't match the expected result. Normally you will see FAIL instead. |
IMAGE | Layout test specific. The test produces image (and possibly text output). The image output doesn‘t match what we’d expect, but the text output, if present, does. |
IMAGE+TEXT | Layout test specific, deprecated. The test produces image and text output, both of which fail to match what we expect. Normally you will see FAIL instead. |
REBASELINE | Layout test specific. The expected test result is out of date and will be ignored (any result other than a crash or timeout will be considered as passing). This test result should only ever show up on local test runs, not on bots (it is forbidden to check in a TestExpectations file with this expectation). This should never show up as an “actual” result. |
NEEDSREBASELINE | Layout test specific. The expected test result is out of date and will be ignored (as above); the auto-rebaseline-bot will look for tests of this type and automatically update them. This should never show up as an “actual” result. |
NEEDSMANUALREBASELINE | Layout test specific. The expected test result is out of date and will be ignored (as above). This result may be checked in to the TestExpectations file, but the auto-rebasline-bot will ignore these entries. This should never show up as an “actual” result. |
The layout tests produce two different variants of the above file. The full_results.json
file matches the above definition and contains every test executed in the run. The failing_results.json
file contains just the tests that produced unexpected results, so it is a subset of the full_results.json
data. The failing_results.json
file is also in the JSONP format, so it can be read via as a <script>
tag from an html file run from the local filesystem without falling prey to the same-origin restrictions for local files. The failing_results.json
file is converted into JSONP by containing the JSON data preceded by the string ADD_RESULTS(
and followed by the string );
, so you can extract the JSON data by stripping off that prefix and suffix.