blob: 5b8261451b3ee868895ccb3950ea981bb07a0d8f [file] [log] [blame] [view]
# The JSON Test Results Format
The JSON Test Results Format is a generic file format we use to record the
results of each individual test in test run (whether the test is run on a bot,
or run locally).
[TOC]
## Introduction
We use these files on the bots in order to determine whether a test step had
any failing tests (using a separate file means that we don't need to parse the
output of the test run, and hence the test can be tailored for human readability
as a result). We also upload the test results to dashboards like the
[Flakiness Dashboard](http://test-results.appspot.com).
The test format originated with the Blink layout tests, but has since been
adopted by GTest-based tests and Python unittest-based tests, so we've
standardized on it for anything related to tracking test flakiness.
### Example
Here's a very simple example for one Python test:
% python mojo/tools/run_mojo_python_tests.py --write-full-results-to results.json mojom_tests.parse.ast_unittest.ASTTest.testNodeBase
Running Python unit tests under mojo/public/tools/bindings/pylib ...
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
% cat results.json
{
"tests": {
"mojom_tests": {
"parse": {
"ast_unittest": {
"ASTTest": {
"testNodeBase": {
"expected": "PASS",
"actual": "PASS",
"artifacts": {
"screenshot": ["screenshots/page.png"],
}
}
}
}
}
}
},
"interrupted": false,
"path_delimiter": ".",
"version": 3,
"seconds_since_epoch": 1406662283.764424,
"num_failures_by_type": {
"FAIL": 0,
"PASS": 1
},
"artifact_types": {
"screenshot": "image/png"
}
}
As you can see, the format consists of a one top level dictionary containing a
set of metadata fields describing the test run, plus a single `tests` key that
contains the results of every test run, structured in a hierarchical trie format
to reduce duplication of test suite names (as you can see from the deeply
hierarchical Python test name).
The file is strictly JSON-compliant. As a part of this, the order the name
appear in each object is unimportant.
## Top-level field names
| Field Name | Data Type | Description |
|------------|-----------|-------------|
| `interrupted` | boolean | **Required.** Whether the test run was interrupted and terminated early (either via the runner bailing out or the user hitting ctrl-C, etc.) If true, this indicates that not all of the tests in the suite were run and the results are at best incomplete and possibly totally invalid. |
| `num_failures_by_type` | dict | **Required.** A summary of the totals of each result type. If a test was run more than once, only the first invocation's result is included in the totals. Each key is one of the result types listed below. A missing result type is the same as being present and set to zero (0). |
| `path_delimiter` | string | **Optional, will be mandatory.** The separator string to use in between components of a tests name; normally "." for GTest- and Python-based tests and "/" for layout tests; if not present, you should default to "/" for backwards-compatibility. |
| `seconds_since_epoch` | float | **Required.** The start time of the test run expressed as a floating-point offset in seconds from the UNIX epoch. |
| `tests` | dict | **Required.** The actual trie of test results. Each directory or module component in the test name is a node in the trie, and the leaf contains the dict of per-test fields as described below. |
| `version` | integer | **Required.** Version of the file format. Current version is 3. |
| `artifact_types` | dict | **Optional. Required if any artifacts are present for any tests.** MIME Type information for artifacts in this json file. All artifacts with the same name must share the same MIME type. |
| `artifact_permament_location` | string | **Optional.** The URI of the root location where the artifacts are stored. If present, any artifact locations are taken to be relative to this location. Currently only the `gs://` scheme is supported. |
| `build_number` | string | **Optional.** If this test run was produced on a bot, this should be the build number of the run, e.g., "1234". |
| `builder_name` | string | **Optional.** If this test run was produced on a bot, this should be the builder name of the bot, e.g., "Linux Tests". |
| `chromium_revision` | string | **Optional.** The revision of the current Chromium checkout, if relevant, e.g. "356123". |
| `has_pretty_patch` | bool | **Optional, layout test specific, deprecated.** Whether the layout tests' output contains PrettyDiff-formatted diffs for test failures. |
| `has_wdiff` | bool | **Optional, layout test specific, deprecated.** Whether the layout tests' output contains wdiff-formatted diffs for test failures. |
| `layout_tests_dir` | string | **Optional, layout test specific.** Path to the LayoutTests directory for the test run (used so that we can link to the tests used in the run). |
| `pixel_tests_enabled` | bool | **Optional, layout test specific.** Whether the layout tests' were run with the --pixel-tests flag. |
| `fixable` | integer | **Optional, deprecated.** The number of tests that were run but were expected to fail. |
| `num_flaky` | integer | **Optional, deprecated.** The number of tests that were run more than once and produced different results each time. |
| `num_passes` | integer | **Optional, deprecated.** The number of successful tests; equivalent to `num_failures_by_type["Pass"]` |
| `num_regressions` | integer | **Optional, deprecated.** The number of tests that produced results that were unexpected failures. |
| `skips` | integer | **Optional, deprecated.** The number of tests that were found but not run (tests should be listed in the trie with "expected" and "actual" values of `SKIP`. |
## Per-test fields
Each leaf of the `tests` trie contains a dict containing the results of a
particular test name. If a test is run multiple times, the dict contains the
results for each invocation in the `actual` field.
| Field Name | Data Type | Description |
|-------------|-----------|-------------|
| `actual` | string | **Required.** An ordered space-separated list of the results the test actually produced. `FAIL PASS` means that a test was run twice, failed the first time, and then passed when it was retried. If a test produces multiple different results, then it was actually flaky during the run. |
| `expected` | string | **Required.** An unordered space-separated list of the result types expected for the test, e.g. `FAIL PASS` means that a test is expected to either pass or fail. A test that contains multiple values is expected to be flaky. |
| `artifacts` | dict | **Optional.** A dictionary describing test artifacts generated by the execution of the test. The dictionary maps the name of the artifact (`screenshot`, `crash_log`) to a list of relative locations of the artifact (`screenshot/page.png`, `logs/crash.txt`). Any '/' characters in the file paths are meant to be platform agnostic; tools will replace them with the appropriate per platform path separators. There is one entry in the list per test execution. If `artifact_permanent_location` is specified, then this location is relative to that path. Otherwise, the path is assumed to be relative to the location of the json file which contains this. |
| `bugs` | string | **Optional.** A comma-separated list of URLs to bug database entries associated with each test. |
| `is_unexpected` | bool | **Optional.** If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored. |
| `time` | float | **Optional.** If present, the time it took in seconds to execute the first invocation of the test. |
| `times` | array of floats | **Optional.** If present, the times in seconds of each invocation of the test. |
| `has_repaint_overlay` | bool | **Optional, layout test specific.** If present and true, indicates that the test output contains the data needed to draw repaint overlays to help explain the results (only used in layout tests). |
| `is_missing_audio` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have an audio baseline to compare against, and we didn't find one. |
| `is_missing_text` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have a text baseline to compare against, and we didn't find one. |
| `is_missing_video` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have an image baseline to compare against and we didn't find one. |
| `is_testharness_test` | bool | **Optional, layout test specific.** If present, indicates that the layout test was written using the w3c's test harness and we don't necessarily have any baselines to compare against. |
| `reftest_type` | string | **Optional, layout test specific.** If present, one of `==` or `!=` to indicate that the test is a "reference test" and the results were expected to match the reference or not match the reference, respectively (only used in layout tests). |
## Test result types
Any test may fail in one of several different ways. There are a few generic
types of failures, and the layout tests contain a few additional specialized
failure types.
| Result type | Description |
|--------------|-------------|
| `SKIP` | The test was not run. |
| `PASS` | The test ran as expected. |
| `FAIL` | The test did not run as expected. |
| `CRASH` | The test runner crashed during the test. |
| `TIMEOUT` | The test hung (did not complete) and was aborted. |
| `MISSING` | **Layout test specific.** The test completed but we could not find an expected baseline to compare against |
| `LEAK` | **Layout test specific.** Memory leaks were detected during the test execution. |
| `SLOW` | **Layout test specific.** The test is expected to take longer than normal to run. |
| `TEXT` | **Layout test specific, deprecated.** The test is expected to produce a text-only failure (the image, if present, will match). Normally you will see `FAIL` instead. |
| `AUDIO` | **Layout test specific, deprecated.** The test is expected to produce audio output that doesn't match the expected result. Normally you will see `FAIL` instead. |
| `IMAGE` | **Layout test specific.** The test produces image (and possibly text output). The image output doesn't match what we'd expect, but the text output, if present, does. |
| `IMAGE+TEXT` | **Layout test specific, deprecated.** The test produces image and text output, both of which fail to match what we expect. Normally you will see `FAIL` instead. |
| `REBASELINE` | **Layout test specific.** The expected test result is out of date and will be ignored (any result other than a crash or timeout will be considered as passing). This test result should only ever show up on local test runs, not on bots (it is forbidden to check in a TestExpectations file with this expectation). This should never show up as an "actual" result. |
| `NEEDSREBASELINE` | **Layout test specific.** The expected test result is out of date and will be ignored (as above); the auto-rebaseline-bot will look for tests of this type and automatically update them. This should never show up as an "actual" result. |
| `NEEDSMANUALREBASELINE` | **Layout test specific.** The expected test result is out of date and will be ignored (as above). This result may be checked in to the TestExpectations file, but the auto-rebasline-bot will ignore these entries. This should never show up as an "actual" result. |
## "full_results.json" and "failing_results.json"
The layout tests produce two different variants of the above file. The
`full_results.json` file matches the above definition and contains every test
executed in the run. The `failing_results.json` file contains just the tests
that produced unexpected results, so it is a subset of the `full_results.json`
data. The `failing_results.json` file is also in the JSONP format, so it can be
read via as a `<script>` tag from an html file run from the local filesystem
without falling prey to the same-origin restrictions for local files. The
`failing_results.json` file is converted into JSONP by containing the JSON data
preceded by the string `ADD_RESULTS(` and followed by the string `);`, so you
can extract the JSON data by stripping off that prefix and suffix.