docs/testing/json_test_results_format.md - chromium/src.git - Git at Google

 # The JSON Test Results Format

 The JSON Test Results Format is a generic file format we use to record the
 results of each individual test in test run (whether the test is run on a bot,
 or run locally).

 [TOC]

 ## Introduction

 We use these files on the bots in order to determine whether a test step had
 any failing tests (using a separate file means that we don't need to parse the
 output of the test run, and hence the test can be tailored for human readability
 as a result). We also upload the test results to dashboards like the
 [Flakiness Dashboard](http://test-results.appspot.com).

 The test format originated with the Blink layout tests, but has since been
 adopted by GTest-based tests and Python unittest-based tests, so we've
 standardized on it for anything related to tracking test flakiness.

 ### Example

 Here's a very simple example for one Python test:

     % python mojo/tools/run_mojo_python_tests.py --write-full-results-to results.json mojom_tests.parse.ast_unittest.ASTTest.testNodeBase
     Running Python unit tests under mojo/public/tools/bindings/pylib ...
     .
     ----------------------------------------------------------------------
     Ran 1 test in 0.000s

     OK
     % cat results.json
     {
       "tests": {
         "mojom_tests": {
           "parse": {
             "ast_unittest": {
               "ASTTest": {
                 "testNodeBase": {
                   "expected": "PASS",
                   "actual": "PASS",
                   "artifacts": {
                     "screenshot": ["screenshots/page.png"],
                   }
                 }
               }
             }
           }
         }
       },
       "interrupted": false,
       "path_delimiter": ".",
       "version": 3,
       "seconds_since_epoch": 1406662283.764424,
       "num_failures_by_type": {
         "FAIL": 0,
         "PASS": 1
       },
       "artifact_types": {
         "screenshot": "image/png"
       }
     }


 As you can see, the format consists of a one top level dictionary containing a
 set of metadata fields describing the test run, plus a single `tests` key that
 contains the results of every test run, structured in a hierarchical trie format
 to reduce duplication of test suite names (as you can see from the deeply
 hierarchical Python test name).

 The file is strictly JSON-compliant. As a part of this, the order the name
 appear in each object is unimportant.

 ## Top-level field names

 | Field Name | Data Type | Description |
 |------------|-----------|-------------|
 | `interrupted` | boolean | **Required.** Whether the test run was interrupted and terminated early (either via the runner bailing out or the user hitting ctrl-C, etc.) If true, this indicates that not all of the tests in the suite were run and the results are at best incomplete and possibly totally invalid. |
 | `num_failures_by_type` |  dict | **Required.** A summary of the totals of each result type. If a test was run more than once, only the first invocation's result is included in the totals. Each key is one of the result types listed below. A missing result type is the same as being present and set to zero (0). |
 | `path_delimiter` | string | **Optional, will be mandatory.** The separator string to use in between components of a tests name; normally "." for GTest- and Python-based tests and "/" for layout tests; if not present, you should default to "/" for backwards-compatibility.  |
 | `seconds_since_epoch` | float | **Required.** The start time of the test run expressed as a floating-point offset in seconds from the UNIX epoch. |
 | `tests` | dict | **Required.** The actual trie of test results. Each directory or module component in the test name is a node in the trie, and the leaf contains the dict of per-test fields as described below. |
 | `version` | integer | **Required.** Version of the file format. Current version is 3. |
 | `artifact_types` | dict | **Optional. Required if any artifacts are present for any tests.** MIME Type information for artifacts in this json file. All artifacts with the same name must share the same MIME type.  |
 | `artifact_permament_location` | string | **Optional.** The URI of the root location where the artifacts are stored. If present, any artifact locations are taken to be relative to this location. Currently only the `gs://` scheme is supported. |
 | `build_number` | string | **Optional.** If this test run was produced on a bot, this should be the build number of the run, e.g., "1234". |
 | `builder_name` | string | **Optional.** If this test run was produced on a bot, this should be the builder name of the bot, e.g., "Linux Tests". |
 | `chromium_revision` | string | **Optional.** The revision of the current Chromium checkout, if relevant, e.g. "356123". |
 | `has_pretty_patch` | bool | **Optional, layout test specific, deprecated.** Whether the layout tests' output contains PrettyDiff-formatted diffs for test failures. |
 | `has_wdiff` | bool | **Optional, layout test specific, deprecated.** Whether the layout tests' output contains wdiff-formatted diffs for test failures. |
 | `layout_tests_dir` | string | **Optional, layout test specific.** Path to the LayoutTests directory for the test run (used so that we can link to the tests used in the run). |
 | `pixel_tests_enabled` | bool | **Optional, layout test specific.** Whether the layout tests' were run with the --pixel-tests flag.  |
 | `fixable` | integer | **Optional, deprecated.** The number of tests that were run but were expected to fail. |
 | `num_flaky` | integer | **Optional, deprecated.** The number of tests that were run more than once and produced different results each time. |
 | `num_passes` | integer | **Optional, deprecated.** The number of successful tests; equivalent to `num_failures_by_type["Pass"]` |
 | `num_regressions` | integer | **Optional, deprecated.** The number of tests that produced results that were unexpected failures. |
 | `skips` | integer | **Optional, deprecated.** The number of tests that were found but not run (tests should be listed in the trie with "expected" and "actual" values of `SKIP`. |

 ## Per-test fields

 Each leaf of the `tests` trie contains a dict containing the results of a
 particular test name. If a test is run multiple times, the dict contains the
 results for each invocation in the `actual` field.

 |  Field Name | Data Type | Description |
 |-------------|-----------|-------------|
 |  `actual` | string | **Required.** An ordered space-separated list of the results the test actually produced. `FAIL PASS` means that a test was run twice, failed the first time, and then passed when it was retried. If a test produces multiple different results, then it was actually flaky during the run. |
 |  `expected` | string | **Required.** An unordered space-separated list of the result types expected for the test, e.g. `FAIL PASS` means that a test is expected to either pass or fail. A test that contains multiple values is expected to be flaky. |
 |  `artifacts` | dict | **Optional.** A dictionary describing test artifacts generated by the execution of the test. The dictionary maps the name of the artifact (`screenshot`, `crash_log`) to a list of relative locations of the artifact (`screenshot/page.png`, `logs/crash.txt`). Any '/' characters in the file paths are meant to be platform agnostic; tools will replace them with the appropriate per platform path separators. There is one entry in the list per test execution. If `artifact_permanent_location` is specified, then this location is relative to that path. Otherwise, the path is assumed to be relative to the location of the json file which contains this. |
 |  `bugs` | string | **Optional.** A comma-separated list of URLs to bug database entries associated with each test. |
 |  `is_unexpected` | bool | **Optional.** If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored. |
 |  `time` | float | **Optional.** If present, the time it took in seconds to execute the first invocation of the test. |
 |  `times` | array of floats | **Optional.** If present, the times in seconds of each invocation of the test. |
 |  `has_repaint_overlay` | bool | **Optional, layout test specific.** If present and true, indicates that the test output contains the data needed to draw repaint overlays to help explain the results (only used in layout tests). |
 |  `is_missing_audio` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have an audio baseline to compare against, and we didn't find one. |
 |  `is_missing_text` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have a text baseline to compare against, and we didn't find one.  |
 |  `is_missing_video` | bool | **Optional, layout test specific.** If present and true, the test was supposed to have an image baseline to compare against and we didn't find one. |
 |  `is_testharness_test` | bool | **Optional, layout test specific.** If present, indicates that the layout test was written using the w3c's test harness and we don't necessarily have any baselines to compare against. |
 |  `reftest_type` | string | **Optional, layout test specific.** If present, one of `==` or `!=` to indicate that the test is a "reference test" and the results were expected to match the reference or not match the reference, respectively (only used in layout tests). |

 ## Test result types

 Any test may fail in one of several different ways. There are a few generic
 types of failures, and the layout tests contain a few additional specialized
 failure types.

 |  Result type | Description |
 |--------------|-------------|
 |  `SKIP` | The test was not run. |
 |  `PASS` | The test ran as expected. |
 |  `FAIL` | The test did not run as expected. |
 |  `CRASH` | The test runner crashed during the test. |
 |  `TIMEOUT` | The test hung (did not complete) and was aborted. |
 |  `MISSING` | **Layout test specific.** The test completed but we could not find an expected baseline to compare against |
 |  `LEAK` | **Layout test specific.** Memory leaks were detected during the test execution. |
 |  `SLOW` | **Layout test specific.** The test is expected to take longer than normal to run. |
 |  `TEXT` | **Layout test specific, deprecated.** The test is expected to produce a text-only failure (the image, if present, will match). Normally you will see `FAIL` instead. |
 |  `AUDIO` | **Layout test specific, deprecated.** The test is expected to produce audio output that doesn't match the expected result. Normally you will see `FAIL` instead. |
 |  `IMAGE` | **Layout test specific.** The test produces image (and possibly text output). The image output doesn't match what we'd expect, but the text output, if present, does. |
 |  `IMAGE+TEXT` | **Layout test specific, deprecated.** The test produces image and text output, both of which fail to match what we expect. Normally you will see `FAIL` instead. |
 |  `REBASELINE`  | **Layout test specific.** The expected test result is out of date and will be ignored (any result other than a crash or timeout will be considered as passing). This test result should only ever show up on local test runs, not on bots (it is forbidden to check in a TestExpectations file with this expectation). This should never show up as an "actual" result. |
 |  `NEEDSREBASELINE` | **Layout test specific.** The expected test result is out of date and will be ignored (as above); the auto-rebaseline-bot will look for tests of this type and automatically update them. This should never show up as an "actual" result. |
 |  `NEEDSMANUALREBASELINE`  | **Layout test specific.** The expected test result is out of date and will be ignored (as above). This result may be checked in to the TestExpectations file, but the auto-rebasline-bot will ignore these entries. This should never show up as an "actual" result. |

 ## "full_results.json" and "failing_results.json"

 The layout tests produce two different variants of the above file. The
 `full_results.json` file matches the above definition and contains every test
 executed in the run. The `failing_results.json` file contains just the tests
 that produced unexpected results, so it is a subset of the `full_results.json`
 data. The `failing_results.json` file is also in the JSONP format, so it can be
 read via as a `<script>` tag from an html file run from the local filesystem
 without falling prey to the same-origin restrictions for local files.  The
 `failing_results.json` file is converted into JSONP by containing the JSON data
 preceded by the string `ADD_RESULTS(` and followed by the string `);`, so you
 can extract the JSON data by stripping off that prefix and suffix.
	# The JSON Test Results Format

	The JSON Test Results Format is a generic file format we use to record the
	results of each individual test in test run (whether the test is run on a bot,
	or run locally).

	[TOC]

	## Introduction

	We use these files on the bots in order to determine whether a test step had
	any failing tests (using a separate file means that we don't need to parse the
	output of the test run, and hence the test can be tailored for human readability
	as a result). We also upload the test results to dashboards like the
	[Flakiness Dashboard](http://test-results.appspot.com).

	The test format originated with the Blink layout tests, but has since been
	adopted by GTest-based tests and Python unittest-based tests, so we've
	standardized on it for anything related to tracking test flakiness.

	### Example

	Here's a very simple example for one Python test:

	% python mojo/tools/run_mojo_python_tests.py --write-full-results-to results.json mojom_tests.parse.ast_unittest.ASTTest.testNodeBase
	Running Python unit tests under mojo/public/tools/bindings/pylib ...
	.
	----------------------------------------------------------------------
	Ran 1 test in 0.000s

	OK
	% cat results.json
	{
	"tests": {
	"mojom_tests": {
	"parse": {
	"ast_unittest": {
	"ASTTest": {
	"testNodeBase": {
	"expected": "PASS",
	"actual": "PASS",
	"artifacts": {
	"screenshot": ["screenshots/page.png"],
	}
	}
	}
	}
	}
	}
	},
	"interrupted": false,
	"path_delimiter": ".",
	"version": 3,
	"seconds_since_epoch": 1406662283.764424,
	"num_failures_by_type": {
	"FAIL": 0,
	"PASS": 1
	},
	"artifact_types": {
	"screenshot": "image/png"
	}
	}



	As you can see, the format consists of a one top level dictionary containing a
	set of metadata fields describing the test run, plus a single `tests` key that
	contains the results of every test run, structured in a hierarchical trie format
	to reduce duplication of test suite names (as you can see from the deeply
	hierarchical Python test name).

	The file is strictly JSON-compliant. As a part of this, the order the name
	appear in each object is unimportant.

	## Top-level field names

	\| Field Name \| Data Type \| Description \|
	\|------------\|-----------\|-------------\|
	\| `interrupted` \| boolean \| Required. Whether the test run was interrupted and terminated early (either via the runner bailing out or the user hitting ctrl-C, etc.) If true, this indicates that not all of the tests in the suite were run and the results are at best incomplete and possibly totally invalid. \|
	\| `num_failures_by_type` \| dict \| Required. A summary of the totals of each result type. If a test was run more than once, only the first invocation's result is included in the totals. Each key is one of the result types listed below. A missing result type is the same as being present and set to zero (0). \|
	\| `path_delimiter` \| string \| Optional, will be mandatory. The separator string to use in between components of a tests name; normally "." for GTest- and Python-based tests and "/" for layout tests; if not present, you should default to "/" for backwards-compatibility. \|
	\| `seconds_since_epoch` \| float \| Required. The start time of the test run expressed as a floating-point offset in seconds from the UNIX epoch. \|
	\| `tests` \| dict \| Required. The actual trie of test results. Each directory or module component in the test name is a node in the trie, and the leaf contains the dict of per-test fields as described below. \|
	\| `version` \| integer \| Required. Version of the file format. Current version is 3. \|
	\| `artifact_types` \| dict \| Optional. Required if any artifacts are present for any tests. MIME Type information for artifacts in this json file. All artifacts with the same name must share the same MIME type. \|
	\| `artifact_permament_location` \| string \| Optional. The URI of the root location where the artifacts are stored. If present, any artifact locations are taken to be relative to this location. Currently only the `gs://` scheme is supported. \|
	\| `build_number` \| string \| Optional. If this test run was produced on a bot, this should be the build number of the run, e.g., "1234". \|
	\| `builder_name` \| string \| Optional. If this test run was produced on a bot, this should be the builder name of the bot, e.g., "Linux Tests". \|
	\| `chromium_revision` \| string \| Optional. The revision of the current Chromium checkout, if relevant, e.g. "356123". \|
	\| `has_pretty_patch` \| bool \| Optional, layout test specific, deprecated. Whether the layout tests' output contains PrettyDiff-formatted diffs for test failures. \|
	\| `has_wdiff` \| bool \| Optional, layout test specific, deprecated. Whether the layout tests' output contains wdiff-formatted diffs for test failures. \|
	\| `layout_tests_dir` \| string \| Optional, layout test specific. Path to the LayoutTests directory for the test run (used so that we can link to the tests used in the run). \|
	\| `pixel_tests_enabled` \| bool \| Optional, layout test specific. Whether the layout tests' were run with the --pixel-tests flag. \|
	\| `fixable` \| integer \| Optional, deprecated. The number of tests that were run but were expected to fail. \|
	\| `num_flaky` \| integer \| Optional, deprecated. The number of tests that were run more than once and produced different results each time. \|
	\| `num_passes` \| integer \| Optional, deprecated. The number of successful tests; equivalent to `num_failures_by_type["Pass"]` \|
	\| `num_regressions` \| integer \| Optional, deprecated. The number of tests that produced results that were unexpected failures. \|
	\| `skips` \| integer \| Optional, deprecated. The number of tests that were found but not run (tests should be listed in the trie with "expected" and "actual" values of `SKIP`. \|

	## Per-test fields

	Each leaf of the `tests` trie contains a dict containing the results of a
	particular test name. If a test is run multiple times, the dict contains the
	results for each invocation in the `actual` field.

	\| Field Name \| Data Type \| Description \|
	\|-------------\|-----------\|-------------\|
	\| `actual` \| string \| Required. An ordered space-separated list of the results the test actually produced. `FAIL PASS` means that a test was run twice, failed the first time, and then passed when it was retried. If a test produces multiple different results, then it was actually flaky during the run. \|
	\| `expected` \| string \| Required. An unordered space-separated list of the result types expected for the test, e.g. `FAIL PASS` means that a test is expected to either pass or fail. A test that contains multiple values is expected to be flaky. \|
	\| `artifacts` \| dict \| Optional. A dictionary describing test artifacts generated by the execution of the test. The dictionary maps the name of the artifact (`screenshot`, `crash_log`) to a list of relative locations of the artifact (`screenshot/page.png`, `logs/crash.txt`). Any '/' characters in the file paths are meant to be platform agnostic; tools will replace them with the appropriate per platform path separators. There is one entry in the list per test execution. If `artifact_permanent_location` is specified, then this location is relative to that path. Otherwise, the path is assumed to be relative to the location of the json file which contains this. \|
	\| `bugs` \| string \| Optional. A comma-separated list of URLs to bug database entries associated with each test. \|
	\| `is_unexpected` \| bool \| Optional. If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored. \|
	\| `time` \| float \| Optional. If present, the time it took in seconds to execute the first invocation of the test. \|
	\| `times` \| array of floats \| Optional. If present, the times in seconds of each invocation of the test. \|
	\| `has_repaint_overlay` \| bool \| Optional, layout test specific. If present and true, indicates that the test output contains the data needed to draw repaint overlays to help explain the results (only used in layout tests). \|
	\| `is_missing_audio` \| bool \| Optional, layout test specific. If present and true, the test was supposed to have an audio baseline to compare against, and we didn't find one. \|
	\| `is_missing_text` \| bool \| Optional, layout test specific. If present and true, the test was supposed to have a text baseline to compare against, and we didn't find one. \|
	\| `is_missing_video` \| bool \| Optional, layout test specific. If present and true, the test was supposed to have an image baseline to compare against and we didn't find one. \|
	\| `is_testharness_test` \| bool \| Optional, layout test specific. If present, indicates that the layout test was written using the w3c's test harness and we don't necessarily have any baselines to compare against. \|
	\| `reftest_type` \| string \| Optional, layout test specific. If present, one of `==` or `!=` to indicate that the test is a "reference test" and the results were expected to match the reference or not match the reference, respectively (only used in layout tests). \|

	## Test result types

	Any test may fail in one of several different ways. There are a few generic
	types of failures, and the layout tests contain a few additional specialized
	failure types.

	\| Result type \| Description \|
	\|--------------\|-------------\|
	\| `SKIP` \| The test was not run. \|
	\| `PASS` \| The test ran as expected. \|
	\| `FAIL` \| The test did not run as expected. \|
	\| `CRASH` \| The test runner crashed during the test. \|
	\| `TIMEOUT` \| The test hung (did not complete) and was aborted. \|
	\| `MISSING` \| Layout test specific. The test completed but we could not find an expected baseline to compare against \|
	\| `LEAK` \| Layout test specific. Memory leaks were detected during the test execution. \|
	\| `SLOW` \| Layout test specific. The test is expected to take longer than normal to run. \|
	\| `TEXT` \| Layout test specific, deprecated. The test is expected to produce a text-only failure (the image, if present, will match). Normally you will see `FAIL` instead. \|
	\| `AUDIO` \| Layout test specific, deprecated. The test is expected to produce audio output that doesn't match the expected result. Normally you will see `FAIL` instead. \|
	\| `IMAGE` \| Layout test specific. The test produces image (and possibly text output). The image output doesn't match what we'd expect, but the text output, if present, does. \|
	\| `IMAGE+TEXT` \| Layout test specific, deprecated. The test produces image and text output, both of which fail to match what we expect. Normally you will see `FAIL` instead. \|
	\| `REBASELINE` \| Layout test specific. The expected test result is out of date and will be ignored (any result other than a crash or timeout will be considered as passing). This test result should only ever show up on local test runs, not on bots (it is forbidden to check in a TestExpectations file with this expectation). This should never show up as an "actual" result. \|
	\| `NEEDSREBASELINE` \| Layout test specific. The expected test result is out of date and will be ignored (as above); the auto-rebaseline-bot will look for tests of this type and automatically update them. This should never show up as an "actual" result. \|
	\| `NEEDSMANUALREBASELINE` \| Layout test specific. The expected test result is out of date and will be ignored (as above). This result may be checked in to the TestExpectations file, but the auto-rebasline-bot will ignore these entries. This should never show up as an "actual" result. \|

	## "full_results.json" and "failing_results.json"

	The layout tests produce two different variants of the above file. The
	`full_results.json` file matches the above definition and contains every test
	executed in the run. The `failing_results.json` file contains just the tests
	that produced unexpected results, so it is a subset of the `full_results.json`
	data. The `failing_results.json` file is also in the JSONP format, so it can be
	read via as a `<script>` tag from an html file run from the local filesystem
	without falling prey to the same-origin restrictions for local files. The
	`failing_results.json` file is converted into JSONP by containing the JSON data
	preceded by the string `ADD_RESULTS(` and followed by the string `);`, so you
	can extract the JSON data by stripping off that prefix and suffix.