Flake Portal

Flake portal is the entry point to Chromium flakes and related information.

  • Use the Flakes page to view all on-going flakes ranked by their negative impacts, search for a flaky test, or filter flakes by binary, builder, component, etc.
  • Use the Report page to assess overall flakiness states for the code area represented by a crbug component.
  • Use the Analysis page to look for flake culprits and verify flake fixes.

Flake Portal UI: https://analysis.chromium.org/p/chromium/flake-portal

Table of contents:

Flakes

The Flakes page ranks on-going flakes from CQ and CI in last 7 days by their negative impacts. Open bugs (either filed by Findit or by developers/sheriffs/other automatic tools) are linked to each flake.

Term Definitions

Flaky tests. Any test that failed nondeterministically is a flaky test. For types of flaky tests, see below.

Flake types. Within Flakes, flake types are defined by the negative impact of a flaky run.

  • CQ false rejection, a test failure that causes a retried build or even causes a CL to be incorrectly rejected by CQ.
  • CQ step level retry, a test failure that causes additional ‘retry with patch’ and/or ‘retry shards with patch’ steps.
  • CQ hidden flake, a passed test that were retried 2+ times by the test-runner. The first run and the first retry failed, while a later retry passed. This is to filter out noises caused by cpu/gpu/etc resource starvation due to parallel test execution by the test runner.
  • CI failed step flake, a test failure that causes a test step failure on a CI waterfall build.

Group Similar Flaky Tests on UI

The Flakes page groups similar flaky tests to avoid duplications, using the following criteria:

  • gtests with different parameters.

Example of gtests with different parameters

  • webkit_layout_tests with different queries.

Example of webkit layout tests with different queries

Rank Flaky Tests

Detected flakes are ranked by a unified score by their negative impacts.

The score for CQ flakes is calculated based on impacted CLs and weights of each flake type. The score for CI flakes is calculated based on occurrences and weights of each flake type, since CL concept is not relevant. The weights are heuritically chosen numbers which should be proportional to the negative impact of each types.

Score = Sum(CQ flake type weight * impacted CLs) + Sum(CI flake type weight * occurrences)

Flake Score Example

Search Flaky Tests

  • Searching by test name for a specific flaky test is supported, without time limit.

Search Flake Example

  • Tag-based filtering for flaky tests in last past 7 days is also supported.
    • You may search flakes with arbitrary combination of the above supported tags. But at least one “==” filter should be included. And the search results will only include flakes that:
      • match ALL “==” filters
      • do NOT match ANY “!=” fliters
TagExample
binarybinary==content_browsertests matches tests in steps with content_browsertests as isolate target.
builderbuilder==win7_chromium_rel_ng matches tests that occurred in the builder win7_chromium_rel_ng.
componentcomponent==Blink>Accessibility matches tests whose directory's OWNERS file has Blink>Accessibility as COMPONENT. Tests in sub-components are not included.
directorydirectory==base/ matches tests whose test files are in base/ directory.
mastermaster==tryserver.chromium.android matches tests that occurred in the master tryserver.chromium.android.
parent_componentparent_component==Blink>Accessibility matches tests whose component is Blink>Accessibility or a sub-component of Blink>Accessibility.
sourcesource==base/hash_unittest.cc matches tests defined in the source file base/hash_unittest.cc.
stepstep==content_browsertests (with patch)
suiteIn Findit, suite is the smallest group of tests that are defined in the same file or directory, with some special cases.
suite==GCMConnectionHandlerImplTest matches gtest GCMConnectionHandlerImplTest.*
suite==FullscreenVideoTest matches Java tests .FullscreenVideoTest#test.
suite==third_party/blink/web_tests/fast/events matches Blink layout tests fast/events/*
suite==webgl_conformance_tests matches Telemetry-based gpu tests gpu_tests.webgl_conformance_integration_test.*
test_typetest_type==content_browsertests matches tests in steps with content_browsertests as their step name if removing suffixes like “(with patch)”.
watchlistwatchlist==accessibility matches tests whose source files match the accessibility watchlist in src/WATCHLISTS.

Filter Flakes Example

Workflow

For CQ flakes

  • It leverages existing data sources for CQ (cq_raw), completed builds (completed_builds) and test results (test_results) BigQuery tables.
  • Cron jobs that execute SQL queries run once every 30 minutes to detect flaky tests and store the results.
    • The query for cq hidden flakes is executed every 2 hours.

For CI flakes

When a test step fails on a CI waterfall build, Findit runs a deflake swarming task to differentiate consistent test failures and flakes. The identified flakes will then show here.

After flakes are detected

Occurrences of the same tests are aggregated and presented in following forms:

  • Manage pre-analysis bugs:
    • Automatically file a bug. Or
    • Automatically comment on an existing bug.
  • Show on the Flakes page.
  • Automatically trigger culprit analysis in Analysis.

Bug Filing Criteria

To avoid noise, a flake bug is filed on Monorail only if all the following requirements are met:

  • At least 3 unreported CQ false rejection or cq step level retry occurrences impacting different CLs within the past 24 hours.
  • Any bug can only be created or updated at most once within any 24 hours window.
  • At most 30 bugs can be created or updated automatically within any 24 hours window.

Additionally, flaky tests could be grouped so that each group can have the same bug. Flaky tests will be grouped together if:

  • They have the same test-type,
  • They have failed in exactly the same builds in the past 24 hours.

Report

The Report page reports weekly flakiness states of Chrome Browser project and breakdown by each crbug component.

Report is generated every Monday 12:00AM PST for data of last week.

It reports the following states:

  • Flaky Tests. Count of tests of a component with flake occurrences in last week.
  • Flake Bugs. Count of flake bugs linked to the flaky tests.
  • New Bugs. Count of flake bugs linked to the flaky tests that were newly created in last week.
  • False Rejects. Count of CLs with CQ build retry (retries) on a patch or an equivalent patchsets because of any of the flaky tests in last week. Should be a sub-set of impacted CLs.
  • Impacted CLs. Count of CLs that have been impacted by any of the flaky tests in last week.
  • Flake Occurrences. Count of occurrences of all flaky tests in last week.

Project view shows aggregated flakiness data and top flake components with the most flakes or highest negative impacts. To look for a particular component, please use the search field on top of the page.

Project Report

Component view shows

  • flake trend over time