This document assumes familiarity with the concepts introduced in how-to-write-metrics.
Metrics produce Histograms, which are serialized into an HTML file named results.html
by default. Upon opening this HTML file in Chrome, after all of the Histograms are loaded, the histogram-set-table
processes them and displays many names, numbers, and controls.
This complex UI can be overwhelming initially, but understanding how to use it can pay off in deep performance insights.
We'll skip the top two rows of controls for now because they modify what information is displayed in the table, which we need to understand first.
The first column always contains Names. Usually, these are Histogram names, which are the strings that metrics pass to Histogram constructors. However, as we'll see in “The Groupby-picker”, if you group by attributes of Histograms besides their name, like the “story name”, then this first Name column will contain the name of the grouping of Histograms, such as the story name.
The other columns contain Histogram values. One of the other attributes of Histograms is their displayLabel
. If these Histograms were produced by Telemetry, then you can pass --results-label
to Telemetry's run_benchmark
command to set a label for all of the Histograms produced by that benchmark run. Otherwise, displayLabel will default to the benchmark name plus the time that the benchmark was run. There is a separate value column for each distinct displayLabel value, which is displayed in the header cell.
Histograms can be grouped into a tree configured by the groupby-picker, described below. Non-leaf node Histograms are merged from the Histograms that are grouped under them. Histograms that are grouped together are only mergeable if they share the same units and bin boundaries.
If a non-leaf node Histogram is merged from unmergeable Histograms, then the table will display that non-leaf node Histogram as the string “(unmergeable)”.
Click the up/down pointing triangles in the table header row in order to sort the table by that column. Sorting the Name column sorts alphabetically. Sorting Value columns sorts numerically by Histogram average.
Click a blue line chart icon in the Name column to display the overview line charts in that row. These line charts graph Histogram averages. The x-axes of these charts differ:
The x-axis of the overview line chart in the Name column corresponds to the value columns.
The x-axis of the overview line chart in the Value columns correspond to the Histograms in the Grouping Tree that were merged to produce the Histogram in that table cell.
By default, table cells contain only averages of the Histograms in those cells.
Click the bar chart icon in table cells to display more information about the Histogram:
Metadata may contain
Metrics can add diagnostics either to samples within Histograms or to Histograms themselves.
Click and drag the mouse around the bar chart in order to display sample diagnostics for the selected Histogram bins.
Sample diagnostics may include Breakdown diagnostics, which indicate relative sizes of component metrics for the selected samples.
When the name of and entries in a Breakdown diagnostic align with a RelatedNameMap diagnostic on the parent Histogram, the table rows become clickable links to the related Histograms. For more about summation metrics, see Diagnostic Metrics.
The groupby-picker
sits just above the table and allows you to control how Histograms are grouped into the Grouping Tree.
Grouping keys include
You can enable or disable grouping keys by clicking the checkboxes, and re-order them by clicking the left and right arrows.
Finally, the row of controls at the top of the screen contains several different small control elements. Let's start at the left.
Type a regex in the Search box to filter Histograms out of the displayed HistogramSet whose name do not match the regex.
For example, the v8 metric produces Histograms whose names end with either “duration” or “count”. The “duration” Histograms cannot be merged with the “count” histograms. If you group by any other grouping key besides “name” first, then the table will try to merge some “count” Histograms with some “duration” Histograms. The two different kinds of Histograms have different units, so they cannot be merged, so the table will display “(unmergeable)”. In order to make the table ignore the “count” Histograms so that it can merge all of the “duration” Histograms together, you can filter out the “count” Histograms by entering “duration” in the Search box.
Toggle the Overview Line Charts for all visible rows at once by clicking the blue line chart button to the right of the Search box.
The Reference Column Selector is displayed when there are multiple Value columns in order to allow you to select one. When one Value column is selected as a reference, then all of the other Value columns will update to display the delta between that column and the reference Value column.
For example, if there are two Value columns “A” and “B”, and you select “A” in the Reference Column Selector, then the values displayed in the “A” Value column will remain the same, but the values displayed in the “B” Value column will change to display the differences between the values in the “B” column and the corresponding values in the “A” column.
When you open a histogram in a non-reference Value column, in addition to the statistics produced by the Histogram such as “avg”, “stddev”, “min”, “max”, etc. the Histogram statistic table will also display statistics about the delta between that Histogram and the corresponding reference Histogram. Delta statistics include
In order to decide whether a regression is significant enough that you should avoid committing a change, you need to know how to interpret the delta statistics. To begin, there are actually two different kinds of “significance”, and you need to evaluate both independently.
Basically, low p-value = more confidence that two Histograms are statistically significantly different.
For more detail, the p-value is the probability under the null hypothesis of obtaining results more extreme than your actual results. The null hypothesis in this case states that
If the p-value is low enough, this is sufficient statistical evidence for us to reject the null hypothesis. If the p-value is high, then we fail to reject the null hypothesis since there is not enough statistical evidence to provide confidence that the samples are from different populations. In context, rejecting the null hypothesis means that the two histograms were actually taken from different populations: a pre-regression state and a post-regression state.
“Low” is subjective, but results.html defaults to “less than α=0.01”, but you could argue that α should be as high as 0.1 or as low as 0.001 for your metric.
In order to make it easy for you to quickly visually find statistically significantly different results, an emoji is displayed in cells in non-reference columns when a reference column is selected.
For more detail on how p-values are computed, see the Mann-Whitney U test.
It is possible for a regression to be statistically significant, but small enough that you shouldn't worry about it. When sample distributions have very low deviation, then even tiny changes can be detected statistically.
Results.html cannot yet automatically determine whether the magnitude of a regression or improvement is significant.
There are currently a few options for manual ways to decide whether the magnitude of a regression is significant:
This slider allows you to adjust the threshold of statistical significance when comparing results in one column against the reference column. Raising the threshold causes more changes to be considered significant, and display green grinning or red frowning emoji. Lowering the threshold causes fewer changes to be considered significant, and display a neutral emoji.
This selector controls which statistics are displayed in the table. You can select from the any of the statistics that any Histogram was configured to produce by the metric that produced it.
When a reference column is selected, you can also select from any of several delta statistics: absΔavg, %Δavg, z-score, absΔstd, %Δstd, p-value, U. These delta statistics will only be displayed in non-reference columns; avg or std will be displayed in the reference column.
Among these statistics, percentiles, interpercentile ranges and confidence intervals are listed as pct_XXX, ipr_XXX_YYY and CI_XXX_[lower/upper]. value of XXX can be between 000 to 100 which covers range of 0.00 to 1.00.
Clicking this button downloads a CSV file that can be imported into a spreadsheet app such as Google Sheets.
After the header row, there will be one row for each of the leaf Histograms in the Grouping Tree. For example, if you group only by Histogram name, then there will be as many value rows as there are distinct Histogram names in the displayed HistogramSet.
The columns will contain
By default, this checkbox is unchecked so that the table only displays Histograms that are source nodes in the metric graphical model. Check this checkbox in order to show all Histograms, including those that are not source nodes in the metric graphical model.