Contacting troopers

This page can be found at:

Have an issue with a piece of build infrastructure? Our troopers are here to help.

Oncall hours: we have 3 oncall sites each covering during their site's work office hours:

  • APAC covers 0100 - 0700 UTC
  • EMEA covers 0900 - 1700 UTC
  • MTV covers 1800 - 0100 UTC (1000 - 1800 MTV)

APAC and EMEA sites primarily respond to P0s for critical infrastructures and are pager-driven. If you have created a P0 issue and don't see a response from EMEA/APAC trooper in the first 30 minutes, please ping them to make sure they are aware of the issue.

The primary way to contact a trooper is via using the templates and priorities established below. If you need to find the current trooper, check, or vi/chrome_infra (internal link).

If you know your issue is with the physical hardware, or otherwise should be handled by the Systems team, please follow their Rules of Engagement.

Bug Templates

For fastest response, please use the provided templates:

Also make sure to include the machine name (e.g. build11-m1) as well as the waterfall name (Builder: Win).

Priority Levels

Priorities are set using the Pri=N label. Use the following as your guideline:

  • Pri-0: Immediate attention desired.  The trooper will stop everything they are doing and investigate.
    • Examples: CQ no longer committing changes, master offline.
  • Pri-1: Resolution desired within an hour or two.
    • Examples: disk full on device, device offline, sheriff-o-matic data stale.
  • Pri-2: Should be handled today.
    • Examples: Master restart requests, tryserver restart requests.
  • Pri-3: Non-urgent. If the trooper cannot get to this today due to other incidents, it is ok to wait.
    • Examples: Large change that will need trooper assistance, aka, “I'd like to land this gigantic change that may break the world”

Life of a Request

Status will be tracked using the Status field, with the ‘owner’ field unset. The trooper queue relies on the ‘owner’ field being unset to track issues properly, with troopers setting the owners field for particularly long-running issues. Please do not assign issues to the trooper directly, doing so may actually increase the time taken to respond to an issue.

  • Untriaged: Your issue will show up in the queue to the trooper as untriaged. Once they acknowledge the bug, the status will change.
  • Available: Trooper has ack'ed, if not Pri-0, this means they have not started working on it.
  • Assigned:
    • Trooper has triaged and determined there is a suitable owner and appropriately assigned.
    • If that owner is YOU this indicates that they need more information from you in order to proceed.  Please provide the information, and then unset ‘owner’ so the issue shows up in the queue again.
  • Started: Your issue is being handled, either by the Trooper or other owner.
  • Fixed: The trooper believes the issue is resolved and no further action is required on their part.

Master Restarts

Trooper-assisted Restart

Please file a bug using the Master restart requests bug template. This is preferred method if you are not a Googler, or not a committer in infradata/master-manager repo. This is also a preferred method for large masters like chromium.* and tryserver.chromium.*, to avoid duplicate restart requests and unintended downtime during peak hours.

Self-service (Googlers only)

Master restarts are handled by master manager and only require running a single command that mails a CL to schedule the restart.

With depot_tools in your path, run:

# Get an auth token for your account if you don't already have one.
depot-tools-auth login

# Restart master in 15 minutes.
cit restart -r <current trooper> [-b <bug number>]

Note: if you‘re not in the committers list CQ will try it first and you’ll have to ping the trooper to get an lgtm. The master will be restarted at the requested time, or once the CL lands, whichever comes later.

If you're having trouble you can file a bug with the trooper using the Master restart requests bug template.

Service Hours

Troopers provide full time coverage with the expected response times outlined above during the PST work day. Support during EMEA work hours is limited to P0 only. Other times support is provided best-effort.

More Information

View the current trooper queue.

Common Non-Trooper Requests: