blob: 16b205d7a76cb163ee2ccff52d749d1bca433a30 [file] [log] [blame]
// Copyright 2019 The LUCI Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package luciexe documents the "LUCI Executable" protocol, and contains
// constants which are part of this protocol.
//
// # Summary
//
// A LUCI Executable ("luciexe") is a binary which implements a protocol to:
//
// - Pass the initial state of the 'build' from a parent process to the luciexe.
// - Understand the build's local system contracts (like the location of cached data).
// - Asynchronously update the state of the build as it runs.
//
// This protocol is recursive; A luciexe can run another luciexe such that the
// child's updates are reflected on the parent's output.
//
// The protocol has 3 parts:
//
// - Host Application - This sits at the top of the luciexe process
// hierarchy and sets up singleton environmental requirements for the whole
// tree.
// - Invocation of a luciexe binary - This invocation process occurs both
// for the topmost luciexe, as well as all internal invocations of other
// luciexe's within the process hierarchy.
// - The luciexe binary - The binary has a couple of responsibilities to be
// compatible with this protocol. Once the binary has fulfilled it's
// responsibilities it's free to do what it wants (i.e. actually do its
// task).
//
// In general, we strive where possible to minimize the complexity of the
// luciexe binary. This is because we expect to have a small number of 'host
// application' implementations, and a relatively large number of 'luciexe'
// implementations.
//
// # The Host Application
//
// At the root of every tree of luciexe invocations there is a 'host'
// application which sets up an manages all environmental singletons (like the
// Logdog 'butler' service, the LUCI ambient authentication service, etc.). This
// Host Application is responsible for intercepting and merging all
// 'build.proto' streams emitted within this tree of luciexes (see "Recursive
// Invocation"). The Host Application may choose what happens to these
// intercepted build.proto messages.
//
// The Host Application MUST:
// - Run a logdog butler service and expose all relevant LOGDOG_* environment
// variables such that the following client libraries can stream log data:
// - Golang: go.chromium.org/luci/logdog/client/butlerlib/bootstrap
// - Python: infra_libs.logdog.bootstrap
// - Hook the butler to intercept and merge build.proto streams into a single
// build.proto (zlib-compressed) stream.
// - Set up a local LUCI ambient authentication service which luciexe's can
// use to mint auth tokens.
// - Prepare an empty directory which will house tempdirs and workdirs for
// all luciexe invocations. The Host Application MAY clean this directory
// up, but it may be useful to leak it for debugging. It's permissible for
// the Host Application to defer this cleanup to an external process (e.g.
// buildbucket's agent may defer this to swarming).
//
// The Host Application MAY hook additional streams for debugging/logging; it is
// frequently convenient to hook the stderr/stdout streams from the top level
// luciexe and tee them to the Host Application's stdout/stderr.
//
// For example: the `go.chromium.org/luci/buildbucket/cmd/agent` binary forwards
// these merged build.proto messages to the Buildbucket service, and also
// uploads all streams to them to the Logdog cloud service. Other host
// implementations may instead choose to write all streams to disk, send them to
// /dev/null or render them as html. However, from the point of view of the
// luciexe that they run, this is transparent.
//
// Host Applications MAY implement 'backpressure' on the luciexe binaries by
// throttling the rate at which the Logdog butler accepts data on its various
// streams. However, doing this could introduce timing issues in the luciexe
// binaries as they try to run, so this should be done thoughtfully.
//
// If a Host Application detects a protocol violation from a luciexe within its
// purview, it SHOULD report the violation (in a manner of its choosing) and
// MUST consider the entire Build status to be INFRA_FAILURE. In addition the
// Host Application SHOULD attempt to kill (via process group SIGTERM/SIGKILL on
// *nix, and CTRL+BREAK/Terminate on windows) the luciexe hierarchy. The host
// application MAY provide a window of time between the initial "please stop"
// signal and the "you die now" signal, but this comes with the usual caveats of
// cleanups and deadlines (notably: best-effort clean up is just that:
// best-effort. It cannot be relied on to run to completion (or to run
// completely)).
//
// # Invocation
//
// When invoking a luciexe, the parent process has a couple responsibilities. It
// must:
//
// - Set $TEMPDIR, $TMPDIR, $TEMP, $TMP and $MAC_CHROMIUM_TMPDIR to all point
// to the same, empty directory.
// This directory MUST be located on the same file system as CWD.
// This directory MUST NOT be the same as CWD.
//
// - Set $LUCI_CONTEXT["luciexe"]["cache_dir"] to a cache dir which makes sense
// for the luciexe.
// The cache dir MAY persist/be shared between luciexe invocations.
// The cache dir MAY NOT be on the same filesystem as CWD.
//
// - Set the $LOGDOG_NAMESPACE to a prefix which namespaces all logdog streams
// generated from the luciexe.
//
// - Set the Status of initial buildbucket.v2.Build message to `STARTED`.
//
// - Set the CreateTime and StartTime of initial buildbucket.v2.Build message.
//
// - Clear following fields in the initial buildbucket.v2.Build message.
//
// - EndTime
//
// - Output
//
// - StatusDetails
//
// - Steps
//
// - SummaryMarkdown
//
// - UpdateTime
//
// The CWD is up to your application. Some contexts (like Buildbucket) will
// guarantee an empty CWD, but others (like recursive invocation) may explicitly
// share CWD between multiple luciexe's.
//
// The tempdir and workdir paths SHOULD NOT be cleaned up by the invoking
// process. Instead, the invoking process should defer to the Host Application
// to provide this cleanup, since the Host Application may be configured to leak
// these for debugging purposes.
//
// The invoker MUST attach the stdout/stderr to the logdog butler as text
// streams. These MUST be located at `$LOGDOG_NAMESPACE/std{out,err}`.
// Typical luciexe implementations will use these for debug logging and output,
// but are not required to do so.
//
// The invoker MUST write a binary-encoded buildbucket.v2.Build to the stdin of
// the luciexe which contains all the input parameters that the luciexe needs to
// know to run successfully.
//
// # The luciexe binary
//
// Once running, the luciexe MUST read a binary-encoded buildbucket.v2.Build
// message from stdin until EOF.
//
// As per the invoker's responsibility, the luciexe binary MAY expect the
// status of the initial Build message to be `STARTED` and CreateTime and
// StartTime are populated. It MAY also expect following fields to be empty.
// It MUST NOT assume other fields in the Build message are set. However, the
// Host Application or invoker MAY fill in other fields they think are useful.
//
// EndTime
// Output
// StatusDetails
// Steps
// SummaryMarkdown
// Tags
// UpdateTime
//
// As per the Host Application's responsibilities, the luciexe binary MAY expect
// the "luciexe" and "local_auth" sections of LUCI_CONTEXT to be filled. Other
// sections of LUCI_CONTEXT MAY also be filled. See the LUCI_CONTEXT docs:
// https://chromium.googlesource.com/infra/luci/luci-py/+/HEAD/client/LUCI_CONTEXT.md
//
// !!NOTE!! The paths supplied to the luciexe MUST NOT be considered stable
// across invocations. Do not hard-code these, and try not to rely on their
// consistency (e.g. for build reproducibility).
//
// # The luciexe binary - Updating the Build state
//
// A luciexe MAY update the Build state by writing to a "build.proto" Logdog
// stream named "$LOGDOG_NAMESPACE/build.proto". A "build.proto" Logdog stream
// is defined as:
//
// Content-Type: "application/luci+proto; message=buildbucket.v2.Build"
// Type: Datagram
//
// Additionally, a build.proto stream MAY append "; encoding=zlib" to the
// Content-Type (and compress each message accordingly). This is useful for when
// you potentially have very large builds.
//
// Each datagram MUST be a valid binary-encoded buildbucket.v2.Build message.
// The state of the build is defined as the last Build message sent on this
// stream. There's no implicit accumulation between sent Build messages.
//
// The Step.Log.Url field in the emitted Build messages can be either absolute
// (has a valid url scheme) or relative. If an absolute `Url` is provided, the
// luciexe is also responsible for supplying corresponding `ViewUrl` and the
// host application won't validate or make any adjustments to those urls. This
// allows luciexe to include logs from other sources into this build. If a
// relative `Url` is provided, the host application will namespace the `Url`
// with $LOGDOG_NAMESPACE of the build.proto stream. For example, if the host
// application is parsing a Build.proto in a stream named
// "logdog://host/project/prefix/+/something/build.proto", then a Log with a Url
// of "hello/world/stdout" will be transformed into:
//
// Url: logdog://host/project/prefix/+/something/hello/world/stdout
// ViewUrl: <implementation defined>
//
// The `ViewUrl` field in this case SHOULD be left empty, and will be filled in
// by the host application running the luciexe (if supplied it will be
// overwritten).
//
// The following Build fields will be read from the luciexe-controlled
// build.proto stream:
//
// EndTime
// Output
// Status
// StatusDetails
// Steps
// SummaryMarkdown
// Tags
// UpdateTime
//
// # The luciexe binary - Reporting final status
//
// A luciexe MUST report its success/failure by sending a Build message with
// a terminal `status` value before exiting. If the luciexe exits before sending
// a Build message with a terminal Status, the invoking application MUST
// interpret this as an INFRA_FAILURE status. The exit code of the luciexe
// SHOULD be ignored, except for advisory (logging/reporting) purposes. The host
// application MUST detect this case and fill in a final status of
// INFRA_FAILURE, but MUST NOT terminate the process hierarchy in this case.
//
// # Recursive Invocation
//
// To support recursive invocation, a luciexe MUST accept the flag:
//
// --output=path/to/file.{pb,json,textpb}
//
// The value of this flag MUST be an absolute path to a non-existent file in
// an existing directory. The extension of the file dictates the data format
// (binary, json or text protobuf). The luciexe MUST write it's final Build
// message to this file in the correct format. If `--output` is specified,
// but no Build message (or an invalid/improperly formatted Build message)
// is written, the caller MUST interpret this as an INFRA_FAILURE status.
//
// NOTE: JSON outputs SHOULD be written with the original proto field names,
// not the lowerCamelCase names; downstream users may not be using jsonpb
// unmarshallers to interpret the JSON data.
//
// This may need to be revised in a subsequent version of this API
// specification.
//
// LUCI Executables MAY invoke other LUCI Executables as sub-steps and have the
// Steps from the child luciexe show in the parent's Build updates. This is one
// of the responsibilities of the Host Application.
//
// The parent can achieve this by recording a Step S (with no children), and
// a Step.MergeBuild.from_logdog_stream which points to a "build.proto" stream
// (see "Updating the Build State"). This is called a "Merge Step", and is
// a directive for the host to merge Build messages from in the
// from_logdog_stream log here. Note that this stream name should follow the
// same provisions specified for "Step.Log.Url". It's the invoker's
// responsibility to populate $LOGDOG_NAMESPACE with the new, full, namespace.
// Failure to do this will result in missing logs/step data.
//
// (SIDENOTE: There's an internal proposal to improve the Logdog "butler" API so
// that application code only needs to handle the relative namespaces as well,
// which would make this much less confusing).
//
// The Host Application MUST append all steps from the child build.proto
// stream to the parent build as substeps of step S and copy the following
// fields of the child Build to the equivalent fields of step S *only if*
// step S has *non-final* status. It is the caller's responsibility to populate
// rest of the fields of step S if the caller explicitly marks the step
// status as final.
//
// SummaryMarkdown
// Status
// EndTime
// Output.Logs (appended)
//
// This rule applies recursively, i.e. the child build MAY have Merge Step(s).
//
// Each luciexe's step names should be emitted as relative names. e.g. say
// a build runs a sub-luciexe with the name "a|b". This sub-luciexe then runs
// a step "x|y|z". The top level build.proto stream will show the step
// "a|b|x|y|z".
//
// The graph of datagram streams MUST be a tree. This follows from the
// namespacing rules of the Log.Url fields; Since Log.Url fields are relative to
// their build's namespace, it's only possible to have a merge step point
// further 'down' the tree, making it impossible to create a cycle.
//
// # Recursive Invocation - Legacy ChromeOS style
//
// Note that there's an option in MergeBuild called 'legacy_global_namespace'.
// This mode is NOT RECOMMENDED, and was only added to support legacy ChromeOS
// builders which relied on this functionality prior to luciexe.
//
// When this option is turned on, output properties on the build stream will be
// merged into the top-level build containing the Merge Step, and it will also
// de-namespace the step names (i.e. a MergeStep called "Parent" whose
// build.proto stream contains a name "Step" would normally show on the build as
// "Parent|Step", will instead simply show as "Step" in this mode. This also
// means that these merge step processes must make sure not to emit a step which
// duplicates the name of one emitted in the parent build (otherwise the overall
// build will be invalid)).
//
// Using this functionality has the downside that properties emitted by the
// parent build WILL BE OVERWRITTEN by the merged build stream, if written to
// the same top-level property keys. In ChromeOS's case this is "OK" because the
// legacy recipes do a short bootstrap followed by the main payload which does
// all the actual stuff.
//
// There's a version of this feature which would be more-supportable, which
// would be to introduce namespaced outputs so that the top-level build could
// reflect the outputs from a given step (i.e. allow Step to actually have its
// own Output message). If you feel like you need this functionality, please
// contact ChOps Foundation team to discuss, rather than trying to use the
// legacy_global_namespace option.
//
// # Related Libraries
//
// For implementation-level details, please refer to the following:
//
// - "go.chromium.org/luci/luciexe" - low-level protocol details (this module).
// - "go.chromium.org/luci/luciexe/host" - the Host Application library.
// - "go.chromium.org/luci/luciexe/invoke" - luciexe invocation.
// - "go.chromium.org/luci/luciexe/exe" - luciexe binary helper library.
//
// # Other Client Implementations
//
// Python Recipes (https://chromium.googlesource.com/infra/luci/recipes-py)
// implement the LUCI Executable protocol using the "luciexe" subcommand.
//
// TODO(iannucci): Implement a luciexe binary helper in `infra_libs` analogous
// to go.chromium.org/luci/luciexe/exe and implement Recipes' support in terms
// of this.
//
// # LUCI Executables on Buildbucket
//
// Buildbucket accepts LUCI Executables as CIPD packages containing the
// luciexe to run with the fixed name of "luciexe".
//
// On Windows, this may be named "luciexe.exe" or "luciexe.bat". Buildbucket
// will prefer the first of these which it finds.
//
// Note that the `recipes.py bundle` command generates a "luciexe" wrapper
// script for compatibility with Buildbucket.
package luciexe