blob: 841ee2e1edb791ceab45b1a04930d0950762d86a [file] [log] [blame] [view]
# Advanced Topics
[TOC]
## What do the different stages mean in the target paths?
You might have noticed paths like
`@portage//internal/packages/stage2/host/portage-stable/sys-apps/attr:2.5.1`.
TL;DR, stageN in the package path means the package was built using the stageN
SDK.
The bazel host tools build architecture was inspired by the
[Gentoo Portage bootstrapping processes]. That is, we start with a "bootstrap
tarball/SDK", or what we call the stage1 tarball/SDK. This stage1 SDK is
expected to have all the host tools (i.e., portage, clang, make, etc)
required to build the [virtual/target-sdk-implicit-system] package and its
run-time dependencies. We don't concern ourselves with the specific versions
and build-time configuration of the packages contained in the stage1 SDK. We
only need the tools to be recent enough to perform a "cross-root" compile of the
[virtual/target-sdk-implicit-system] package and its dependencies.
*** note
A cross-root compilation is defined as `CBUILD=CHOST && BROOT=/ &&
ROOT=/build/host` (e.g., `CHOST=x86_64-pc-linux-gnu`). In other words, we use
the native compiler (instead of a cross-compiler) to build a brand new sysroot
in a different directory.
***
A cross-root build allows us to bootstrap the system from scratch. That means we
don’t build or link against any of the headers and libraries installed in the
`BROOT`. By starting from scratch, we can choose which packages to build,
their versions, and their build-time configuration. We call these packages built
using the stage1 SDK the "stage1 packages".
*** note
Since the stage1 SDK is the root node of all packages, we want to avoid updating
it unnecessarily to avoid cache busting all the packages.
***
Once all the "stage1 packages" (or "implicit system packages") have been built,
we take that newly created sysroot (i.e., `/build/host`) and generate the
"stage2 tarball/SDK". This bootstrap flow has two big advantages:
1. By cross-root compiling instead of trying to update the stage1 SDK in place,
we avoid performing any analysis on the packages it contains, and computing
the install, uninstall, and rebuild actions required to "upgrade" the stage1
SDK in place. This reduces a great deal of complexity.
2. Changes to any of the implicit system packages are immediately taken into
account. There is no separate out-of-band processes required. This means we
can catch build breakages before a CL lands. For example, if we wanted to
upgrade `bash` or `portage`, we could just rev-bump the ebuild, and
everything would get rebuilt automatically using the new version.
*** note
This flow essentially replaces the `update_chroot` step used by the portage
flow.
***
Now that we have a newly minted stage2 SDK, we can start building the "stage2
host packages". We no longer need to cross-root build these host tools, but can
instead perform a regular native-root build (i.e., `ROOT=/`) since we now know
the versions of the headers and libraries that are installed. When performing a
native-root build, there is basically no difference between a `DEPEND` and
`BDEPEND` dependency.
The next step is building the cross compilers for the target board. Under
portage, this is normally done using the [crossdev] tool. With bazel we just
build the cross compilers like any other package. The ebuilds have been modified
to properly declare their dependencies, so everything just works.
Once the cross compilers are built, we can start building the target board
packages. We first start with the [primordial packages] (i.e., `glibc`,
`libcxx`, etc). You can think of these as implicit system dependencies for the
target board. From there we can continue building the specified target package.
Since we are cross-compiling the target board's packages, we depend on the
packages to declare proper `BDEPEND`s so we can inject the proper host tools.
*** note
If you encounter an ebuild with `EAPI < 7` (which doesn't support
`BDEPEND`), please upgrade it and declare the proper `BDEPEND`s. For these older
ebuilds, we need to treat the `DEPEND` line as a `BDEPEND` line. This results in
building extra host tools that might not be necessary. To limit the impact of
these extra dependencies, we maintain a [list] of `DEPEND`s that we consider
valid `BDEPEND`s.
***
Conceptually you can think of every package getting a custom built SDK that
contains only the things it specifies as dependencies. We create these
per-package ephemeral SDKs as an overlayfs filesystem for efficiency, layering
all declared `BDEPEND`s, `DEPEND`s, and `RDEPEND`s atop an implicit system
layer, and executing the package's ebuild actions within that ephemeral SDK.
In summary, this is what the structure looks like:
* //bazel/portage/sdk:stage1 <-- The downloaded bootstrap/stage1 SDK.
* @portage//internal/ <- Alchemist implementation details.
* sdk/ <-- Directory containing all the SDK targets.
* stage1/target/host <-- `stage1` SDK with the `/build/host` sysroot
containing the [primordial packages].
* stage2 <-- The stage2 SDK/tarball containing the freshly built and
up-to-date `stage1/target/host` [virtual/target-sdk-implicit-system]
packages.
* target/board <-- `stage2` SDK with the `/build/board` sysroot
containing the [primordial packages].
* stage3:bootstrap <-- It is built using the `stage2/host`
[virtual/target-sdk-implicit-system] packages, and all their
transitive `BDEPEND`s. This tarball can then be used as a
stage1 tarball whenever we need a new one.
* stage4 <-- This is only used to verify that the `stage3:bootstrap`
SDK can build the implicit system. It is built using the packages
from `stage3/target/host`.
* packages/ -- Directory containing all the package targets.
* stage1/target/host/`${OVERLAY}`/`${CATEGORY}`/`${PACKAGE}` <-- The
cross-root compiled host packages built using the
`stage1/target/host` SDK. These are what go into making the `stage2`
SDK.
* stage2/ <-- Directory containing all the stage2 packages.
* host/`${OVERLAY}`/`${CATEGORY}`/`${PACKAGE}` <-- The native-root
compiled host packages built using the `stage2` SDK. These will
also be used to generate the `stage3:bootstrap` SDK in the
future.
* target/board/`${OVERLAY}`/`${CATEGORY}`/`${PACKAGE}` <-- The
cross-compiled target board packages built using the
`stage2/target/board` SDK.
* stage3/target/host/`${OVERLAY}`/`${CATEGORY}`/`${PACKAGE}` <-- The
cross-root compiled host packages built using the `stage3:bootstrap`
SDK. This will be used to confirm that the `stage3:bootstrap` SDK
can be successfully used as a "stage1 SDK/tarball".
As you can see, it's turtles (SDKs?) all the way down.
[Gentoo Portage bootstrapping processes]: https://web.archive.org/web/20171119211822/https://wiki.gentoo.org/wiki/Sakaki%27s_EFI_Install_Guide/Building_the_Gentoo_Base_System_Minus_Kernel#Bootstrapping_the_Base_System_.28Optional_but_Recommended.29
[virtual/target-sdk-implicit-system]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/virtual/target-sdk-implicit-system/target-sdk-implicit-system-9999.ebuild
[crossdev]: https://wiki.gentoo.org/wiki/Cross_build_environment
[primordial packages]: https://chromium.googlesource.com/chromiumos/bazel/+/refs/heads/main/portage/bin/alchemist/src/bin/alchemist/generate_repo/common.rs#21
[list]: https://chromium.googlesource.com/chromiumos/bazel/+/refs/heads/main/portage/bin/alchemist/src/analyze/dependency/direct/hacks.rs#14
## Interface Libraries
When bazel executes an action, it hashes all the inputs and uses the resulting
hashes to compute a cache key. If any of those inputs change, then it causes a
cache bust, and the action needs to be re-run. This can have dire consequences
when making changes to packages lower in the dep graph as it causes a lot of
rebuilds.
The best way to fix this is to prune any unnecessary dependencies. We can do
this in one of two ways:
* Prune the whole package. This is ideal since it flattens the dependency
graph.
* Be clever and prune the files installed by a package's dependencies to
reduce the likelihood of cache busting. This is what we will be calling
"Interface Libraries" or "Interface Layers".
Gentoo Portage was originally built with the assumption that everything is
dynamically linked, and for the most part this remains true. When an executable
or library is linked, the linker verifies that all the necessary symbols are
provided by the specified libraries. In the case of shared libraries, the linker
doesn't actually need any of the data/code contained in the library, only the
exported symbols. This means that replacing the `.so` files with stub files that
only contain the exported symbols (the interface) allows us to avoid rebuilding
reverse dependencies when the library is modified, as long as the interface
remains consistent. The [llvm-ifs] tool does just that.
Take the following example:
```
┌──────────────────┐
│ │
│ crash-reporter │
│ │
└─────────┬────────┘
│DEPEND + RDEPEND
┌────────▼────────┐
│ │ /usr/lib64/libsegmentation.so
│ libsegmentation │ /usr/sbin/feature_check
│ │
└───────┬─────────┘
│DEPEND + RDEPEND
┌───▼───┐ /usr/sbin/vpd
│ vpd │ /usr/lib64/libvpd.so
└───┬───┘
│DEPEND + RDEPEND
┌──────▼─────┐
│ │ /usr/lib64/libflashrom.so.1.0.0
│ flashrom │ /usr/lib64/libflashrom.a
│ │ /sbin/flashrom
└────────────┘
```
When building `crash-reporter`, we need to install `libsegmentation`, `vpd`, and
`flashrom` into the ephemeral chroot. If `flashrom`'s' source changes, we need
to rebuild `vpd`, `libsegmentation`, and `crash-reporter`. This is both
inefficient, and unnecessary. If, when building `crash-reporter` we replace
`libsegmentation.so`, `libvpd.so`, and `libflashrom.so.1.0.0` with a stub, we
can insulate `crash-reporter` from changes to the library code.
We can take this a step further. ChromeOS leverages cross-compilation for board
packages (CBUILD != CHOST). The CHOST binaries aren't executable on the CBUILD
machine. If the binaries can't be executed, we can make the assumption that they
aren't used. This allows us to also strip out `/sbin/flashrom`, `/usr/sbin/vpd`,
and `/usr/sbin/feature_check`. This further insulates `crash-reporter` from any
code changes that occur in those binaries.
Additionally, we strip out any split-debug symbols and
`/usr/share/{man,doc,info}`. See [create_interface_layer] for the details.
While the above assumptions work most of the time, some packages only produce
static libraries or need to bundle the contents of other packages. For these
cases, we provide metadata annotations allowing a package to control interface
layer generation or consumption. See the metadata annotations below.
[llvm-ifs]: https://llvm.org/docs/CommandGuide/llvm-ifs.html
[create_interface_layer]: https://chromium.googlesource.com/chromiumos/bazel/+/HEAD/portage/bin/create_interface_layer/src/main.rs
## Declaring Bazel-specific ebuild/eclass metadata
Our Portage-to-Bazel translator (aka Alchemist) evaluates ebuilds and eclasses
to extract package metadata, such as package dependencies
(`DEPEND`/`RDEPEND`/etc.) and source dependencies (`CROS_WORKON_*`), and
translates them into `BUILD.bazel` files. While the standard metadata are enough
to build most packages, some packages benefit from additional metadata to build
efficiently or correctly under Bazel.
You can provide Bazel-specific metadata for certain ebuilds/eclasses by placing
TOML files in certain places:
* **ebuild**: Placed in the same directory as an ebuild, and named `$PN.toml`
(e.g. `cryptohome.toml` for `chromeos-base/cryptohome`).
* **eclass**: Placed in the same directory as an eclass, and named
`$ECLASS.toml` (e.g. `cros-workon.toml` for `cros-workon.eclass`).
When analyzing an ebuild file, Alchemist collects Bazel-specific metadata from
the TOML file associated with the ebuild, and those associated with eclasses
directly or indirectly inherited by the ebuild.
Below is the format of the TOML file:
```toml
# Copyright 2024 The ChromiumOS Authors
# Use of this source code is governed by a BSD-style license that can be
# found in the LICENSE file.
# This is the only top-level table allowed in the TOML file containing
# Bazel-specific metadata.
[bazel]
# Extra source code needed to build the package.
#
# First-party ebuilds should usually define CROS_WORKON_* variables and inherit
# cros-workon.eclass to declare source code dependencies. However, it's
# sometimes the case that a package build needs to depend on extra source code
# that are not declared in the ebuild's CROS_WORKON_*, e.g. some common build
# scripts. This is especially the case with eclasses because manipulating
# CROS_WORKON_* correctly in eclasses is not straightforward. Under the
# Portage-orchestrated build system, accessing those extra files is as easy as
# just hard-coding /mnt/host/source/..., but it's an error under the
# Bazel-orchestrated build system where source code dependencies are strictly
# managed.
#
# This metadata allows ebuilds and eclasses to define extra source code
# dependencies. Each element must be a label of a Bazel target defined with
# extra_sources rule from //bazel/portage/build_defs:extra_sources.bzl. The rule
# defines a set of files to be used as extra sources.
#
# When multiple TOML files set this metadata for a package, values are simply
# merged.
extra_sources = ["//platform2/common-mk:sources"]
# The package supports dynamically linking against interface only shared objects.
#
# When enabled (the default) this will result in all build-time dependencies of
# the package having their shared objects (.so) stripped of all code. All static
# libraries (.a) and executables (/bin, /usr/bin, etc) will also be omitted. By
# pruning the dependencies, the package will not have to rebuild unless the
# interface of the dependencies change.
#
# You must set this to `false` if your package performs any kind of static
# linking, otherwise the required files won't be present.
#
# Format: You can specify either `true`, `false`, or a shell expression. The
# shell expression is used to test USE flags. i.e., `use static` or
# `use !foo && use bar`.
#
# This value can also be declared on an `eclass` and it will apply to all
# packages that inherit from it.
#
# If multiple declarations are found, they are all ANDed together.
supports_interface_libraries = false
# or
supports_interface_libraries = "use !static"
# The package should generate interface libraries that can be used by the
# reverse dependencies of this package.
#
# When enabled, you will allow the reverse dependencies of this package to build
# using interface libraries (.so files that have had their code stripped). All
# binaries and static libraries are also pruned and not accessible to the
# reverse dependencies while building. This has the advantage of only cache
# busting the reverse dependencies when a library interface changes.
#
# You must set this to `false` if your package only generates static libraries,
# or switch your package to produce .so files instead. If your package is a
# hybrid and produces both .so and .a files, or .a and executables, you might
# consider using `interface_library_allowlist` to still get some benefits of
# interface libraries.
#
# Format: You can specify either `true`, `false`, or a shell expression. The
# shell expression is used to test USE flags. i.e., `use static` or
# `use !foo && use bar`.
#
# This value can also be declared on an `eclass` and it will propagate to all
# packages that inherit from it.
#
# If multiple declarations are found they are all ANDed together.
generate_interface_libraries = false
# Files generated by the package that should be copied without modifications
# into the interface layer for the package.
#
# Some packages produce small static libraries that need to be statically linked
# into the main executable. This is in addition to the `.so` that contains the
# bulk of the code. By default we strip all `.a` files when generating the
# interface layer from a package.
#
# For example, dev-libs/protobuf produces the following libraries:
# * /usr/lib64/libprotobuf.so
# * /usr/lib64/libprotobuf-lite.so
# * /usr/lib64/libutf8_validity.a
#
# `libutf8_validity` must be statically linked into all executables that
# dynamically link against `libprotobuf`.
#
interface_library_allowlist = [
"/usr/lib64/libutf8_validity.a",
]
```
## Bazel Build Event Services
Bazel supports uploading and persisting build/test events and top level outputs
(e.g. what was built, invocation command, hostname, performance metrics) to a
backend. These build events can then be visualized and accessed over a shareable
URL. These standardized backends are known as Build Event Services (BES), and the
events are defined in
[build_event_stream.proto](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/buildeventstream/proto/build_event_stream.proto).
Currently, BES uploads are enabled by default for all CI and local builds (for
Googlers). The URL is printed at the start and end of every invocation. For
example:
``` sh
$ BOARD=amd64-generic bazel test //bazel/rust/examples/...
(09:18:58) INFO: Invocation ID: 2dbec8dc-8dfe-4263-b0db-399a029b7dc7
(09:18:58) INFO: Streaming build results to: http://sponge2/2dbec8dc-8dfe-4263-b0db-399a029b7dc7
...
(09:19:13) INFO: Elapsed time: 16.542s, Critical Path: 2.96s
(09:19:13) INFO: 6 processes: 3 remote cache hit, 7 linux-sandbox.
Executed 5 out of 5 tests: 5 tests pass.
(09:19:13) INFO: Streaming build results to: http://sponge2/2dbec8dc-8dfe-4263-b0db-399a029b7dc7
```
The flags related to BES uploads are grouped behind the `--config=bes` flag,
defined in the common bazelrc file.
## Injecting prebuilt binary packages
In the case your work is blocked by some package build failures, you can
workaround them by injecting prebuilt binary packages via command line flags.
For every `ebuild` target under `@portage//internal/packages/...`, an associated
string flag target is defined. You can set a `gs://` URL of a prebuilt binary
package to inject it.
For example, to inject a prebuilt binary packages for `chromeos-chrome`, you can
set this option:
```
--@portage//internal/packages/stage1/target/board/chromiumos/chromeos-base/chromeos-chrome:114.0.5715.0_rc-r2_prebuilt=gs://chromeos-prebuilt/board/amd64-generic/postsubmit-R114-15427.0.0-49533-8783437624917045025/packages/chromeos-base/chromeos-chrome-114.0.5715.0_rc-r2.tbz2
```
*** note
You can run [generate_chrome_prebuilt_config.py] to generate the prebuilt config
for the current version of chromeos-chrome.
```sh
% BOARD=amd64-generic portage/tools/generate_chrome_prebuilt_config.py
```
***
When performing changes to `eclasses`, `build_packages`, `chromite` or other
things that cache bust large parts of the graph, it might be beneficial to pin
the binary packages for already built packages so you don't need to rebuild
them when iterating on your changes. You can use the [generate-stage2-prebuilts]
script to do this:
```sh
$ BOARD=amd64-generic ./bazel/portage/tools/generate-stage2-prebuilts
```
This will scan your `bazel-bin` directory for any existing binpkgs and copy them
to `~/.cache/binpkgs`. It will then generate a `prebuilts.bazelrc` that contains
various `--config` options. The `prebuilts.bazelrc` is invalid after you
`repo sync` since it contains package version numbers. Just re-run the script
after a `repo sync` to regenerate the `prebuilts.bazelrc` and it will pin the
packages with versions that still exist in your `bazel-bin`.
Running a build with pinned packages:
```sh
$ BOARD=amd64-generic bazel build --config=prebuilts/stage2-board-sdk @portage//target/sys-apps/attr
```
Googlers also have the option of using the artifacts generated by the snapshot
builders. The output gs bucket will contain a `prebuilts.bazelrc` that specifies
all the `CAS` locations for each package that was built by the builder. You can
download this and place it into your workspace root. You can also use the
following tool to fetch the `prebuilts.bazelrc` for your currently synced
manifest hash.
```sh
$ ./bazel/portage/tools/fetch-prebuilts --board "$BOARD"
```
[generate_chrome_prebuilt_config.py]: ../portage/tools/generate_chrome_prebuilt_config.py
[generate-stage2-prebuilts]: ../portage/tools/generate-stage2-prebuilts
## Extracting binary packages
In case you need to extract the contents of a binary package so you can easily
inspect it, you can use the `xpak split` CLI.
```sh
bazel run //bazel/portage/bin/xpak:xpak -- split --extract libffi-3.1-r8.tbz2 libusb-0-r2.tbz2
```