The Linux sandbox provides an API for restricting the capabilities of a process. The overall design philosophy of the sandbox is documented elsewhere; this document explains how it works on Linux.
There are several different sandboxing mechanisms available on Linux:
Chromium chooses which mechanisms to use based on which kernel features are available. We also generally use multiple layers of sandboxing, to achieve both confinement for the process and reduction of the exposed kernel attack surface. Of these mechanisms, Chrome uses:
And we used to use, but no longer use:
The setuid(2) sandbox takes advantage of the fact that privileged processes on Linux are allowed to create new namespaces (see namespaces(7)) and sandboxes the renderer by creating empty namespaces for it at launch time. It relies on a setuid binary, usually installed at /opt/google/chrome/chrome-sandbox
, which:
CLONE_FS
) and having that helper chroot() it, which leaves the process in an empty, readonly root directory.prctl(2)
, which prevents any process without CAP_SYS_PTRACE
from tracing it. In theory this would keep renderers from debugging each other, but in practice they are isolated from each other by PID namespaces anyway.In general, the setuid sandbox makes an effort to apply all these mitigations, but support for them varies between kernel versions, so the strength of the setuid sandbox is variable, with newer kernels providing better security.
The setuid sandbox is implemented in suid/.
If you need to disable it, you can use --disable-setuid-sandbox
. You should also see docs/linux/suid_sandbox_development.md for advice on developing the setuid sandbox itself.
On modern Linuxes, we use the filter mode of seccomp(2), which allows us to supply a program (written in a domain-specific language called “BPF”, see bpf(2) and bpfc(1)) which is evaluated every time the sandboxed process makes a syscall to figure out whether the syscall should be allowed. The seccomp filters are compiled and applied “early” in the syscall process, so this both constrains what the process can do and reduces attack surface of the kernel.
The seccomp sandbox is implemented in seccomp-bpf/, and our tools for working with the BPF DSL are in bpf_dsl/. The actual baseline policies we use are in seccomp-bpf-helpers/.
Since the seccomp sandbox has a filter that is applied to all syscalls being made, to use it you must have an exhaustive list of syscalls that could be made by the code being sandboxed - both code you did write and code you didn't write. Generating that list of syscalls can be difficult and so it is helpful to have very good test coverage which runs under the sandbox to ensure you are exercising any code paths that could lead to syscalls.
We currently don‘t use Landlock, but we’d like to: 345514921.