In ChromeOS, OS-level functionality (such as configuring network interfaces) is implemented by a collection of system services and provided to Chrome over D-Bus. These system services have greater system and hardware access than the Chrome browser.
Separating functionality like this aims to prevent malicious websites from gaining access to OS-level functionality. If Chrome were able to directly control network interfaces, then a compromise in Chrome would give an attacker almost full control over the system. For example, by having a separate network manager, we can reduce the functionality exposed to an attacker to just querying interfaces and performing pre-determined actions on them.
ChromeOS uses a few different mechanisms to isolate system services from Chrome and from each other. We use a helper program called Minijail (executable minijail0
). In most cases, Minijail is used in the service's init script. In other cases, the Minijail library is used if a service wants to apply restrictions to the programs that it launches, or to itself.
These different sandboxing mechanisms are described in the ChromeOS sandboxing talk (internal only).
The forbidden intersection is:
CAP_SYS_ADMIN
, andYou must avoid the forbidden intersection by having at least one of, preferably more than one of, and ideally all of:
CAP_SYS_ADMIN
— User IDsYou don't normally need both Seccomp and SELinux but for very security-sensitive workloads this can be required.
Just remember that code has bugs, and these bugs can be used to take control of the code. An attacker can then do anything the original code was allowed to do. Therefore, code should only be given the absolute minimum level of privilege needed to perform its function.
Aim to keep your code lean, and your privileges low. Don‘t run your service as root. If you need to use third-party code that you didn’t write, you should definitely not run it as root.
Use the libraries provided by the system/SDK. In ChromeOS, libchrome and libbrillo (née libchromeos) offer a lot of functionality to avoid reinventing the wheel, poorly. Don‘t reinvent IPC; use D-Bus or Mojo. Don’t open listening sockets; connect to the required service.
Don't (ab)use shell scripts. Shell script logic is harder to reason about and shell command-injection bugs are easy to miss. If you need functionality separated from your main service, use programs written in a primary programming language like C++ or Rust, not shell scripts. Moreover, when you execute them, consider further restricting their privileges.
exec minijail0 -u <user> -g <group> /full/path/to/binary
The first sandboxing mechanism is user IDs (UIDs). We try to run each service as its own UID, different from the root user, which allows us to restrict what files and directories the service can access, and also removes a big chunk of system functionality that's only available to root. For example, see the permission_broker
service's /etc/init/permission_broker.conf
file:
start on starting system-services stop on stopping system-services respawn # Run as 'devbroker' user. exec minijail0 -u devbroker -c 'cap_chown,cap_fowner+eip' -- \ /usr/bin/permission_broker
Minijail's -u
argument forces the target program (in this case permission_broker
) to be executed as the devbroker user, instead of root. This is equivalent of doing sudo -u devbroker
.
The user (devbroker
in this case) needs to first be added to the build system database (example for a different user).
Next, the user needs to be installed on the system (example, again for a different user).
See the ChromeOS user accounts README for more details.
There‘s a test in the CQ that keeps track of the users present on the system that request additional access (e.g. listing more than one user in a group). If your user does that, the test baseline has to be updated at the same time the accounts are added with another CL (example). If you’re unsure whether you need this, the CQ will reject your CL when the test fails, so if the tests pass, you should be good to go!
You can use Cq-Depend to land the CLs together (see How do I specify the dependencies of a change?).
The forbidden intersection notwithstanding, if a service accesses user data by having its UID in the chronos-access
group, it must run with an enforcing SELinux domain. These services are accessing data owned by the chronos user, which could be malicious. A compromised Chrome browser could modify or corrupt chronos-owned data and attempt to escalate privileges by exploiting or confusing a service accessing this data.
SELinux is useful because it provides finer-grained control over what the service is allowed to do. Exploitation in these cases doesn't happen via memory corruption. Instead, the attacker will set up user data to confuse the service and trick it into performing valid operations on the wrong filesystem objects. This is usually referred to as a confused deputy attack. For example, a service might be tricked into mounting what it believes is a USB drive. In reality, however, it ends up mounting a virtual image on top of an existing file or directory, bypassing our write-XOR-execute restrictions.
Seccomp is not a great option to prevent this type of attack because it's not granular enough. In the previous example, since the service is allowed to perform mounts, the mount(2)
system call has to be allowed. Seccomp does not have the ability to filter path arguments to system calls, so it would be impossible to restrict path arguments to the mount call. SELinux, on the other hand, allows us to restrict a service to only perform operations (like mount
calls) on specific paths in the filesystem.
Some programs, however, require some of the system access usually granted only to the root user. We use Linux capabilities for this. Capabilities allow us to grant a specific subset of root's privileges to an otherwise unprivileged process. The link above has the full list of capabilities that can be granted to a process. Some of them are equivalent to root, so we avoid granting those. In general, most processes need capabilities to configure network interfaces, access raw sockets, or performing specific file operations. Capabilities are passed to Minijail using the -c
switch. permission_broker
, for example, needs capabilities to be able to chown(2)
device nodes.
From permission_broker.conf
:
start on starting system-services stop on stopping system-services respawn # Run as <devbroker> user. # Grant CAP_CHOWN and CAP_FOWNER. exec minijail0 -u devbroker -c 'cap_chown,cap_fowner+eip' -- \ /usr/bin/permission_broker
Capabilities are expressed using the format that cap_from_text(3)
accepts.
Many resources in the Linux world can be isolated now such that a process has its own view of things. For example, it has its own list of mount points, and any changes it makes (unmounting, mounting more devices, etc...) are only visible to it. This helps keep a broken process from messing up the settings of other processes.
For more in-depth details, see the namespaces overview.
In ChromiumOS, we like to see every process/daemon run under as many unique namespaces as possible. Many are easy to enable/rationalize about: if you don't use a particular resource, then isolating it is straightforward. If you do rely on it though, it can take more effort.
Here‘s a quick overview. Use the command line option if the description below matches your service (or if you don’t know what functionality it‘s talking about -- most likely you aren’t using it!).
--profile=minimalistic-mountns
: This is a good first default that enables mount and process namespaces. This only mounts /proc
and creates a few basic device nodes in /dev
. If you need more things mounted, you can use the -b
(bind-mount) and -k
(regular mount) flags.--uts
: Just always turn this on. It makes changes to the host / domain name not affect the rest of the system.-e
: If your process doesn't need network access. This also isolates netlink and UNIX abstract sockets. Note: D-Bus and syslog use named UNIX sockets, so they will still be usable (as long as you bind-mounted them).-l
: If your process doesn't use SysV shared memory or IPC.-p
: If your process doesn't interact with other process PIDs (other than child processes).-N
: If your process doesn't need to modify common control groups settings.Almost all processes do not need to run in the init namespace. Please contact chromeos-security@ for a consultation if you believe that your process needs to run in the init mount or PID namespace.
When using many namespaces to isolate a service, there are some resources that the service still reasonably should be able to access.
If bind-mounting on top of /run, you need to mount a tmpfs /run:
-k 'none,/run,tmpfs,MS_NODEV|MS_NOEXEC|MS_NOSUID,mode=755,size=10M'
If bind-mounting on top of /sys, you need to mount a tmpfs /sys:
-k 'none,/sys,sysfs,MS_NODEV|MS_NOEXEC|MS_NOSUID,mode=755,size=10M'
-d
to mount a minimal /dev, you can pass access to the syslog daemon by using -b /dev/log
. If your process mounts all of /dev, you need to use -b /run/systemd/journal
since /dev/log is a symlink to /run/systemd/journal/dev-log. In either case, you do not need to specify the writable flag to -b
for this to work. This will work across all namespaces (including -e
network).-b /run/dbus
. You do not need to specify the writable flag to -b
for this to work. This will work across all namespaces (including -e
network and -p
PID).-b /run/dns-proxy
as that daemon maintains the /etc/resolv.conf
file.Note: Before utilizing these namespaces, please consult with the chromeos-security@ team to make sure it's used correctly.
ChromeOS Guest sessions run in an isolated mount namespace bound to the path /run/namespaces/mnt_chrome
. This means Chrome runs in the non-init mount namespace at the path and Cryptohome mounts user profile directories and Daemon stores in this namespace. Regular sessions, on the other hand, don't have this session isolation yet.
Any process that needs to access user data during a Guest session must run in the Chrome mount namespace. For regular sessions however the namespace isolation is not active. The namespace exists for all sessions, however regular user sessions setup the user home directories in the root mount namespace to handle data propagation between ARCVM, Linux VM and other system parts. Therefore processes must not enter the mount namespace during a regular user session.
Most processes that access user data utilize daemon stores, which are already mounted in the Chrome mount namespace. However, if a new process needs to access user data from user cryptohome by explicitly entering the Chrome mount namespace by calling setns(2) or nsenter(1), it can do so by querying the state of the session isolation from the browser process. Here is an example CL for this approach.
When creating a new mount or entering a new mount namespace, an important consideration is the mount propagation mode of the mount. Some background on mount types can be found in Linux kernel mount documentation, but a brief summary is:
MS_SHARED
) — allows mount/unmount events to flow in both directions between namespacesMS_SLAVE
) — only allows mount events to flow in from the parent namespaceMS_PRIVATE
) — allows no flow of mount eventsMS_UNBINDABLE
) — same as private + you also cannot bind-mount to this mount pointFor how to change the mount propagation mode when entering a new mount namespace see the section for -K[mode]
in minijail0(1)
.
Mounts need to be shared if and only if mount/unmount events need to flow between namespaces in both directions. If mounts only need to flow from one namespace to the other, then they must be shared on the parent namespace but can be mounts-flow-in on the child namespace. Making mounts shared increases the possibilities for interaction between processes that could undermine security of the system and user namespace separation.
Removing access to the filesystem and to root-only functionality is not enough to completely isolate a system service. A service running as its own UID and with no capabilities has access to a big chunk of the kernel API. The kernel therefore exposes a huge attack surface to non-root processes, and we would like to restrict what kernel functionality is available for sandboxed processes.
The mechanism we use is called Seccomp-BPF. Minijail can take a policy file that describes what syscalls will be allowed, what syscalls will be denied, and what syscalls will only be allowed with specific arguments. The full description of the policy file language can be found in [the syscall_filter.c
source].
Abridged policy for mtpd
on amd64 platforms:
# Copyright 2012 The ChromiumOS Authors # Use of this source code is governed by a BSD-style license that can be # found in the LICENSE file. read: 1 ioctl: 1 write: 1 timerfd_settime: 1 open: 1 poll: 1 close: 1 # Don't allow mmap with both PROT_WRITE and PROT_EXEC. mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE mremap: 1 munmap: 1 # Don't allow mprotect with PROT_EXEC. mprotect: arg2 in ~PROT_EXEC lseek: 1 # Allow socket(domain==PF_LOCAL) or socket(domain==PF_NETLINK) socket: arg0 == 0x1 || arg0 == 0x10 # Allow PR_SET_NAME from libchrome's base::PlatformThread::SetName() prctl: arg0 == 0xf
Any syscall not explicitly mentioned, when called, results in the process being killed. The policy file can also tell the kernel to fail the system call (returning -1 and setting errno
) without killing the process:
# execve: return EPERM execve: return 1
mmap
and mprotect
both have argument filters to prevent writeable executable memory since that makes certain classes of attacks much easier. In most cases mprotect
does not need PROT_EXEC
, but you might have to use arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
just like mmap
in cases where child processes are executed and need to dynamically link shared libraries or the code implements a JIT compiler.On kernels 4.14 and above we can use the new SECCOMP_RET_LOG
return value to make policy generation easier. On these kernels, the -L
Minijail option will use SECCOMP_RET_LOG
as the return value for blocked syscalls: those not listed in the policy or whose arguments don't match the policy. Instead of killing the process on a blocked syscall, the kernel will log the otherwise blocked syscall but will effectively allow it.
The advantage of this mechanism versus what we have available in pre-4.14 kernels is that instead of having to add syscalls to the policy one by one, you can run the process with -L
, get a list of all the syscalls not included in the policy, review them, and automatically generate or augment a policy, all in one step.
-L
flag requires minijail to be built with USE=cros-debug
. Generally, this means that it will not work out of the box in prebuilt (e.g Goldeneye, CPFE, etc.) OS images.Our recommended way of using this functionality is to start with an empty policy which will cause all syscalls to be logged-but-allowed. The resulting audit logs (at /var/log/audit/audit.log*
) can then be parsed with the generate_seccomp_policy.py script to automatically generate a policy. There's a bit of extra setup required and some associated caveats. Please see the detailed instructions at generate_seccomp_policy.py
's README section on using Linux audit logs to generate policy.
This mechanism can also be combined with the strace
-based mechanism below: run the process to be sandboxed under strace
, generate a base policy using the policy generation script, and then refine it using -L
.
This is the old and familiar way of generating policies by inspecting syscalls using strace
. It does not have any kernel version dependencies and also does not require a minijail build with USE=cros-debug
. Similar to audit logs above, the generate_seccomp_policy.py script can accept strace logs from an unsandboxed process to generate a policy.
strace
logs for arm64, make sure you‘re running it in arm64 userland as most devices that support arm64 kernels run 32-bit arm userland by default. The image running on the device should be built for 64-bit arm userland e.g. kevin device can run the image built for 32-bit arm userland with the --board=kevin
flag and run the image for 64-bit arm userland built with the --board=kevin64
flag. You can run file -L /bin/sh
command to check which environment you’re running on.strace -f -o strace.log <program>
When sandboxing a dynamically-linked executable, Minijail will default to using LD_PRELOAD
to install the seccomp filter. This will install the filter after glibc initialization, so remove the syscalls related to glibc initialization to obtain a smaller filter (and a tighter sandbox). Those are normally everything up to and including the following:
rt_sigaction(SIGRTMIN, {<sa_handler>, [], SA_RESTORER|SA_SIGINFO, <sa_restorer>}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {<sa_handler>, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, <sa_restorer>}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 brk(NULL) = <addr> brk(<addr>) = <addr>
If you want to collect strace
logs for an existing service and you already have a test device set up, the following steps can help especially if you are extending an existing policy:
mount -o remount,rw /
/etc/init
strace -f -o /tmp/strace.log
before Minijail is invoked. Note that this will include all the minijail0
syscalls as well, but you can exclude them later.initctl reload-configuration
and restart the service./tmp/strace.log
.~/chromiumos/src/platform/minijail/tools/generate_seccomp_policy.py strace.log > $PROGRAM_NAME.policy
Test the policy:
minijail0 -S seccomp.policy -L <cmd>
/var/log/audit/audit.log
(as discussed above)./var/log/messages
.You should ensure that your service is executing as many of its code paths as possible when executing strace
or auditing, in particular error paths. In some cases it may be easier to add syscalls manually (for instance, abort
) rather than forcing execution of those paths.
When you‘re collecting strace logs to create a Seccomp policy, make sure you’re in the targeted userland. You can check the environment by running file -L /bin/sh
command.
When not using the -n
Minijail flag, privilege-dropping syscalls happen after the filter is installed, and they won‘t show up in normal program execution because they’re called by Minijail itself. Because the audit log method uses Minijail, this is only applicable to the strace
method. The following syscalls need to be added to the policy:
setgroups(2)
, setresgid(2)
, and setresuid(2)
for dropping root.capget(2)
, capset(2)
, and prctl(2)
for dropping capabilities.-n
flag (no_new_privs
), which prevents the sandboxed process from obtaining new privileges and is therefore a good addition for sandboxing.Sometimes Minijail will fail to compile the seccomp filter with an error similar to:
WARNING minijail0[32315]: libminijail[32315]: trailing garbage after constant: 'LOOP_GET_STATUS64' WARNING minijail0[32315]: libminijail[32315]: compile_atom: /usr/share/policy/e2fsck-seccomp.policy(13): invalid constant 'LOOP_GET_STATUS64' WARNING minijail0[32315]: libminijail[32315]: could not allocate filter block WARNING minijail0[32315]: libminijail[32315]: compile_filter: compile_file() failed ERR minijail0[32315]: libminijail[32315]: failed to compile seccomp filter BPF program in '/usr/share/policy/e2fsck-seccomp.policy'
This means that one of the constant parameters provided to a syscall could not be resolved: e.g. ioctl: arg1 == LOOP_GET_STATUS64
.
To fix this, look up the hex value of the constant and substitute the constant e.g. ioctl: arg1 == 0x4C05
.
Minijail resolves these constants based on headers that are in gen_constants-inl.h.
Some syscalls may be triggered more deterministically on VMs than in tests on real hardware (e.g. clock_gettime
and gettimeofday
). Developers should be on the lookout for such failures and the Seccomp policies should be adjusted accordingly.
If a process violates its seccomp policy, it‘ll be terminated with SIGSYS
(bad system call) and you’ll see a message like this in /var/log/messages
:
WARNING minijail0[1415]: libminijail[1415]: child process 1417 had a policy violation (/usr/share/policy/foo.policy)
There are a couple of ways to find out what syscall caused the violation:
If you can deploy a debug build of Minijail (the chromeos-base/minijail
package built with USE="cros-debug"
) to the device, and add -L
to the Minijail command. Information on failing syscalls will be logged to /var/log/audit/audit.log
. (Kernel version 4.14 or above is required for this to work.) Note that -L
also stops policy violations from crashing the process, so execution will succeed.
Many other things are logged to audit.log, so to locate the messages related to your binary, look for messages starting type=SECCOMP
. For example, here's a violation message caused by the true
command making syscall 157:
type=SECCOMP msg=audit(1641860367.246:224): auid=0 uid=0 gid=0 ses=2 subj=u:r:minijail:s0 pid=13686 comm="true" exe="/usr/bin/coreutils" sig=0 arch=c000003e syscall=157 compat=0 ip=0x7a28525bde35 code=0x7ffc0000
(Note that longer command names may be truncated in the comm=
field.)
If you have a core dump or minidump (such as the .core
file found in /var/spool/crash/
, or the minidump attached to a crash report), you can open it in a debugger and dump the register values (e.g. with info registers
in gdb or register read
in lldb). Check the syscall calling conventions for your architecture to determine which registers contain the syscall number and arguments. (Note: 64-bit ARM systems (arm64, aarch64) run userspace programs in 32-bit mode, so you'll want to follow the arm calling conventions in that case.)
Once you have the syscall number, find out the name of the syscall by looking it up our in online syscalls table or in minijail0 -H
(run on the same architecture as the program you're debugging). You can then add the syscall to your policy.
If you do not want to allow an entire syscall, you can only allow certain parameters, e.g. ioctl: arg1 == FDGETPRM
. You can find the values of these parameters from a core or minidump using a debugger, as described above.
The policy file needs to be installed in the system, so we need to add it to the ebuild file. For example:
# Install seccomp policy file. insinto /usr/share/policy use seccomp && newins "mtpd-seccomp-${ARCH}.policy" mtpd-seccomp.policy
And finally, the policy file has to be passed to Minijail, using the -S
option. Again, using mtpd as an example:
# use minijail (drop root, set no_new_privs, set seccomp filter). # Mount /proc, /sys, /dev, /run/udev so that USB devices can be # discovered. Also mount /run/dbus to communicate with D-Bus. exec minijail0 -i -I -p -l -r -v -t -u mtp -g mtp -G \ -P /mnt/empty -b / -b /proc -b /sys -b /dev \ -k tmpfs,/run,tmpfs,0xe -b /run/dbus -b /run/udev \ -n -S /usr/share/policy/mtpd-seccomp.policy -- \ /usr/sbin/mtpd -minloglevel="${MTPD_MINLOGLEVEL}"
Some daemons store user data on the user‘s cryptohome under /home/.shadow/<user_hash>/mount/root/<daemon_name>
or equivalently /home/root/<user_hash>/<daemon_name>
. For instance, Session Manager stores user policy under /home/root/<user_hash>/session_manager/policy
. This is useful if the data should be protected from other users since the user’s cryptohome is only mounted (and therefore decrypted) when the user logs in. If the user is not logged in, it is encrypted with the user's password.
However, if a daemon is already running inside a mount namespace (minijail0 -v ...
) when the user's cryptohome is mounted, it does not see the mount since mount events do not propagate into mount namespaces by default. This propagation can be achieved, though, by making the parent mount a shared mount and the corresponding mount inside the namespace a shared or MS_SLAVE
mount. See shared subtrees.
To set up a cryptohome daemon store folder that propagates into your daemon‘s mount namespace, add this code to the src_install
section of your daemon’s ebuild:
local daemon_store="/etc/daemon-store/<daemon_name>" dodir "${daemon_store}" fperms 0700 "${daemon_store}" fowners <daemon_user>:<daemon_group> "${daemon_store}"
This directory is never used directly. It merely serves as a secure template for the chromeos_startup
script, which picks it up and creates /run/daemon-store/<daemon_name>
as a shared mount.
Next, move the user/group setup to pkg_setup()
since pkg_preinst()
, where this is usually done, runs after src_install()
:
pkg_setup() { # Has to be done in pkg_setup() instead of pkg_preinst() since # src_install() needs <daemon_user> and <daemon_group>. enewuser <daemon_user> enewgroup <daemon_group> cros-workon_pkg_setup }
In your daemon's init script, mount the daemon store folder as MS_SLAVE
in your mount namespace. Be sure not to mount all of /run
. Make sure to mount with the MS_REC
flag to propagate any already-mounted cryptohome bind mounts into the mount namespace.
minijail0 -v -Kslave \ -k 'tmpfs,/run,tmpfs,MS_NOSUID|MS_NODEV|MS_NOEXEC' \ -k '/run/daemon-store/<daemon_name>,/run/daemon-store/<daemon_name>,none,MS_BIND|MS_REC' \ ...
During sign-in, when the user's cryptohome is mounted, Cryptohome creates /home/.shadow/<user_hash>/mount/root/<daemon_name>
, bind-mounts it to /run/daemon-store/<daemon_name>/<user_hash>
and copies ownership and mode from /etc/daemon-store/<daemon_name>
to the bind target. Since /run/daemon-store/<daemon_name>
is a shared mount outside of the mount namespace and a MS_SLAVE
mount inside, the mount event propagates into the daemon.
Your daemon can now use /run/daemon-store/<daemon_name>/<user_hash>
to store user data once the user's cryptohome is mounted. Note that even though /run/daemon-store
is on a tmpfs, your data is actually stored on disk and not lost on reboot.
Be sure not to write to the folder before the cryptohome is mounted. Consider listening to Session Manager's SessionStateChanged
signal or similar to detect mount events. Note that /run/daemon-store/<daemon_name>/<user_hash>
might exist even though cryptohome is not mounted, so testing existence is not enough (it only works the first time).
The <user_hash>
can be retrieved with cryptohome's GetSanitizedUsername
D-Bus method.
The following diagram illustrates the mount event propagation:
Landlock is a Linux Security Module that helps manage filesystem access, notably even for unprivileged processes. Minijail supports options to help manage a Landlock policy.
Policies consist of an allowlist of paths, and the specific permissions for a given path. Minijail includes the following flags to help set up a policy:
--fs-default-paths
: This is recommended for most services, and provides access to basic system resources, such as the ability to execute shared object libraries in /lib64
.--fs-path-ro
: Provides read-only access for a path.--fs-path-rx
: Provides read and execute access for a path.--fs-path-rw
: Provides read and basic write access for a path.--fs-path-advanced-rw
: Provides read and advanced write access for a path. In most cases, basic write access is sufficient, but if you need special capabilities such as creating symlinks, use this option. The full set of additional capabilities is:LANDLOCK_ACCESS_FS_MAKE_CHAR
LANDLOCK_ACCESS_FS_MAKE_DIR
LANDLOCK_ACCESS_FS_MAKE_REG
LANDLOCK_ACCESS_FS_MAKE_SOCK
LANDLOCK_ACCESS_FS_MAKE_FIFO
LANDLOCK_ACCESS_FS_MAKE_BLOCK
LANDLOCK_ACCESS_FS_MAKE_SYM
The objective of Landlock is reducing process interactions via the filesystem. As such, creating an overly broad policy that includes RW access to all of /run
and /var
would substantially diminish the security benefits of Landlock.
Instead, allow a minimal set of paths that you need. For example, if you need to access D-Bus, consider allowing /run/dbus
.
Policies are based on the state of the filesystem when a policy is applied, rather than a string comparison against path names. Internally, Landlock looks at the inodes that exist when the sandbox is entered, so if you need to create new files or directories you’ll want to specify a Landlock policy that includes RW access one directory level above.
Landlock cannot be used if the sandboxed process needs to modify its filesystem topology, specifically via mount(2)
or pivot_root(2)
. For additional background, see the official Landlock documentation.
Below is an example Landlock config file, for a process that needs access to D-Bus and needs to write to a file in /var/lib/example_daemon
. If the config file is named example_daemon.conf
, you can pass it to Minijail using --config=example_daemon.conf
.
% minijail-config-file v0 # Filesystem access rules. fs-default-paths fs-path-rw = /run/dbus fs-path-rw = /var/lib/example_daemon # Other Minijail options....
The Minijail wrappers are currently deprecated. They were designed to allow mocking of individual Minijail configuration settings but we concluded that this was the wrong level to mock Minijail. The mocks were fragile and wordy. A better way to mock Minijail is to just abstract away the entire sandboxed process execution. An example of this can be found in the SandboxedProcess class in debugd.