| Bugfixes: |
| |
| * Many manager configuration settings that are only applicable to user |
| manager or system manager can be always set. It would be better to reject |
| them when parsing config. |
| |
| * Jun 01 09:43:02 krowka systemd[1]: Unit user@1000.service has alias user@.service. |
| Jun 01 09:43:02 krowka systemd[1]: Unit user@6.service has alias user@.service. |
| Jun 01 09:43:02 krowka systemd[1]: Unit user-runtime-dir@6.service has alias user-runtime-dir@.service. |
| |
| External: |
| |
| * Fedora: add an rpmlint check that verifies that all unit files in the RPM are listed in %systemd_post macros. |
| |
| * dbus: |
| - natively watch for dbus-*.service symlinks (PENDING) |
| - teach dbus to activate all services it finds in /etc/systemd/services/org-*.service |
| |
| * fedora: suggest auto-restart on failure, but not on success and not on coredump. also, ask people to think about changing the start limit logic. Also point people to RestartPreventExitStatus=, SuccessExitStatus= |
| |
| * neither pkexec nor sudo initialize environ[] from the PAM environment? |
| |
| * fedora: update policy to declare access mode and ownership of unit files to root:root 0644, and add an rpmlint check for it |
| |
| * register catalog database signature as file magic |
| |
| * zsh shell completion: |
| - <command> <verb> -<TAB> should complete options, but currently does not |
| - systemctl add-wants,add-requires |
| - systemctl reboot --boot-loader-entry= |
| |
| * systemctl status should know about 'systemd-analyze calendar ... --iterations=' |
| * If timer has just OnInactiveSec=..., it should fire after a specified time |
| after being started. |
| |
| * write blog stories about: |
| - hwdb: what belongs into it, lsusb |
| - enabling dbus services |
| - how to make changes to sysctl and sysfs attributes |
| - remote access |
| - how to pass throw-away units to systemd, or dynamically change properties of existing units |
| - auto-restart |
| - how to develop against journal browsing APIs |
| - the journal HTTP iface |
| - non-cgroup resource management |
| - dynamic resource management with cgroups |
| - refreshed, longer missions statement |
| - calendar time events |
| - init=/bin/sh vs. "emergency" mode, vs. "rescue" mode, vs. "multi-user" mode, vs. "graphical" mode, and the debug shell |
| - how to create your own target |
| - instantiated apache, dovecot and so on |
| - hooking a script into various stages of shutdown/early boot |
| |
| Regularly: |
| |
| * look for close() vs. close_nointr() vs. close_nointr_nofail() |
| |
| * check for strerror(r) instead of strerror(-r) |
| |
| * pahole |
| |
| * set_put(), hashmap_put() return values check. i.e. == 0 does not free()! |
| |
| * use secure_getenv() instead of getenv() where appropriate |
| |
| * link up selected blog stories from man pages and unit files Documentation= fields |
| |
| Janitorial Clean-ups: |
| |
| * rework mount.c and swap.c to follow proper state enumeration/deserialization |
| semantics, like we do for device.c now |
| |
| * get rid of prefix_roota() and similar, only use chase() and related |
| calls instead. |
| |
| * get rid of basename() and replace by path_extract_filename() |
| |
| * Replace our fstype_is_network() with a call to libmount's mnt_fstype_is_netfs()? |
| Having two lists is not nice, but maybe it's now worth making a dependency on |
| libmount for something so trivial. |
| |
| * drop set_free_free() and switch things over from string_hash_ops to |
| string_hash_ops_free everywhere, so that destruction is implicit rather than |
| explicit. Similar, for other special hashmap/set/ordered_hashmap destructors. |
| |
| * generators sometimes apply C escaping and sometimes specifier escaping to |
| paths and similar strings they write out. Sometimes both. We should clean |
| this up, and should probably always apply both, i.e. introduce |
| unit_file_escape() or so, which applies both. |
| |
| * xopenat() should pin the parent dir of the inode it creates before doing its |
| thing, so that it can create, open, label somewhat atomically. |
| |
| Deprecations and removals: |
| |
| * Remove any support for booting without /usr pre-mounted in the initrd entirely. |
| Update INITRD_INTERFACE.md accordingly. |
| |
| * remove cgroups v1 support EOY 2023. As per |
| https://lists.freedesktop.org/archives/systemd-devel/2022-July/048120.html |
| and then rework cgroupsv2 support around fds, i.e. keep one fd per active |
| unit around, and always operate on that, instead of cgroup fs paths. |
| |
| * drop support for kernels that lack ambient capabilities support (i.e. make |
| 4.3 new baseline). Then drop support for "!!" modifier for ExecStart= which |
| is only supported for such old kernels. |
| |
| * drop support for kernels lacking memfd_create() (i.e. make 3.17 new |
| baseline), then drop all pipe() based fallbacks. |
| |
| * drop support for getrandom()-less kernels. (GRND_INSECURE means once kernel |
| 5.6 becomes our baseline). See |
| https://github.com/systemd/systemd/pull/24101#issuecomment-1193966468 for |
| details. Maybe before that: at taint-flags/warn about kernels that lack |
| getrandom()/environments where it is blocked. |
| |
| * drop support for LOOP_CONFIGURE-less loopback block devices, once kernel |
| baseline is 5.8. |
| |
| * drop fd_is_mount_point() fallback mess once we can rely on |
| STATX_ATTR_MOUNT_ROOT to exist i.e. kernel baseline 5.8 |
| |
| * Remove /dev/mem ACPI FPDT parsing when /sys/firmware/acpi/fpdt is ubiquitous. |
| That requires distros to enable CONFIG_ACPI_FPDT, and have kernels v5.12 for |
| x86 and v6.2 for arm. |
| |
| * Once baseline is 4.13, remove support for INTERFACE_OLD= checks in "udevadm |
| trigger"'s waiting logic, since we can then rely on uuid-tagged uevents |
| |
| Features: |
| |
| * our logging tools should look for $DEBUG_INVOCATION and consider equivalent |
| to $SYSTEMD_LOG_LEVEL=debug |
| |
| * Teach systemd-ssh-generator to generated an /run/issue.d/ drop-in telling |
| users how to connect to the system via the AF_VSOCK, as per: |
| https://github.com/systemd/systemd/issues/35071#issuecomment-2462803142 |
| |
| * maybe introduce an OSC sequence that signals when we ask for a password, so |
| that terminal emulators can maybe connect a password manager or so, and |
| highlight things specially. |
| |
| * Port pidref_namespace_open() to use PIDFD_GET_MNT_NAMESPACE and related |
| ioctls to get nsfds directly from pidfds. |
| |
| * start using STATX_SUBVOL in btrfs_is_subvol(). Also, make use of it |
| generically, so that image discovery recognizes bcachefs subvols too. |
| |
| * format-table: introduce new cell type for strings with ansi sequences in |
| them. display them in regular output mode (via strip_tab_ansi()), but |
| suppress them in json mode. |
| |
| * machined: when registering a machine, also take a relative cgroup path, |
| relative to the machine's unit. This is useful when registering unpriv |
| machines, as they might sit down the cgroup tree, below a cgroup delegation |
| boundary. Then, install an inotify watch on that cgroup to track when the |
| machine's local cgroup goes down. |
| |
| * resolved: report ttl in resolution replies if we know it. This data is useful |
| for tools such as wireguard which want to periodically re-resolve DNS names, |
| and might want to use the TTL has hint for that. |
| |
| * journald: beef up ClientContext logic to store pidfd_id of peer, to validate |
| we really use the right cache entry |
| |
| * journald: log client's pidfd id as a new automatic field _PIDFDID= or so. |
| |
| * journald: split up ClientContext cache in two: one cache keyed by pid/pidfdid |
| with process information, and another one keyed by cgroup path/cgroupid with |
| cgroup information. This way if a service consisting of many logging |
| processes can take benefit of the cgroup caching. |
| |
| * system lsmbpf policy that prohibits creating files owned by "nobody" |
| system-wide |
| |
| * system lsmpbf policy that prohibits creating or opening device nodes outside |
| of devtmpfs/tmpfs, except if they are the pseudo-devices /dev/null, |
| /dev/zero, /dev/urandom and so on. |
| |
| * system lsmbpf policy that enforces that block device backed mounts may only |
| be established on top of dm-crypt or dm-verity devices, or an allowlist of |
| file systems (which should probably include vfat, for compat with the ESP) |
| |
| * $LISTEN_PID, $MAINPID and $SYSTEMD_EXECPID env vars that the service manager |
| sets should be augmented with $LISTEN_PIDFDID, $MAINPIDFDID and |
| $SYSTEMD_EXECPIDFD (and similar for other env vars we might send). |
| |
| * port copy.c over to use LabelOps for all labelling. |
| |
| * port remaining getmntent() users over to libmount. There are subtle |
| differences in the parsers (see #25371 for example), and it hence makes sense |
| if we stick to one set of parsers on this, not mix both. |
| |
| * run0 and run0 --user=root have different effect on tty ownership? |
| |
| * get rid of compat with libidn.so.11 (retain only for libidn.so.12) |
| |
| * get rid of compat with libbpf.so.0 (retainly only for libbpf.so.1) |
| |
| * define a generic "report" varlink interface, which services can implement to |
| provide health/statistics data about themselves. then define a dir somewhere |
| in /run/ where components can bind such sockets. Then make journald, logind, |
| and pid1 itself implement this and expose various stats on things there. Then |
| issue parallel calls to these interfaces from the systemd-report tool, |
| combine into one json document, and include measurement logs and tpm |
| quote. tpm quote should protect the json doc via the nonce field |
| studd. Allow shipping this off elsewhere for analyze. |
| |
| * The bind(AF_UNSPEC) construct (for resetting sockets to their initial state) |
| should be blocked in many cases because it punches holes in many sandboxes. |
| |
| * find a nice way to opt-in into auto-masking SIGCHLD on first |
| sd_event_add_child(), and then get rid of many more explicit sigprocmask() |
| calls. |
| |
| * introduce new structure Tpm2CombinedPolicy, that combines the various TPm2 |
| policy bits into one structure, i.e. public key info, pcr masks, pcrlock |
| stuff, pin and so on. Then pass that around in tpm2_seal() and tpm2_unseal(). |
| |
| * look at nsresourced, mountfsd, homed, importd, and try to come up with a way |
| how the forked off worker processes can be moved into transient services with |
| sandboxing, without breaking notify socket stuff and so on. |
| |
| * replace all \x1b, \x1B, \033 C string escape sequences in our codebase with a |
| more readable \e. It's a GNU extension, but a ton more readable than the |
| others, and most importantly it doesn't result in confusing errors if you |
| suffix the escape sequence with one more decimal digit, because compilers |
| think you might actually specify a value outside the 8bit range with that. |
| |
| * homed: allow login via username + realm on getty/login prompt. Then rewrite |
| the user name in the PAM stack |
| |
| * homed/userdb: add "aliases" field to user record, which can alternatively be |
| used for logging in. Rewrite user name in the PAM stack once acquired. |
| |
| * confext/sysext: instead of mounting the overlayfs directly on /etc/ + /usr/, |
| insert an intermediary bind mount on itself there. This has the benefit that |
| services where mount propagation from the root fs is off, an still have |
| confext/sysext propagated in. |
| |
| * generic interface for varlink for setting log level and stuff that all our daemons can implement |
| |
| * maybe teach repart.d/ dropins a new setting MakeMountNodes= or so, which is |
| just like MakeDirectories=, but uses an access mode of 0000 and sets the +i |
| chattr bit. This is useful as protection against early uses of /var/ or /tmp/ |
| before their contents is mounted. |
| |
| * go through all uses of table_new() in our codebase, and make sure we support |
| all three of: |
| 1. --no-legend properly |
| 2. --json= properly |
| 3. --no-pager properly |
| |
| * go through all --help texts in our codebases, and make sure: |
| 1. the one sentence description of the tool is highlighted via ANSI how we |
| usually do it |
| 2. If more than one or two commands are supported (as opposed to switches), |
| separate commands + switches from each other, using underlined --help sections. |
| 3. If there are many switches, consider adding additional --help sections. |
| |
| * go through our codebase, and convert "vertical tables" (i.e. things such as |
| "systemctl status") to use table_new_vertical() for output |
| |
| * pcrlock: add support for multi-profile UKIs |
| |
| * logind: when logging in use new tmpfs quota support to configure quota on |
| /tmp/ + /dev/shm/. But do so only in case of tmpfs, because otherwise quota |
| is persistent and any persistent settings mean we don#t have to reapply them. |
| |
| * initrd: when transitioning from initrd to host, validate that |
| /lib/modules/`uname -r` exists, refuse otherwise |
| |
| * signed bpf loading: to address need for signature verification for bpf |
| programs when they are loaded, and given the bpf folks don't think this is |
| realistic in kernel space, maybe add small daemon that facilitates this |
| loading on request of clients, validates signatures and then loads the |
| programs. This daemon should be the only daemon with privs to do load BPF on |
| the system. It might be a good idea to run this daemon already in the initrd, |
| and leave it around during the initrd transition, to continue serve requests. |
| Should then live in its own fs namespace that inherits from the initrd's |
| fs tree, not from the host, to isolate it properly. Should set |
| PR_SET_DUMPABLE so that it cannot be ptraced from the host. Should have |
| CAP_SYS_BPF as only service around. |
| |
| * add a mechanism we can drop capabilities from pid1 *before* transitioning |
| from initrd to host. i.e. before we transition into the slightly lower trust |
| domain that is the host systems we might want to get rid of some caps. |
| Example: CAP_SYS_BPF in the signed bpf loading logic above. (We already have |
| CapabilityBoundingSet= in system.conf, but that is enforced when pid 1 |
| initializes, rather then when it transitions to the next.) |
| |
| * maybe add a new standard slice where process that are started in the initrd |
| and stick around for the whole system runtime (i.e. root fs storage daemons, |
| the bpf loader daemon discussed above, and such) are placed. maybe |
| protected.slice or so? Then write docs that suggest that services like this |
| set Slice=protected.sice, RefuseManualStart=yes, RefuseManualStop=yes and a |
| couple of other things. |
| |
| * add feature to xopenat() that implements O_REGULAR in userspace: i.e. let's |
| open the inode via O_PATH first, then validate its type, and then convert to |
| proper fd via fd_reopen() |
| |
| * rough proposed implementation design for remote attestation infra: add a tool |
| that generates a quote of local PCRs and NvPCRs, along with synchronous log |
| snapshot. use "audit session" logic for that, so that we get read-outs and |
| signature in one step. Then turn this into a JSON object. Use the "TCG TSS 2.0 |
| JSON Data Types and Policy Language" format to encode the signature. And CEL |
| for the measurement log. |
| |
| * creds: add a new cred format that reused the JSON structures we use in the |
| LUKS header, so that we get the various newer policies for free. |
| |
| * drop PCR 7 from default PCR mask in credentials and LUKS2 enrollments |
| |
| * systemd-analyze: port "pcrs" verb to talk directly to TPM device, instead of |
| using sysfs interface (well, or maybe not, as that would require privileges?) |
| |
| * pcrextend/tpm2-util: add a concept of "rotation" to event log. i.e. allow |
| trailing parts of the logs if time or disk space limit is hit. Protect the |
| boot-time measurements however (i.e. up to some point where things are |
| settled), since we need those for pcrlock measurements and similar. When |
| deleting entries for rotation, place an event that declares how many items |
| have been dropped, and what the hash before and after that. |
| |
| * measure information about all DDIs as we activate them to an NvPCR. We |
| probably should measure the dm-verity root hash from the kernel side, but |
| DDI meta info from userspace. |
| |
| * rework tpm2_parse_pcr_argument_to_mask() to refuse literal hash value |
| specifications. They are currently parsed but ignored. We should refuse them |
| however, to not confuse people. |
| |
| * use name_to_handle_at() with AT_HANDLE_FID instead of .st_ino (inode |
| number) for identifying inodes, for example in copy.c when finding hard |
| links, or loop-util.c for tracking backing files, and other places. |
| |
| * cryptenroll/cryptsetup/homed: add unlock mechanism that combines tpm2 and |
| fido2, as well as tpm2 + ssh-agent, inspired by ChromeOS' logic: encrypt the |
| volume key with the TPM, with a policy that insists that a nonce is signed by |
| the fido2 device's key or ssh-agent key. Thus, add unlock/login time the TPM |
| generates a nonce, which is sent as a challenge to the fido2/ssh-agent, which |
| returns a signature which is handed to the tpm, which then reveals the volume |
| key to the PC. |
| |
| * cryptenroll/cryptsetup/homed: similar to this, implement TOTP backed by TPM. |
| |
| * expose the handoff timestamp fully via the D-Bus properties that contain |
| ExecStatus information |
| |
| * properly serialize the ExecStatus data from all ExecCommand objects |
| associated with services, sockets, mounts and swaps. Currently, the data is |
| flushed out on reload, which is quite a limitation. |
| |
| * Clean up "reboot argument" handling, i.e. set it through some IPC service |
| instead of directly via /run/, so that it can be sensible set remotely. |
| |
| * userdb: add concept for user "aliases", to cover for cases where you can log |
| in under the name lennart@somenetworkfsserver, and it would automatically |
| generate a local user, and from the one both names can be used to allow |
| logins into the same account. |
| |
| * systemd-tpm2-support: add a some logic that detects if system is in DA |
| lockout mode, and queries the user for TPM recovery PIN then. |
| |
| * systemd-repart should probably enable btrfs' "temp_fsid" feature for all file |
| systems it creates, as we have no interest in RAID for repart, and it should |
| make sure that we can mount them trivially everywhere. |
| |
| * systemd-nspawn should get the same SSH key support that vmspawn now has. |
| |
| * move documentation about our common env vars (SYSTEMD_LOG_LEVEL, |
| SYSTEMD_PAGER, …) into a man page of its own, and just link it from our |
| various man pages that so far embed the whole list again and again, in an |
| attempt to reduce clutter and noise a bid. |
| |
| * vmspawn switch default swtpm PCR bank to SHA384-only (away from SHA256), at |
| least on 64bit archs, simply because SHA384 is typically double the hashing |
| speed than SHA256 on 64bit archs (since based on 64bit words unlike SHA256 |
| which uses 32bit words). |
| |
| * In vmspawn/nspawn/machined wait for X_SYSTEMD_UNIT_ACTIVE=ssh-active.target |
| and X_SYSTEMD_SIGNALS_LEVEL=2 as indication whether/when SSH and the POSIX |
| signals are available. Similar for D-Bus (but just use sockets.target for |
| that). Report as property for the machine. |
| |
| * teach nspawn/machined a new bus call/verb that gets you a |
| shell in containers that have no sensible pid1, via joining the container, |
| and invoking a shell directly. Then provide another new bus call/vern that is |
| somewhat automatic: if we detect that pid1 is running and fully booted up we |
| provide a proper login shell, otherwise just a joined shell. Then expose that |
| as primary way into the container. |
| |
| * make vmspawn/nspawn/importd/machined a bit more usable in a WSL-like |
| fashion. i.e. teach unpriv systemd-vmspawn/systemd-nspawn a reasonable |
| --bind-user= behaviour that mounts the calling user through into the |
| machine. Then, ship importd with a small database of well known distro images |
| along with their pinned signature keys. Then add some minimal glue that binds |
| this together: downloads a suitable image if not done so yet, starts it in |
| the bg via vmspawn/nspawn if not done so yet and then requests a shell inside |
| it for the invoking user. |
| |
| * importd/…: define per-user dirs for container/VM images too. |
| |
| * add a new specifier to unit files that figures out the DDI the unit file is |
| from, tracing through overlayfs, DM, loopback block device. |
| |
| * importd/importctl |
| - port tar handling to libarchive |
| - complete varlink interface |
| - download images into .v/ dirs |
| |
| * in os-release define a field that can be initialized at build time from |
| SOURCE_DATE_EPOCH (maybe even under that name?). Would then be used to |
| initialize the timestamp logic of ConditionNeedsUpdate=. |
| |
| * nspawn/vmspawn/pid1: add ability to easily insert fully booted VMs/FOSC into |
| shell pipelines, i.e. add easy to use switch that turns off console status |
| output, and generates the right credentials for systemd-run-generator so that |
| a program is invoked, and its output captured, with correct EOF handling and |
| exit code propagation |
| |
| * new systemd-analyze "join" verb or so, for debugging services. Would be |
| nsenter on steroids, i.e invoke a shell or command line in an environment as |
| close as we can make it for the MainPID of a service. Should be built around |
| pidfd, so that we can reasonably robustly do this. Would only cover the |
| execution environment like namespaces, but not the privilege settings. |
| |
| * Introduce a CGroupRef structure, inspired by PidRef. Should contain cgroup |
| path, cgroup id, and cgroup fd. Use it to continuously pin all v2 cgroups via |
| a cgroup_ref field in the CGroupRuntime structure. Eventually switch things |
| over to do all cgroupfs access only via that structure's fd. |
| |
| * Get rid of the symlinks in /run/systemd/units/* and exclusively use cgroupfs |
| xattrs to convey info about invocation ids, logging settings and so on. |
| support for cgroupfs xattrs in the "trusted." namespace was added in linux |
| 3.7, i.e. which we don't pretend to support anymore. |
| |
| * rewrite bpf-devices in libbpf/C code, rather than home-grown BPF assembly, to |
| match bpf-restrict-fs, bpf-restrict-ifaces, bpf-socket-bind |
| |
| * ditto: rewrite bpf-firewall in libbpf/C code |
| |
| * credentials: if we ever acquire a secure way to derive cgroup id of socket |
| peers (i.e. SO_PEERCGROUPID), then extend the "scoped" credential logic to |
| allow cgroup-scoped (i.e. app or service scoped) credentials. Then, as next |
| step use this to implement per-app/per-service encrypted directories, where |
| we set up fscrypt on the StateDirectory= with a randomized key which is |
| stored as xattr on the directory, encrypted as a credential. |
| |
| * credentials: optionally include a per-user secret in scoped user-credential |
| encryption keys. should come from homed in some way, derived from the luks |
| volume key or fscrypt directory key. |
| |
| * credentials: add a flag to the scoped credentials that if set require PK |
| reauthentication when unlocking a secret. |
| |
| * teach systemd --user to properly load credentials off disk, with |
| /etc/credstore equivalent and similar. Make sure that $CREDENTIALS_DIRECTORY= |
| actually works too when run with user privs. |
| |
| * extend the smbios11 logic for passing credentials so that instead of passing |
| the credential data literally it can also just reference an AF_VSOCK CID/port |
| to read them from. This way the data doesn't remain in the SMBIOS blob during |
| runtime, but only in the credentials fs. |
| |
| * machined: optionally track nspawn unix-export/ runtime for each machined, and |
| then update systemd-ssh-proxy so that it can connect to that. |
| |
| * add a new ExecStart= flag that inserts the configured user's shell as first |
| word in the command line. (maybe use character '.'). Usecase: tool such as |
| run0 can use that to spawn the target user's default shell. |
| |
| * introduce mntid_t, and make it 64bit, as apparently the kernel switched to |
| 64bit mount ids |
| |
| * mountfsd/nsresourced |
| - userdb: maybe allow callers to map one uid to their own uid |
| - bpflsm: allow writes if resulting UID on disk would be userns' owner UID |
| - make encrypted DDIs work (password…) |
| - add API for creating a new file system from scratch (together with some |
| dm-integrity/HMAC key). Should probably work using systemd-repart (access |
| via varlink). |
| - add api to make an existing file "trusted" via dm-integry/HMAC key |
| - port: portabled |
| - port: tmpfiles, sysusers and similar |
| - lets see if we can make runtime bind mounts into unpriv nspawn work |
| |
| * add a kernel cmdline switch (and cred?) for marking a system to be |
| "headless", in which case we never open /dev/console for reading, only for |
| writing. This would then mean: systemd-firstboot would process creds but not |
| ask interactively, getty would not be started and so on. |
| |
| * cryptsetup: new crypttab option to auto-grow a luks device to its backing |
| partition size. new crypttab option to reencrypt a luks device with a new |
| volume key. |
| |
| * we probably should have some infrastructure to acquire sysexts with |
| drivers/firmware for local hardware automatically. Idea: reuse the modalias |
| logic of the kernel for this: make the main OS image install a hwdb file |
| that matches against local modalias strings, and adds properties to relevant |
| devices listing names of sysexts needed to support the hw. Then provide some |
| tool that goes through all devices and tries to acquire/download the |
| specified images. |
| |
| * repart + cryptsetup: support file systems that are encrypted and use verity |
| on top. Usecase: confexts that shall be signed by the admin but also be |
| confidential. Then, add a new --make-ddi=confext-encrypted for this. |
| |
| * tmpfiles: add new line type for moving files from some source dir to some |
| target dir. then use that to move sysexts/confexts and stuff from initrd |
| tmpfs to /run/, so that host can pick things up. |
| |
| * tiny varlink service that takes a fd passed in and serves it via http. Then |
| make use of that in networkd, and expose some EFI binary of choice for |
| DHCP/HTTP base EFI boot. |
| |
| * bootctl: add reboot-to-disk which takes a block device name, and |
| automatically sets things up so that system reboots into that device next. |
| |
| * maybe: in PID1, when we detect we run in an initrd, make superblock read-only |
| early on, but provide opt-out via kernel cmdline. |
| |
| * systemd-pcrextend: |
| - support measuring to nvindex with PCR update semantics ("fake PCRs") |
| - add api for "allocating" such an nvindex |
| - once we have that start measuring every sysext we apply, every confext, |
| every RootImage= we apply, every nspawn and so on. All in separate fake |
| PCRs. |
| |
| * vmspawn: |
| - run in scope unit when invoked from command line, and machined registration is off |
| - sd_notify support |
| - --ephemeral support |
| - --read-only support |
| - automatically suspend/resume the VM if the host suspends. Use logind |
| suspend inhibitor to implement this. request clean suspend by generating |
| suspend key presses. |
| - support for "real" networking via "-n" and --network-bridge= |
| - translate SIGTERM to clean ACPI shutdown event |
| |
| * systemd-pcrmachine should probably also measure the SMBIOS system UUID. |
| |
| * sd-boot: allow synthesizing additional type1 entries via SMBIOS vendor strings |
| |
| * storagetm: |
| - add USB mass storage device logic, so that all local disks are also exposed |
| as mass storage devices on systems that have a USB controller that can |
| operate in device mode |
| - add NVMe authentication |
| |
| * add support for activating nvme-oF devices at boot automatically via kernel |
| cmdline, and maybe even support a syntax such as |
| root=nvme:<trtype>:<traddr>:<trsvcid>:<nqn>:<partition> to boot directly from |
| nvme-oF |
| |
| * pcrlock: |
| - add kernel-install plugin that automatically creates UKI .pcrlock file when |
| UKI is installed, and removes it when it is removed again |
| - automatically install PE measurement of sd-boot on "bootctl install" |
| - pre-calc sysext + kernel cmdline measurements |
| - pre-calc cryptsetup root key measurement |
| - maybe make systemd-repart generate .pcrlock for old and new GPT header in |
| /run? |
| - Add support for more than 8 branches per PCR OR |
| - add "systemd-pcrlock lock-kernel-current" or so which synthesizes .pcrlock |
| policy from currently booted kernel/event log, to close gap for first boot |
| for pre-built images |
| |
| * in sd-boot and sd-stub measure the SMBIOS vendor strings to some PCR (at |
| least some subset of them that look like systemd stuff), because apparently |
| some firmware does not, but systemd honours it. avoid duplicate measurement |
| by sd-boot and sd-stub by adding LoaderFeatures/StubFeatures flag for this, |
| so that sd-stub can avoid it if sd-boot already did it. |
| |
| * cryptsetup: a mechanism that allows signing a volume key with some key that |
| has to be present in the kernel keyring, or similar, to ensure that confext |
| DDIs can be encrypted against the local SRK but signed with the admin's key |
| and thus can authenticated locally before they are decrypted. |
| |
| * image policy should be extended to allow dictating *how* a disk is unlocked, |
| i.e. root=encrypted-tpm2+encrypted-fido2 would mean "root fs must be |
| encrypted and unlocked via fido2 or tpm2, but not otherwise" |
| |
| * systemd-repart: add support for formatting dm-crypt + dm-integrity file |
| systems. |
| |
| * homed: use systemd-storagetm to expose home dirs via nvme-tcp. Then, |
| teach homed/pam_systemd_homed with a user name such as |
| lennart%nvme_tcp_192.168.100.77_8787 to log in from any linux host with the |
| same home dir. Similar maybe for nbd, iscsi? this should then first ask for |
| the local root pw, to authenticate that logging in like this is ok, and would |
| then be followed by another password prompt asking for the user's own |
| password. Also, do something similar for CIFS: if you log in via |
| lennart%cifs-someserver_someshare, then set up the homed dir for it |
| automatically. The PAM module should update the user name used for login to |
| the short version once it set up the user. Some care should be taken, so that |
| the long version can be still be resolved via NSS afterwards, to deal with |
| PAM clients that do not support PAM sessions where PAM_USER changes half-way. |
| |
| * redefine /var/lib/extensions/ as the dir one can place all three of sysext, |
| confext as well is multi-modal DDIs that qualify as both. Then introduce |
| /var/lib/sysexts/ which can be used to place only DDIs that shall be used as |
| sysext |
| |
| * Varlinkification of the following command line tools, to open them up to |
| other programs via IPC: |
| - bootctl |
| - journalctl (allowing journal read access via IPC) |
| - coredumpcl |
| - systemd-bless-boot |
| - systemd-measure |
| - systemd-cryptenroll (to allow UIs to enroll FIDO2 keys and such) |
| - systemd-dissect |
| - systemd-sysupdate |
| - systemd-analyze |
| - kernel-install |
| - systemd-mount (with PK so that desktop environments could use it to mount disks) |
| |
| * enumerate virtiofs devices during boot-up in a generator, and synthesize |
| mounts for rootfs, /usr/, /home/, /srv/ and some others from it, depending on |
| the "tag". (waits for: https://gitlab.com/virtio-fs/virtiofsd/-/issues/128) |
| |
| * automatically mount one virtiofs during early boot phase to /run/host/, |
| similar to how we do that for nspawn, based on some clear tag. |
| |
| * add some service that makes an atomic snapshot of PCR state and event log up |
| to that point available, possibly even with quote by the TPM. |
| |
| * encode type1 entries in some UKI section to add additional entries to the |
| menu. |
| |
| * Add ACL-based access management to .socket units. i.e. add AllowPeerUser= + |
| AllowPeerGroup= that installs additional user/group ACL entries on AF_UNIX |
| sockets. |
| |
| * systemd-tpm2-setup should probably have a factory reset logic, i.e. when some |
| kernel command line option is set we reset the TPM (equivalent of tpm2_clear |
| -c owner? or rather echo 5 >/sys/class/tpm/tpm0/ppi/request?). |
| |
| * systemd-tpm2-setup should support a mode where we refuse booting if the SRK |
| changed. (Must be opt-in, to not break systems which are supposed to be |
| migratable between PCs) |
| |
| * when systemd-sysext learns mutable /usr/ (and systemd-confext mutable /etc/) |
| then allow them to store the result in a .v/ versioned subdir, for some basic |
| snapshot logic |
| |
| * add a new PE binary section ".mokkeys" or so which sd-stub will insert into |
| Mok keyring, by overriding/extending whatever shim sets in the EFI |
| var. Benefit: we can extend the kernel module keyring at ukify time, |
| i.e. without recompiling the kernel, taking an upstream OS' kernel and adding |
| a local key to it. |
| |
| * PidRef conversion work: |
| - cg_pid_get_xyz() |
| - pid_from_same_root_fs() |
| - get_ctty_devnr() |
| - actually wait for POLLIN on pidref's pidfd in service logic |
| - openpt_allocate_in_namespace() |
| - unit_attach_pid_to_cgroup_via_bus() |
| - cg_attach() – requires new kernel feature |
| - journald's process cache |
| |
| * ddi must be listed as block device fstype |
| |
| * measure some string via pcrphase whenever we end up booting into emergency |
| mode. |
| |
| * similar, measure some string via pcrphase whenever we resume from hibernate |
| |
| * homed: add a basic form of secrets management to homed, that stores |
| secrets in $HOME somewhere, is protected by the accounts own authentication |
| mechanisms. Should implement something PKCS#11-like that can be used to |
| implement emulated FIDO2 in unpriv userspace on top (which should happen |
| outside of homed), emulated PKCS11, and libsecrets support. Operate with a |
| 2nd key derived from volume key of the user, with which to wrap all |
| keys. maintain keys in kernel keyring if possible. |
| |
| * use sd-event ratelimit feature optionally for journal stream clients that log |
| too much |
| |
| * systemd-mount should only consider modern file systems when mounting, similar |
| to systemd-dissect |
| |
| * add another PE section ".fname" or so that encodes the intended filename for |
| PE file, and validate that when loading add-ons and similar before using |
| it. This is particularly relevant when we load multiple add-ons and want to |
| sort them to apply them in a define order. The order should not be under |
| control of the attacker. |
| |
| * also include packaging metadata (á la |
| https://systemd.io/ELF_PACKAGE_METADATA/) in our UEFI PE binaries, using the |
| same JSON format. |
| |
| * make "bootctl install" + "bootctl update" useful for installing shim too. For |
| that introduce new dir /usr/lib/systemd/efi/extra/ which we copy mostly 1:1 |
| into the ESP at install time. Then make the logic smart enough so that we |
| don't overwrite bootx64.efi with our own if the extra tree already contains |
| one. Also, follow symlinks when copying, so that shim rpm can symlink their |
| stuff into our dir (which is safe since the target ESP is generally VFAT and |
| thus does not have symlinks anyway). Later, teach the update logic to look at |
| the ELF package metadata (which we also should include in all PE files, see |
| above) for version info in all *.EFI files, and use it to only update if |
| newer. |
| |
| * in sd-stub: optionally add support for a new PE section .keyring or so that |
| contains additional certificates to include in the Mok keyring, extending |
| what shim might have placed there. why? let's say I use "ukify" to build + |
| sign my own fedora-based UKIs, and only enroll my personal lennart key via |
| shim. Then, I want to include the fedora keyring in it, so that kmods work. |
| But I might not want to enroll the fedora key in shim, because this would |
| also mean that the key would be in effect whenever I boot an archlinux UKI |
| built the same way, signed with the same lennart key. |
| |
| * resolved: take possession of some IPv6 ULA address (let's say |
| fd00:5353:5353:5353:5353:5353:5353:5353), and listen on port 53 on it for the |
| local stubs, so that we can make the stub available via ipv6 too. |
| |
| * Maybe add SwitchRootEx() as new bus call that takes env vars to set for new |
| PID 1 as argument. When adding SwitchRootEx() we should maybe also add a |
| flags param that allows disabling and enabling whether serialization is |
| requested during switch root. |
| |
| * introduce a .acpitable section for early ACPI table override |
| |
| * add proper .osrel matching for PE addons. i.e. refuse applying an addon |
| intended for a different OS. Take inspiration from how confext/sysext are |
| matched against OS. |
| |
| * figure out what to do about credentials sealed to PCRs in kexec + soft-reboot |
| scenarios. Maybe insist sealing is done additionally against some keypair in |
| the TPM to which access is updated on each boot, for the next, or so? |
| |
| * logind: when logging in, always take an fd to the home dir, to keep the dir |
| busy, so that autofs release can never happen. (this is generally a good |
| idea, and specifically works around the fact the autofs ignores busy by mount |
| namespaces) |
| |
| * mount most file systems with a restrictive uidmap. e.g. mount /usr/ with a |
| uidmap that blocks out anything outside 0…1000 (i.e. system users) and similar. |
| |
| * mount the root fs with MS_NOSUID by default, and then mount /usr/ without |
| both so that suid executables can only be placed there. Do this already in |
| the initrd. If /usr/ is not split out create a bind mount automatically. |
| |
| * fix our various hwdb lookup keys to end with ":" again. The original idea was |
| that hwdb patterns can match arbitrary fields with expressions like |
| "*:foobar:*", to wildcard match both the start and the end of the string. |
| This only works safely for later extensions of the string if the strings |
| always end in a colon. This requires updating our udev rules, as well as |
| checking if the various hwdb files are fine with that. |
| |
| * mount /tmp/ and /var/tmp with a uidmap applied that blocks out "nobody" user |
| among other things such as dynamic uid ranges for containers and so on. That |
| way no one can create files there with these uids and we enforce they are only |
| used transiently, never persistently. |
| |
| * rework loopback support in fstab: when "loop" option is used, then |
| instantiate a new systemd-loop@.service for the source path, set the |
| lo_file_name field for it to something recognizable derived from the fstab |
| line, and then generate a mount unit for it using a udev generated symlink |
| based on lo_file_name. |
| |
| * teach systemd-nspawn the boot assessment logic: hook up vpick's try counters |
| with success notifications from nspawn payloads. When this is enabled, |
| automatically support reverting back to older OS version images if newer ones |
| fail to boot. |
| |
| * implement new "systemd-fsrebind" tool that works like gpt-auto-generator but |
| looks at a root dir and then applies vpick on various dirs/images to pick a |
| root tree, a /usr/ tree, a /home/, a /srv/, a /var/ tree and so on. Dirs |
| could also be btrfs subvols (combine with btrfs auto-snapshort approach for |
| creating versions like these automatically). |
| |
| * remove tomoyo support, it's obsolete and unmaintained apparently |
| |
| * In .socket units, add ConnectStream=, ConnectDatagram=, |
| ConnectSequentialPacket= that create a socket, and then *connect to* rather than |
| listen on some socket. Then, add a new setting WriteData= that takes some |
| base64 data that systemd will write into the socket early on. This can then |
| be used to create connections to arbitrary services and issue requests into |
| them, as long as the data is static. This can then be combined with the |
| aforementioned journald subscription varlink service, to enable |
| activation-by-message id and similar. |
| |
| * .service with invalid Sockets= starts successfully. |
| |
| * landlock: lock down RuntimeDirectory= via landlock, so that services lose |
| ability to write anywhere else below /run/. Similar for |
| StateDirectory=. Benefit would be clear delegation via unit files: services |
| get the directories they get, and nothing else even if they wanted to. |
| |
| * landlock: for unprivileged systemd (i.e. systemd --user), use landlock to |
| implement ProtectSystem=, ProtectHome= and so on. Landlock does not require |
| privs, and we can implement pretty similar behaviour. Also, maybe add a mode |
| where ProtectSystem= combined with an explicit PrivateMounts=no could request |
| similar behaviour for system services, too. |
| |
| * Add systemd-mount@.service which is instantiated for a block device and |
| invokes systemd-mount and exits. This is then useful to use in |
| ENV{SYSTEMD_WANTS} in udev rules, and a bit prettier than using RUN+= |
| |
| * udevd: extend memory pressure logic: also kill any idle worker processes |
| |
| * udevadm: to make symlink querying with udevadm nicer: |
| - do not enable the pager for queries like 'udevadm info -q -r symlink' |
| - add mode with newlines instead of spaces (for grep)? |
| |
| * SIGRTMIN+18 and memory pressure handling should still be added to: hostnamed, |
| localed, oomd, timedated. |
| |
| * repart/gpt-auto/DDIs: maybe introduce a concept of "extension" partitions, |
| that have a new type uuid and can "extend" earlier partitions, to work around |
| the fact that systemd-repart can only grow the last partition defined. During |
| activation we'd simply set up a dm-linear mapping to merge them again. A |
| partition that is to be extended would just set a bit in the partition flags |
| field to indicate that there's another extension partition to look for. The |
| identifying UUID of the extension partition would be hashed in counter mode |
| from the uuid of the original partition it extends. Inspiration for this is |
| the "dynamic partitions" concept of new Android. This would be a minimalistic |
| concept of a volume manager, with the extents it manages being exposes as GPT |
| partitions. I a partition is extended multiple times they should probably |
| grow exponentially in size to ensure O(log(n)) time for finding them on |
| access. |
| |
| * Make nspawn to a frontend for systemd-executor, so that we have to ways into |
| the executor: via unit files/dbus/varlink through PID1 and via cmdline/OCI |
| through nspawn. |
| |
| * sd-stub: detect if we are running with uefi console output on serial, and if so |
| automatically add console= to kernel cmdline matching the same port. |
| |
| * add a utility that can be used with the kernel's |
| CONFIG_STATIC_USERMODEHELPER_PATH and then handles them within pid1 so that |
| security, resource management and cgroup settings can be enforced properly |
| for all umh processes. |
| |
| * homed: when resizing an fs don't sync identity beforehand there might simply |
| not be enough disk space for that. try to be defensive and sync only after |
| resize. |
| |
| * homed: if for some reason the partition ended up being much smaller than |
| whole disk, recover from that, and grow it again. |
| |
| * timesyncd: when saving/restoring clock try to take boot time into account. |
| Specifically, along with the saved clock, store the current boot ID. When |
| starting, check if the boot id matches. If so, don't do anything (we are on |
| the same boot and clock just kept running anyway). If not, then read |
| CLOCK_BOOTTIME (which started at boot), and add it to the saved clock |
| timestamp, to compensate for the time we spent booting. If EFI timestamps are |
| available, also include that in the calculation. With this we'll then only |
| miss the time spent during shutdown after timesync stopped and before the |
| system actually reset. |
| |
| * systemd-stub: maybe store a "boot counter" in the ESP, and pass it down to |
| userspace to allow ordering boots (for example in journalctl). The counter |
| would be monotonically increased on every boot. |
| |
| * pam_systemd_home: add module parameter to control whether to only accept |
| only password or only pcks11/fido2 auth, and then use this to hook nicely |
| into two of the three PAM stacks gdm provides. |
| See discussion at https://github.com/authselect/authselect/pull/311 |
| |
| * sd-boot: make boot loader spec type #1 accept http urls in "linux" |
| lines. Then, do the uefi http dance to download kernels and boot them. This |
| is then useful for network boot, by embedding a cpio with type #1 snippets |
| in sd-boot, which reference remote kernels. |
| |
| * maybe prohibit setuid() to the nobody user, to lock things down, via seccomp. |
| the nobody is not a user any code should run under, ever, as that user would |
| possibly get a lot of access to resources it really shouldn't be getting |
| access to due to the userns + nfs semantics of the user. Alternatively: use |
| the seccomp log action, and allow it. |
| |
| * sd-boot: add a new PE section .bls or so that carries a cpio with additional |
| boot loader entries (both type1 and type2). Then when initializing, find this |
| section, iterate through it and populate menu with it. cpio is simple enough |
| to make a parser for this reasonably robust. use same path structures as in |
| the ESP. Similar add one for signature key drop-ins. |
| |
| * sd-boot: also allow passing in the cpio as in the previous item via SMBIOS |
| |
| * add a new EFI tool "sd-fetch" or so. It looks in a PE section ".url" for an |
| URL, then downloads the file from it using UEFI HTTP APIs, and executes it. |
| Use case: provide a minimal ESP with sd-boot and a couple of these sd-fetch |
| binaries in place of UKIs, and download them on-the-fly. |
| |
| * maybe: systemd-loop-generator that sets up loopback devices if requested via kernel |
| cmdline. use case: include encrypted/verity root fs in UKI. |
| |
| * systemd-gpt-auto-generator: add kernel cmdline option to override block |
| device to dissect. also support dissecting a regular file. useccase: include |
| encrypted/verity root fs in UKI. |
| |
| * sd-stub: add ".bootcfg" section for kernel bootconfig data (as per |
| https://docs.kernel.org/admin-guide/bootconfig.html) |
| |
| * tpm2: add (optional) support for generating a local signing key from PCR 15 |
| state. use private key part to sign PCR 7+14 policies. stash signatures for |
| expected PCR7+14 policies in EFI var. use public key part in disk encryption. |
| generate new sigs whenever db/dbx/mok/mokx gets updated. that way we can |
| securely bind against SecureBoot/shim state, without having to renroll |
| everything on each update (but we still have to generate one sig on each |
| update, but that should be robust/idempotent). needs rollback protection, as |
| usual. |
| |
| * Lennart: big blog story about DDIs |
| |
| * Lennart: big blog story about building initrds |
| |
| * Lennart: big blog story about "why systemd-boot" |
| |
| * bpf: see if we can use BPF to solve the syslog message cgroup source problem: |
| one idea would be to patch source sockaddr of all AF_UNIX/SOCK_DGRAM to |
| implicitly contain the source cgroup id. Another idea would be to patch |
| sendto()/connect()/sendmsg() sockaddr on-the-fly to use a different target |
| sockaddr. |
| |
| * bpf: see if we can address opportunistic inode sharing of immutable fs images |
| with BPF. i.e. if bpf gives us power to hook into openat() and return a |
| different inode than is requested for which we however it has same contents |
| then we can use that to implement opportunistic inode sharing among DDIs: |
| make all DDIs ship xattr on all reg files with a SHA256 hash. Then, also |
| dictate that DDIs should come with a top-level subdir where all reg files are |
| linked into by their SHA256 sum. Then, whenever an inode is opened with the |
| xattr set, check bpf table to find dirs with hashes for other prior DDIs and |
| try to use inode from there. |
| |
| * extend the verity signature partition to permit multiple signatures for the |
| same root hash, so that people can sign a single image with multiple keys. |
| |
| * consider adding a new partition type, just for /opt/ for usage in system |
| extensions |
| |
| * gpt-auto-discovery: also use the pkcs7 signature stuff, and pass signature to |
| kernel. So far we only did this for the various --image= switches, but not |
| for the root fs or /usr/. |
| |
| * dissection policy should enforce that unlocking can only take place by |
| certain means, i.e. only via pw, only via tpm2, or only via fido, or a |
| combination thereof. |
| |
| * make the systemd-repart "seed" value provisionable via credentials, so that |
| confidential computing environments can set it and deterministically |
| enforce the uuids for partitions created, so that they can calculate PCR 15 |
| ahead of time. |
| |
| * systemd-repart: also derive the volume key from the seed value, for the |
| aforementioned purpose. |
| |
| * in the initrd: derive the default machine ID to pass to the host PID 1 via |
| $machine_id from the same seed credential. |
| |
| * Add systemd-sysupdate-initrd.service or so that runs systemd-sysupdate in the |
| initrd to bootstrap the initrd to populate the initial partitions. Some things |
| to figure out: |
| - Should it run on firstboot or on every boot? |
| - If run on every boot, should it use the sysupdate config from the host on |
| subsequent boots? |
| |
| * revisit default PCR bindings in cryptenroll and systemd-creds. Currently they |
| use PCR 7 which should contain secureboot state db/dbx. Which sounded like a |
| safe bet, given that it should change only on policy changes, and not |
| software updates. But that's wrong. Recent fwupd (rightfully) contains code |
| for updating the dbx denylist. This means even without any active policy |
| change PCR 7 might change. Hence, better idea might be in systemd-creds to |
| default to PCR 15 at least if sd-stub is used (i.e. bind to system identity), |
| and in cryptsetup simply the empty list? Also, PCR 14 almost certainly should |
| be included as much as PCR 7 (as it contains shim's policy, which is |
| certainly as relevant as PCR 7 on many systems) |
| |
| * To mimic the new tpm2-measure-pcr= crypttab option add the same to veritytab |
| (measuring the root hash) and integritytab (measuring the HMAC key if one is |
| used) |
| |
| * We should start measuring all services, containers, and system extensions we |
| activate. probably into PCR 13. i.e. add --tpm2-measure-pcr= or so to |
| systemd-nspawn, and MeasurePCR= to unit files. Should contain a measurement |
| of the activated configuration and the image that is being activated (in case |
| verity is used, hash of the root hash). |
| |
| * bootspec: permit graceful "update" from type #2 to type #1. If both a type #1 |
| and a type #2 entry exist under otherwise the exact same name, then use the |
| type #1 entry, and ignore the type #2 entry. This way, people can "upgrade" |
| from the UKI with all parameters baked in to a Type #1 .conf file with manual |
| parametrization, if needed. This matches our usual rule that admin config |
| should win over vendor defaults. |
| |
| * write a "search path" spec, that documents the prefixes to search in |
| (i.e. the usual /etc/, /run/, /usr/lib/ dance, potentially /usr/etc/), how to |
| sort found entries, how masking works and overriding. |
| |
| * automatic boot assessment: add one more default success check that just waits |
| for a bit after boot, and blesses the boot if the system stayed up that long. |
| |
| * systemd-repart: add support for generating ISO9660 images |
| |
| * systemd-repart: in addition to the existing "factory reset" mode (which |
| simply empties existing partitions marked for that). add a mode where |
| partitions marked for it are entirely removed. Use case: remove secondary OS |
| copy, and redundant partitions entirely, and recreate them anew. |
| |
| * systemd-boot: maybe add support for collapsing menu entries of the same OS |
| into one item that can be opened (like in a "tree view" UI element) or |
| collapsed. If only a single OS is installed, disable this mode, but if |
| multiple OSes are installed might make sense to default to it, so that user |
| is not immediately bombarded with a multitude of Linux kernel versions but |
| only one for each OS. |
| |
| * systemd-repart: if the GPT *disk* UUID (i.e. the one global for the entire |
| disk) is set to all FFFFF then use this as trigger for factory reset, in |
| addition to the existing mechanisms via EFI variables and kernel command |
| line. Benefit: works also on non-EFI systems, and can be requested on one |
| boot, for the next. |
| |
| * systemd-sysupdate: make transport pluggable, so people can plug casync or |
| similar behind it, instead of http. |
| |
| * systemd-tmpfiles: add concept for conditionalizing lines on factory reset |
| boot, or on first boot. |
| |
| * we probably needs .pcrpkeyrd or so as additional PE section in UKIs, |
| which contains a separate public key for PCR values that only apply in the |
| initrd, i.e. in the boot phase "enter-initrd". Then, consumers in userspace |
| can easily bind resources to just the initrd. Similar, maybe one more for |
| "enter-initrd:leave-initrd" for resources that shall be accessible only |
| before unprivileged user code is allowed. (we only need this for .pcrpkey, |
| not for .pcrsig, since the latter is a list of signatures anyway). With that, |
| when you enroll a LUKS volume or similar, pick either the .pcrkey (for |
| coverage through all phases of the boot, but excluding shutdown), the |
| .pcrpkeyrd (for coverage in the initrd only) and .pcrpkeybt (for coverage |
| until users are allowed to log in). |
| |
| * Once the root fs LUKS volume key is measured into PCR 15, default to binding |
| credentials to PCR 15 in "systemd-creds" |
| |
| * add support for asymmetric LUKS2 TPM based encryption. i.e. allow preparing |
| an encrypted image on some host given a public key belonging to a specific |
| other host, so that only hosts possessing the private key in the TPM2 chip |
| can decrypt the volume key and activate the volume. Use case: systemd-confext |
| for a central orchestrator to generate confext images securely that can only |
| be activated on one specific host (which can be used for installing a bunch |
| of creds in /etc/credstore/ for example). Extending on this: allow binding |
| LUKS2 TPM based encryption also to the TPM2 internal clock. Net result: |
| prepare a confext image that can only be activated on a specific host that |
| runs a specific software in a specific time window. confext would be |
| automatically invalidated outside of it. |
| |
| * maybe add a "systemd-report" tool, that generates a TPM2-backed "report" of |
| current system state, i.e. a combination of PCR information, local system |
| time and TPM clock, running services, recent high-priority log |
| messages/coredumps, system load/PSI, signed by the local TPM chip, to form an |
| enhanced remote attestation quote. Use case: a simple orchestrator could use |
| this: have the report tool upload these reports every 3min somewhere. Then |
| have the orchestrator collect these reports centrally over a 3min time |
| window, and use them to determine what which node should now start/stop what, |
| and generate a small confext for each node, that uses Uphold= to pin services |
| on each node. The confext would be encrypted using the asymmetric encryption |
| proposed above, so that it can only be activated on the specific host, if the |
| software is in a good state, and within a specific time frame. Then run a |
| loop on each node that sends report to orchestrator and then sysupdate to |
| update confext. Orchestrator would be stateless, i.e. operate on desired |
| config and collected reports in the last 3min time window only, and thus can |
| be trivially scaled up since all instances of the orchestrator should come to |
| the same conclusions given the same inputs of reports/desired workload info. |
| Could also be used to deliver Wireguard secrets and thus to clients, thus |
| permitting zero-trust networking: secrets are rolled over via confext updates, |
| and via the time window TPM logic invalidated if node doesn't keep itself |
| updated, or becomes corrupted in some way. |
| |
| * in the initrd, once the rootfs encryption key has been measured to PCR 15, |
| derive default machine ID to use from it, and pass it to host PID 1. |
| |
| * sd-boot: for each installed OS, grey out older entries (i.e. all but the |
| newest), to indicate they are obsolete |
| |
| * automatically propagate LUKS password credential into cryptsetup from host |
| (i.e. SMBIOS type #11, …), so that one can unlock LUKS via VM hypervisor |
| supplied password. |
| |
| * add ability to path_is_valid() to classify paths that refer to a dir from |
| those which may refer to anything, and use that in various places to filter |
| early. i.e. stuff ending in "/", "/." and "/.." definitely refers to a |
| directory, and paths ending that way can be refused early in many contexts. |
| |
| * systemd-measure: add --pcrpkey-auto as an alternative to --pcrpkey=, where it |
| would just use the same public key specified with --public-key= (or the one |
| automatically derived from --private-key=). |
| |
| * Add "purpose" flag to partition flags in discoverable partition spec that |
| indicate if partition is intended for sysext, for portable service, for |
| booting and so on. Then, when dissecting DDI allow specifying a purpose to |
| use as additional search condition. Use case: images that combined a sysext |
| partition with a portable service partition in one. |
| |
| * On boot, auto-generate an asymmetric key pair from the TPM, |
| and use it for validating DDIs and credentials. Maybe upload it to the kernel |
| keyring, so that the kernel does this validation for us for verity and kernel |
| modules |
| |
| * lock down acceptable encrypted credentials at boot, via simple allowlist, |
| maybe on kernel command line: |
| systemd.import_encrypted_creds=foobar.waldo,tmpfiles.extra to protect locked |
| down kernels from credentials generated on the host with a weak kernel |
| |
| * Merge systemd-creds options --uid= (which accepts user names) and --user. |
| |
| * Add support for extra verity configuration options to systemd-repart (FEC, |
| hash type, etc) |
| |
| * chase(): take inspiration from path_extract_filename() and return |
| O_DIRECTORY if input path contains trailing slash. |
| |
| * chase(): refuse resolution if trailing slash is specified on input, |
| but final node is not a directory |
| |
| * document in boot loader spec that symlinks in XBOOTLDR/ESP are not OK even if |
| non-VFAT fs is used. |
| |
| * measure credentials picked up from SMBIOS to some suitable PCR |
| |
| * measure GPT and LUKS headers somewhere when we use them (i.e. in |
| systemd-gpt-auto-generator/systemd-repart and in systemd-cryptsetup?) |
| |
| * pick up creds from EFI vars |
| |
| * Add and pickup tpm2 metadata for creds structure. |
| |
| * sd-boot: we probably should include all BootXY EFI variable defined boot |
| entries in our menu, and then suppress ourselves. Benefit: instant |
| compatibility with all other OSes which register things there, in particular |
| on other disks. Always boot into them via NextBoot EFI variable, to not |
| affect PCR values. |
| |
| * systemd-measure tool: |
| - pre-calculate PCR 12 (command line) + PCR 13 (sysext) the same way we can precalculate PCR 11 |
| |
| * in sd-boot: load EFI drivers from a new PE section. That way, one can have a |
| "supercharged" sd-boot binary, that could carry ext4 drivers built-in. |
| |
| * sd-device: add an API for acquiring list of child devices, given a device |
| objects (i.e. all child dirents that dirs or symlinks to dirs) |
| |
| * sd-device: maybe pin the sysfs dir with an fd, during the entire runtime of |
| an sd_device, then always work based on that. |
| |
| * maybe add new flags to gpt partition tables for rootfs and usrfs indicating |
| purpose, i.e. whether something is supposed to be bootable in a VM, on |
| baremetal, on an nspawn-style container, if it is a portable service image, |
| or a sysext for initrd, for host os, or for portable container. Then hook |
| portabled/… up to udev to watch block devices coming up with the flags set, and |
| use it. |
| |
| * sd-boot should look for information what to boot in SMBIOS, too, so that VM |
| managers can tell sd-boot what to boot into and suchlike |
| |
| * add "systemd-sysext identify" verb, that you can point on any file in /usr/ |
| and that determines from which overlayfs layer it originates, which image, and with |
| what it was signed. |
| |
| * systemd-creds: extend encryption logic to support asymmetric |
| encryption/authentication. Idea: add new verb "systemd-creds public-key" |
| which generates a priv/pub key pair on the TPM2 and stores the priv key |
| locally in /var. It then outputs a certificate for the pub part to stdout. |
| This can then be copied/taken elsewhere, and can be used for encrypting creds |
| that only the host on its specific hw can decrypt. Then, support a drop-in |
| dir with certificates that can be used to authenticate credentials. Flow of |
| operations is then this: build image with owner certificate, then after |
| boot up issue "systemd-creds public-key" to acquire pubkey of the machine. |
| Then, when passing data to the machine, sign with privkey belonging to one of |
| the dropped in certs and encrypted with machine pubkey, and pass to machine. |
| Machine is then able to authenticate you, and confidentiality is guaranteed. |
| |
| * building on top of the above, the pub/priv key pair generated on the TPM2 |
| should probably also one you can use to get a remote attestation quote. |
| |
| * Process credentials in: |
| • crypttab-generator: allow defining additional crypttab-like volumes via |
| credentials (similar: verity-generator, integrity-generator). Use |
| fstab-generator logic as inspiration. |
| • run-generator: allow defining additional commands to run via a credential |
| • resolved: allow defining additional /etc/hosts entries via a credential (it |
| might make sense to then synthesize a new combined /etc/hosts file in /run |
| and bind mount it on /etc/hosts for other clients that want to read it. |
| • repart: allow defining additional partitions via credential |
| • timesyncd: pick NTP server info from credential |
| • portabled: read a credential "portable.extra" or so, that takes a list of |
| file system paths to enable on start. |
| • make systemd-fstab-generator look for a system credential encoding root= or |
| usr= |
| • in gpt-auto-generator: check partition uuids against such uuids supplied via |
| sd-stub credentials. That way, we can support parallel OS installations with |
| pre-built kernels. |
| |
| * define a JSON format for units, separating out unit definitions from unit |
| runtime state. Then, expose it: |
| |
| 1. Add Describe() method to Unit D-Bus object that returns a JSON object |
| about the unit. |
| 2. Expose this natively via Varlink, in similar style |
| 3. Use it when invoking binaries (i.e. make PID 1 fork off systemd-executor |
| binary which reads the JSON definition and runs it), to address the cow |
| trap issue and the fact that NSS is actually forbidden in |
| forked-but-not-exec'ed children |
| 4. Add varlink API to run transient units based on provided JSON definitions |
| |
| * Add SUPPORT_END_URL= field to os-release with more *actionable* information |
| what to do if support ended |
| |
| * pam_systemd: on interactive logins, maybe show SUPPORT_END information at |
| login time, à la motd |
| |
| * sd-boot: instead of unconditionally deriving the ESP to search boot loader |
| spec entries in from the paths of sd-boot binary, let's optionally allow it |
| to be configured on sd-boot cmdline + efi var. Use case: embed sd-boot in the |
| UEFI firmware (for example, ovmf supports that via qemu cmdline option), and |
| use it to load stuff from the ESP. |
| |
| * mount /var/ from initrd, so that we can apply sysext and stuff before the |
| initrd transition. Specifically: |
| 1. There should be a var= kernel cmdline option, matching root= and usr= |
| 2. systemd-gpt-auto-generator should auto-mount /var if it finds it on disk |
| 3. mount.x-initrd mount option in fstab should be implied for /var |
| |
| * make persistent restarts easier by adding a new setting OpenPersistentFile= |
| or so, which allows opening one or more files that is "persistent" across |
| service restarts, hot reboot, cold reboots (depending on configuration): the |
| files are created empty on first invocation, and on subsequent invocations |
| the files are reboot. The files would be backed by tmpfs, pmem or /var |
| depending on desired level of persistency. |
| |
| * sd-event: add ability to "chain" event sources. Specifically, add a call |
| sd_event_source_chain(x, y), which will automatically enable event source y |
| in oneshot mode once x is triggered. Use case: in src/core/mount.c implement |
| the /proc/self/mountinfo rescan on SIGCHLD with this: whenever a SIGCHLD is |
| seen, trigger the rescan defer event source automatically, and allow it to be |
| dispatched *before* the SIGCHLD is handled (based on priorities). Benefit: |
| dispatch order is strictly controlled by priorities again. (next step: chain |
| event sources to the ratelimit being over) |
| |
| * if we fork of a service with StandardOutput=journal, and it forks off a |
| subprocess that quickly dies, we might not be able to identify the cgroup it |
| comes from, but we can still derive that from the stdin socket its output |
| came from. We apparently don't do that right now. |
| |
| * add ability to set hostname with suffix derived from machine id at boot |
| |
| * add PR_SET_DUMPABLE service setting |
| |
| * homed/userdb: maybe define a "companion" dir for home directories where apps |
| can safely put privileged stuff in. Would not be writable by the user, but |
| still conceptually belong to the user. Would be included in user's quota if |
| possible, even if files are not owned by UID of user. Use case: container |
| images that owned by arbitrary UIDs, and are owned/managed by the users, but |
| are not directly belonging to the user's UID. Goal: we shouldn't place more |
| privileged dirs inside of unprivileged dirs, and thus containers really |
| should not be placed inside of traditional UNIX home dirs (which are owned by |
| users themselves) but somewhere else, that is separate, but still close |
| by. Inform user code about path to this companion dir via env var, so that |
| container managers find it. the ~/.identity file is also a candidate for a |
| file to move there, since it is managed by privileged code (i.e. homed) and |
| not unprivileged code. |
| |
| * maybe add support for binding and connecting AF_UNIX sockets in the file |
| system outside of the 108ch limit. When connecting, open O_PATH fd to socket |
| inode first, then connect to /proc/self/fd/XYZ. When binding, create symlink |
| to target dir in /tmp, and bind through it. |
| |
| * add a proper concept of a "developer" mode, i.e. where cryptographic |
| protections of the root OS are weakened after interactive confirmation, to |
| allow hackers to allow their own stuff. idea: allow entering developer mode |
| only via explicit choice in boot menu: i.e. add explicit boot menu item for |
| it. When developer mode is entered, generate a key pair in the TPM2, and add |
| the public part of it automatically to keychain of valid code signature keys |
| on subsequent boots. Then provide a tool to sign code with the key in the |
| TPM2. Ensure that boot menu item is the only way to enter developer mode, by |
| binding it to locality/PCRs so that keys cannot be generated otherwise. |
| |
| * services: add support for cryptographically unlocking per-service directories |
| via TPM2. Specifically, for StateDirectory= (and related dirs) use fscrypt to |
| set up the directory so that it can only be accessed if host and app are in |
| order. |
| |
| * update HACKING.md to suggest developing systemd with the ideas from: |
| https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html |
| https://0pointer.net/blog/running-an-container-off-the-host-usr.html |
| |
| * sd-event: compat wd reuse in inotify code: keep a set of removed watch |
| descriptors, and clear this set piecemeal when we see the IN_IGNORED event |
| for it, or when read() returns EAGAIN or on IN_Q_OVERFLOW. Then, whenever we |
| see an inotify wd event check against this set, and if it is contained ignore |
| the event. (to be fully correct this would have to count the occurrences, in |
| case the same wd is reused multiple times before we start processing |
| IN_IGNORED again) |
| |
| * for vendor-built signed initrds: |
| - kernel-install should be able to install encrypted creds automatically for |
| machine id, root pw, rootfs uuid, resume partition uuid, and place next to |
| EFI kernel, for sd-stub to pick them up. These creds should be locked to |
| the TPM, and bind to the right PCR the kernel is measured to. |
| - kernel-install should be able to pick up initrd sysexts automatically and |
| place them next to EFI kernel, for sd-stub to pick them up. |
| - systemd-fstab-generator should look for rootfs device to mount in creds |
| - systemd-resume-generator should look for resume partition uuid in creds |
| - sd-stub: automatically pick up microcode from ESP (/loader/microcode/*) |
| and synthesize initrd from it, and measure it. Signing is not necessary, as |
| microcode does that on its own. Pass as first initrd to kernel. |
| |
| * Maybe extend the service protocol to support handling of some specific SIGRT |
| signal for setting service log level, that carries the level via the |
| sigqueue() data parameter. Enable this via unit file setting. |
| |
| * sd_notify/vsock: maybe support binding to AF_VSOCK in Type=notify services, |
| then passing $NOTIFY_SOCKET and $NOTIFY_GUESTCID with PID1's cid (typically |
| fixed to "2", i.e. the official host cid) and the expected guest cid, for the |
| two sides of the channel. The latter env var could then be used in an |
| appropriate qemu cmdline. That way qemu payloads could talk sd_notify() |
| directly to host service manager. |
| |
| * sd-device should return the devnum type (i.e. 'b' or 'c') via some API for an |
| sd_device object, so that data passed into sd_device_new_from_devnum() can |
| also be queried. |
| |
| * sd-event: optionally, if per-event source rate limit is hit, downgrade |
| priority, but leave enabled, and once ratelimit window is over, upgrade |
| priority again. That way we can combat event source starvation without |
| stopping processing events from one source entirely. |
| |
| * sd-event: similar to existing inotify support add fanotify support (given |
| that apparently new features in this area are only going to be added to the |
| latter). |
| |
| * sd-event: add 1st class event source for clock changes |
| |
| * sd-event: add 1st class event source for timezone changes |
| |
| * support uefi/http boots with sd-boot: instead of looking for dropin files in |
| /loader/entries/ dir, look for a file /loader/entries/SHA256SUMS and use that |
| as directory manifest. The file would be a standard directory listing as |
| generated by GNU sha256sums. |
| |
| * sd-boot: maybe add support for embedding the various auxiliary resources we |
| look for right in the sd-boot binary. i.e. take inspiration from sd-stub |
| logic: allow combining sd-boot via ukify with kernels to enumerate, .conf |
| files, drivers, keys to enroll and so on. Then, add whatever we find that way |
| to the menu. Use case: allow building a single PE image you can boot into via |
| UEFI HTTP boot. |
| |
| * maybe add a new UEFI stub binary "sd-http". It works similar to sd-stub, but |
| all it does is download a file from a http server, and execute it, after |
| optionally checking its hash sum. idea would be: combine this "sd-http" stub |
| binary with some minimal info about a URL + hash sum, plus .osrel data, and |
| drop it into the unified kernel dir in the ESP. And bam you have something |
| that is tiny, feels a lot like a unified kernel, but all it does is chainload |
| the real kernel. benefit: downloading these stubs would be tiny and quick, |
| hence cheap for enumeration. |
| |
| * sysext: measure all activated sysext into a TPM PCR |
| |
| * systemd-dissect: show available versions inside of a disk image, i.e. if |
| multiple versions are around of the same resource, show which ones. (in other |
| words: show partition labels). |
| |
| * systemd-dissect: add --cat switch for dumping files such as /etc/os-release |
| |
| * per-service sandboxing option: ProtectIds=. If used, will overmount |
| /etc/machine-id and /proc/sys/kernel/random/boot_id with synthetic files, to |
| make it harder for the service to identify the host. Depending on the user |
| setting it should be fully randomized at invocation time, or a hash of the |
| real thing, keyed by the unit name or so. Of course, there are other ways to |
| get these IDs (e.g. journal) or similar ids (e.g. MAC addresses, DMI ids, CPU |
| ids), so this knob would only be useful in combination with other lockdown |
| options. Particularly useful for portable services, and anything else that |
| uses RootDirectory= or RootImage=. (Might also over-mount |
| /sys/class/dmi/id/*{uuid,serial} with /dev/null). |
| |
| * doc: prep a document explaining resolved's internal objects, i.e. Query |
| vs. Question vs. Transaction vs. Stream and so on. |
| |
| * doc: prep a document explaining PID 1's internal logic, i.e. transactions, |
| jobs, units |
| |
| * automatically ignore threaded cgroups in cg_xyz(). |
| |
| * add linker script that implicitly adds symbol for build ID and new coredump |
| json package metadata, and use that when logging |
| |
| * Enable RestrictFileSystems= for all our long-running services (similar: |
| RestrictNetworkInterfaces=) |
| |
| * Add systemd-analyze security checks for RestrictFileSystems= and |
| RestrictNetworkInterfaces= |
| |
| * cryptsetup/homed: implement TOTP authentication backed by TPM2 and its |
| internal clock. |
| |
| * man: rework os-release(5), and clearly separate our extension-release.d/ and |
| initrd-release parts, i.e. list explicitly which fields are about what. |
| |
| * sysext: before applying a sysext, do a superficial validation run so that |
| things are not rearranged to wildy. I.e. protect against accidental fuckups, |
| such as masking out /usr/lib/ or so. We should probably refuse if existing |
| inodes are replaced by other types of inodes or so. |
| |
| * userdb: when synthesizing NSS records, pick "best" password from defined |
| passwords, not just the first. i.e. if there are multiple defined, prefer |
| unlocked over locked and prefer non-empty over empty. |
| |
| * homed: if the homed shell fallback thing has access to an SSH agent, try to |
| use it to unlock home dir (if ssh-agent forwarding is enabled). We |
| could implement SSH unlocking of a homedir with that: when enrolling a new |
| ssh pubkey in a user record we'd ask the ssh-agent to sign some random value |
| with the privkey, then use that as luks key to unlock the home dir. Will not |
| work for ECDSA keys since their signatures contain a random component, but |
| will work for RSA and Ed25519 keys. |
| |
| * add tiny service that decrypts encrypted user records passed via initrd |
| credential logic and drops them into /run where nss-systemd can pick them up, |
| similar to /run/host/userdb/. Use case: drop a root user JSON record there, |
| and use it in the initrd to log in as root with locally selected password, |
| for debugging purposes. Other use case: boot into qemu with regular user |
| mounted from host. maybe put this in systemd-user-sessions.service? |
| |
| * drop dependency on libcap, replace by direct syscalls based on |
| CapabilityQuintet we already have. (This likely allows us to drop libcap |
| dep in the base OS image) |
| |
| * userdbd: implement an additional varlink service socket that provides the |
| host user db in restricted form, then allow this to be bind mounted into |
| sandboxed environments that want the host database in minimal form. All |
| records would be stripped of all meta info, except the basic UID/name |
| info. Then use this in portabled environments that do not use PrivateUsers=1. |
| |
| * portabled: when extracting unit files and copying to system.attached, if a |
| .p7s is available in the image, use it to protect the system.attached copy |
| with fs-verity, so that it cannot be tampered with |
| |
| * /etc/veritytab: allow that the roothash column can be specified as fs path |
| including a path to an AF_UNIX path, similar to how we do things with the |
| keys of /etc/crypttab. That way people can store/provide the roothash |
| externally and provide to us on demand only. |
| |
| * we probably should extend the root verity hash of the root fs into some PCR |
| on boot. (i.e. maybe add a veritytab option tpm2-measure=12 or so to measure |
| it into PCR 12); Similar: we probably should extend the LUKS volume key of |
| the root fs into some PCR on boot. (i.e. maybe add a crypttab option |
| tpm2-measure=15 or so to measure it into PCR 15); once both are in place |
| update gpt-auto-discovery to generate these by default for the partitions it |
| discovers. Static vendor stuff should probably end up in PCR 12 (i.e. the |
| verity hash), with local keys in PCR 15 (i.e. the encryption volume |
| key). That way, we nicely distinguish resources supplied by the OS vendor |
| (i.e. sysext, root verity) from those inherently local (i.e. encryption key), |
| which is useful if they shall be signed separately. |
| |
| * in uefi stub: query firmware regarding which PCR banks are being used, store |
| that in EFI var. then use this when enrolling TPM2 in cryptsetup to verify |
| that the selected PCRs actually are used by firmware. |
| |
| * rework recursive read-only remount to use new mount API |
| |
| * PAM: pick up authentication token from credentials |
| |
| * when mounting disk images: if IMAGE_ID/IMAGE_VERSION is set in os-release |
| data in the image, make sure the image filename actually matches this, so |
| that images cannot be misused. |
| |
| * New udev block device symlink names: |
| /dev/disk/by-parttypelabel/<pttype>-<ptlabel>. Use case: if pt label is used |
| as partition image version string, this is a safe way to reference a specific |
| version of a specific partition type, in particular where related partitions |
| are processed (e.g. verity + rootfs both named "LennartOS_0.7"). |
| |
| * sysupdate: |
| - add fuzzing to the pattern parser |
| - support casync as download mechanism |
| - "systemd-sysupdate update --all" support, that iterates through all components |
| defined on the host, plus all images installed into /var/lib/machines/, |
| /var/lib/portable/ and so on. |
| - Allow invocation with a single transfer definition, i.e. with |
| --definitions= pointing to a file rather than a dir. |
| - add ability to disable implicit decompression of downloaded artifacts, |
| i.e. a Compress=no option in the transfer definitions |
| |
| * in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix) |
| |
| * systemd-sysext: optionally, run it in initrd already, before transitioning |
| into host, to open up possibility for services shipped like that. |
| |
| * introduce /dev/disk/root/* symlinks that allow referencing partitions on the |
| disk the rootfs is on in a reasonably secure way. (or maybe: add |
| /dev/gpt-auto-{home,srv,boot,…} similar in style to /dev/gpt-auto-root as we |
| already have it. |
| |
| * whenever we receive fds via SCM_RIGHTS make sure none got dropped due to the |
| reception limit the kernel silently enforces. |
| |
| * Add service unit setting ConnectStream= which takes IP addresses and connects to them. |
| |
| * Similar, Load= which takes literal data in text or base64 format, and puts it |
| into a memfd, and passes that. This enables some fun stuff, such as embedding |
| bash scripts in unit files, by combining Load= with ExecStart=/bin/bash |
| /proc/self/fd/3 |
| |
| * add a ConnectSocket= setting to service unit files, that may reference a |
| socket unit, and which will connect to the socket defined therein, and pass |
| the resulting fd to the service program via socket activation proto. |
| |
| * Add a concept of ListenStream=anonymous to socket units: listen on a socket |
| that is deleted in the fs. Use case would be with ConnectSocket= above. |
| |
| * importd: support image signature verification with PKCS#7 + OpenBSD signify |
| logic, as alternative to crummy gpg |
| |
| * add "systemd-analyze debug" + AttachDebugger= in unit files: The former |
| specifies a command to execute; the latter specifies that an already running |
| "systemd-analyze debug" instance shall be contacted and execution paused |
| until it gives an OK. That way, tools like gdb or strace can be safely be |
| invoked on processes forked off PID 1. |
| |
| * expose MS_NOSYMFOLLOW in various places |
| |
| * credentials system: |
| - acquire from EFI variable? |
| - acquire via ask-password? |
| - acquire creds via keyring? |
| - pass creds via keyring? |
| - pass creds via memfd? |
| - acquire + decrypt creds from pkcs11? |
| - make PAMName= acquire pw via creds logic |
| - make macsec code in networkd read key via creds logic (copy logic from |
| wireguard) |
| - make gatewayd/remote read key via creds logic |
| - add sd_notify() command for flushing out creds not needed anymore |
| |
| * TPM2: auto-reenroll in cryptsetup, as fallback for hosed firmware upgrades |
| and such |
| |
| * introduce a new group to own TPM devices |
| |
| * cryptsetup: add option for automatically removing empty password slot on boot |
| |
| * cryptsetup: optionally, when run during boot-up and password is never |
| entered, and we are on battery power (or so), power off machine again |
| |
| * cryptsetup: when waiting for FIDO2/PKCS#11 token, tell plymouth that, and |
| allow plymouth to abort the waiting and enter pw instead |
| |
| * make cryptsetup lower --iter-time |
| |
| * cryptsetup: allow encoding key directly in /etc/crypttab, maybe with a |
| "base64:" prefix. Useful in particular for pkcs11 mode. |
| |
| * cryptsetup: reimplement the mkswap/mke2fs in cryptsetup-generator to use |
| systemd-makefs.service instead. |
| |
| * cryptsetup: |
| - cryptsetup-generator: allow specification of passwords in crypttab itself |
| - support rd.luks.allow-discards= kernel cmdline params in cryptsetup generator |
| |
| * systemd-analyze netif that explains predictable interface (or networkctl) |
| |
| * systemd-analyze inspect-elf should show other notes too, at least build-id. |
| |
| * Figure out naming of verbs in systemd-analyze: we have (singular) capability, |
| exit-status, but (plural) filesystems, architectures. |
| |
| * Add service setting to run a service within the specified VRF. i.e. do the |
| equivalent of "ip vrf exec". |
| |
| * special case some calls of chase() to use openat2() internally, so |
| that the kernel does what we otherwise do. |
| |
| * add a new flag to chase() that stops chasing once the first missing |
| component is found and then allows the caller to create the rest. |
| |
| * make use of new glibc 2.32 APIs sigabbrev_np() and strerrorname_np(). |
| |
| * if /usr/bin/swapoff fails due to OOM, log a friendly explanatory message about it |
| |
| * pid1: also remove PID files of a service when the service starts, not just |
| when it exits |
| |
| * make us use dynamically fewer deps for containers in general purpose distros: |
| o turn into dlopen() deps: |
| - libblkid (only in RootImage= handling in PID 1, but not elsewhere) |
| - libpam (only when called from PID 1) |
| |
| * seccomp: maybe use seccomp_merge() to merge our filters per-arch if we can. |
| Apparently kernel performance is much better with fewer larger seccomp |
| filters than with more smaller seccomp filters. |
| |
| * systemd-path: Add "private" runtime/state/cache dir enum, mapping to |
| $RUNTIME_DIRECTORY, $STATE_DIRECTORY and such |
| |
| * seccomp: by default mask x32 ABI system wide on x86-64. it's on its way out |
| |
| * seccomp: don't install filters for ABIs that are masked anyway for the |
| specific service |
| |
| * busctl: maybe expose a verb "ping" for pinging a dbus service to see if it |
| exists and responds. |
| |
| * socket units: allow creating a udev monitor socket with ListenDevices= or so, |
| with matches, then activate app through that passing socket over |
| |
| * unify on openssl: |
| - kill gnutls support in resolved |
| - figure out what to do about libmicrohttpd, which has a hard dependency on |
| gnutls |
| - port fsprg over to a dlopen lib, then switch it to openssl |
| |
| * add growvol and makevol options for /etc/crypttab, similar to |
| x-systemd.growfs and x-systemd-makefs. |
| |
| * userdb: allow username prefix searches in varlink API, allow realname and |
| realname substr searches in varlink API |
| |
| * userdb: allow uid/gid range checks |
| |
| * userdb: allow existence checks |
| |
| * pid1: activation by journal search expression |
| |
| * when switching root from initrd to host, set the machine_id env var so that |
| if the host has no machine ID set yet we continue to use the random one the |
| initrd had set. |
| |
| * sd-event: add native support for P_ALL waitid() watching, then move PID 1 to |
| it for reaping assigned but unknown children. This needs to some special care |
| to operate somewhat sensibly in light of priorities: P_ALL will return |
| arbitrary processes, regardless of the priority we want to watch them with, |
| hence on each event loop iteration check all processes which we shall watch |
| with higher prio explicitly, and then watch the entire rest with P_ALL. |
| |
| * tweak sd-event's child watching: keep a prioq of children to watch and use |
| waitid() only on the children with the highest priority until one is waitable |
| and ignore all lower-prio ones from that point on |
| |
| * maybe introduce xattrs that can be set on the root dir of the root fs |
| partition that declare the volatility mode to use the image in. Previously I |
| thought marking this via GPT partition flags but that's not ideal since |
| that's outside of the LUKS encryption/verity verification, and we probably |
| shouldn't operate in a volatile mode unless we got told so from a trusted |
| source. |
| |
| * coredump: maybe when coredumping read a new xattr from /proc/$PID/exe that |
| may be used to mark a whole binary as non-coredumpable. Would fix: |
| https://bugs.freedesktop.org/show_bug.cgi?id=69447 |
| |
| * teach parse_timestamp() timezones like the calendar spec already knows it |
| |
| * We should probably replace /etc/rc.d/README with a symlink to doc |
| content. After all it is constant vendor data. |
| |
| * maybe add kernel cmdline params: to force random seed crediting |
| |
| * let's not GC a unit while its ratelimits are still pending |
| |
| * when killing due to service watchdog timeout maybe detect whether target |
| process is under ptracing and then log loudly and continue instead. |
| |
| * make rfkill uaccess controllable by default, i.e. steal rule from |
| gnome-bluetooth and friends |
| |
| * make MAINPID= message reception checks even stricter: if service uses User=, |
| then check sending UID and ignore message if it doesn't match the user or |
| root. |
| |
| * maybe trigger a uevent "change" on a device if "systemctl reload xyz.device" |
| is issued. |
| |
| * when importing an fs tree with machined, optionally apply userns-rec-chown |
| |
| * when importing an fs tree with machined, complain if image is not an OS |
| |
| * Maybe introduce a helper safe_exec() or so, which is to execve() which |
| safe_fork() is to fork(). And then make revert the RLIMIT_NOFILE soft limit |
| to 1K implicitly, unless explicitly opted-out. |
| |
| * rework seccomp/nnp logic that even if User= is used in combination with |
| a seccomp option we don't have to set NNP. For that, change uid first whil |
| keeping CAP_SYS_ADMIN, then apply seccomp, the drop cap. |
| |
| * when no locale is configured, default to UEFI's PlatformLang variable |
| |
| * add a new syscall group "@esoteric" for more esoteric stuff such as bpf() and |
| usefaultd() and make systemd-analyze check for it. |
| |
| * paranoia: whenever we process passwords, call mlock() on the memory |
| first. i.e. look for all places we use free_and_erasep() and |
| augment them with mlock(). Also use MADV_DONTDUMP. |
| Alternatively (preferably?) use memfd_secret(). |
| |
| * Move RestrictAddressFamily= to the new cgroup create socket |
| |
| * optionally: turn on cgroup delegation for per-session scope units |
| |
| * sd-boot: optionally, show boot menu when previous default boot item has |
| non-zero "tries done" count |
| |
| * augment CODE_FILE=, CODE_LINE= with something like CODE_BASE= or so which |
| contains some identifier for the project, which allows us to include |
| clickable links to source files generating these log messages. The identifier |
| could be some abberviated URL prefix or so (taking inspiration from Go |
| imports). For example, for systemd we could use |
| CODE_BASE=github.com/systemd/systemd/blob/98b0b1123cc or so which is |
| sufficient to build a link by prefixing "http://" and suffixing the |
| CODE_FILE. |
| |
| * Augment MESSAGE_ID with MESSAGE_BASE, in a similar fashion so that we can |
| make clickable links from log messages carrying a MESSAGE_ID, that lead to |
| some explanatory text online. |
| |
| * maybe extend .path units to expose fanotify() per-mount change events |
| |
| * hibernate/s2h: if swap is on weird storage and refuse if so |
| |
| * cgroups: use inotify to get notified when somebody else modifies cgroups |
| owned by us, then log a friendly warning. |
| |
| * beef up log.c with support for stripping ANSI sequences from strings, so that |
| it is OK to include them in log strings. This would be particularly useful so |
| that our log messages could contain clickable links for example for unit |
| files and suchlike we operate on. |
| |
| * add support for "portablectl attach http://foobar.com/waaa.raw (i.e. importd integration) |
| |
| * sync dynamic uids/gids between host+portable srvice (i.e. if DynamicUser=1 is set for a service, make sure that the |
| selected user is resolvable in the service even if it ships its own /etc/passwd) |
| |
| * Fix DECIMAL_STR_MAX or DECIMAL_STR_WIDTH. One includes a trailing NUL, the |
| other doesn't. What a disaster. Probably to exclude it. |
| |
| * Check that users of inotify's IN_DELETE_SELF flag are using it properly, as |
| usually IN_ATTRIB is the right way to watch deleted files, as the former only |
| fires when a file is actually removed from disk, i.e. the link count drops to |
| zero and is not open anymore, while the latter happens when a file is |
| unlinked from any dir. |
| |
| * systemctl, machinectl, loginctl: port "status" commands over to |
| format-table.c's vertical output logic. |
| |
| * pid1: lock image configured with RootDirectory=/RootImage= using the usual nspawn semantics while the unit is up |
| |
| * add --vacuum-xyz options to coredumpctl, matching those journalctl already has. |
| |
| * add CopyFile= or so as unit file setting that may be used to copy files or |
| directory trees from the host to the services RootImage= and RootDirectory= |
| environment. Which we can use for /etc/machine-id and in particular |
| /etc/resolv.conf. Should be smart and do something useful on read-only |
| images, for example fall back to read-only bind mounting the file instead. |
| |
| * bypass SIGTERM state in unit files if KillSignal is SIGKILL |
| |
| * add proper dbus APIs for the various sd_notify() commands, such as MAINPID=1 |
| and so on, which would mean we could report errors and such. |
| |
| * introduce DefaultSlice= or so in system.conf that allows changing where we |
| place our units by default, i.e. change system.slice to something |
| else. Similar, ManagerSlice= should exist so that PID1's own scope unit could |
| be moved somewhere else too. Finally machined and logind should get similar |
| options so that it is possible to move user session scopes and machines to a |
| different slice too by default. Use case: people who want to put resources on |
| the entire system, with the exception of one specific service. See: |
| https://lists.freedesktop.org/archives/systemd-devel/2018-February/040369.html |
| |
| * calenderspec: add support for week numbers and day numbers within a |
| year. This would allow us to define "bi-weekly" triggers safely. |
| |
| * sd-bus: add vtable flag, that may be used to request client creds implicitly |
| and asynchronously before dispatching the operation |
| |
| * sd-bus: parse addresses given in sd_bus_set_addresses immediately and not |
| only when used. Add unit tests. |
| |
| * make use of ethtool veth peer info in machined, for automatically finding out |
| host-side interface pointing to the container. |
| |
| * add some special mode to LogsDirectory=/StateDirectory=… that allows |
| declaring these directories without necessarily pulling in deps for them, or |
| creating them when starting up. That way, we could declare that |
| systemd-journald writes to /var/log/journal, which could be useful when we |
| doing disk usage calculations and so on. |
| |
| * deprecate RootDirectoryStartOnly= in favour of a new ExecStart= prefix char |
| |
| * support projid-based quota in machinectl for containers |
| |
| * add a way to lock down cgroup migration: a boolean, which when set for a unit |
| makes sure the processes in it can never migrate out of it |
| |
| * blog about fd store and restartable services |
| |
| * document Environment=SYSTEMD_LOG_LEVEL=debug drop-in in debugging document |
| |
| * rework ExecOutput and ExecInput enums so that EXEC_OUTPUT_NULL loses its |
| magic meaning and is no longer upgraded to something else if set explicitly. |
| |
| * in the long run: permit a system with /etc/machine-id linked to /dev/null, to |
| make it lose its identity, i.e. be anonymous. For this we'd have to patch |
| through the whole tree to make all code deal with the case where no machine |
| ID is available. |
| |
| * optionally, collect cgroup resource data, and store it in per-unit RRD files, |
| suitable for processing with rrdtool. Add bus API to access this data, and |
| possibly implement a CPULoad property based on it. |
| |
| * beef up pam_systemd to take unit file settings such as cgroups properties as |
| parameters |
| |
| * maybe hook up xfs/ext4 quotactl() with services? i.e. automatically manage |
| the quota of the user indicated in User= via unit file settings, like the |
| other resource management concepts. Would mix nicely with DynamicUser=1. Or |
| alternatively, do this with projids, so that we can also cover services |
| running as root. Quota should probably cover all the special dirs such as |
| StateDirectory=, LogsDirectory=, CacheDirectory=, as well as RootDirectory= if it |
| is set, plus the whole disk space any image configured with RootImage=. |
| |
| * In DynamicUser= mode: before selecting a UID, use disk quota APIs on relevant |
| disks to see if the UID is already in use. |
| |
| * Add AddUser= setting to unit files, similar to DynamicUser=1 which however |
| creates a static, persistent user rather than a dynamic, transient user. We |
| can leverage code from sysusers.d for this. |
| |
| * add some optional flag to ReadWritePaths= and friends, that has the effect |
| that we create the dir in question when the service is started. Example: |
| |
| ReadWritePaths=:/var/lib/foobar |
| |
| * Add ExecMonitor= setting. May be used multiple times. Forks off a process in |
| the service cgroup, which is supposed to monitor the service, and when it |
| exits the service is considered failed by its monitor. |
| |
| * track the per-service PAM process properly (i.e. as an additional control |
| process), so that it may be queried on the bus and everything. |
| |
| * add a new "debug" job mode, that is propagated to unit_start() and for |
| services results in two things: we raise SIGSTOP right before invoking |
| execve() and turn off watchdog support. Then, use that to implement |
| "systemd-gdb" for attaching to the start-up of any system service in its |
| natural habitat. |
| |
| * gpt-auto logic: support encrypted swap, add kernel cmdline option to force |
| it, and honour a gpt bit about it, plus maybe a configuration file |
| |
| * add a percentage syntax for TimeoutStopSec=, e.g. TimeoutStopSec=150%, and |
| then use that for the setting used in user@.service. It should be understood |
| relative to the configured default value. |
| |
| * enable LockMLOCK to take a percentage value relative to physical memory |
| |
| * Permit masking specific netlink APIs with RestrictAddressFamily= |
| |
| * define gpt header bits to select volatility mode |
| |
| * ProtectClock= (drops CAP_SYS_TIMES, adds seecomp filters for settimeofday, adjtimex), sets DeviceAllow o /dev/rtc |
| |
| * ProtectTracing= (drops CAP_SYS_PTRACE, blocks ptrace syscall, makes /sys/kernel/tracing go away) |
| |
| * ProtectMount= (drop mount/umount/pivot_root from seccomp, disallow fuse via DeviceAllow, imply Mountflags=slave) |
| |
| * ProtectKeyRing= to take keyring calls away |
| |
| * RemoveKeyRing= to remove all keyring entries of the specified user |
| |
| * ProtectReboot= that masks reboot() and kexec_load() syscalls, prohibits kill |
| on PID 1 with the relevant signals, and makes relevant files in /sys and |
| /proc (such as the sysrq stuff) unavailable |
| |
| * Support ReadWritePaths/ReadOnlyPaths/InaccessiblePaths in systemd --user instances |
| via the new unprivileged Landlock LSM (https://landlock.io) |
| |
| * make sure the ratelimit object can deal with USEC_INFINITY as way to turn off things |
| |
| * in nss-systemd, if we run inside of RootDirectory= with PrivateUsers= set, |
| find a way to map the User=/Group= of the service to the right name. This way |
| a user/group for a service only has to exist on the host for the right |
| mapping to work. |
| |
| * add bus API for creating unit files in /etc, reusing the code for transient units |
| |
| * add bus API to remove unit files from /etc |
| |
| * add bus API to retrieve current unit file contents (i.e. implement "systemctl cat" on the bus only) |
| |
| * rework fopen_temporary() to make use of open_tmpfile_linkable() (problem: the |
| kernel doesn't support linkat() that replaces existing files, currently) |
| |
| * transient units: don't bother with actually setting unit properties, we |
| reload the unit file anyway |
| |
| * optionally, also require WATCHDOG=1 notifications during service start-up and shutdown |
| |
| * cache sd_event_now() result from before the first iteration... |
| |
| * PID1: find a way how we can reload unit file configuration for |
| specific units only, without reloading the whole of systemd |
| |
| * add an explicit parser for LimitRTPRIO= that verifies |
| the specified range and generates sane error messages for incorrect |
| specifications. |
| |
| * when we detect that there are waiting jobs but no running jobs, do something |
| |
| * PID 1 should send out sd_notify("WATCHDOG=1") messages (for usage in the --user mode, and when run via nspawn) |
| |
| * there's probably something wrong with having user mounts below /sys, |
| as we have for debugfs. for example, src/core/mount.c handles mounts |
| prefixed with /sys generally special. |
| https://lists.freedesktop.org/archives/systemd-devel/2015-June/032962.html |
| |
| * fstab-generator: default to tmpfs-as-root if only usr= is specified on the kernel cmdline |
| |
| * docs: bring https://systemd.io/MY_SERVICE_CANT_GET_REALTIME up to date |
| |
| * add a job mode that will fail if a transaction would mean stopping |
| running units. Use this in timedated to manage the NTP service |
| state. |
| https://lists.freedesktop.org/archives/systemd-devel/2015-April/030229.html |
| |
| * The udev blkid built-in should expose a property that reflects |
| whether media was sensed in USB CF/SD card readers. This should then |
| be used to control SYSTEMD_READY=1/0 so that USB card readers aren't |
| picked up by systemd unless they contain a medium. This would mirror |
| the behaviour we already have for CD drives. |
| |
| * hostnamectl: show root image uuid |
| |
| * Find a solution for SMACK capabilities stuff: |
| https://lists.freedesktop.org/archives/systemd-devel/2014-December/026188.html |
| |
| * synchronize console access with BSD locks: |
| https://lists.freedesktop.org/archives/systemd-devel/2014-October/024582.html |
| |
| * as soon as we have sender timestamps, revisit coalescing multiple parallel daemon reloads: |
| https://lists.freedesktop.org/archives/systemd-devel/2014-December/025862.html |
| |
| * figure out when we can use the coarse timers |
| |
| * maybe allow timer units with an empty Units= setting, so that they |
| can be used for resuming the system but nothing else. |
| |
| * what to do about udev db binary stability for apps? (raw access is not an option) |
| |
| * exponential backoff in timesyncd when we cannot reach a server |
| |
| * timesyncd: add ugly bus calls to set NTP servers per-interface, for usage by NM |
| |
| * add systemd.abort_on_kill or some other such flag to send SIGABRT instead of SIGKILL |
| (throughout the codebase, not only PID1) |
| |
| * drop nss-myhostname in favour of nss-resolve? |
| |
| * resolved: |
| - mDNS/DNS-SD |
| - service registration |
| - service/domain/types browsing |
| - avahi compat |
| - DNS-SD service registration from socket units |
| - resolved should optionally register additional per-interface LLMNR |
| names, so that for the container case we can establish the same name |
| (maybe "host") for referencing the server, everywhere. |
| - allow clients to request DNSSEC for a single lookup even if DNSSEC is off (?) |
| - hook up resolved with machined-based address resolution |
| |
| * refcounting in sd-resolve is borked |
| |
| * add new gpt type for btrfs volumes |
| |
| * generator that automatically discovers btrfs subvolumes, identifies their purpose based on some xattr on them. |
| |
| * a way for container managers to turn off getty starting via $container_headless= or so... |
| |
| * figure out a nice way how we can let the admin know what child/sibling unit causes cgroup membership for a specific unit |
| |
| * For timer units: add some mechanisms so that timer units that trigger immediately on boot do not have the services |
| they run added to the initial transaction and thus confuse Type=idle. |
| |
| * add bus api to query unit file's X fields. |
| |
| * gpt-auto-generator: |
| - Define new partition type for encrypted swap? Support probed LUKS for encrypted swap? |
| - Make /home automount rather than mount? |
| |
| * add generator that pulls in systemd-network from containers when |
| CAP_NET_ADMIN is set, more than the loopback device is defined, even |
| when it is otherwise off |
| |
| * MessageQueueMessageSize= (and suchlike) should use parse_iec_size(). |
| |
| * implement Distribute= in socket units to allow running multiple |
| service instances processing the listening socket, and open this up |
| for ReusePort= |
| |
| * cgroups: |
| - implement per-slice CPUFairScheduling=1 switch |
| - introduce high-level settings for RT budget, swappiness |
| - how to reset dynamically changed unit cgroup attributes sanely? |
| - when reloading configuration, apply new cgroup configuration |
| - when recursively showing the cgroup hierarchy, optionally also show |
| the hierarchies of child processes |
| - add settings for cgroup.max.descendants and cgroup.max.depth, |
| maybe use them for user@.service |
| |
| * transient units: |
| - add field to transient units that indicate whether systemd or somebody else saves/restores its settings, for integration with libvirt |
| |
| * libsystemd-journal, libsystemd-login, libudev: add calls to easily attach these objects to sd-event event loops |
| |
| * be more careful what we export on the bus as (usec_t) 0 and (usec_t) -1 |
| |
| * rfkill,backlight: we probably should run the load tools inside of the udev rules so that the state is properly initialized by the time other software sees it |
| |
| * If we try to find a unit via a dangling symlink, generate a clean |
| error. Currently, we just ignore it and read the unit from the search |
| path anyway. |
| |
| * refuse boot if /usr/lib/os-release is missing or /etc/machine-id cannot be set up |
| |
| * man: the documentation of Restart= currently is very misleading and suggests the tools from ExecStartPre= might get restarted. |
| |
| * There's currently no way to cancel fsck (used to be possible via C-c or c on the console) |
| |
| * add option to sockets to avoid activation. Instead just drop packets/connections, see http://cyberelk.net/tim/2012/02/15/portreserve-systemd-solution/ |
| |
| * make sure systemd-ask-password-wall does not shutdown systemd-ask-password-console too early |
| |
| * verify that the AF_UNIX sockets of a service in the fs still exist |
| when we start a service in order to avoid confusion when a user |
| assumes starting a service is enough to make it accessible |
| |
| * Make it possible to set the keymap independently from the font on |
| the kernel cmdline. Right now setting one resets also the other. |
| |
| * and a dbus call to generate target from current state |
| |
| * investigate whether the gnome pty helper should be moved into systemd, to provide cgroup support. |
| |
| * dot output for --test showing the 'initial transaction' |
| |
| * be able to specify a forced restart of service A where service B depends on, in case B |
| needs to be auto-respawned? |
| |
| * pid1: |
| - When logging about multiple units (stopping BoundTo units, conflicts, etc.), |
| log both units as UNIT=, so that journalctl -u triggers on both. |
| - generate better errors when people try to set transient properties |
| that are not supported... |
| https://lists.freedesktop.org/archives/systemd-devel/2015-February/028076.html |
| - recreate systemd's D-Bus private socket file on SIGUSR2 |
| - when we automatically restart a service, ensure we restart its rdeps, too. |
| - hide PAM options in fragment parser when compile time disabled |
| - Support --test based on current system state |
| - If we show an error about a unit (such as not showing up) and it has no Description string, then show a description string generated form the reverse of unit_name_mangle(). |
| - after deserializing sockets in socket.c we should reapply sockopts and things |
| - drop PID 1 reloading, only do reexecing (difficult: Reload() |
| currently is properly synchronous, Reexec() is weird, because we |
| cannot delay the response properly until we are back, so instead of |
| being properly synchronous we just keep open the fd and close it |
| when done. That means clients do not get a successful method reply, |
| but much rather a disconnect on success. |
| - when breaking cycles drop sysv services first, then services from /run, then from /etc, then from /usr |
| - when a bus name of a service disappears from the bus make sure to queue further activation requests |
| - maybe introduce CoreScheduling=yes/no to optionally set a PR_SCHED_CORE cookie, so that all |
| processes in a service's cgroup share the same cookie and are guaranteed not to share SMT cores |
| with other units https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/hw-vuln/core-scheduling.rst |
| - ExtensionImages= deduplication for services is currently only applied to disk images without GPT envelope. |
| This should be extended to work with proper DDIs too, as well as directory confext/sysext. Moreover, |
| system-wide confext/sysext should support this too. |
| - Pin the mount namespace via FD by sending it back from sd-exec to the manager, and use it |
| for live mounting, instead of doing it via PID |
| |
| * unit files: |
| - allow port=0 in .socket units |
| - maybe introduce ExecRestartPre= |
| - implement Register= switch in .socket units to enable registration |
| in Avahi, RPC and other socket registration services. |
| - allow Type=simple with PIDFile= |
| https://bugzilla.redhat.com/show_bug.cgi?id=723942 |
| - allow writing multiple conditions in unit files on one line |
| - introduce Type=pid-file |
| - add a concept of RemainAfterExit= to scope units |
| - Allow multiple ExecStart= for all Type= settings, so that we can cover rescue.service nicely |
| - add verification of [Install] section to systemd-analyze verify |
| |
| * timer units: |
| - timer units should get the ability to trigger when DST changes |
| - Modulate timer frequency based on battery state |
| |
| * add libsystemd-password or so to query passwords during boot using the password agent logic |
| |
| * clean up date formatting and parsing so that all absolute/relative timestamps we format can also be parsed |
| |
| * on shutdown: move utmp, wall, audit logic all into PID 1 (or logind?), get rid of systemd-update-utmp-runlevel |
| |
| * make repeated alt-ctrl-del presses printing a dump |
| |
| * currently x-systemd.timeout is lost in the initrd, since crypttab is copied into dracut, but fstab is not |
| |
| * add a pam module that on password changes updates any LUKS slot where the password matches |
| |
| * test/: |
| - add unit tests for config_parse_device_allow() |
| |
| * seems that when we follow symlinks to units we prefer the symlink |
| destination path over /etc and /usr. We should not do that. Instead |
| /etc should always override /run+/usr and also any symlink |
| destination. |
| |
| * when isolating, try to figure out a way how we implicitly can order |
| all units we stop before the isolating unit... |
| |
| * teach ConditionKernelCommandLine= globs or regexes (in order to match foobar={no,0,off}) |
| |
| * Add ConditionDirectoryNotEmpty= handle non-absoute paths as a search path or add |
| ConditionConfigSearchPathNotEmpty= or different syntax? See the discussion starting at |
| https://github.com/systemd/systemd/pull/15109#issuecomment-607740136. |
| |
| * BootLoaderSpec: Define a way how an installer can figure out whether a BLS |
| compliant boot loader is installed. |
| |
| * think about requeuing jobs when daemon-reload is issued? use case: |
| the initrd issues a reload after fstab from the host is accessible |
| and we might want to requeue the mounts local-fs acquired through |
| that automatically. |
| |
| * systemd-inhibit: make taking delay locks useful: support sending SIGINT or SIGTERM on PrepareForSleep() |
| |
| * remove any syslog support from log.c — we probably cannot do this before split-off udev is gone for good |
| |
| * shutdown logging: store to EFI var, and store to USB stick? |
| |
| * merge unit_kill_common() and unit_kill_context() |
| |
| * add a dependency on standard-conf.xml and other included files to man pages |
| |
| * MountFlags=shared acts as MountFlags=slave right now. |
| |
| * properly handle loop back mounts via fstab, especially regards to fsck/passno |
| |
| * initialize the hostname from the fs label of /, if /etc/hostname does not exist? |
| |
| * sd-bus: |
| - EBADSLT handling |
| - GetAllProperties() on a non-existing object does not result in a failure currently |
| - port to sd-resolve for connecting to TCP dbus servers |
| - see if we can introduce a new sd_bus_get_owner_machine_id() call to retrieve the machine ID of the machine of the bus itself |
| - see if we can drop more message validation on the sending side |
| - add API to clone sd_bus_message objects |
| - longer term: priority inheritance |
| - dbus spec updates: |
| - NameLost/NameAcquired obsolete |
| - path escaping |
| - update systemd.special(7) to mention that dbus.socket is only about the compatibility socket now |
| |
| * sd-event |
| - allow multiple signal handlers per signal? |
| - document chaining of signal handler for SIGCHLD and child handlers |
| - define more intervals where we will shift wakeup intervals around in, 1h, 6h, 24h, ... |
| - maybe support iouring as backend, so that we allow hooking read and write |
| operations instead of IO ready events into event loops. See considerations |
| here: |
| http://blog.vmsplice.net/2020/07/rethinking-event-loop-integration-for.html |
| |
| * dbus: when a unit failed to load (i.e. is in UNIT_ERROR state), we |
| should be able to safely try another attempt when the bus call LoadUnit() is invoked. |
| |
| * document org.freedesktop.MemoryAllocation1 |
| |
| * maybe do not install getty@tty1.service symlink in /etc but in /usr? |
| |
| * print a nicer explanation if people use variable/specifier expansion in ExecStart= for the first word |
| |
| * mount: turn dependency information from /proc/self/mountinfo into dependency information between systemd units. |
| |
| * EFI: |
| - honor language efi variables for default language selection (if there are any?) |
| - honor timezone efi variables for default timezone selection (if there are any?) |
| * bootctl |
| - recognize the case when not booted on EFI |
| |
| * bootctl: |
| - show whether UEFI audit mode is available |
| - teach it to prepare an ESP wholesale, i.e. with mkfs.vfat invocation |
| - teach it to copy in unified kernel images and maybe type #1 boot loader spec entries from host |
| |
| * logind: |
| - logind: optionally, ignore idle-hint logic for autosuspend, block suspend as long as a session is around |
| - logind: wakelock/opportunistic suspend support |
| - Add pretty name for seats in logind |
| - logind: allow showing logout dialog from system? |
| - add Suspend() bus calls which take timestamps to fix double suspend issues when somebody hits suspend and closes laptop quickly. |
| - if pam_systemd is invoked by su from a process that is outside of a |
| any session we should probably just become a NOP, since that's |
| usually not a real user session but just some system code that just |
| needs setuid(). |
| - logind: make the Suspend()/Hibernate() bus calls wait for the for |
| the job to be completed. before returning, so that clients can wait |
| for "systemctl suspend" to finish to know when the suspending is |
| complete. |
| - logind: when the power button is pressed short, just popup a |
| logout dialog. If it is pressed for 1s, do the usual |
| shutdown. Inspiration are Macs here. |
| - expose "Locked" property on logind session objects |
| - maybe allow configuration of the StopTimeout for session scopes |
| - rename session scope so that it includes the UID. THat way |
| the session scope can be arranged freely in slices and we don't have |
| make assumptions about their slice anymore. |
| - follow PropertiesChanged state more closely, to deal with quick logouts and |
| relogins |
| - (optionally?) spawn seat-manager@$SEAT.service whenever a seat shows up that as CanGraphical set |
| |
| * move multiseat vid/pid matches from logind udev rule to hwdb |
| |
| * delay activation of logind until somebody logs in, or when /dev/tty0 pulls it |
| in or lingering is on (so that containers don't bother with it until PAM is used). also exit-on-idle |
| |
| * journal: |
| - consider introducing implicit _TTY= + _PPID= + _EUID= + _EGID= + _FSUID= + _FSGID= fields |
| - journald: also get thread ID from client, plus thread name |
| - journal: when waiting for journal additions in the client always sleep at least 1s or so, in order to minimize wakeups |
| - add API to close/reopen/get fd for journal client fd in libsystemd-journal. |
| - fall back to /dev/log based logging in libsystemd-journal, if we cannot log natively? |
| - declare the local journal protocol stable in the wiki interface chart |
| - sd-journal: speed up sd_journal_get_data() with transparent hash table in bg |
| - journald: when dropping msgs due to ratelimit make sure to write |
| "dropped %u messages" not only when we are about to print the next |
| message that works, but already after a short timeout |
| - check if we can make journalctl by default use --follow mode inside of less if called without args? |
| - maybe add API to send pairs of iovecs via sd_journal_send |
| - journal: add a setgid "systemd-journal" utility to invoke from libsystemd-journal, which passes fds via STDOUT and does PK access |
| - journalctl: support negative filtering, i.e. FOOBAR!="waldo", |
| and !FOOBAR for events without FOOBAR. |
| - journal: store timestamp of journal_file_set_offline() in the header, |
| so it is possible to display when the file was last synced. |
| - journal-send.c, log.c: when the log socket is clogged, and we drop, count this and write a message about this when it gets unclogged again. |
| - journal: find a way to allow dropping history early, based on priority, other rules |
| - journal: When used on NFS, check payload hashes |
| - journald: add kernel cmdline option to disable ratelimiting for debug purposes |
| - refuse taking lower-case variable names in sd_journal_send() and friends. |
| - journald: we currently rotate only after MaxUse+MaxFilesize has been reached. |
| - journal: deal nicely with byte-by-byte copied files, especially regards header |
| - journal: sanely deal with entries which are larger than the individual file size, but where the components would fit |
| - Replace utmp, wtmp, btmp, and lastlog completely with journal |
| - journalctl: instead --after-cursor= maybe have a --cursor=XYZ+1 syntax? |
| - when a kernel driver logs in a tight loop, we should ratelimit that too. |
| - journald: optionally, log debug messages to /run but everything else to /var |
| - journald: when we drop syslog messages because the syslog socket is |
| full, make sure to write how many messages are lost as first thing |
| to syslog when it works again. |
| - journald: allow per-priority and per-service retention times when rotating/vacuuming |
| - journald: make use of uid-range.h to manage uid ranges to split |
| journals in. |
| - journalctl: add the ability to look for the most recent process of a binary. |
| journalctl /usr/bin/X11 --invocation=-1 |
| - systemctl: change 'status' to show logs for the last invocation, not a fixed |
| number of lines |
| - systemctl: expand --wait to show logs for the invocation with a new switch |
| - improve journalctl performance by loading journal files |
| lazily. Encode just enough information in the file name, so that we |
| do not have to open it to know that it is not interesting for us, for |
| the most common operations. |
| - man: document that corrupted journal files is nothing to act on |
| - rework journald sigbus stuff to use mutex |
| - Set RLIMIT_NPROC for systemd-journal-xyz, and all other of our |
| services that run under their own user ids, and use User= (but only |
| in a world where userns is ubiquitous since otherwise we cannot |
| invoke those daemons on the host AND in a container anymore). Also, |
| if LimitNPROC= is used without User= we should warn and refuse |
| operation. |
| - journalctl --verify: don't show files that are currently being |
| written to as FAIL, but instead show that they are being written to. |
| - add journalctl -H that talks via ssh to a remote peer and passes through |
| binary logs data |
| - add a version of --merge which also merges /var/log/journal/remote |
| - journalctl: -m should access container journals directly by enumerating |
| them via machined, and also watch containers coming and going. |
| Benefit: nspawn --ephemeral would start working nicely with the journal. |
| - assign MESSAGE_ID to log messages about failed services |
| - check if loop in decompress_blob_xz() is necessary |
| |
| * journald: support RFC3164 fully for the incoming syslog transport, see |
| https://github.com/systemd/systemd/issues/19251#issuecomment-816601955 |
| |
| * Hook up journald's FSS logic with TPM2: seal the verification disk by |
| time-based policy, so that the verification key can remain on host and ve |
| validated via TPM. |
| |
| * rework journalctl -M to be based on a machined method that generates a mount |
| fd of the relevant journal dirs in the container with uidmapping applied to |
| allow the host to read it, while making everything read-only. |
| |
| * journald: add varlink service that allows subscribing to certain log events, |
| for example matching by message ID, or log level returns a list of journal |
| cursors as they happen. |
| |
| * journald: also collect CLOCK_BOOTTIME timestamps per log entry. Then, derive |
| "corrected" CLOCK_REALTIME information on display from that and the timestamp |
| info of the newest entry of the specific boot (as identified by the boot |
| ID). This way, if a system comes up without a valid clock but acquires a |
| better clock later, we can "fix" older entry timestamps on display, by |
| calculating backwards. We cannot use CLOCK_MONOTONIC for this, since it does |
| not account for suspend phases. This would then also enable us to correct the |
| kmsg timestamping we consume (where we erroneously assume the clock was in |
| CLOCK_MONOTONIC, but it actually is CLOCK_BOOTTIME as per kernel). |
| |
| * in journald, write out a recognizable log record whenever the system clock is |
| changed ("stepped"), and in timesyncd whenever we acquire an NTP fix |
| ("slewing"). Then, in journalctl for each boot time we come across, find |
| these records, and use the structured info they include to display |
| "corrected" wallclock time, as calculated from the monotonic timestamp in the |
| log record, adjusted by the delta declared in the structured log record. |
| |
| * in journald: whenever we start a new journal file because the boot ID |
| changed, let's generate a recognizable log record containing info about old |
| and new ID. Then, when displaying log stream in journalctl look for these |
| records, to be able to order them. |
| |
| * journald: generate recognizable log events whenever we shutdown journald |
| cleanly, and when we migrate run → var. This way tools can verify that a |
| previous boot terminated cleanly, because either of these two messages must |
| be safely written to disk, then. |
| |
| * hook up journald with TPMs? measure new journal records to the TPM in regular |
| intervals, validate the journal against current TPM state with that. (taking |
| inspiration from IMA log) |
| |
| * sd-journal puts a limit on parallel journal files to view at once. journald |
| should probably honour that same limit (JOURNAL_FILES_MAX) when vacuuming to |
| ensure we never generate more files than we can actually view. |
| |
| * bsod: maybe use graphical mode. Use DRM APIs directly, see |
| https://github.com/dvdhrm/docs/blob/master/drm-howto/modeset.c for an example |
| for doing that. |
| |
| * maybe implicitly attach monotonic+realtime timestamps to outgoing messages in |
| log.c and sd-journal-send |
| |
| * journalctl/timesyncd: whenever timesyncd acquires a synchronization from NTP, |
| create a structured log entry that contains boot ID, monotonic clock and |
| realtime clock (I mean, this requires no special work, as these three fields |
| are implicit). Then in journalctl when attempting to display the realtime |
| timestamp of a log entry, first search for the closest later log entry |
| of this kinda that has a matching boot id, and convert the monotonic clock |
| timestamp of the entry to the realtime clock using this info. This way we can |
| retroactively correct the wallclock timestamps, in particular for systems |
| without RTC, i.e. where initially wallclock timestamps carry rubbish, until |
| an NTP sync is acquired. |
| |
| * introduce per-unit (i.e. per-slice, per-service) journal log size limits. |
| |
| * journald: do journal file writing out-of-process, with one writer process per |
| client UID, so that synthetic hash table collisions can slow down a specific |
| user's journal stream down but not the others. |
| |
| * tweak journald context caching. In addition to caching per-process attributes |
| keyed by PID, cache per-cgroup attributes (i.e. the various xattrs we read) |
| keyed by cgroup path, and guarded by ctime changes. This should provide us |
| with a nice speed-up on services that have many processes running in the same |
| cgroup. |
| |
| * maybe add call sd_journal_set_block_timeout() or so to set SO_SNDTIMEO for |
| the sd-journal logging socket, and, if the timeout is set to 0, sets |
| O_NONBLOCK on it. That way people can control if and when to block for |
| logging. |
| |
| * journalctl: make sure -f ends when the container indicated by -M terminates |
| |
| * journald: sigbus API via a signal-handler safe function that people may call |
| from the SIGBUS handler |
| |
| * add a test if all entries in the catalog are properly formatted. |
| (Adding dashes in a catalog entry currently results in the catalog entry |
| being silently skipped. journalctl --update-catalog must warn about this, |
| and we should also have a unit test to check that all our message are OK.) |
| |
| * build short web pages out of each catalog entry, build them along with man |
| pages, and include hyperlinks to them in the journal output |
| |
| * homed: |
| - when user tries to log into record signed by unrecognized key, automatically add key to our chain after polkit auth |
| - rollback when resize fails mid-operation |
| - GNOME's side for forget key on suspend (requires rework so that lock screen runs outside of uid) |
| - update LUKS password on login if we find there's a password that unlocks the JSON record but not the LUKS device. |
| - create on activate? |
| - properties: icon url?, administrator bool (which translates to 'wheel' membership)?, address?, telephone?, vcard?, samba stuff?, parental controls? |
| - communicate clearly when usb stick is safe to remove. probably involves |
| beefing up logind to make pam session close hook synchronous and wait until |
| systemd --user is shut down. |
| - logind: maybe keep a "busy fd" as long as there's a non-released session around or the user@.service |
| - maybe make automatic, read-only, time-based reflink-copies of LUKS disk |
| images (and btrfs snapshots of subvolumes) (think: time machine) |
| - distinguish destroy / remove (i.e. currently we can unregister a user, unregister+remove their home directory, but not just remove their home directory) |
| - in systemd's PAMName= logic: query passwords with ssh-askpassword, so that we can make "loginctl set-linger" mode work |
| - fingerprint authentication, pattern authentication, … |
| - make sure "classic" user records can also be managed by homed |
| - make size of $XDG_RUNTIME_DIR configurable in user record |
| - move acct mgmt stuff from pam_systemd_home to pam_systemd? |
| - when "homectl --pkcs11-token-uri=" is used, synthesize ssh-authorized-keys records for all keys we have private keys on the stick for |
| - make slice for users configurable (requires logind rework) |
| - logind: populate auto-login list bus property from PKCS#11 token |
| - when determining state of a LUKS home directory, check DM suspended sysfs file |
| - when homed is in use, maybe start the user session manager in a mount namespace with MS_SLAVE, |
| so that mounts propagate down but not up - eg, user A setting up a backup volume |
| doesn't mean user B sees it |
| - use credentials logic/TPM2 logic to store homed signing key |
| - permit multiple user record signing keys to be used locally, and pick |
| the right one for signing records automatically depending on a pre-existing |
| signature |
| - add a way to "adopt" a home directory, i.e. strip foreign signatures |
| and insert a local signature instead. |
| - as an extension to the directory+subvolume backend: if located on |
| especially marked fs, then sync down password into LUKS header of that fs, |
| and always verify passwords against it too. Bootstrapping is a problem |
| though: if no one is logged in (or no other user even exists yet), how do you |
| unlock the volume in order to create the first user and add the first pw. |
| - support new FS_IOC_ADD_ENCRYPTION_KEY ioctl for setting up fscrypt |
| - maybe pre-create ~/.cache as subvol so that it can have separate quota |
| easily? |
| - store PKCS#11 + FIDO2 token info in LUKS2 header, compatible with |
| systemd-cryptsetup, so that it can unlock homed volumes |
| - maybe make all *.home files owned by `systemd-home` user or so, so that we |
| can easily set overall quota for all users |
| - on login, if we can't fallocate initially, but rebalance is on, then allow |
| login in discard mode, then immediately rebalance, then turn off discard |
| - add "homectl unbind" command to remove local user record of an inactive |
| home dir |
| |
| * add a new switch --auto-definitions=yes/no or so to systemd-repart. If |
| specified, synthesize a definition automatically if we can: enlarge last |
| partition on disk, but only if it is marked for growing and not read-only. |
| |
| * systemd-repart: read LUKS encryption key from $CREDENTIALS_DIRECTORY |
| |
| * systemd-repart: add a switch to factory reset the partition table without |
| immediately applying the new configuration again. i.e. --factory-reset=leave |
| or so. (this is useful to factory reset an image, then putting it into |
| another machine, ensuring that luks key is generated on new machine, not old) |
| |
| * systemd-repart: support setting up dm-integrity with HMAC |
| |
| * systemd-repart: maybe remove half-initialized image on failure. It fails |
| if the output file exists, so a repeated invocation will usually fail if |
| something goes wrong on the way. |
| |
| * systemd-repart: by default generate minimized partition tables (i.e. tables |
| that only cover the space actually used, excluding any free space at the |
| end), in order to maximize dd'ability. Requires libfdisk work, see |
| https://github.com/karelzak/util-linux/issues/907 |
| |
| * systemd-repart: MBR partition table support. Care needs to be taken regarding |
| Type=, so that partition definitions can sanely apply to both the GPT and the |
| MBR case. Idea: accept syntax "Type=gpt:home mbr:0x83" for setting the types |
| for the two partition types explicitly. And provide an internal mapping so |
| that "Type=linux-generic" maps to the right types for both partition tables |
| automatically. |
| |
| * systemd-repart: allow sizing partitions as factor of available RAM, so that |
| we can reasonably size swap partitions for hibernation. |
| |
| * systemd-repart: allow boolean option that ensures that if existing partition |
| doesn't exist within the configured size bounds the whole command fails. This |
| is useful to implement ESP vs. XBOOTLDR schemes in installers: have one set |
| of repart files for the case where ESP is large enough and one where it isn't |
| and XBOOTLDR is added in instead. Then apply the former first, and if it |
| fails to apply use the latter. |
| |
| * systemd-repart: add per-partition option to never reuse existing partition |
| and always create anew even if matching partition already exists. |
| |
| * systemd-repart: add per-partition option to fail if partition already exist, |
| i.e. is not added new. Similar, add option to fail if partition does not exist yet. |
| |
| * systemd-repart: allow disabling growing of specific partitions, or making |
| them (think ESP: we don't ever want to grow it, since we cannot resize vfat) |
| Also add option to disable operation via kernel command line. |
| |
| * systemd-repart: make it a static checker during early boot for existence and |
| absence of other partitions for trusted boot environments |
| |
| * systemd-repart: add support for SD_GPT_FLAG_GROWFS also on real systems, i.e. |
| generate some unit to actually enlarge the fs after growing the partition |
| during boot. |
| |
| * systemd-repart: do not print "Successfully resized …" when no change was done. |
| |
| * document: |
| - document that deps in [Unit] sections ignore Alias= fields in |
| [Install] units of other units, unless those units are disabled |
| - man: clarify that time-sync.target is not only sysv compat but also useful otherwise. Same for similar targets |
| - document that service reload may be implemented as service reexec |
| - add a man page containing packaging guidelines and recommending usage of things like Documentation=, PrivateTmp=, PrivateNetwork= and ReadOnlyDirectories=/etc /usr. |
| - document systemd-journal-flush.service properly |
| - documentation: recommend to connect the timer units of a service to the service via Also= in [Install] |
| - man: document the very specific env the shutdown drop-in tools live in |
| - man: add more examples to man pages, |
| - in particular an example how to do the equivalent of switching runlevels |
| - man: maybe sort directives in man pages, and take sections from --help and apply them to man too |
| - document root=gpt-auto properly |
| |
| * systemctl: |
| - add systemctl switch to dump transaction without executing it |
| - Add a verbose mode to "systemctl start" and friends that explains what is being done or not done |
| - print nice message from systemctl --failed if there are no entries shown, and hook that into ExecStartPre of rescue.service/emergency.service |
| - add new command to systemctl: "systemctl system-reexec" which reexecs as many daemons as virtually possible |
| - systemctl enable: fail if target to alias into does not exist? maybe show how many units are enabled afterwards? |
| - systemctl: "Journal has been rotated since unit was started." message is misleading |
| |
| * introduce an option (or replacement) for "systemctl show" that outputs all |
| properties as JSON, similar to busctl's new JSON output. In contrast to that |
| it should skip the variant type string though. |
| |
| * Add a "systemctl list-units --by-slice" mode or so, which rearranges the |
| output of "systemctl list-units" slightly by showing the tree structure of |
| the slices, and the units attached to them. |
| |
| * add "systemctl wait" or so, which does what "systemd-run --wait" does, but |
| for all units. It should be both a way to pin units into memory as well as a |
| wait to retrieve their exit data. |
| |
| * show whether a service has out-of-date configuration in "systemctl status" by |
| using mtime data of ConfigurationDirectory=. |
| |
| * "systemctl preset-all" should probably order the unit files it |
| operates on lexicographically before starting to work, in order to |
| ensure deterministic behaviour if two unit files conflict (like DMs |
| do, for example) |
| |
| * add "systemctl start -v foobar.service" that shows logs of a service |
| while the start command runs. This is non-trivial to do without |
| races though, since we should flush out all journal messages before |
| returning from the "systemctl stop". |
| |
| * systemctl: if some operation fails, show log output? |
| |
| * Add a new verb "systemctl top" |
| |
| * unit install: |
| - "systemctl mask" should find all names by which a unit is accessible |
| (i.e. by scanning for symlinks to it) and link them all to /dev/null |
| |
| * nspawn: |
| - emulate /dev/kmsg using CUSE and turn off the syslog syscall |
| with seccomp. That should provide us with a useful log buffer that |
| systemd can log to during early boot, and disconnect container logs |
| from the kernel's logs. |
| - as soon as networkd has a bus interface, hook up --network-interface=, |
| --network-bridge= with networkd, to trigger netdev creation should an |
| interface be missing |
| - a nice way to boot up without machine id set, so that it is set at boot |
| automatically for supporting --ephemeral. Maybe hash the host machine id |
| together with the machine name to generate the machine id for the container |
| - fix logic always print a final newline on output. |
| https://github.com/systemd/systemd/pull/272#issuecomment-113153176 |
| - should optionally support receiving WATCHDOG=1 messages from its payload |
| PID 1... |
| - optionally automatically add FORWARD rules to iptables whenever nspawn is |
| running, remove them when shut down. |
| - add support for sysext extensions, too. i.e. a new --extension= switch that |
| takes one or more arguments, and applies the extensions already during |
| startup. |
| - when main nspawn supervisor process gets suspended due to SIGSTOP/SIGTTOU |
| or so, freeze the payload too. |
| - support time namespaces |
| - on cgroupsv1 issue cgroup empty handler process based on host events, so |
| that we make cgroup agent logic safe |
| - add API to invoke binary in container, then use that as fallback in |
| "machinectl shell" |
| - make nspawn suitable for shell pipelines: instead of triggering a hangup |
| when input is finished, send ^D, which synthesizes an EOF. Then wait for |
| hangup or ^D before passing on the EOF. |
| - greater control over selinux label? |
| - support that /proc, /sys/, /dev are pre-mounted |
| - maybe allow TPM passthrough, backed by swtpm, and measure --image= hash |
| into its PCR 11, so that nspawn instances can be TPM enabled, and partake |
| in measurements/remote attestation and such. swtpm would run outside of |
| control of container, and ideally would itself bind its encryption keys to |
| host TPM. |
| - make boot assessment do something sensible in a container. i.e send an |
| sd_notify() from payload to container manager once boot-up is completed |
| successfully, and use that in nspawn for dealing with boot counting, |
| implemented in the partition table labels and directory names. |
| - optionally set up nftables/iptables routes that forward UDP/TCP traffic on |
| port 53 to resolved stub 127.0.0.54 |
| - maybe optionally insert .nspawn file as GPT partition into images, so that |
| such container images are entirely stand-alone and can be updated as one. |
| - The subreaper logic we currently have seems overly complex. We should |
| investigate whether creating the inner child with CLONE_PARENT isn't better. |
| - Reduce the number of sockets that are currently in use and just rely on one |
| or two sockets. |
| - Support running nspawn as an unprivileged user. |
| |
| * machined: |
| - add an API so that libvirt-lxc can inform us about network interfaces being |
| removed or added to an existing machine |
| - "machinectl migrate" or similar to copy a container from or to a |
| difference host, via ssh |
| - introduce systemd-nspawn-ephemeral@.service, and hook it into |
| "machinectl start" with a new --ephemeral switch |
| - "machinectl status" should also show internal logs of the container in |
| question |
| - "machinectl history" |
| - "machinectl diff" |
| - "machinectl commit" that takes a writable snapshot of a tree, invokes a |
| shell in it, and marks it read-only after use |
| |
| * udev: |
| - move to LGPL |
| - kill scsi_id |
| - add trigger --subsystem-match=usb/usb_device device |
| - reimport udev db after MOVE events for devices without dev_t |
| - re-enable ProtectClock= once only cgroupsv2 is supported. |
| See f562abe2963bad241d34e0b308e48cf114672c84. |
| |
| * coredump: |
| - save coredump in Windows/Mozilla minidump format |
| - when truncating coredumps, also log the full size that the process had, and make a metadata field so we can report truncated coredumps |
| - add examples for other distros in ELF_PACKAGE_METADATA |
| |
| * support crash reporting operation modes (https://live.gnome.org/GnomeOS/Design/Whiteboards/ProblemReporting) |
| |
| * tmpfiles: |
| - allow time-based cleanup in r and R too |
| - instead of ignoring unknown fields, reject them. |
| - creating new directories/subvolumes/fifos/device nodes |
| should not follow symlinks. None of the other adjustment or creation |
| calls follow symlinks. |
| - teach tmpfiles.d q/Q logic something sensible in the context of XFS/ext4 |
| project quota |
| - teach tmpfiles.d m/M to move / atomic move + symlink old -> new |
| - add new line type for setting btrfs subvolume attributes (i.e. rw/ro) |
| - tmpfiles: add new line type for setting fcaps |
| - add -n as shortcut for --dry-run in tmpfiles & sysusers & possibly other places |
| |
| * udev-link-config: |
| - Make sure ID_PATH is always exported and complete for |
| network devices where possible, so we can safely rely |
| on Path= matching |
| |
| * sd-rtnl: |
| - add support for more attribute types |
| - inbuilt piping support (essentially degenerate async)? see loopback-setup.c and other places |
| |
| * networkd: |
| - add more keys to [Route] and [Address] sections |
| - add support for more DHCPv4 options (and, longer term, other kinds of dynamic config) |
| - add reduced [Link] support to .network files |
| - properly handle routerless dhcp leases |
| - work with non-Ethernet devices |
| - dhcp: do we allow configuring dhcp routes on interfaces that are not the one we got the dhcp info from? |
| - the DHCP lease data (such as NTP/DNS) is still made available when |
| a carrier is lost on a link. It should be removed instantly. |
| - expose in the API the following bits: |
| - option 15, domain name |
| - option 12, hostname and/or option 81, fqdn |
| - option 123, 144, geolocation |
| - option 252, configure http proxy (PAC/wpad) |
| - provide a way to define a per-network interface default metric value |
| for all routes to it. possibly a second default for DHCP routes. |
| - allow Name= to be specified repeatedly in the [Match] section. Maybe also |
| support Name=foo*|bar*|baz ? |
| - whenever uplink info changes, make DHCP server send out FORCERENEW |
| |
| * in networkd, when matching device types, fix up DEVTYPE rubbish the kernel passes to us |
| |
| * Figure out how to do unittests of networkd's state serialization |
| |
| * dhcp: |
| - figure out how much we can increase Maximum Message Size |
| |
| * dhcp6: |
| - add functions to set previously stored IPv6 addresses on startup and get |
| them at shutdown; store them in client->ia_na |
| - write more test cases |
| - implement reconfigure support, see 5.3., 15.11. and 22.20. |
| - implement support for temporary addresses (IA_TA) |
| - implement dhcpv6 authentication |
| - investigate the usefulness of Confirm messages; i.e. are there any |
| situations where the link changes without any loss in carrier detection |
| or interface down |
| - some servers don't do rapid commit without a filled in IA_NA, verify |
| this behavior |
| - RouteTable= ? |
| |
| * shared/wall: Once more programs are taught to prefer sd-login over utmp, |
| switch the default wall implementation to wall_logind |
| (https://github.com/systemd/systemd/pull/29051#issuecomment-1704917074) |