Chromium OS Embedded Controller runtime

Design principles

Never do at runtime what you can do at compile time The goal is saving flash space and computations. Compile-time configuration until you really need to switch at runtime.
Real-time: guarantee low latency (eg < 20 us) no interrupt disabling ... bounded code in interrupt handlers.
Keep it simple: design for the subset of microcontroller we use targeted at 32-bit single core CPU for small systems : 4kB to 64kB data RAM, possibly execute-in-place from flash.

Execution contexts

This is a pre-emptible runtime with static tasks. It has only 2 possible execution contexts:

the regular tasks
the interrupt handlers

The initial startup is an exception as described in the dedicated paragraph.

tasks

The tasks are statically defined at compile-time. They are described for each board in the board/$board/ec.tasklist file.

They also have a static fixed priority implicitly defined at compile-time by their order in the ec.tasklist file (the top-most one being the lowest priority aka task 1). As a consequence, two different tasks cannot have the same priority.

In order to store its context, each task has its own stack whose (small) size is defined at compile-time in the ec.tasklist file.

A task can normally be preempted at any time by either interrupts or higher priority tasks, see the preemption section for details and the locking section for the few cases where you need to avoid it.

interrupts

The hardware interrupt requests are connected to the interruption handling C routines declared by the DECLARE_IRQ macros, through some chip/core specific mechanisms (e.g. depending whether we have a vectored interrupt controller, slave interrupt controllers...)

The interrupts can be nested (ie interrupted by a higher priority interrupt). All the interrupt vectors are assigned a priority as defined in their DECLARE_IRQ macro. The number of available priority level is architecture-specific (e.g. 4 on Cortex-M0, 8 on Cortex-M3/M4) and several interrupt handlers can have the same priority. An interrupt handler can only be interrupted by an handler having a priority strictly greater than its own.

In most cases, the exceptions (e.g data/prefetch aborts, software interrupt) can be seen as interrupts with a priority strictly greater than all IRQ vectors. So they can interrupt any IRQ handler using the same nesting mechanism. All fatal exceptions should ultimately lead to a reboot.

Events

Each task has a pending events bitmap[1] implemented as a 32-bit word. Several events are pre-defined for all tasks, the most significant bits on the 32-bit bitmap are reserved for them : the timer pending event on bit 31 (see the corresponding section), the requested task wake (bit 29), the event to kick the waiters on a mutex (bit 30), along with a few hardware specific events. The 19 least significant bits are available for task-specific meanings.

Those event bits are used in inter-task communication and scheduling mechanism, other tasks and interrupt handlers can atomically set them to request specific actions from the task. Therefore, the presence of pending events in a task bitmap has an impact on its scheduling as described in the scheduling section. These requests are done using the task_set_event() and task_wake() primitives.

The two typical use-cases are:

a task sends a message to another task (simply use some common memory structures see explanation and want it to process it now.
an hardware IRQ occurred and we need to do some long processing to respond to it (e.g. an I2C transaction). The associated interrupt handler cannot do it (for latency reason), so it will raise an event to ask a task to do it.

The task code chooses to consume them (or a subset of them) when it's running through the task_wait_event() and task_wait_event_mask() primitives.

Scheduling and preemption

The system has a global bitmap[1] called tasks_ready containing one bit per task and indicating whether or not it is ready to run (ie want/need to be scheduled). The task ready bit can only be cleared when it's calling itself one of the functions explicitly triggering a re-scheduling (e.g. task_wait_event() or task_set_event()) and it has no pending event. The task ready bit is set by any task or interrupt handler setting an event bit for the task (ie task_set_event()).

The scheduling is based on (and only on) the tasks_ready bitmap (which is derived from all the events bitmap of the tasks as explained above).

Then, the scheduling policy to find which task should run is just finding the most significant bit set in the tasks_ready bitmap and schedule the corresponding task.

Important note: the re-scheduling happens only when we are exiting the interrupt context. It is done in a non-preemptible context (likely with the highest priority). Indeed, a re-scheduling is actually needed only when the highest priority task ready has changed. There are 3 distinct cases where this can happen:

an interrupt handler sets a new event for a task. In this case, task_set_event will detect that it is executed in interrupt context and record in the need_resched_or_profiling variable that it might need to re-schedule at interrupt return. When the current interrupt is going to return, it will see this bit and decide to take the slow path making a new scheduling decision and eventually a context switch instead of the fast path returning to the interrupt task.
a task sets an event on another task. The runtime will trigger a software interrupt to force a re-scheduling at its exit.
the running task voluntarily relinguish its current execution rights by calling task_wait_event() or a similar function. This will call the software interrupt similarly to the previous case.

On the re-scheduling path, if the highest-priority ready task is not matching the currently running one, it will perform a context-switch by saving all the processor registers on the current task stack, switch the stack pointer to the newly scheduled task, and restore the registers from the previously saved context from there.

hooks and deferred function

The lowest priority task (ie Task 1, aka TASK_ID_HOOKS) is reserved to execute repetitive actions and future actions deferred in time without blocking the current task or creating a dedicated task (whose stack memory allocation would be wasting precious RAM).

The HOOKS task has a list of deferred functions and their next deadline. Every time it is waken up, it runs through the list and calls the ones whose deadline is expired. Before going back to sleep, it arms a timer to the closest deadline. The deferred functions can be created using the DECLARED_DEFERRED() macro. Similarly the HOOK_SECOND and HOOK_TICK hooks are called periodically by the HOOKS task loop (the tick duration is platform-defined and shorter than the second).

Note: be specially careful about priority inversions when accessing resources protected by a mutex (e.g. a shared I2C controller) in a deferred function. Indeed being the lowest priority task, it might be de-scheduled for long time and starve higher priority tasks trying to access the resource given there is no priority boosting implemented for this case. Also be careful about long delays (> x 100us) in hook or deferred function handlers, since those will starve other hooks of execution time. It is better to implement a state machine where you set up a subsequent call to a deferred function than have a long delay in your handler.

watchdog

The system is always protected against misbehaving tasks and interrupt handlers by a hardware watchdog rebooting the CPU when it is not attended.

The watchdog is petted in the HOOKS task, typically by declaring a HOOK_TICK doing it as regular intervals. Given this is the lowest priority task, this guarantees that all tasks are getting some run time during the watchdog period.

Note: that's also why one should not sprinkle its code with watchdog_reload() to paper over long-running routine issues.

To help debugging bad sequences triggering watchdog reboots, most platforms implement a warning mechanism defined under CONFIG_WATCHDOG_HELP. It‘s a timer firing at the middle of the watchdog period if it hasn’t been petted by then, and dumping on the console the current state of the execution mainly to help finding a stuck task or handler. The normal execution is resumed though after this alert.

Startup

The startup sequence goes through the following steps:

the assembly entry routine clears the .bss (uninitialized data), copies the initialized data (and optionally the code if we are not executing from flash), sets a stack pointer.
we can jump to the main() C routine at this point.
then we go through the hardware pre-init (before we have all the clocks to run the peripherals normal) and init routines, in this rough order: memory protection if any, gpios in their default state, prepare the interrupt controller, set the clocks, then timers, enable interrupts, init the debug UART and the watchdog.
finally start tasks.

For the tasks startup, initially only the HOOKS task is marked as ready, so it is the first to start and can call all the HOOK_INIT handlers performing initializations before actually executing any real task code. Then all tasks are marked as ready, and the highest priority one is given the control.

During all the startup sequence until the control is given the first task, we are using a speciak stack called ‘system stack’ which will be later re-used as the interrupts and exception stack.

To prepare the first context switch, the code in task_pre_init() is stuffing all the tasks stacks with a fake saved context whose program counter is containing the task start address and the stack pointer is pointing to its reserved stack space.

locking and atomicity

The two main concurrency primitives are lightweight atomic variables and heavier mutexes.

The atomic variables are 32-bit integers (which can usually be loaded/stored atomically on the architecture we are supporting). The atomic.h headers include primitives to do atomically various bit and arithmetic operations using either load-linked/load-exclusive, store-conditional/store-exclusive or simple depending what is available.

The mutexes are actually statically allocated binary semaphores. In case of contention, they will make the waiting task sleep (removing its ready bit) and use the event mechanism to wake-up the other waiters on unlocking.

Note: the mutexes are NOT triggering any priority boosting to avoid the priority inversion phenomenon.

Given the runtime is running on single core CPU, spinlocks would be equivalent to masking interrupts with interrupt_disable() spinlocks, but it's strongly discouraged to avoid harming the real-time characterics of the runtime.

Time

time keeping

In the runtime, the time is accounted everywhere using a 64-bit microsecond count since the microcontroller cold boot.

Note: The runtime has no notion of wall-time/date, even though a few platform have an RTC inside the microcontroller.

These microsecond timestamps are implemented in the code using the timestamp_t type and the current timestamp is returned by the get_time() function.

The time-keeping is preferably implemented using a 32-bit hardware free running counter at 1Mhz plus a 32-bit word in memory keeping track of the high word of the 64-bit absolute time. This word is incremented by the 32-bit timer rollback interrupt.

Note: as a consequence of this implementation, when the 64-bit timestamp is read in interrupt context in an handler having a higher priority than the timer IRQ (which is somewhat rare), the high 32-bit word might be incoherent (off by one).

timer event

The runtime offers one (and only one) timer per task. All the task timers are multiplexed on a single hardware timer. (can be just a match interrupt on the free running counter mentioned in the previous paragraph) Every time a timer is armed or expired, the runtime finds the task timer having the closest deadline and programs it in the hardware to get an interrupt. At the same time, it sets the TASK_EVENT_TIMER event in all tasks whose timer deadline has expired. The next deadline is computed in interrupt context.

Note: given each task has a single timer which is also used to wake-up the task when task_wait_event() is called with a timeout, one needs to be careful when using directly the timer_arm() function because there is an eventuality that this timer is still running on the next task_wait_event() call, the call will fail due to the lack of available timer.

Memory

Single address space

There is no memory isolation between tasks (ie they all live in the same address space). Some architectures implement memory protection mechanism albeit only to differentiate executable area (eg .code) from writable area (eg .bss or .data) as there is a single privilege level for all execution contexts.

As all the memory is implicitely shared between the task, the inter-task communication can be done by simply writing the data structures in memory and using events to wake the other task (given we properly thought the concurrent accesses on thoses structures).

heap

The data structure should be statically allocated at compile time.

Note: there is no dynamic allocator available (e.g. malloc()), not due to impossibility to create one but to avoid the negative side effects of having one: ie poor/unpredictable real-time behavior and possible leaks leading to a long-tail of failures.

TODO: talk about shared memory
TODO: where/how we store panic memory and sysjump parameters.

stacks

Each task has its own stack, in addition there is a system stack used for startup and interrupts/exceptions.

Note 1: Each task stack is relatively small (e.g. 512 bytes), so one needs to be careful about stack usage when implementing features.

Note 2: At the same time, the total size of RAM used by stacks is a big chunk of the total RAM consumption, so their sizes need to be carefully tuned. (please refer to the debugging paragraph for additional input on this topic.

Firmware code organization and multiple copies

TODO: Details the classical RO / RW partitions and how we sysjump.

power management

TODO: talk about the idle task + WFI (note: interrupts are disabled!)
TODO: more about low power idle and the sleep-disable bitmap
TODO: adjusting the microsecond timer at wake-up

debugging

TODO: our main tool: serial console ... (but non-blocking / discard overflow, cflush DO/DONT)
TODO: else JTAG stop and go: careful with watchdog and timer
TODO: panics and software panics
TODO: stack size tuning and canarying
TODO: Address the rest of the comments from https://crrev.com/c/445941

[1]: bitmap: array of bits.