Contributors: awhalley, creis, dcheng, jschuh, jyasskin, lukasza, mkwst, nasko, palmer, tsepez. Patches and corrections welcome!
Last Updated: 27 April 2021
In light of Spectre/Meltdown, we needed to re-think our threat model and defenses for Chrome renderer processes. Spectre is a new class of hardware side-channel attack that affects (among many other targets) web browsers. This document describes the impact of these side-channel attacks and our approach to mitigating them.
The upshot of the latest developments is that the folks working on this from the V8 side are increasingly convinced that there is no viable alternative to Site Isolation as a systematic mitigation to SSCAs [speculative side-channel attacks]. In this new mental model, we have to assume that user code can reliably gain access to all data within a renderer process through speculation. This means that we definitely need some sort of ‘privileged/PII data isolation’ guarantees as well, for example ensuring that password and credit card info are not speculatively loaded into a renderer process without user consent. — Daniel Clifford, in private email
In fact, any software that both (a) runs (native or interpreted) code from more than one source; and (b) attempts to create a security boundary inside a single address space, is potentially affected. For example, software that processes document formats with scripting capabilities, and which loads multiple documents from different sources into the same process, may need to take defense measures similar to those described here.
The implications of this are far-reaching:
Additionally, attackers may develop ways to read memory from other userland processes (e.g. a renderer reading the browser’s memory). We do not include those attacks in our threat model. The hardware, microcode, and OS must re-establish the process boundary and the userland/kernel boundary. If the underlying platform does not enforce those boundaries, there’s nothing an application (like a web browser) can do.
Chrome’s GPU process handles data from all origins in a single process. It is not currently practical to isolate different sites or origins into their own GPU processes. (At a minimum, there are time and space efficiency concerns; we are still trying to get Site Isolation shipped and are actively resolving issues there.)
However, WebGL exposed high-resolution clocks that are useful for exploiting Spectre. It was possible to temporarily remove some of them, and to coarsen another, with minimal breakage of web compatibility, and so that has been done. However, we expect to reinstate the clocks on platforms where Site Isolation is on by default. (See Attenuating Clocks, below.)
We do not currently believe that, short of full code execution, an attacker can control speculative execution inside the GPU process to the extent necessary to exploit Spectre-like vulnerabilities. As always, evidence to the contrary is welcome!
It is generally safest to assume that an arbitrary read-write primitive in the renderer process will be available to the attacker. The richness of the attack/API surface available in a rendering engine makes this plausible. However, this capability is not a freebie the way Spectre is — the attacker must actually find 1 or more bugs that enable the RW primitive.
Site Isolation (SI) gets us closer to a place where origins face in-process attacks only from other origins in their
SiteInstance, and not from any arbitrary origin. (Origins that include script from hostile origins will still be vulnerable, of course.) However, there may be hostile origins in the same process.
Strict origin isolation is not yet being worked on; we must first ship SI on by default. It is an open question whether strict origin isolation will turn out to be feasible.
These are presented in no particular order, with the exception that Site Isolation is currently the best and most direct solution.
The first order solution is to simply get cross-origin data out of the Spectre attacker’s address space. Site Isolation (SI) more closely aligns the web security model (the same-origin policy) with the underlying platform’s security model (separate address spaces and privilege reduction).
SI still has some bugs that need to be ironed out before we can turn it on by default, both on Desktop and on Android. As of May 2018 we believe we can turn it on by default, on Desktop (but not Android yet) in M67 or M68.
On iOS, where Chrome is a WKWebView embedder, we must rely on the mitigations that Apple is developing.
All major browsers are working on some form of site isolation, and we are collaborating publicly on a way for sites to opt in to isolation, to potentially make implementing and deploying site isolation easier. (Chrome Desktop’s Site Isolation will be on by default, regardless, in the M67 – M68 timeframe.)
Site Isolation depends on cross-origin read blocking (CORB; formerly known as cross-site document blocking or XSDB) to prevent a malicious website from pulling in sensitive cross-origin data. Otherwise, an attacker could use markup like
<img src="http://example.com/secret.json"> to get cross-origin data within reach of Spectre or other OOB-read exploits.
As of M65, CORB protects:
Content-Typeheader. We recommend using
Today, CORB doesn’t protect:
Site operators should read and follow, where applicable, our guidance for maximizing CORB and other defensive features. (There is an open bug to add a CORB evaluator to Lighthouse.)
A site is defined as the effective TLD + 1 DNS label (“eTLD+1”) and the URL scheme. This is a broader category than the origin, which is the scheme, entire hostname, and port number. All of these origins belong to the same site:
Therefore, even once we have shipped SI on all platforms and have shaken out all the bugs, renderers will still not be perfect compartments for origins. So we will still need to take a multi-faceted approach to UXSS, memory corruption, and OOB-read attacks like Spectre.
Note that we are looking into the possibility of disabling assignments to
document.domain (via origin-wide application of Feature Policy or the like). This would open the possibility that we could isolate at the origin level.
With SI, Chrome tends to spawn more renderer processes, which tends to lead to greater overall memory usage (conservative estimates seem to be about 10%). On many Android devices, it is more than 10%, and this additional cost can be prohibitive. However, each renderer is smaller and shorter-lived under Site Isolation.
Chrome uses different PPAPI processes per origin, for secure origins. (We tracked this as Issue 809614.)
Click To Play greatly reduces the risk that Flash-borne Spectre (and other) exploits will be effective at scale. Even so, we might want to consider teaching CORB about Flash flavour of CORS.
WebViews run in their own process as of Android O, so the hosting application gets protection from malicious web content. However, all origins are run in the same
Before copying sensitive data into a renderer process, we should somehow get the person’s affirmative knowledge and consent. This has implications for all types of form auto-filling: normal form data, passwords, payment instruments, and any others. It seems like we are currently in a pretty good place on that front, with one exception: usernames and passwords get auto-filled into the shadow DOM, and then revealed to the real DOM on a (potentially forged?) user gesture. These credentials are origin-bound, however.
The Credential Management API still poses a risk, exposing usernames/passwords without a gesture for the subset of users who've accepted the auto-sign-in mechanism.
What should count as a secure gesture is a gesture on relevant, well-labeled browser chrome, handled in the browser process. Tracking the gesture in the renderer, that can be forged by web content that compromises the renderer, does not suffice.
We must enable a good user experience with autofill, payments, and passwords, while also not ending up with a browser that leaks these super-important classes of data. (A good password management experience is itself a key security goal, after all.)
Of the known attacks, we believe it’s currently only feasible to try to mitigate variant 1 with code changes in C++. We will need the toolchain and/or platform support to mitigate other types of speculation attacks. We could experiment with inserting
LFENCE instructions or using Retpoline before calling into Blink.
We don’t consider this approach to be a true solution; it’s only a mitigation. We think we can eliminate many of the most obvious gadgets and can buy some time for better defense mechanisms to be developed and deployed (primarily, Site Isolation).
It is very likely impossible to eliminate all gadgets. As with return-oriented programming, a large body of object code (like a Chrome renderer) is likely to contain so many gadgets that the attacker has a good probability to craft a working exploit. At some point, we may decide that we can’t stay ahead of attack research, and will stop trying to eliminate gadgets.
Additionally, the mitigations typically come with a performance cost, and we may ultimately roll some or all of them back. Some potential mitigations are so expensive that it is impractical to deploy them.
Exploiting Spectre requires a clock. We don’t believe it’s possible to eliminate, coarsen, or jitter all explicit and implicit clocks in the Open Web Platform (OWP) in a way that is sufficient to fully resolve Spectre. (Merely enumerating all the clocks is difficult.) Surprisingly coarse clocks are still useful for exploitation.
While it sometimes makes sense to deprecate, remove, coarsen, or jitter clocks, we don’t expect that we can get much long-term defensive value from doing so, for several reasons:
In particular, clock jitter is of extremely limited utility when defending against side channel attacks.
Many useful and legitimate web applications need access to high-precision clocks, and we want the OWP to be able to support them.
We want to support a powerful web, though we recognize that some kinds of APIs a powerful web requires are more likely than others to facilitate exploitation, either because they can be used as very high-resolution timers (
SharedArrayBuffer), or because they provide powerful introspection capabilities (
performance.measureMemory). We can mitigate the risks these APIs pose by exposing them only in contexts that have opted-into a set of restrictions that limits access to cross-origin data.
Cross-Origin-Opener-Policy (COOP) and
Cross-Origin-Embedder-Policy (COEP) seem promising. Together, these mechanisms change web-facing behavior to enable origin-level process isolation, and ensure that cross-origin subresources will flow into a given process only if they opt-into that usage. These properties give us a higher degree of confidence that otherwise dangerous APIs can be exposed safely, as any attack they enable would only gain access to same-origin data, or data that explicitly asserted that it accepted the risk of exposure.
Both COOP and COEP are enabled as of M83, and we plan to require both before enabling APIs like
SharedArrayBuffer. Other browsers seem likely to do the same.
Note: This section explores ideas but we are not currently planning on implementing anything along these lines.
Looking beyond developer opt-ins such as COOP and COEP, we might be able to find other ways of limiting the scope of APIs to reduce their risk. For example, a third-party
iframe that is trying to exploit Spectre is very different than a WebAssembly game, in the top-level frame, that the person is actively playing (and issuing many gestures to). We could programmatically detect engagement and establish policies for when certain APIs and features will be available to web content. (See e.g. Feature Policy.)
Engagement could be defined in a variety of complementary ways:
Additionally, we have considered the possibility of prompting the user for permission to run certain exploit-enabling APIs, although there are problems: warning fatigue, and the difficulty of communicating something accurate yet comprehensible to people.
For the reasons above, we now assume any active code can read any data in the same address space. The plan going forward must be to keep sensitive cross-origin data out of address spaces that run untrustworthy code, rather than relying on in-process checks.