Elaboration of the blob storage system in Chrome.
Please see the FileAPI Spec for the full specification for Blobs, or Mozilla's Blob documentation for a description of how Blobs are used in the Web Platform in general. For the purposes of this document, the important aspects of blobs are:
In Chrome, after blob creation the actual blob ‘data’ gets transported to and lives in the browser process. The renderer just holds a reference - a mojom BlobPtr (and for now a string UUID) - to the blob, which it can use to read the blob or pass it to other processes.
If the in-memory space for blobs is getting full, or a new blob is too large to be in-memory, then the blob system uses the disk. This can either be paging old blobs to disk, or saving the new too-large blob straight to disk.
Blob reading goes through the mojom Blob interface, where the renderer or browser calls the
ReadRange methods to read the blob through a data pipe. This is implemented in the browser process in the
General Chrome terminology:
Blob system terminology:
We calculate the storage limits here.
In-Memory Storage Limit
total_physical_memory / 5
total_physical_memory / 100
Disk Storage Limit
disk_size / 2
6 * disk_size / 100
disk_size / 10
Note: Chrome OS's disk is part of the user partition, which is separate from the system partition.
Minimum Disk Availability
We limit our disk limit to accomidate a minimum disk availability. The equation we use is:
min_disk_availability = in_memory_limit * 2
|Device||Ram||In-Memory Limit||Disk||Disk Limit||Min Disk Availability|
|Cast||512 MB||102 MB||0||0||0|
|Android Minimal||512 MB||5 MB||8 GB||491 MB||10 MB|
|Android Fat||2 GB||20 MB||32 GB||1.9 GB||40 MB|
|CrOS||2 GB||409 MB||8 GB||4 GB||0.8 GB|
|Desktop 32||3 GB||614 MB||500 GB||50 GB||1.2 GB|
|Desktop 64||4 GB||2 GB||500 GB||50 GB||4 GB|
Similarily if a URL is created for a blob, this will keep the blob data around until the URL is revoked (and the blob object is dereferenced). However, the URL is automatically revoked when the browser context that created it is destroyed.
The primary API to interact with the blob system is through its mojo interface. This is how the renderer process interacts with the blob systems and creates and transports blobs, but also how other subsystems in the browser process interact with the blob system, for example to read blobs they received.
New blobs are created through the BlobRegistry mojo interface. In blink you can get a reference to this interface via
blink::BlobDataHandle::GetBlobRegistry(). This interface has two methods to create a new blob. The
Register method takes a blob description in the form of an array of
DataElements, while the
RegisterFromStream method creates a blob by reading data from a mojo
Register will call its callback as soon as possible after the request has been received, at which point the uuid is valid and known to the blob system. It will then asynchronously request the data and actually create the blob. On the other hand the
RegisterFromStream method won't call its callback until all the data for the blob has been received and the blob has been entirely completed.
To read the data for a blob, the
Blob mojom interface provides
ReadSideData methods. These methods will wait until the blob has finished building before they start reading data, and if for whatever reason the blob failed to build or reading data failed, will report back an error through the (optional)
Within blink creating blobs is done through the
BlobDataHandle classes. The
BlobData class can be seen as a builder for an array of mojom
DataElements. While doing so it also tries to consolidate all adjacent memory blob items into one. This is done since blobs are often constructed with arrays with single bytes. The implementation tries to avoid doing any copying or allocating of new memory buffers. Instead it facilitates the transformation between the ‘consolidated’ blob items and the underlying bytes items. This way we don't waste any memory.
After the blob has been ‘consolidated’ and its data has been assembled in a
BlobData object, it is passed to the
blink::BlobDataHandle constructor. This then passes the consolidated data to the mojo
DataElementByte elements in the blob description will have an associated
BytesProvider, as implemented by the
blink::BlobBytesProvider class. This class is owned by the mojo message pipe it is bound to, and is what the browser uses to request data for the blob when quota for it becomes available. Depending on the transport strategy chosen by the browser one of the
Request* methods on this interface will be called (or if the blob goes out of scope before the data has been requested, the
BytesProvider pipe is simply dropped, destroying the
BlobBytesProvider instance and the data it owned.
BlobBytesProvider instances also try to keep the renderer alive while we are sending blobs, as if the renderer is closed then we would lose any pending blob data. It does this by calling
In blink, in addition to going through the mojo
Blob interface as exposed
through blink::Blob::GetBlobDataHandle, you can also use
FileReaderLoader as an abstraction around the mojo interface. This class for example can convert the resulting bytes to a
ArrayBuffer, and generally just wraps the mojo
DataPipe functionality in an easier to use interface.
Generally even in the browser process it should be preferred to go through the mojo
Blob interface to interact with blobs. This results in a cleaner separation between the blob system and the rest of chrome. However in some cases it might still be needed to directly interact with the guts of the blob system, so for now it is at least possible to interact with the blob system more directly.
But keep in mind that everything in this section is really for legacy code only. New code should strongly prefer to use the mojo interfaces described above.
Blob interaction in C++ should go through the
BlobStorageContext. Blobs are built using a
BlobDataBuilder to populate the data and then calling
::BuildBlob. This returns a
BlobDataHandle, which manages reading, lifetime, and metadata access for the new blob.
If you have known data that is not available yet, you can still create the blob reference, but see the documentation in
BlobDataBuilder::AppendFuture* or ::Populate* methods on the builder, the callback usage on
BlobStorageContext::NotifyTransportComplete to facilitate this construction.
All blob information should come from the
BlobDataHandle returned on construction. This handle is cheap to copy. Once all instances of handles for a blob are destructed, the blob is destroyed.
BlobDataHandle::RunOnConstructionComplete will notify you when the blob is constructed or broken (construction failed due to not enough space, filesystem error, etc).
BlobReader class is for reading blobs, and is accessible off of the
BlobDataHandle at any time.
The browser side is a little more complicated than the renderer side. We are thinking about:
We follow this general flow for constructing a blob on the browser side:
BlobUnderConstructioninstance to start asking for blob data given the earlier decision of strategy.
BlobTransportStrategypopulates the browser-side blob data item.
Note: The transportation sections (steps 1, 2, 3) of this process are described (without accounting for blob dependencies) with diagrams and details in this presentation.
BlobRegistryImpl) is in charge of the actual construction of a blob and manages the transportation of the data from the renderer to the browser. When the initial description of the blob is sent to the browser, the BlobUnderConstruction asks the BlobMemoryController which strategy (IPC, Shared Memory, or File) it should use to transport the file. Based on this strategy it creates a
BlobTransportStrategy instance. That instance will then translate the memory items sent from the renderer into a browser represetation to facilitate the transportation. See this slide, which illustrates how the browser might segment or split up the renderer's memory into transportable chunks.
Once the transport host decides its strategy, it will create its own transport state for the blob, including a
BlobDataBuilder using the transport's data segment representation. Then it will tell the
BlobStorageContext that it is ready to build the blob.
BlobStorageContext tells the transport host that it is ready to transport the blob data, the
BlobTransportStrategy requests all of the data from the renderer, populates the data in the
BlobDataBuilder, and then signals the storage context that it is done.
BlobStorageContext is the hub of the blob storage system. It is responsible for creating & managing all the state of constructing blobs, as well as all blob handle generation and general blob status access.
BlobDataBuilder is given to the context, it will do the following:
BlobMemoryControllerfor file or memory quota for the transportation if necessary.
BlobMemoryControllerfor memory quota for any copies necessary for blob slicing.
When all of the following conditions are met:
BlobRegistrytells us it has transported all the data (or we don't need to transport data),
BlobMemoryManagerapproves our memory quota for slice copies (or we don't need slice copies), and
The blob can finish constructing, where any pending blob slice copies are performed, and we set the status of the blob.
The BlobStatus tracks the construction procedure (specifically the transport process), and the copy memory quota and dependent blob process is encompassed in
Once a blob is finished constructing, the status is set to
DONE or any of the
During construction, slices are created for dependent blobs using the given offset and size of the reference. This slice consists of the relevant blob items, and metadata about possible copies from either end. If blob items can entirely be used by the new blob, then we just share the item between the. But if there is a ‘slice’ of the first or last item, then BlobDataBuilder will create a new bytes item for the new blob, and store necessary copy data for later.
While a blob is build in
BlobDataBuilder a ‘flat’ representation of the new blob is created, replacing all blob references with the actual elements those blobs are made up off, possibly slicing them in the process. It also stores any copy data from the slices.
BlobMemoryController is responsable for: