blob: 7188d5771f34488f0eb5c64832bdd7ad152dbbe4 [file] [log] [blame] [view]
# High-level overview of Save-Page-As code
This document describes code under `//content/browser/downloads`
restricting the scope only to code handling Save-Page-As functionality
(i.e. leaving out other downloads-related code).
This document focuses on high-level overview and aspects of the code that
span multiple compilation units (hoping that individual compilation units
are described by their code comments or by their code structure).
## Classes overview
* SavePackage class
* coordinates overall save-page-as request
* created and owned by `WebContents`
(ref-counted today, but it is unnecessary - see https://crbug.com/596953)
* UI-thread object
* SaveFileCreateInfo::SaveFileSource enum
* classifies `SaveItem` and `SaveFile` processing into 2 flavours:
* `SAVE_FILE_FROM_NET` (see `SaveFileResourceHandler`)
* `SAVE_FILE_FROM_DOM` (see "Complete HTML" section below)
* SaveItem class
* tracks saving a single file
* created and owned by `SavePackage`
* UI-thread object
* SaveFileManager class
* coordinates between the download sequence and the UI thread
* Gets requests from `SavePackage` and communicates results back to
`SavePackage` on the UI thread.
* Shephards data (received from the network OR from DOM) into
the download sequence - via `SaveFileManager::UpdateSaveProgress`
* created and owned by `BrowserMainLoop`
(ref-counted today, but it is unnecessary - see https://crbug.com/596953)
* The global instance can be retrieved by the Get method.
* SaveFile class
* tracks saving a single file
* created and owned by `SaveFileManager`
* download sequence object
* SaveFileCreateInfo POD struct
* short-lived object holding data passed to callbacks handling start of
saving a file.
* MHTMLGenerationManager class
* singleton that manages progress of jobs responsible for saving individual
MHTML files (represented by `MHTMLGenerationManager::Job`).
## Overview of the processing flow
Save-Page-As flow starts with `WebContents::OnSavePage`.
The flow is different depending on the save format chosen by the user
(each flow is described in a separate section below).
### Complete HTML
Very high-level flow of saving a page as "Complete HTML":
* Step 1: `SavePackage` asks all frames for "savable resources"
and creates `SaveItem` for each of files that need to be saved
* Step 2: `SavePackage` first processes `SAVE_FILE_FROM_NET`
`SaveItem`s and asks `SaveFileManager` to save
them.
* Step 3: `SavePackage` handles remaining `SAVE_FILE_FROM_DOM` `SaveItem`s and
asks each frame to serialize its DOM/HTML (each frame gets from
`SavePackage` a map covering local paths that need to be referenced by
the frame). Responses from frames get forwarded to `SaveFileManager`
to be written to disk.
### MHTML
Very high-level flow of saving a page as MHTML:
* Step 1: `WebContents::GenerateMHTML` is called by either `SavePackage` (for
Save-Page-As UI) or Extensions (via `chrome.pageCapture` extensions
API) or by an embedder of `WebContents` (since this is public API of
//content).
* Step 2: `MHTMLGenerationManager` creates a new instance of
`MHTMLGenerationManager::Job` that coordinates generation of
the MHTML file by sequentially (one-at-a-time) asking each
frame to write its portion of MHTML to a file handle. Other
classes (i.e. `SavePackage` and/or `SaveFileManager`) are not
used at this step at all.
* Step 3: When done `MHTMLGenerationManager` destroys
`MHTMLGenerationManager::Job` instance and calls a completion
callback which in case of Save-Page-As will end up in
`SavePackage::OnMHTMLGenerated`.
Note: MHTML format is by default disabled in Save-Page-As UI on Windows, MacOS
and Linux (it is the default on Chrome OS), but for testing this can be easily
changed using `--save-page-as-mhtml` command line switch.
### HTML Only
Very high-level flow of saving a page as "HTML Only":
* `SavePackage` creates only a single `SaveItem` (always `SAVE_FILE_FROM_NET`)
and asks `SaveFileManager` to process it
(as in the Complete HTML individual SaveItem handling above.).
## Other relevant code
Pointers to related code outside of `//content/browser/download`:
* End-to-end tests:
* `//chrome/browser/downloads/save_page_browsertest.cc`
* `//chrome/test/data/save_page/...`
* Other tests:
* `//content/browser/downloads/*test*.cc`
* `//content/renderer/dom_serializer_browsertest.cc` - single process... :-/
* Elsewhere in `//content`:
* `//content/renderer/savable_resources...`
* Blink:
* `//third_party/blink/public/web/web_frame_serializer...`
* `//third_party/blink/renderere/core/frame/web_frame_serializer_impl...`
(used for Complete HTML today; should use `FrameSerializer` instead in
the long-term - see https://crbug.com/328354).
* `//third_party/blink/renderer/core/frame/frame_serializer...`
(used for MHTML today)
* `//third_party/blink/renderer/platform/mhtml/mhtml_archive...`