High-level overview of Save-Page-As code

This document describes code under //content/browser/downloads restricting the scope only to code handling Save-Page-As functionality (i.e. leaving out other downloads-related code). This document focuses on high-level overview and aspects of the code that span multiple compilation units (hoping that individual compilation units are described by their code comments or by their code structure).

Classes overview

  • SavePackage class

    • coordinates overall save-page-as request
    • created and owned by WebContents (ref-counted today, but it is unnecessary - see https://crbug.com/596953)
    • UI-thread object
  • SaveFileCreateInfo::SaveFileSource enum

    • classifies SaveItem and SaveFile processing into 2 flavours:
      • SAVE_FILE_FROM_NET (see SaveFileResourceHandler)
      • SAVE_FILE_FROM_DOM (see “Complete HTML” section below)
  • SaveItem class

    • tracks saving a single file
    • created and owned by SavePackage
    • UI-thread object
  • SaveFileManager class

    • coordinates between the download sequence and the UI thread
      • Gets requests from SavePackage and communicates results back to SavePackage on the UI thread.
      • Shephards data (received from the network OR from DOM) into the download sequence - via SaveFileManager::UpdateSaveProgress
    • created and owned by BrowserMainLoop (ref-counted today, but it is unnecessary - see https://crbug.com/596953)
    • The global instance can be retrieved by the Get method.
  • SaveFile class

    • tracks saving a single file
    • created and owned by SaveFileManager
    • download sequence object
  • SaveFileResourceHandler class

    • tracks network downloads + forwards their status into SaveFileManager (onto download sequence)
    • created by ResourceDispatcherHostImpl::BeginSaveFile
    • IO-thread object
  • SaveFileCreateInfo POD struct

    • short-lived object holding data passed to callbacks handling start of saving a file.
  • MHTMLGenerationManager class

    • singleton that manages progress of jobs responsible for saving individual MHTML files (represented by MHTMLGenerationManager::Job).

Overview of the processing flow

Save-Page-As flow starts with WebContents::OnSavePage. The flow is different depending on the save format chosen by the user (each flow is described in a separate section below).

Complete HTML

Very high-level flow of saving a page as “Complete HTML”:

  • Step 1: SavePackage asks all frames for “savable resources” and creates SaveItem for each of files that need to be saved

  • Step 2: SavePackage first processes SAVE_FILE_FROM_NET SaveItems and asks SaveFileManager to save them.

  • Step 3: SavePackage handles remaining SAVE_FILE_FROM_DOM SaveItems and asks each frame to serialize its DOM/HTML (each frame gets from SavePackage a map covering local paths that need to be referenced by the frame). Responses from frames get forwarded to SaveFileManager to be written to disk.

MHTML

Very high-level flow of saving a page as MHTML:

  • Step 1: WebContents::GenerateMHTML is called by either SavePackage (for Save-Page-As UI) or Extensions (via chrome.pageCapture extensions API) or by an embedder of WebContents (since this is public API of //content).

  • Step 2: MHTMLGenerationManager creates a new instance of MHTMLGenerationManager::Job that coordinates generation of the MHTML file by sequentially (one-at-a-time) asking each frame to write its portion of MHTML to a file handle. Other classes (i.e. SavePackage and/or SaveFileManager) are not used at this step at all.

  • Step 3: When done MHTMLGenerationManager destroys MHTMLGenerationManager::Job instance and calls a completion callback which in case of Save-Page-As will end up in SavePackage::OnMHTMLGenerated.

Note: MHTML format is by default disabled in Save-Page-As UI on Windows, MacOS and Linux (it is the default on Chrome OS), but for testing this can be easily changed using --save-page-as-mhtml command line switch.

HTML Only

Very high-level flow of saving a page as “HTML Only”:

  • SavePackage creates only a single SaveItem (always SAVE_FILE_FROM_NET) and asks SaveFileManager to process it (as in the Complete HTML individual SaveItem handling above.).

Other relevant code

Pointers to related code outside of //content/browser/download:

  • End-to-end tests:

    • //chrome/browser/downloads/save_page_browsertest.cc
    • //chrome/test/data/save_page/...
  • Other tests:

    • //content/browser/downloads/*test*.cc
    • //content/renderer/dom_serializer_browsertest.cc - single process... :-/
  • Elsewhere in //content:

    • //content/renderer/savable_resources...
  • Blink:

    • //third_party/blink/public/web/web_frame_serializer...
    • //third_party/blink/renderere/core/frame/web_frame_serializer_impl... (used for Complete HTML today; should use FrameSerializer instead in the long-term - see https://crbug.com/328354).
    • //third_party/blink/renderer/core/frame/frame_serializer... (used for MHTML today)
    • //third_party/blink/renderer/platform/mhtml/mhtml_archive...