tree: 515fb580735cb1df34fe9dc4c36aa2655c18d372 [path history] [tgz]
  4. disjoint_range_lock_manager.h
  7. leveldb_scope.h
  8. leveldb_scope_deletion_mode.h
  11. leveldb_scopes.h
  13. leveldb_scopes_coding.h
  16. leveldb_scopes_factory.h
  18. leveldb_scopes_tasks.h
  21. leveldb_scopes_test_utils.h
  24. leveldb_state.h
  26. scope_lock.h
  28. scope_lock_range.h
  30. scopes_lock_manager.h
  32. scopes_metadata.proto

LevelDB Scopes Implementation Design

This document is the implementation plan and design for LevelDB Scopes. It serves to both document the current state and the future plans for implementing LevelDB Scopes.

See LevelDB Scopes general design doc here.

Status / Current State

  • Lock manager - Done
  • Scopes - Done
  • Blob / large value support - Future

Things not covered in this README:

  • How iterators & scopes interact to ensure the iterator is operating on fresh data



A scope encompasses a group of changes that can be reverted. It is basically synonymous with a transaction, and would be used for readwrite and versionchange transactions in LevelDB. The scope has a defined list of key ranges where the changes can occur. It operates by keeping an undo task log, which is either discarded on commit, or replayed on revert. It also keeps a cleanup task log for specialty operations to happen during log cleanup.


A task is something that is executed after a scope has been committed. There are two types of tasks (undo/revert tasks and cleanup tasks), and they are stored in the separate logs.

Undo Task

Undo tasks are used to revert a scope (that is, undo the changes that were written by the scope). An undo task is a single operation that is used to undo one or more of these changes. See the LevelDBScopesUndoTask in scopes_metadata.proto. Undo tasks are only executed when a scope is reverted.

Cleanup Tasks

Cleanup tasks are optionally executed when a scopes is cleaned up. They consist of deferred deletions (range deletions that the user doesn't need to happen right away).


There are two task logs in a scope, the undo task log and the cleanup task log. They each have a unique key prefix so they can be iterated easily.

Scope Number (scope#)

Each scope has an identifier unique to the backing store that is auto-incrementing. During recovery, it is set to the minimum unused scope (see more in the recovery section).

Sequence Number (seq#)

Every undo log entry has a unique sequence number within the scope. These should start at {max int} (using Fixed64) and decrement.

Commit Point

This signifies that a scope has been committed. The commit point for a scope is the absence of locks in the scope's metadata.

Key Range

Represents a range of LevelDB keys. Every operation has a key or a key range associated with it. Sometimes key ranges are expected to be re-used or modified again by subsequent operations, and sometimes key ranges are known to be never used again.


To prevent reading uncommitted data, IndexedDB ‘locks’ objects stores when there are (readwrite) transactions operating on them.

New LevelDB Table Data

PurposeKeyFormatValue (protobufs)
Scope Metadatascope-{scope#}prefix1{scope#}LevelDBScopesScopeMetadata
Undo log operationslog-undo-{scope#}-{seq#}prefix20{scope#}{seq#}LevelDBScopesUndoTask
Cleanup log operationslog-cleanup-{scope#}-{seq#}prefix21{scope#}{seq#}LevelDBScopesCleanupTask

Key Format

  • Allow the ‘user’ of the scopes system to choose the key prefix (prefix).
  • Scope # is a varint
  • Sequence # is a big-endian Fixed64 (to support bytewise sorting)

See leveldb_scopes_coding.h for the key encoding implementation.

Value Format

All values are protobufs, see scopes_metadata.proto.

Operation Flows

Acquiring Locks

IndexedDB Sequence

  • Input - lock ranges & types. The lock ranges should correspond to the key ranges that will be read from or written to. The type signifies if the lock range should be requested as exclusive or shared.
  • If any of the key ranges are currently locked, then wait until they are all free. This is the IDB transaction sequencing logic.
  • Output - a list of locks, one per requested lock range.

Creating & Using a Scope

IndexedDB Sequence

  • Input - a list of locks for the scope. See Acquiring Locks above
  • Create the scope
    • Use the next available scope # (and increment the next scope #)
  • Enable operations on the scope
  • While the total size of changes is less than X Mb, buffer them in a write batch.
    • If the scope is committed before reaching X Mb, then just commit the scope without generating an undo log.
  • If the size of the buffer write batch is > X Mb, or the user needs to ‘read’ in the range that was just written to, then the changes must be written to disk.
    • For every operation, the scope must read the database and append the undo operation to the undo task log.
  • Deferred operations are written do the cleanup task log.
  • Output - a Scope

Committing a Scope

IndexedDB Sequence

  • Input - a Scope
  • The scope is marked as ‘committed’ by writing the LevelDBScopesScopeMetadata (at scope-{scope#}) to remove the lock key ranges. This change is flushed to disk.
  • The Cleanup Sequence is signalled for cleaning up the committed scope #.
  • Output - Scope is committed, and lock is released.

Reverting a Scope

Revert Sequence

  • Input - revert a given scope number.
  • Opens a cursor to log-undo-{scope#}-0
    • Cursor should be at the first undo entry
    • If the scope‘s commit point exists (in the scope’s metadata, if the locks are empty) then error - reverting a committed scope is an invalid state in this design
  • Iterate through undo tasks, committing operations.
  • Update the Scope's LevelDBScopesScopeMetadata entry (at scope-{scope#}) by cleaning the locks and setting ignore_cleanup_tasks = true, and flush this change to disk.
  • The Cleanup Sequence is signalled for cleaning up the reverted scope #.
  • Output - Scope was successfully reverted, and lock released.

Startup & Recovery

IndexedDB Sequence

  • Input - the database
  • Reads metadata (fail for unknown version)
  • Opens a cursor to scopes- & iterates to read in all scopes
    • If the scope metadata (LevelDBScopesScopeMetadata at scope-{scope#}) has locks, then those are considered locked
    • The maximum scope # is found and used to determine the next scope number (used in scope creation)
  • Requests locks from the lock system
  • Signals IndexedDB that it can start accepting operations (IndexedDB can now start running)
  • For every LevelDBScopesScopeMetadata that has no locks
  • For every LevelDBScopesScopeMetadata that has locks
    • Kick off a Reverting a Scope task. This state indicates a shutdown while a revert was in progress.
  • Output - nothing, done

Undo Log Cleanup

Cleanup & Revert Sequence

  • Input - scope #
  • Open the scope-{scope#} metadata
    • If the commit point is not there (if the locks are not empty), then error.
  • If the ‘ignore_cleanup_tasks’ value is false, then
    • Iterate through the log-cleanup-{scope#}- cleanup tasks and execute them (range deletes).
  • Delete both the undo tasks log and the cleanup task log
  • Delete the scope-{scope#} metadata
  • Output - nothing

Lock Manager

The scopes feature needs a fairly generic lock manager. This lock manager needs two levels, because versionchange transactions need to lock a whole database, where other transactions will lock smaller ranges within the level one range.

API Methods


Accepts a list of lock request and a callback which is given a moveable-only lock object, which releases its lock on destruction. Each lock request consists of a lock type (shared or exclusive), a level number, and a key range. The levels are totally independent from the perspective of the lock manager.

To keep the implementation simple, all required locks for a given scope or operation need to be acquired all at once.

Internal Data Structure

The lock manager basically holds ranges, and needs to know if a new range intersects any old ranges. A good data structure for this is an Interval Tree.

Undo Operations

Undo operations are generated when the undo tasks are required. Note that for under a certain scope ‘size’ (where the buffer write batch is small enough), no undo operations are generated.


  • Put(key, value)
  • Delete(key)
  • DeleteRange(key_range)

See LevelDBScopesUndoTask in the scopes_metadata.proto


Normal Cases

Note - all cases where “old_value” is used requires reading the current value from the database.

Put(key, value)

  • Delete(key) if an old value doesn't exist,
  • Put(key, old_value) if an old value does exist, or
  • Nothing if the old value and new value matche

Add(key, value)

  • Delete(key), trusting the client that there wasn't a value here before.


  • Put(key, old_value) if the old_value exists, or
  • Nothing if no old_value exists.


  • Put(key, old_value) for every entry in that key range. This requires iterating the database using the key_range to find all entries.

Special Case - Empty Unique Key Range

Creation - key range is empty initially

If the values being created are in a key range that is initially empty (we trust the API caller here - there is no verification), and if the key range will never be reused if it is reverted, then the undo log can consist of a single:


Examples of this are creating new databases or indexes in a versionchange transaction. The scopes system can check to make sure it's range is empty before doing operations in debug builds.

Deletion - key range will never be used again.

This is done by creating a cleanup task (see LevelDBScopesCleanupTask in scopes_metadata.proto). When the scope is cleaned up, these operations are executed. This allows a user to defer the deletion to a later time and a different thread.