tree: 30dceb881df5dfdb7296aa2f130acc7486d0eb86 [path history] [tgz]
  1. README.md
  2. context.go
  3. datastore.go
  4. datastore_data.go
  5. datastore_index.go
  6. datastore_index_selection.go
  7. datastore_index_test.go
  8. datastore_query.go
  9. datastore_query_execution.go
  10. datastore_query_execution_test.go
  11. datastore_query_test.go
  12. datastore_test.go
  13. doc.go
  14. error_markers.go
  15. info.go
  16. info_test.go
  17. mail.go
  18. mail_static_lists.go
  19. mail_test.go
  20. memcache.go
  21. memcache_test.go
  22. memstore.go
  23. memstore_iter.go
  24. memstore_iter_test.go
  25. memstore_test.go
  26. memstore_tracing_utils.go
  27. module.go
  28. module_test.go
  29. race_test.go
  30. taskqueue.go
  31. taskqueue_data.go
  32. taskqueue_test.go
  33. testing_utils_test.go
  34. transaction.go
  35. user.go
  36. user_test.go
impl/memory/README.md

Datastore implementation internals

This document contains internal implementation details for this in-memory version of datastore. It‘s mostly helpful to understand what’s going on in this implementation, but it also can reveal some insight into how the real appengine datastore works (though note that the specific encodings are different).

Additionally note that this implementation cheats by moving some of the Key bytes into the table (collection) names (like the namespace, property name for the builtin indexes, etc.). The real implementation contains these bytes in the table row keys, I think.

Internal datastore key/value collection schema

The datastore implementation here uses several different tables (‘collections’) to manage state for the data. The schema for these tables is enumerated below to make the code a bit easier to reason about.

All datastore user objects (Keys, Properties, PropertyMaps, etc.) are serialized using go.chromium.org/gae/service/datastore/serialize, which in turn uses the primitives available in go.chromium.org/luci/common/cmpbin. The encodings are important to understanding why the schemas below sort correctly when compared only using bytes.Compare (aka memcmp). This doc will assume that you're familiar with those encodings, but will point out where we diverge from the stock encodings.

All encoded Property values used in memory store Keys (i.e. index rows) are serialized using the settings serialize.WithoutContext, and datastore.ShouldIndex.

Primary table

The primary table maps datastore keys to entities.

  • Name: "ents:" + namespace
  • Key: serialized datastore.Property containing the entity's datastore.Key
  • Value: serialized datastore.PropertyMap

This table also encodes values for the following special keys:

  • Every entity root (e.g. a Key with nil Parent()) with key K has:
    • Key("__entity_group__", 1, K) -> {"__version__": PTInt} A child entity with the kind __entity_group__ and an id of 1. The value has a single property __version__, which contains the version number of this entity group. This is used to detect transaction conflicts.
    • Key("__entity_group_ids__", 1, K) -> {"__version__": PTInt} A child entity with the kind __entity_group__ and an id of 1. The value has a single property __version__, which contains the last automatically allocated entity ID for entities within this entity group.
  • A root entity with the key Key("__entity_group_ids__",1) which contains the same __version__ property, and indicates the last automatically allocated entity ID for root entities.

Compound Index table

The next table keeps track of all the user-added ‘compound’ index descriptions (not the content for the indexes). There is a row in this table for each compound index that the user adds by calling ds.Raw().Testable().AddIndexes.

  • Name: "idx"
  • Key: normalized, serialized datastore.IndexDefinition with the SortBy slice in reverse order (i.e. datastore.IndexDefinition.PrepForIdxTable()).
  • Value: empty

The Key format here requires some special attention. Say you started with a compound IndexDefinition of:

IndexDefinition{
  Kind: "Foo",
  Ancestor: true,
  SortBy: []IndexColumn{
    {Property: "Something", Direction: DESCENDING},
    {Property: "Else", Direction: ASCENDING},
    {Property: "Cool", Direction: ASCENDING},
  }
}

After prepping it for the table, it would be equivalent to:

IndexDefinition{
  Kind: "Foo",
  Ancestor: true,
  SortBy: []IndexColumn{
    {Property: "__key__", Direction: ASCENDING},
    {Property: "Cool", Direction: ASCENDING},
    {Property: "Else", Direction: ASCENDING},
    {Property: "Something", Direction: DESCENDING},
  }
}

The reason for doing this will be covered in the Query Planning section, but it boils down to allowing the query planner to use this encoded table to intelligently scan for potentially useful compound indexes.

Index Tables

Every index (both builtin and compound) has one index table per namespace, which contains as rows every entry in the index, one per row.

  • Name: "idx:" + namespace + IndexDefinition.PrepForIdxTable()
  • Key: concatenated datastore.Property values, one per SortBy column in the IndexDefinition (the non-PrepForIdxTable version). If the SortBy column is DESCENDING, the serialized Property is inverted (e.g. XOR 0xFF).
  • Value: empty

If the IndexDefinition has Ancestor: true, then the very first column of the Key contains the partial Key for the entity. So if an entity has the datastore key /Some,1/Thing,2/Else,3, it would have the values /Some,1, /Some,1/Thing,2, and /Some,1/Thing,2/Else,3 as value in the ancestor column of indexes that it matches.

Builtin (automatic) indexes

The following indexes are automatically created for some entity with a key /Kind,*, for every property (with ShouldIndex values) named “Foo”:

IndexDefinition{ Kind: "Kind", Ancestor: false, SortBy: []IndexColumn{
  {Property: "__key__", Direction: ASCENDING},
}}
IndexDefinition{ Kind: "Kind", Ancestor: false, SortBy: []IndexColumn{
  {Property: "Foo", Direction: ASCENDING},
  {Property: "__key__", Direction: ASCENDING},
}}
IndexDefinition{ Kind: "Kind", Ancestor: false, SortBy: []IndexColumn{
  {Property: "Foo", Direction: DESCENDING},
  {Property: "__key__", Direction: ASCENDING},
}}

Index updates

(Note that this is a LARGE departure from how the production appengine datastore does this. This model only works because the implementation is not distributed, and not journaled. The real datastore does index updates in parallel and is generally pretty fancy compared to this).

Index updates are pretty straightforward. On a mutation to the primary entity table, we take the old entity value (remember that entity values are PropertyMaps), the new property value, create index entries for both, merge them, and apply the deltas to the affected index tables (i.e. entries that exist in the old entities, but not the new ones, are deleted. Entries that exist in the new entities, but not the old ones, are added).

Index generation works (given an slice of indexes []Idxs) by:

  • serializing all ShouldIndex Properties in the PropertyMap to get a map[name][]serializedProperty.
  • for each index idx
    • if idx's columns contain properties that are not in the map, skip idx
    • make a [][]serializedProperty, where each serializedProperty slice corresponds with the IndexColumn of idx.SortBy
      • duplicated values for multi-valued properties are skipped.
    • generate a []byte row which is the contatenation of one value from each []serializedProperty, permuting through all combinations. If the SortBy column is DESCENDING, make sure to invert (XOR 0xFF) the serializedProperty value!.
    • add that generated []byte row to the index's corresponding table.

Note that we choose to serialize all permutations of the saved entity. This is so that we can use repeated-column indexes to fill queries which use a subset of the columns. E.g. if we have the index duck,duck,duck,goose, we can theoretically use it to fill a query for duck=1,duck=2,goose>"canadian", by pasting 1 or 2 as the value for the 3rd duck column. This simplifies index selection at the expense of larger indexes. However, it means that if you have the entity:

duck = 1, 2, 3, 4
goose = "færøske"

It generates the following index entries:

duck=1,duck=1,duck=1,goose="færøske"
duck=1,duck=1,duck=2,goose="færøske"
duck=1,duck=1,duck=3,goose="færøske"
duck=1,duck=1,duck=4,goose="færøske"
duck=1,duck=2,duck=1,goose="færøske"
duck=1,duck=2,duck=2,goose="færøske"
duck=1,duck=2,duck=3,goose="færøske"
duck=1,duck=2,duck=4,goose="færøske"
duck=1,duck=3,duck=1,goose="færøske"
... a lot ...
duck=4,duck=4,duck=4,goose="færøske"

This is a very large number of index rows (i.e. an ‘exploding index’)!

An alternate design would be to only generate unique permutations of elements where the index has repeated columns of a single property. This only makes sense because it's illegal to have an equality and an inequality on the same property, under the current constraints of appengine (though not completely ridiculous in general, if inequality constraints meant the same thing as equality constraints. However it would lead to a multi-dimensional query, which can be quite slow and is very difficult to scale without application knowledge). If we do this, it also means that we need to SORT the equality filter values when generating the prefix (so that the least-valued equality constraint is first). If we did this, then the generated index rows for the above entity would be:

duck=1,duck=2,duck=3,goose="færøske"
duck=1,duck=2,duck=4,goose="færøske"
duck=1,duck=3,duck=4,goose="færøske"
duck=2,duck=3,duck=4,goose="færøske"

Which be a LOT more compact. It may be worth implementing this restriction later, simply for the memory savings when indexing multi-valued properties.

If this technique is used, there's also room to unambiguously index entities with repeated equivalent values. E.g. if duck=1,1,2,3,4 , then you could see a row in the index like:

duck=1,duck=1,duck=2,goose="færøske"

Which would allow you to query for “an entity which has duck values equal to 1, 1 and 2”. Currently such a query is not possible to execute (it would be equivalent to “an entity which has duck values equal to 1 and 2”).

Query planning

Now that we have all of our data tabulated, let's plan some queries. The high-level algorithm works like this:

  • Generate a suffix format from the user's query which looks like:
    • orders (including the inequality as the first order, if any)
    • projected fields which aren't explicitly referenced in the orders (we assume ASCENDING order for them), in the order that they were projected.
    • __key__ (implied ascending, unless the query‘s last sort order is for __key__, in which case it’s whatever order the user specified)
  • Reverse the order of this suffix format, and serialize it into an IndexDefinition, along with the query's Kind and Ancestor values. This does what PrepForIdxTable did when we added the Index in the first place.
  • Use this serialized reversed index to find compound indexes which might match by looking up rows in the “idx” table which begin with this serialized reversed index.
  • Generate every builtin index for the inequality + equality filter properties, and see if they match too.

An index is a potential match if its suffix exactly matches the suffix format, and it contains only sort orders which appear in the query (e.g. the index contains a column which doesn't appear as an equality or inequlity filter).

The index search continues until:

  • We find at least one matching index; AND
  • The combination of all matching indexes accounts for every equality filter at least once.

If we fail to find sufficient indexes to fulfill the query, we generate an index description that could be sufficient by concatenating all missing equality filters, in ascending order, followed by concatenating the suffix format that we generated for this query. We then suggest this new index to the user for them to add by returing an error containing the generated IndexDefinition. Note that this index is not REQUIRED for the user to add; they could choose to add bits and pieces of it, extend existing indexes in order to cover the missing columns, invert the direction of some of the equality columns, etc.

Recall that equality filters are expressed as map[propName][]serializedProperty. We'll refer to this mapping as the ‘constraint’ mapping below.

To actually come up with the final index selection, we sort all the matching indexes from greatest number of columns to least. We add the 0th index (the one with the greatest number of columns) unconditionally. We then keep adding indexes which contain one or more of the remaining constraints, until we have no more constraints to satisfy.

Adding an index entails determining which columns in that index correspond to equality columns, and which ones correspond to inequality/order/projection columns. Recall that the inequality/order/projection columns are all the same for all of the potential indices (i.e. they all share the same suffix format). We can use this to just iterate over the index‘s SortBy columns which we’ll use for equality filters. For each equality column, we remove a corresponding value from the constraints map. In the event that we run out of constraints for a given column, we simply pick an arbitrary value from the original equality filter mapping and use that. This is valid to do because, after all, they're equality filters.

Note that for compound indexes, the ancestor key counts as an equality filter, and if the compound index has Ancestor: true, then we implicitly put the ancestor as if it were the first SortBy column. For satisfying Ancestor queries with built-in indexes, see the next section.

Once we've got our list of constraints for this index, we concatenate them all together to get the prefix for this index. When iterating over this index, we only ever want to see index rows whose prefix exactly matches this. Unlike the suffix formt, the prefix is per-index (remember that ALL indexes in the query must have the same suffix format).

Finally, we set the ‘start’ and ‘end’ values for all chosen indexes to either the Start and End cursors, or the Greater-Than and Less-Than values for the inequality. The Cursors contain values for every suffix column, and the inequality only contains a value for the first suffix column. If both cursors and an inequality are specified, we take the smaller set of both (the combination which will return the fewest rows).

That's about it for index selection! See Query Execution for how we actually use the selected indexes to run a query.

Ancestor queries and Built-in indexes

You may have noticed that the built-in indexes can be used for Ancestor queries with equality filters, but they don't start with the magic Ancestor column!

There‘s a trick that you can do if the suffix format for the query is just __key__ though (e.g. the query only contains equality filters, and/or an inequality filter on __key__). You can serialize the datastore key that you’re planning to use for the Ancestor query, then chop off the termintating null byte from the encoding, and then use this as additional prefix bytes for this index. So if the builtin for the “Val” property has the column format of:

{Property: "Val"}, {Property: "__key__"}

And your query holds Val as an equality filter, you can serialize the ancestor key (say /Kind,1), and add those bytes to the prefix. So if you had an index row:

PTInt ++ 100 ++ PTKey ++ "Kind" ++ 1 ++ CONTINUE ++ "Child" ++ 2 ++ STOP

(where CONTINUE is the byte 0x01, and STOP is 0x00), you can form a prefix for the query Query("Kind").Ancestor(Key(Kind, 1)).Filter("Val =", 100) as:

PTInt ++ 100 ++ PTKey ++ "Kind" ++ 1

Omitting the STOP which normally terminates the Key encoding. Using this prefix will only return index rows which are /Kind,1 or its children.

“That's cool! Why not use this trick for compound indexes?”, I hear you ask :) Remember that this trick only works if the prefix before the __key__ is entirely composed of equality filters. Also recall that if you ONLY use equality filters and Ancestor (and possibly an inequality on __key__), then you can always satisfy the query from the built-in indexes! While you technically could do it with a compound index, there‘s not really a point to doing so. To remain faithful to the production datastore implementation, we don’t implement this trick for anything other than the built-in indexes.

Cursor format

Cursors work by containing values for each of the columns in the suffix, in the order and Direction specified by the suffix. In fact, cursors are just encoded versions of the []IndexColumn used for the ‘suffix format’, followed by the raw bytes of the suffix for that particular row (incremented by 1 bit).

This means that technically you can port cursors between any queries which share precisely the same suffix format, regardless of other query options, even if the index planner ends up choosing different indexes to use from the first query to the second. No state is maintained in the service implementation for cursors.

I suspect that this is a more liberal version of cursors than how the production appengine implements them, but I haven't verified one way or the other.

Query execution

Last but not least, we need to actually execute the query. After figuring out which indexes to use with what prefixes and start/end values, we essentially have a list of index subsets, all sorted the same way. To pull the values out, we start by iterating the first index in the list, grabbing its suffix value, and trying to iterate from that suffix in the second, third, fourth, etc index.

If any index iterates past that suffix, we start back at the 0th index with that suffix, and continue to try to find a matching row. Doing this will end up skipping large portions of all of the indexes in the list. This is the algorithm known as “zigzag merge join”, and you can find talks on it from some of the appengine folks. It has very good algorithmic running time and tends to scale with the number of full matches, rather than the size of all of the indexes involved.

A hit occurs when all of the iterators have precisely the same suffix. This hit suffix is then decoded using the suffix format information. The very last column of the suffix will always be the datastore key. The suffix is then used to call back to the user, according to the query type:

  • keys-only queries just directly return the Key
  • projection queries return the projected fields from the decoded suffix. Remember how we added all the projections after the orders? This is why. The projected values are pulled directly from the index, instead of going to the main entity table.
  • normal queries pull the decoded Key from the “ents” table, and return that entity to the user.