commit | 2e030562385e8327ee7d59c1c9754f9820e741c8 | [log] [tgz] |
---|---|---|
author | Steve Yen <steve.yen@gmail.com> | Mon Jan 07 01:35:56 2013 |
committer | Steve Yen <steve.yen@gmail.com> | Mon Jan 07 01:35:56 2013 |
tree | 8adeb784b7ebb253fc1a97220b1010ef8cced3a4 | |
parent | f951e28c37f93053603d31598f8e7e9b026f8ee4 [diff] |
Mention SQLite VFS-like feature in README.
gkvlite is a simple, ordered, key-value persistence library for Go
gkvlite is a library that provides a simple key-value persistence implementation, inspired by SQLite and CouchDB/Couchstore.
gkvlite has the following features...
gkvlite is single-threaded. Users are encouraged to use Go channels or their own locking to serialize access to a Store.
Open source - MIT licensed.
import ( "os" "github.com/steveyen/gkvlite" ) f, err := os.Create("/tmp/test.gkvlite") s, err := gkvlite.NewStore(f) c := s.SetCollection("cars", nil) // You can also retrieve the collection, where c == cc. cc := s.GetCollection("cars") // Insert values. c.Set([]byte("tesla"), []byte("$$$")) c.Set([]byte("mercedes"), []byte("$$")) c.Set([]byte("bmw"), []byte("$")) // Replace values. c.Set([]byte("tesla"), []byte("$$$$")) // Retrieve values. mercedesPrice, err := c.Get([]byte("mercedes")) // One of the most priceless cars is not in the collection. thisIsNil, err := c.Get([]byte("the-apollo-15-moon-buggy")) // Iterate through items. c.VisitItemsAscend([]byte("ford"), func(i *gkvlite.Item) bool { // This visitor callback will be invoked with every item // with key "ford" and onwards, in key-sorted order. // So: "mercedes", "tesla" are visited, in that ascending order, // but not "bmw". // If we want to stop visiting, return false; // otherwise return true to keep visiting. return true }) // Let's get a snapshot. snap := s.Snapshot() snapCars := snap.GetCollection("cars") // The snapshot won't see modifications against the original Store. c.Delete([]byte("mercedes")) mercedesIsNil, err := c.Get([]byte("mercedes")) mercedesPriceFromSnapshot, err := snapCars.Get([]bytes("mercedes")) // Persist all the changes to disk. s.Flush() f.Sync() // Some applications may also want to fsync the underlying file. // Now, other file readers can see the data, too. f2, err := os.Open("/tmp/test.gkvlite") s2, err := gkvlite.NewStore(f2) c2 := s.GetCollection("cars") bmwPrice := c2.Get([]byte("bmw"))
Because all collections are persisted atomically when you flush a store to disk, you can implement consistent secondary indexes by maintaining additional collections per store. For example, a “users” collection can hold a JSON document per user, keyed by userId. Another “userEmails” collection can be used like a secondary index, keyed by “emailAddress:userId”, with empty values (e.g., []byte{}).
The fundamental data structure is an immutable treap (tree + heap). When used with random heap item priorities, treaps have probabilistic balanced tree behavior with the usual O(log N) performance bounds expected of balanced binary trees.
The persistence design is append-only, using ideas from Apache CouchDB and Couchstore / Couchbase, providing a simple approach to reaching ACID properties in the face of process or machine crashes. On re-opening a file, the implementation scans the file backwards looking for the last good root record and logically “truncates” the file at that point. New mutations are appended from that last good root location. This follows the MVCC (multi-version concurrency control) and “the log is the database” approach of CouchDB / Couchstore / Couchbase.
TRADEOFF: the append-only persistence design means file sizes will grow until there's a compaction. To get a compacted file, use CopyTo() with a high “flushEvery” argument.
TRADEOFF: the current simple design means you can't store the bytes of a gkvlite database file as a value inside of another gkvlite database.
The immutable, copy-on-write treap plus the append-only persistence design allows for fast and efficient MVCC snapshots.
TRADEOFF: the immutable, copy-on-write design means more memory garbage may be created than other designs, meaning more work for the garbage collector (GC).
TODO: Performance: consider splitting item storage from node storage, so we're not mixing metadata and data in same qcache pages. Need to measure how much win this could be in cases like compaction. Tradeoff as this could mean no more single file simplicity.
TODO: Allow snapshots to be concurrent, accessible by separate goroutines.
TODO: Allow mutability for less garbage, perhaps switching to immutable only when there are in-use snapshots. This probably won't be a win if there are always active snapshots.
TODO: Keep stats on misses, disk fetches & writes, tree depth, etc.
TODO: Provide O(1) collection copying.
TODO: Provide O(log N) collection spliting.
TODO: Provide O(1) MidItem() or TopItem() implementation, so that users can split collections at decent points.
TODO: Provide item priority shifting during CopyTo().
See more TODO's throughout codebase / grep.