puffin: Cache puff stream in puffpatch

Currently without caching the puff stream, the OS updates can be very slow. For
example, a back of envelope calculation can show that if we don't cache the puff
stream, we may need to read and puff the entire deflate stream (about 500 MB)
160 times which can be very slow. And examination shows that updates can take
more than one hour to finish on a veyron_minnie. It may not be reasonable to
keep memory usage low but keep the CPU usage high for extended periods of
time. The other rationale behind caching the puff stream is that if we only use
bsdiff, then we have to keep large patches in memory when applying the
patch. But since puffin reduced the patch size by a large factor (so to keep a
much smaller patch in memory), then it won't be as bad to cache some memory for
faster patching. It is basically a trade-off between CPU time and memory. We may
end up using the same amount of memory (which in practice is not going to
happen, we will use much less), but the payload patch size is much smaller.

This patch modifies PuffinStream to cache only the Puff Buffers up to a maximum
passed total memory size as a parameter to puffpatch. This acts as an LRU, the
least recently used puff buffers are ejected when the cache is full.

BUG=chromium:775072
TEST=unittests passed; brillo_update_payload verfiy passes; cros flash with delta updates passes;

Change-Id: I2a390eb35eb5449577cb5e2b1e857bcdca093f68
Reviewed-on: https://chromium-review.googlesource.com/722452
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>
6 files changed
tree: bcfec7362dd7fe97822d285854cd9957430b54b8
  1. src/
  2. libpuffdiff.pc
  3. libpuffpatch.pc
  4. LICENSE
  5. Makefile
  6. OWNERS
  7. PRESUBMIT.cfg
  8. puffin.gyp
  9. README.md
README.md

Puffin

Source code for Puffin: A utility for deterministic DEFLATE recompression.

TODO(ahassani): Describe the directory structure and how-tos.

Glossary

  • Alphabet A value that occurs in the input stream. It can be either a literal:[0..255], and end of block sign [256], a length[257..285], or a distance [0..29].

  • Huffman code A variable length code representing the Huffman encoded of an alphabet. Huffman codes can be created uniquely using Huffman code length array.

  • Huffman code array An array which an array index identifies a Huffman code and the array element in that index represents the corresponding alphabet. Throughout the code, Huffman code arrays are identified by vectors with postfix hcodes_.

  • Huffman reverse code array An array which an array index identifies an alphabet and the array element in that index contains the Huffman code of the alphabet. Throughout the code, The Huffman reverse code arrays are identified by vectors with postfix rcodes_.

  • Huffman code length The number of bits in a Huffman code.

  • Huffman code length array An array of Huffman code lengths with the array index as the alphabet. Throughout the code, Huffman code length arrays are identified by vectors with postfix lens_.