commit	20b5eb06ebc29c30a5ed460b658fe48d1afc119e	[log] [tgz]
author	Benoit Jacob <benoitjacob@google.com>	Wed Jan 20 02:07:54 2021
committer	Benoit Jacob <benoitjacob@google.com>	Wed Jan 20 03:37:35 2021
tree	192a231991a43336b9049b38dd651b35d4157935
parent	4ed621615d2f0a54410976cdaaae22779eaec664 [diff]

Add a tracing framework (really just logging). This isn't a performance tracing framework (unlike the old ruy tracing). This is about understanding what happens inside a ruy::Mul with a view toward documenting how ruy works. Added a 'parametrized_example' to help play with this tracing on any flavor of ruy::Mul call. This also serves as a more elaborate example of how to call ruy::Mul, and as a single binary instantiating several different instantiations of the ruy::Mul template, which is useful for measuring binary size and showing a breakdown of ruy symbols in a document. A few code changes beyond tracing slipped in: - Improved logic in determining the traversal order in MakeBlockMap: In rectangular cases, since we first do the top-level rectangularness subdivision with linear traversal anyway, the traversal order only applies within each subdivision past that, so it should be based on sizes already divided by rectangularness. In practice this nudges 1000x400x2000 from kFractalHilbert to kFractalU on Pixel4, without making an observable perf difference in that case. - Removed the old RUY_BLOCK_MAP_DEBUG logging code: superseded. Kept only a minimal hook to force a block_size_log2 choice. - Wrote new comments on BlockMap internals. - Fixed Ctx::set_runtime_enabled_paths to behave as documented: passing Path::kNone reverts to the default behavior (auto detect). - Exposed Context::set_runtime_enabled_paths. - Renamed UseSimpleLoop -> GetUseSimpleLoop (easier to read trace). PiperOrigin-RevId: 352695092

tree: 192a231991a43336b9049b38dd651b35d4157935

README.md

The ruy matrix multiplication library

This is not an officially supported Google product.

ruy is a matrix multiplication library. Its focus is to cover the matrix multiplication needs of neural network inference engines. Its initial user has been TensorFlow Lite, where it is used by default on the ARM CPU architecture.

ruy supports both floating-point and 8bit-integer-quantized matrices.

Efficiency

ruy is designed to achieve high performance not just on very large sizes, as is the focus of many established libraries, but on whatever are the actual sizes and shapes of matrices most critical in current TensorFlow Lite applications. This often means quite small sizes, e.g. 100x100 or even 50x50, and all sorts of rectangular shapes. It's not as fast as completely specialized code for each shape, but it aims to offer a good compromise of speed across all shapes and a small binary size.

Documentation

Some documentation will eventually be available in the doc/ directory, see doc/README.md.