docs/codegen-tour.md - external/github.com/googlefonts/fontations - Git at Google

 # Code generation in oxidize

 This document is an attempt to describe in reasonable detail the general
 architecture of the [`read-fonts`][] and [`write-fonts`][] crates, focusing
 specifically on parts that are auto-generated.

 > ***note***:
 >
 > at various points in this document I will make use of blockquotes (like this
 one) to highlight
 > particular aspects of the design that may be interesting, confusing, or
 require refinement.

 ## contents

 - [overview](#overview)
 - [`read-fonts`](#read-fonts)
     - [the code we don't generate](#what-we-dont-generate)
         - [scalars and `BigEndian<T>`](#scalars-detour)
         - [`FontData`](#font-data)
     - [tables and records](#tables-and-records)
     - [tables](#read-tables)
         - [`FontRead` and `FontReadWithArgs`](#font-read-args)
         - [versioned tables](#versioned-tables)
         - [multi-format tables](#multi-format-tables)
         - [getters](#table-getters)
         - [offset getters](#offset-getters)
         - [offset data](#offset-data)
     - [records](#records)
         - [zerocopy](#zerocopy)
         - [copy-on-read](#copy-on-read)
         - [offsets in records](#offsets-in-records)
     - [arrays](#arrays)
     - [flags and enums](#flags-and-enums)
     - [traversal](#traversal)
 - [`write-fonts`](#write-fonts)
     - [tables and records](#write-tables-records)
     - [fields and `#[compile(..)]`](#table-fields)
     - [offsets](#write-offsets)
     - [parsing and `FromTableRef`](#write-parsing)
     - [validation](#validation)
     - [compilation and `FontWrite`](#compilation)

 ## <a id="overview"></a> overview

 These two crates can be thought of as siblings, and they both follow the same
 basic high-level design pattern: they contain a set of generated types, mapping
 *as closely as possible* to the types in the [OpenType spec][opentype],
 alongside hand-written code that uses and is used by those types.

 The [`read-fonts`][] crate is focused on efficient read access and parsing, and
 the [`write-fonts`][] crate is focused on compilation. The two crates contain a
 parallel `tables` module, with a nearly identical set of type definitions: for
 instance, [both crates][read-name-record] [contain a][write-name-record] `tables::name::NameRecord` type.

 We will examine each of these crates separately.

 ## <a id="read-fonts"></a> `read-fonts`

 ### <a id="what-we-dont-generate"></a> The code we *don't* generate

 Although this writeup is focused specifically on the code we generate, that code
 is closely entwined with code that we hand-write. This is a general pattern: we
 manually implement some set of types and traits, which are then used in our
 generated code.

 All of the types which are used in codegen are reexported in the
 [`codegen_prelude`][read-prelude] module; this is glob imported at the top of
 every generated file.

 We will describe various of these manually implemented types as we encounter
 them throughout this document, but before we get started it is worth touching on
 two cases: `FontData` and scalars / `BigEndian<T>`.

 #### <a id="scalars-detour"></a> Scalars and `BigEndian<T>`

 Before we dive into the specifics of the tables and records in `read-fonts`, I
 want to talk briefly about how we represent and handle the [basic data types](ot-data-types)
 of which records and tables are composed.

 In the font file, these values are all represented in [big-endian][endianness]
 byte order. When we access them, we will need to convert them to the native
 endianness of the host platform. We also need to have some set of types which
 exactly match the memory layout (including byte ordering) of the underlying font
 file; this is necessary for us to take advance of zerocopy semantics (see the
 [zerocopy section](#zerocopy) below.)

 In addition to endianness, it is also sometimes the case that types will be
 represented by a different number of bytes in the raw file than when are
 manipulating them natively; for instance `Offset24` is represented as three
 bytes on disk, but represented as a `u32` in native code.

 This leads us to a situation where we require two distinct types for each
 scalar: a native type that we will use in our program logic, and a
 'raw' type that will represent the bytes in the font file (as well as some
 mechanism to convert between them.)

 There are various ways we could express this in Rust. The most straightforward
 would be to just have two parallel sets of types: for instance alongside the
 `F2Dot14` type, we might have `RawF2Dot14`, or `F2Dot14Be`. Another option might
 be to have types that are generic over byte-order, such that you end up with
 types like `U16<BE>` and `U16<LE>`.

 I have taken a slightly different approach, which tries to be more ergonomic and
 intuitive to the user, at the cost of having a slightly more complicated
 implementation.

 ##### `BigEndian<T>` and `Scalar`

 Our design has two basic components: a trait, `Scalar` and a type
 `BigEndian<T>`, which look like this:

 ```rust
 /// A trait for font scalars.
 pub trait Scalar {
     /// The raw byte representation of this type.
     type Raw: Copy + AsRef<[u8]>;

     /// Create an instance of this type from raw big-endian bytes
     fn from_raw(raw: Self::Raw) -> Self;
     /// Encode this type as raw big-endian bytes
     fn to_raw(self) -> Self::Raw;
 }

 /// A wrapper around raw big-endian bytes for some type.
 #[derive(Clone, Copy, PartialEq, Eq)]
 #[repr(transparent)]
 pub struct BigEndian<T: Scalar>(T::Raw);
 ```

 The `Scalar` trait handles conversion of a type to and from its raw representation
 (a fixed-size byte array) and the `BigEndian` type is way of representing some
 fixed number of bytes, and associating them with a concrete type; it has `get`
 and `set` methods which read or write the underlying bytes, relying on the
 `from_raw` and `to_raw` methods on Scalar.

 This is a compromise. The `Raw` associated type is expected to always be a
 fixed-size byte array; say `[u8; 2]` for a `u16`, or `[u8; 3]` for an `Offset24`.

 Ideally, the scalar trait would look like,

 ```rust
 trait Scalar {
     const RAW_SIZE: usize;
     fn from_raw(bytes: [u8; Self::RAW_SIZE]) -> Self;
     fn to_raw(self) -> [u8; Self::RAW_SIZE];
 }
 ```

 But this is not currently something we can express with Rust's generics,
 although [it should become possible eventually](generic-const-exprs).

 In any case: what this lets us do is avoid having two separate sets of types for
 the 'raw' and 'native' cases; we have a single wrapper type that we use anytime
 we want to indicate that a type is in its raw form. This has the additional
 advantage that we can define new types in our generated code that implement
 `Scalar`, and then those types can automatically work with `BigEndian`; this is
 useful for things like custom enums and flags that are defined at various points
 in the spec.

 ##### `FixedSize`

 In addition to these two traits, we also have a [`FixedSize`][] trait, which is
 implemented for all scalar types (and later, for structs consisting only of
 scalar types). This trait consists of a single associated constant:

 ```rust
 /// A trait for types that have a known, constant size.
 pub trait FixedSize: Sized {
     /// The raw (encoded) size of this type, in bytes.
     const RAW_BYTE_LEN: usize;
 }
 ```

 This is implemented for both all the scalar values, as well as all their
 `BigEndian` equivalents; and in both cases, the value of `RAW_BYTE_LEN` is the
 size of the raw (big-endian) representation.

 #### <a id="font-data"></a> `FontData`

 The [`FontData`][] struct is at the core of all of our font reading code. It
 represents a pointer to raw bytes, augmented with a bunch of methods for safely
 reading scalar values from that raw data.

 It looks approximately like this:

 ```rust
 pub struct FontData<'a>(&'a [u8]);
 ```

 And can be thought of as a specialized interface on top of a Rust byte
 slice.This type is used extensively in the API, and will show up frequently in
 subsequent code snippets.

 ### <a id="tables-and-records"></a> tables and records

 In the [`read-fonts`][] crate, we make a distinction between *table* objects and
 *record* objects, and we generate different code for each.

 The distinction between a *table* and a *record* is blurry, but the
 specification offers two "general criteria":

 > - Tables are referenced by offsets. If a table contains an offset to a
 > sub-structure, the offset is normally from the start of that table.
 > - Records occur sequentially within a parent structure, either within a
 > sequence of table fields or within an array of records of a given type. If a
 > record contains an offset to a sub-structure, that structure is logically a
 > subtable of the record’s parent table and the offset is normally from the start
 > of the parent table.
 >
 > ([The OpenType font file][otff])

 ### <a id="read-tables"></a> tables

 Conceptually, a table object is additional type information laid over a
 `FontData` object (a wrapper around a rust byte slice (`&[u8]`), essentially
 a pointer plus a length). It provides typed access to that tables fields.

 Conceptually, this looks like:

 ```rust
 pub struct MyTable<'a>(FontData<'a>);

 impl MyTable<'_> {
     /// Read the table's first field
     pub fn format(&self) -> u16 {
         self.0.read_at(0)
     }
 }
 ```

 In practice, what we generate is slightly different: instead of
 generating a struct for the table itself (and wrapping the data directly)
 we generate a 'marker' struct, which defines the type of the table, and then we
 combine it with the data via a `TableRef` struct.

 The `TableRef` struct looks like this:

 ```rust
 /// Typed access to raw table data.
 pub struct TableRef<'a, T> {
     shape: T,
     data: FontData<'a>,
 }
 ```

 And the definition of the table above, using a marker type, would look something
 like:

 ```rust
 /// A marker type
 pub struct MyTableMarker;

 /// Instead of generating a struct for each table, we define a type alias
 pub type MyTable<'a> = TableRef<'a, MyTableMarker>;

 impl MyTableMarker {
     fn format_byte_range(&self) -> Range<usize> {
         0..u16::RAW_BYTE_LEN
     }
 }

 impl MyTable<'_> {
     fn format(&self) -> u16 {
         let range = self.shape.format_byte_range();
         self.data.read_at(range.start)
     }
 }
 ```

 To the user these two API are equivalent (you have a type `MyTable`, on which
 you can call methods to read fields) but the 'marker' pattern potentially allows
 for us to do some fancy things in the future (involving various cases where we
 want to store a type separate from a lifetime).

 > ***note:***
 >
 > there are also downsides of the marker pattern; in particular, currently
 > the code we generate will only compile if it is part of the `read-fonts` crate
 > itself. This isn't a major limitation, except that it makes certain kinds of
 > testing harder to do, since we can't do fancy things like generate code that
 > treated as a separate compilation unit, e.g. for use with the [`trybuild`][]
 crate.

 #### <a id="font-read-args"></a> `FontRead` & `FontReadWithArgs`

 After generating the type definitions, the next thing we generate is an
 implementation of one of [`FontRead`][] or [`FontReadWithArgs`][]. The
 `FontRead` trait is used if a table is self-describing: that is, if the data in
 the table can be fully interpreted without any external information. In some
 cases, however, this is not possible. A simple example is the [`loca` table][loca-spec]:
 the data for this table cannot be interpreted correctly without knowing the
 number of glyphs in the font (stored in the `maxp` table) as well as whether the
 format is long or short, which is stored in the `head` table.

 > ***note***:
 >
 > The `FontRead` trait is similar the 'sanitize' methods in HarfBuzz: that is to
 > say that it does not parse the data, but only ensures that it is well-formed.
 > Unlike 'sanitize', however, `FontRead` is not recursive (it does not chase
 > offsets) and it does not in anyway modify the structure; it merely returns an
 > error if the structure is malformed.
 >
 > We will likely want to change the name of this method at some point, to
 > clarify the fact that it is not exactly *reading*.

 In either case, the generated table code is very similar.

 For the purpose of illustration, let's imagine we have a table that looks like
 this:

 ```rust
 table Foob {
     #[version]
     version: BigEndian<u16>,
     some_val: BigEndian<u32>,
     other_val: BigEndian<u32>,
     flags_count: BigEndian<u16>,
     #[count($flags_count)]
     flags: [BigEndian<u16>],
     #[since_version(1)]
     versioned_value: BigEndian<u32>,
 }
 ```

 This generates the following code:

 ```rust
 impl<'a> FontRead<'a> for Foob<'a> {
     fn read(data: FontData<'a>) -> Result<Self, ReadError> {
         let mut cursor = data.cursor();
         let version: u16 = cursor.read()?;
         cursor.advance::<u32>(); // some_val
         cursor.advance::<u32>(); // other_val
         let flags_count: u16 = cursor.read()?;
         let flags_byte_len = flags_count as usize * u16::RAW_BYTE_LEN;
         cursor.advance_by(flags_byte_len); // flags
         let versioned_value_byte_start = version
             .compatible(1)
             .then(|| cursor.position())
             .transpose()?;
         version.compatible(1).then(|| cursor.advance::<u32>());
         cursor.finish(FoobMarker {
             flags_byte_len,
             versioned_value_byte_start,
         })
     }
 }
 ```

 Let's walk through this. Firstly, the whole process is based around a 'cursor'
 type, which is simply a way of advancing through the input data on a
 field-by-field basis. Where we need to know the value of a field in order to
 validate subsequent fields, we read that field into a local variable.
 Additionally, values that we have to compute based on other fields are currently
 cached in the marker struct, although this is an implementation detail and may
 change. Let's walk through this code, field by field:

 - **version**: as this is marked with the `#[version]` attribute, we read the
   value into a local variable, since we will need to know the version when
   reading any versioned fields.
 - **some_val**: this is a simple value, and we do not need to know what it is,
   only that it exists. We `advance` the cursor by the appropriate number of
   bytes.
 - **other_val**: ditto. The compiler will be able to combine these two
   `advances` into a single operation.
 - **flags_count**: This value is referenced in the `#[count]` attribute on the
   following field, and so we bind it to a local variable.
 - **flags**: the `#[count]` attribute indicates that the length of this array is
   stored in the `flags_count` field. We determine the array length by
   multiplying that value by the size of the array member, and we advance the
   cursor by that number of bytes.
 - **versioned_value**: this field is only available if the `version` field is >=
   to `1` (this is specified via the `#[since_version]` attribute). We record the
   current cursor position (as an `Option`, which will be `Some` only if the
   version is compatible) and then we advance the cursor by the size of the
   field's type.

 Finally, having finished with each field, we call the `finish` method on the
 cursor: this performs a final bounds check, and instantiates the table with the
 provided marker.

 > ***note***:
 >
 > The `FontRead` trait is currently doing a bit of a double duty: in the case of
 > tables, it is expected to perform a very minimal validation (essentially just
 > bounds checking) but in the case of records it serves as an actual parse
 > function, returning a concrete instance of the type. It is possible that these
 > two roles should be separated?

 #### <a id="versioned-tables"></a> versioned tables

 As hinted at above, for tables that are versioned (which have a version field,
 and which have more than one known version value we do not generate a distinct
 table per version; instead we generate a single table. For fields that are
 available on all versions of a table, we generate getters as usual. For fields
 that are only available on certain versions, we generate getters that return an
 `Option` type, which will be `Some` in the case where that field is present for
 the current version.

 > ***note***:
 >
 > The way we determine availability is crude: it is based on the
 > [`Compatible`][] trait, which is implemented for the various types which are
 > used to represent versions. For types that represent their version as a
 > (major, minor) pair, we consider a version to be compatible with another version
 > if it has the same major number and a greater-than-or-equal minor number. For
 > versions that are a single value, we consider them compatible if they are
 > greater-than-or-equal. If this ends up being inadequate, we can revisit it.

 #### <a id="multi-format-tables"></a> multi-format tables

 Some tables have multiple possible 'formats'. The various formats of a table
 will all share an initial 'format' field (generally a `u16`) which identifies
 the format, but the rest of their fields may differ.

 For tables like this, we generate an enum that contains a variant for each of
 the possible formats. For this to work, each different table format
 must declare its table field in the input file:

 ```rust
 table MyTableFormat1 {
     #[format = 1]
     table_format: BigEndian<u16>,
     my_val: BigEndian<u16>,
 }
 ```

 The `#[format = 1]` attribute on the field of `MyTableFormat1` is an important
 detail, here. This causes us to implement a private trait, `Format`, like this:

 ```rust
 impl Format<u16> for MyTableFormat1 {
     const FORMAT: u16 = 1;
 }
 ```

 You then also declare that you want to create an enum, providing an explicit
 format, and listing which tables should be included:

 ```rust
 format u16[@N] MyTable {
     Format1(MyTableFormat1),
     Format2(MyTableFormat2),
 }
 ```

 the 'format' keyword is followed by the type that represents the format, and
 optionally a position  at which to read it (indicated by the '@' token, followed
 by an unsigned integer literal.) In the vast majority of cases this can be
 omitted, and the format will be read from the first position in the table.

 We will then generate an enum, as well as a `FontRead` implementation: this
 implementation will read the format off of the front of the input data, and then
 instantiate the appropriate variant based on that value. The generated
 implementation looks like this:

 ```rust
 impl<'a> FontRead<'a> for MyTable<'a> {
     fn read(data: FontData<'a>) -> Result<Self, ReadError> {
         let format: u16 = data.read_at(0)?;
         match format {
             MyTableFormat1::FORMAT => Ok(Self::Format1(FontRead::read(data)?)),
             MyTableFormat2::FORMAT => Ok(Self::Format2(FontRead::read(data)?)),
             other => Err(ReadError::InvalidFormat(other.into())),
         }
     }
 }
 ```

 This trait-based approach has a few nice properties: we ensure that
 we don't accidentally have formats declared with different types, and we also
 ensure that if we accidentally provide the sae format value for two different
 tables, we will at least see a compiler warning.


 #### <a id="table-getters"></a> getters

 For each field in the table, we generate a getter method. The exact behaviour of
 this method depends on the type of the field. If the field is a *scalar* (that
 is, if it is a single raw value, such as an offset, a `u16`, or a [`Tag`][])
 then this getter reads the raw bytes, and then returns a value of the
 appropriate type, handling big-endian conversion. If it is an array, then the
 getter returns an array type that wraps the underlying bytes, which will be read
 lazily on access.

 Alongside the getters we also generate, for each field, a
 method on the marker struct that returns the start and end positions of each
 field. These are defined in terms of one another: the end position of field `N`
 is the start of field `N+1`. These fields are defined in a process that echoes
 how the table is validated, where we build up the offsets as we advance through
 the fields. This means we avoid the case where we are calculating offsets from
 the start of the table, which should lead to more auditable code.

 #### <a id="offset-getters"></a> offset getters

 For fields that are either offsets or arrays of offsets, we generate *two*
 getters: a raw getter that returns the raw offset, and an 'offset getter' that
 resolves the offset into the concrete type that is referenced. If the field is
 an array of offsets, this returns an *iterator* of resolved offsets. (This is a
 detail that I would like to change in the future, replacing it with some sort of
 lazy array-like type.)

 For instance, if we have a table which contains the following:

 ```rust
 table CoverageContainer {
     coverage_offset: BigEndian<Offset16<CoverageTable>>,
     class_count: BigEndian<u16>,
     #[count($class_count)]
     class_def_offsets: [BigEndian<Offset16<ClassDef>>],
 }
 ```

 we will generate the following methods:

 ```rust
 impl<'a> ClassContainer<'a> {
     pub fn coverage_offset(&self) -> Offset16 { .. }
     pub fn coverage(&self) -> Result<CoverageTable<'a>, ReadError> { .. }
     pub fn class_def_offsets(&self) -> &[BigEndian<Offset16>] { .. }
     pub fn class_defs(&self) ->
         impl Iterator<Item = Result<ClassDef<'a>, ReadError>> + 'a { .. }
 ```

 #####  custom offset getters, #[read_offset_with]

 Every offset field requires an offset getter, but the getters generated by
 default only work with types that implement `FontRead`. For types that require
 args, you can use the `#[read_offset_with($arg1, $arg1)]` attribute to indicate
 that this offset needs to be resolved with `FontReadWithArgs`, which will be
 passed the arguments specified; these can be either the names of fields on the
 containing table, or the name of arguments passed into this table through its
 *own* `FontReadWithArgs` impl.

 In special cases, you can also manually implement this getter by using the
 `#[offset_getter(method)]` attribute, where `method` will be a method you
 implement on the type that handles resolving the offset via whatever custom
 logic is required.

 ##### <a id="offset-data"></a> offset data

 How do we keep track of the data from which an offset is resolved? A happy
 byproduct of how we represent tables makes this generally trivial: because a
 table is just a wrapper around a chunk of bytes, and since most offsets are
 resolved relative to the start of the containing table, we can resolve offsets
 from directly from our inner data.

 In tricky cases, where offsets are not relative to the start of the table, we
 there is a custom `#[offset_data]` attribute, where the user can specify a
 method that should be called to get the data against which a given offset should
 be resolved.

 ### <a id="records"></a> records

 Records are components of tables. With a few exceptions, they almost always
 exist in arrays; that is, a table will contain an array with some number of
 records.

 When generating code for records, we can take one of two paths. If the record
 has a fixed size, which is known at compile time, we generate a "zerocopy"
 struct; and if not, we generate a "copy on read" struct. I will describe these
 separately.

 #### <a id="zerocopy"></a> zerocopy

 When a record has a known, constant size, we declare a struct which has fields
 which exactly match the raw memory layout of the record.

 As an example, the root *TableDirectory* of an OpenType font contains a
 *TableRecord* type, defined like this:

 | Type       | Name     | Description                         |
 | ---------- | -------- | ----------------------------------- |
 | `Tag`      | tableTag | Table identifier.                   |
 | `uint32`   | checksum | Checksum for this table.            |
 | `Offset32` | offset   | Offset from beginning of font file. |
 | `uint32`   | length   | Length of this table.               |

 For this type, we generate the following struct:

 ```rust
 #[repr(C)]
 #[repr(packed)]
 pub struct TableRecord {
     /// Table identifier.
     pub tag: BigEndian<Tag>,
     /// Checksum for the table.
     pub checksum: BigEndian<u32>,
     /// Offset from the beginning of the font data.
     pub offset: BigEndian<Offset32>,
     /// Length of the table.
     pub length: BigEndian<u32>,
 }

 impl FixedSize for TableRecord {
     const RAW_BYTE_LEN: usize = Tag::RAW_BYTE_LEN
         + u32::RAW_BYTE_LEN
         + Offset32::RAW_BYTE_LEN
         + u32::RAW_BYTE_LEN;
 }
 ```
 Some things to note:

 - The `repr` attribute specifies the layout and and alignment of the struct.
   `#[repr(packed)]` means that the generated struct has no internal padding,
   and that the alignment is `1`. (`#[repr(C)]` is required in order to use
   `#[repr(packed)]`, and it basically means "opt me out of the default
   representation").
 - All of the fields are `BigEndian<_>` types. This means that their internal
   representation is raw, big-endian bytes.
 - The `FixedSize` trait acts as a marker, to ensure that this type's fields
   are themselves all also `FixedSize`.

 Taken altogether, we get a struct that can be 'cast' from any slice of bytes
 of the appropriate length. More specifically, this works for arrays: we can take
 a slice of bytes, ensure that its length is a multiple of `T::RAW_BYTE_LEN`,
 and then convert that to a Rust slice of the appropriate type.

 #### <a id="copy-on-read"></a> copy-on-read

 In certain cases, there are records which do not have a size known at compile
 time. This happens frequently in the GPOS table. An example is the
 [`PairValueRecord`][] type: this contains two `ValueRecord` fields, and the size
 (in bytes) of each of these fields depends on a `ValueFormat` that is stored in
 the parent table.

 As such, we cannot know the size of `PairValueRecord` at compile time, which
 means we cannot cast it directly from bytes. Instead, we generate a 'normal'
 struct, as well as an implementation of `FontReadWithArgs` (discussed in the
 table section.) This looks like,

 ```rust
 pub struct PairValueRecord {
     /// Glyph ID of second glyph in the pair
     pub second_glyph: BigEndian<GlyphId>,
     /// Positioning data for the first glyph in the pair.
     pub value_record1: ValueRecord,
     /// Positioning data for the second glyph in the pair.
     pub value_record2: ValueRecord,
 }

 impl<'a> FontReadWithArgs<'a> for PairValueRecord {
     fn read_with_args(
         data: FontData<'a>,
         args: &(ValueFormat, ValueFormat),
     ) -> Result<Self, ReadError> {
         let mut cursor = data.cursor();
         let (value_format1, value_format2) = *args;
         Ok(Self {
             second_glyph: cursor.read()?,
             value_record1: cursor.read_with_args(&value_format1)?,
             value_record2: cursor.read_with_args(&value_format2)?,
         })
     }
 }
 ```

 Here, in our 'read' impl, we are actually instantiating an instance of our type,
 copying the bytes as needed.

 In addition, we also generate an implementation of the `ComputeSize` trait; this
 is analogous to the `FixedSize` trait, which represents the case of a type that
 has a size which can be computed at runtime from some set of arguments.

 #### <a id="offsets-in-records"></a> offsets in records

 Records, like tables, can contain offsets. Unlike tables, records do not have
 access to the raw data against which those offsets should be resolved. For the
 purpose of consistency across our geneerated code, however, it *is* important
 that we have a consistent way of resolving offsets contained in records, and we
 do: you have to pass it in.

 Where an offset getter on a table might look like,

 ```rust
 fn coverage(&self) -> Result<CoverageTable<'a>, ReadError>;
 ```

 The equivalent getter on a record looks like,

 ```rust
 fn coverage(&self, data: FontData<'a>) -> Result<CoverageTable<'a>, ReadError>;
 ```

 This... honestly, this is not great ergonomics. It is, however, simple, and is
 relied on by codegen in various places, and when we're generating code we aren't
 too bothered by how ergonomic it is. We might want to revisit this at some
 point; one simple improvement would be to have the caller pass in the parent
 table, but I'm not sure how this would work in cases where a type might be
 referenced by multiple parents. Another option would be to have some kind of
 fancy `RecordData` struct that would be a thin wrapper around a record plus the
 parent data, and which would implement the record getters, but deref to the
 record otherwise.... I'm really not sure.

 ### <a id="arrays"></a> arrays

 The code we generate to represent an array varies based on what we know about
 the size and contents of the array:

 - if the contents of an array have a fixed uniform size, known at compile time, then we
   represent the array as a rust slice: `&[T]`. This is true for all scalars
   (including offsets) as well as records that are composed of a fixed number of
   scalars.
 - if the contents of an array have a uniform size, but the size can only be
   determined at runtime, we represent the array using the [`ComputedArray`][] type.
   This requires the inner type to implement [`FontReadWithArgs`][], and the
   array itself wraps the raw bytes and instantiates its elements lazily as they
   are accessed. As an example, the length of a `ValueRecord` depends on the
   specific associated `ValueFormat`.
   ```rust
   table SinglePosFormat2 {
       // some fields omitted
       value_format: BigEndian<ValueFormat>,
       value_count: BigEndian<u16>,
       #[count($value_count)]
       #[read_with($value_format)]
       value_records: ComputedArray<ValueRecord>,
   }
   ```
 - finally, if an array contains elements of non-uniform sizes, we use the
   [`VarLenArray`][] type. This requires the inner type to have a leading field
   which contains the length of the item, and this array does not allow for
   random access; an example is the array of Pascal-style strings in the ['post'
   table][pstring]. The inner type must implement the implement the [`VarSize`][]
   trait, via which it indicates the type of its leading length field. An example
   of this pattern is the array of Pascal-style strings in the 'post' table;
   the first byte of these strings encodes the length, and so we represent them
   in a `VarLenArray`:

   ```rust
   table Post {
       // some fields omitted
       #[count(..)]
       #[since_version(2.0)]
       string_data: VarLenArray<PString<'a>>,
   }
   ```

 ### <a id="flags-and-enums"></a> flags and enums

 On top of tables and records, we also generate code for various defined flags
 and enums. In the case of flags, we generate implementations based on the
 [`bitflags`][] crate, and in the case of enums, we generate a rust enum.
 These code paths are not currently very heavily used.

 ### <a id="traversal"></a> traversal

 There is one last piece of code that we generate in `read-fonts`, and that is
 our 'traversal' code.

 This is experimental and likely subject to significant change, but the general
 idea is that it is a mechanism for recursively traversing a graph of
 tables, without needing to worry about the specific type of any *particular* table. It
 does this by using [trait objects][trait-objects], which allow us to refer to
 multiple distinct types in terms of a trait that they implement. The core of this is the
 [`SomeTable`][] trait, which is implemented for each table; through this, we can
 get the name of a table, as well as iterate through that tables fields.

 For each field, the table returns the name of the field (as a string) along with
 some *value*; the set of possible values is covered by the [`FieldType`][]
 enum. Importantly, the table resolves any contained offsets, and returns the
 referenced tables as `SomeTable` trait objects as well, which can then also be
 traversed recursively.

 We do not currently make very heavy use of this mechanism, but it *is* the basis
 for the generated implementations of the `Debug` trait, and it is used in the
 [otexplorer][] sample project.

 ## <a id="write-fonts"></a> `write-fonts`

 The `write-fonts` crate is significantly simpler than the `read-fonts` crate
 (currently less than half the total lines of generated code) and because it does
 not have to deal with the specifics of the memory layout or worry about avoiding
 allocation, the generated code is generally more straightforward.

 ### <a id="write-tables-records"></a> tables and records

 Unlike in `read-fonts`, which generates significantly different code for tables
 and records (as well as very different code based on whether a record is
 zerocopy or not) the `write-fonts` crate treats all tables and records as basic
 Rust structs.

 As in `read-fonts` we generate enums for tables that have multiple formats, and
 likewise we generate a single struct for tables that have versioned fields, with
 version-dependent fields represented as `Option` types.

 > ***note***:
 >
 > This pattern is a bit more annoying in write-fonts, and we may want to revisit
 > it at some point, or at least improve the API with some sort of builder
 > pattern.

 #### <a id="table-fields"></a> fields and `#[compile(..)]`

 Where the types in `read-fonts` generally contain the exact fields described in
 the spec, this does not always make sense for the `write-types`. A simple
 example is fields that contain the count of an array. This is useful in
 `read-fonts`, but in `write-fonts` it is redundant, since we can determine the
 count from the array itself. The same is true of things like the `format` field,
 which we can determine from the type of the table, as well as version numbers,
 which we can choose based on the fields present on the table.

 In these cases, the `#[compile(..)]` attribute can be used to provide a computed
 value to be written in the place of this field. The provided value can be a
 literal or an expression that evaluates to a value of the field's type.

 If a field has a `#[compile(..)]` attribute, then that field will be omitted in
 the generated struct.

 #### <a id="write-offsets"></a> offsets

 Fields that are of the various offset types in the spec are represented in
 `write-fonts` as [`OffsetMarker`] types. These are a wrapper around an
 `Option<T>` where `T` is the type of the referenced subtable; they also have a
 const generic param `N` that represents the width of the offset, in bytes.

 During compilation (see the section on [`FontWrite`][#fontwrite], below) we use
 these markers to record the position of offsets in a table, and to associate
 those locations with specific subtables.

 #### <a id="write-parsing"></a> parsing and [`FromTableRef`][]

 There is generally 1:1 relationship between the generated types in `read-fonts` and
 `write-fonts`, and you can convert a type in `read-fonts` to a corresponding
 type in `write-fonts` (assuming the default "parsing" feature is enabled) via
 the [`FromObjRef`][] and [`FromTableRef`][] traits. These are modeled on the
 [`From` trait][from-trait] in the Rust prelude, down to having a pair of
 companion `IntoOwnedObj` and `IntoOwnedTable` traits with blanket impls.

 The basic idea behind this approach is that we do not generate separate parsing
 code for the types in `write-fonts`; we leave the parsing up to the types in `read-fonts`,
 and then we just handle conversion from these to the write types.

 The more general of these two traits is [`FromObjRef`][], which is implemented
 for every table and record. It has one method, `from_obj_ref`, which takes some
 type from `read-fonts`, as well as `FontData` that is used to resolve any
 offsets. If the type is a table, it can ignore the provided data, since it
 already has a reference to the data it will use to resolve any contained
 offsets, but if it is a record than it must use the input data in order to
 recursively convert any contained offsets.

 In their `FromObjRef` implementation, tables provide pass their own data down to
 any contained records as required.

 The `FromTableRef` trait is simply a marker; it indicates that a given object
 does not require any external data.

 In any case, all of these traits are largely implementation details, and you
 will rarely need to interact with them directly: if because if a type implements
 `FromTableRef`, then we *also* generate an implementation of the `FontRead`
 trait from `read-fonts`. This means that all of the self-describing tables in
 `write-fonts` can be instantiated directly from raw bytes in a font file.

 #### <a id="validation"></a> Validation

 One detail of `FromObjRef` and family is that these traits are *infallible*;
 that is, if we can parse a table at all, we will always successfully convert it
 to its owned equivalent, even if it contains unexpected null offsets, or has
 subtables which cannot be read. This means that you can read and modify a table
 that is malformed.

 We do not want to *write* tables that are malformed, however, and we also want
 an opportunity to enforce various other constraints that are expressed in the
 spec, and for this we have the [`Validate`][] trait. An implementation of this
 trait is generated for all tables, and we automatically verify a number of
 conditions: for instance that offsets which should not be null contain a value,
 or that the number of items in a table does not overflow the integer type that
 stores that table's length. Additional validation can be performed on a
 per-field basis by providing a method name to the `#[validate(..)]` attribute;
 this should be an instance method (having a `&self` param) and should also
 accept an additional 'ctx' argument, of type [`&mut ValidateCtx`][validation-ctx] which is used
 to report errors.

 ### <a id="compilation"></a> compilation and [`FontWrite`][]

 Finally, for each type we generate an implementation of the [`FontWrite`][] trait,
 which looks like:

 ```rust
 pub trait FontWrite {
     fn write_into(&self, writer: &mut TableWriter);
 }
 ```

 The `TableWriter` struct has two jobs: it records the raw bytes representing the
 data in this table or record, as well as recording the position of offsets, and
 the entities they point do.

 The implementation of this type is all hand-written, and out of the scope of
 this document, but the implementations of `FontWrite` that we generate are
 straight-forward: we walk the struct's fields in order (computing a value if the
 field has a `#[compile(..)]` attribute) and recursively call `write_into` on
 them. This recurses until it reaches either an `OffsetMarker` or a scalar type;
 in the first case we record the position and size of the offset in the current
 table, and then recursively write out the referenced object; and in the latter
 case we record the big-endian bytes themselves.


 ## fin

 This document represents a best effort at capturing the most important details
 of the code we generate, as of October 2022. It is likely that things will
 change over time, and I will endeavour to keep this document up to date. If
 anything is unclear or incorrect, please open an issue and I will try to
 clarify.


 [`read-fonts`]: https://docs.rs/read-fonts/
 [`write-fonts`]: https://docs.rs/write-fonts/
 [opentype]: https://learn.microsoft.com/en-us/typography/opentype/spec/
 [read-name-record]: https://docs.rs/read-fonts/latest/read_fonts/tables/name/struct.NameRecord.html
 [write-name-record]: https://docs.rs/write-fonts/latest/write_fonts/tables/name/struct.NameRecord.html
 [`trybuild`]: https://docs.rs/trybuild/latest/trybuild/
 [`FontRead`]: https://docs.rs/read-fonts/latest/read_fonts/trait.FontRead.html
 [`FontReadWithArgs`]: https://docs.rs/read-fonts/latest/read_fonts/trait.FontReadWithArgs.html
 [loca-spec]: https://learn.microsoft.com/en-us/typography/opentype/spec/loca
 [`Tag`]: https://learn.microsoft.com/en-us/typography/opentype/spec/ttoreg
 [otff]: https://learn.microsoft.com/en-us/typography/opentype/spec/otff
 [`PairValueRecord`]: https://learn.microsoft.com/en-us/typography/opentype/spec/gpos#pairValueRec
 [`bitflags`]: https://docs.rs/bitflags/latest/bitflags/
 [ot-data-types]: https://learn.microsoft.com/en-us/typography/opentype/spec/otff#data-types
 [endianness]: https://en.wikipedia.org/wiki/Endianness
 [`Compatible`]: https://docs.rs/font-types/latest/font_types/trait.Compatible.html
 [trait-objects]: http://doc.rust-lang.org/1.64.0/book/ch17-02-trait-objects.html
 [`SomeTable`]: https://docs.rs/read-fonts/latest/read_fonts/traversal/trait.SomeTable.html
 [`FieldType`]: https://docs.rs/read-fonts/latest/read_fonts/traversal/enum.FieldType.html
 [otexplorer]: https://github.com/cmyr/fontations/tree/main/otexplorer
 [`OffsetMarker`]: https://docs.rs/write-fonts/latest/write_fonts/struct.OffsetMarker.html
 [`FromObjRef`]: https://docs.rs/write-fonts/latest/write_fonts/from_obj/trait.FromObjRef.html
 [`FromTableRef`]: https://docs.rs/write-fonts/latest/write_fonts/from_obj/trait.FromTableRef.html
 [from-trait]: http://doc.rust-lang.org/1.64.0/std/convert/trait.From.html
 [`Validate`]: https://docs.rs/write-fonts/latest/write_fonts/validate/trait.Validate.html
 [validation-ctx]: https://docs.rs/write-fonts/latest/write_fonts/validate/struct.ValidationCtx.html
 [`FontWrite`]: https://docs.rs/write-fonts/latest/write_fonts/trait.FontWrite.html
 [`FixedSize`]: https://docs.rs/font-types/latest/font_types/trait.FixedSize.html
 [generic-const-exprs]: https://github.com/rust-lang/rust/issues/60551#issuecomment-917511891
 [read-prelude]: https://github.com/cmyr/fontations/blob/main/read-fonts/src/lib.rs#L42
 [`FontData`]: https://docs.rs/read-fonts/latest/read_fonts/struct.FontData.html
 [`ComputedArray`]: https://docs.rs/read-fonts/latest/read_fonts/array/struct.ComputedArray.html
 [`VarLenArray`]: https://docs.rs/read-fonts/latest/read_fonts/array/struct.VarLenArray.html
 [`VarSize`]: https://docs.rs/read-fonts/latest/read_fonts/trait.VarSize.html
 [pstring]: https://learn.microsoft.com/en-us/typography/opentype/spec/post#version-20