This crate contains utilities used to generate code for parsing and compiling various font tables. For an in-depth overview of what code we generate and how it works, see the codegen-tour document.
The basics:
resources/codegen_inputs.# Rebuild all the things (normal use case) $ cargo run --bin=codegen resources/codegen_plan.toml # Process a single file $ cargo run --bin=codegen file $mode $input $ cargo run --bin=codegen file parse resources/codegen_inputs/cmap.rs > read-fonts/generated/generated_cmap.rswhere
$input is the path to an input file, and $mode is one of ‘parse’ or ‘compile’, and which will generate the code corresponding to the read-fonts or write-fonts crate, respectively. This will print the output to stdout; you can redirect it elsewhere as desired.resources/codegen_plan.toml.$crate/generated/generated_$name.rs (where $crate is one of read-fonts or write-fonts.)include! macro into a corresponding module, generally in $crate/src/tables/$name.rs.resources/codegen_inputs. The name of this file is not important, but in general it should be the name of the corresponding table in the spec. Each top-level table (table with a tag) gets its own file. To assist with creating this file, you may use the preprocessor; see below.resources/codegen_plan.toml to generate an output in read-fonts/generated.read-fonts crate. In general this means adding a new file in read-fonts/src/tables, and adding an entry in read-fonts/src/tables.rs. The module should include! the generated file.$ cargo run --bin=codegen resources/codegen_plan.toml, and run cargo check to see if there are any errors.read-fonts/src/table_provider.rs to provide a getter for your table.otexplorer to add support for your table. Run the otexplorer tool, and ensure it is producing reasonable output.write-fonts crate.It is possible that in adding a table you will need to modify the codegen code itself, for instance to add a new attribute.
This can be a fiddily process. In general, the workflow is something like this:
codegen_inputs/test.rs to include an input matching the input you are trying to support.$ cargo run --bin=codegen resources/test_plan.toml && cargo test to see if the generated code compiles, and inspect to see that it is working as intended.To speed up writing of the codegen inputs, there is a preprocessor, which takes a simple text input and does basic reformatting into the expected input format.
The text in the preprocessor inputs (which live in resources/raw_tables) is copied directly from the Microsoft OpenType® docs; it is then augmented with links to the original documentation, and a few basic annotations to indicate the type of the object (record/table/flags/enums)
Inputs to the preprocessor look like this:
/// an optional comment for each top-level item @table Gpos1_0 uint16 majorVersion Major version of the GPOS table, = 1 uint16 minorVersion Minor version of the GPOS table, = 0 Offset16 scriptListOffset Offset to ScriptList table, from beginning of GPOS table Offset16 featureListOffset Offset to FeatureList table, from beginning of GPOS table Offset16 lookupListOffset Offset to LookupList table, from beginning of GPOS table /// Part of [Name1] @record LangTagRecord uint16 length Language-tag string length (in bytes) Offset16 langTagOffset Language-tag string offset from start of storage area (in bytes). /// [Axis value table flags](https://docs.microsoft.com/en-us/typography/opentype/spec/stat#flags). @flags(u16) AxisValueTableFlags 0x0001 OLDER_SIBLING_FONT_ATTRIBUTE If set, this axis value table provides axis value information 0x0002 ELIDABLE_AXIS_VALUE_NAME If set, do something else @enum(u16) GlyphClassDef 1 Base Base glyph (single character, spacing glyph) 2 Ligature Ligature glyph (multiple character, spacing glyph) 3 Mark Mark glyph (non-spacing combining glyph) 4 Component Component glyph (part of single character, spacing glyph)
@OBJECT_TYPE.run this like,
$ cargo run --bin preprocessor resources/raw_tables/my_table.txt > resources/codegen_inputs/my_table.rs
The codegen tool reads in a file in rust-like syntax, and generates the final rust source.
To run the tool on a single input:
# cargo run --bin=codegen resources/codegen_inputs/my_table.rs
This will write the generated source to stdout; you can redirect it as desired.
Codegen inputs can be annotated with various table and field attributes that inform how the code is generated. These use the same syntax as proc-macro attributes.
The following annotations are supported on top-level objects:
#[skip_font_write]: if present, we will not generate a FontWrite implementation for this type. This is useful if a type needs some manual processing before it can be compiled.#[skip_from_obj]: if present, we will not generate a FromObjRef implementation for this type.#[read_args(name: type,+)] if present, this type will be given an implementation of FontReadWithArgs, expecting the provided arguments. The provided names will be available to other attributes on this type, as if they were fields on the type itself.#[generic_offset(T)] Indicate that this type contains an offset with a generic target, for which we will add a PhantomData field. This is is used for common tables that contain offsets which point to different concrete types depending on the containing table, such as the Layout subtable shared between GPOS and GSUB.#[write_fonts_only] Indicate that this table should only be generated for write-fonts (i.e. should be ignored in read-fonts).#[validate(method)] Provide a method to perform additional pre-compilation validation for this type. The method must be manually implemented on the type, with the signature fn(&self, &mut ValidationCtx).#[nullable]: only allowed on offsets or arrays of offsets, and indicates that this field is allowed to be null. This changes the behaviour of getters, as well as validation and compilation code.#[since_version(version)]: indicates that a field only exists in a given version of the table. The version may be either a single integer literal (#[since_version(1)]), or a major.minor pair (#[since_version(1.1)]).#[if_flag($field, Flags::SOME_FLAG)]: indicates that a given field is only present if a particular flag is set on the named field. The field is expected to be a bitset with a contains method.#[if_cond($field, Flags::SOME_FLAG_A, Flags::SOME_FLAG_B, ...)]: indicates that a given field is only present if at least one of the listed flags is set on the named field. The field is expected to be a bitset with a contains method.#[skip_getter]: if present, we will not generate a getter for this field. Used on things like padding fields.#[offset_getter(method name)]: only allowed on offsets or arrays of offsets. If present, we will not generate a method that resolves this offset, but will instead expect that one will be implemented manually, and will have the provided name.#[offset_data(method name)]: only on offset fields. If present, the provided ‘method name’ must be implemented, and must return FontData that will be used to resolve this offset. Used in places where offsets are not resolved from the base of the containing table. Uncommon.#[offset_adjustment(expr)]: related to the above, but for encoding: the provided expression must evaluate to a u32, which will be subtracted from the computed offset during compilation.#[version]: May only be supplied for one field. If present, this field is treated as the ‘version’, used when determining the availability of versioned fields.#[format = x]: Indicates that this field is the format field of a multi-format table, and that it has the provided format value.#[count(arg)] and #[count(fn_name(arg, +))]: This annotation has two forms. The simple form accepts a single argument, which can be either the token .. (meaning all remaining data, and only valid on the last field in a table), the name of a field (preceded by the $ token) or a literal integer. The less-simple form begins with a function identifier, and then one or more arguments, comma separated. Currently accepted function identifiers are ‘add’, ‘subtract’, ‘add_multiply’, ‘multiply_add’, ‘half’, ‘map_delta_size’, and ‘delta_value_count’.#[compile(arg)]: If present, this field will not be included in the compile type. The value may be either the literal ‘skip’, or an expression that evaluates to the field's type: the skip case is only expected in cases where there is a manual FontWrite impl, and the field does not make sense on the compile type.#[compile_with(method_name)]: Specify custom compilation behaviour. This attribute lets you name a method that will be called to get some type that will be used to compile this field. This may be any type that implements the FontWrite trait; this can be used in cases where the logic to compile a given type requires some custom implementation.#[compile_type(type)]: specify an alternate type to be used in the struct generated for this type.#[default(expr)]: specify a value that will be used in the implementation of Default for the containing type. Unlike with #[compile], this value is set when the type is created, and can be manually modified by the user.#[read_with(args,+)]: specify that this field's type needs to be read with FontReadWithArgs, and passed the provided args. Args is a comma separated list of fields or input args to the type.#[read_offset_with(args,+)]: on offsets or arrays of offsets, indicates that the type referenced by this offset needs to be passed the provided args when it is read.#[validate(arg)]: arg is either the literal ‘skip’ or the name of a method. If the name of a method, that method will be called during validation, and can implement custom validation logic.#[traverse_with(method name)]: uncommon/hacky: provides a method name that will be called in traversal code to get the FieldType for this field. To skip traversing this field, you can use the ‘skip’ keyword (#[traverse_with(skip)]).#[to_owned(expr)]: uncommon/hacky: provide an expression that will be used in FromObjRef to convert the parse type to the compile type.There is also the concept of a ‘codegen plan’, which is a simple toml file describing a number of different operations to be run in parallel. This is intended to be the general mechanism by which codegen is run.
See ../resources/codegen_plan.toml for an example.