| # codegen |
| |
| This crate contains utilities used to generate code for parsing and |
| compiling various font tables. For an in-depth overview of what code we generate |
| and how it works, see the [codegen-tour][] document. |
| |
| The basics: |
| - Inputs live in `resources/codegen_inputs`. |
| - To run the code generator: |
| ```sh |
| # Rebuild all the things (normal use case) |
| $ cargo run --bin=codegen resources/codegen_plan.toml |
| |
| # Process a single file |
| $ cargo run --bin=codegen file $mode $input |
| $ cargo run --bin=codegen file parse resources/codegen_inputs/cmap.rs > read-fonts/generated/generated_cmap.rs |
| ``` |
| where `$input` is the path to an input file, and `$mode` is one of 'parse' or |
| 'compile', and which will generate the code corresponding to the `read-fonts` |
| or `write-fonts` crate, respectively. This will print the output to `stdout`; |
| you can redirect it elsewhere as desired. |
| - But inputs are more commonly run through a 'codegen plan', which describes the |
| inputs and their destinations. The default plan lives in `resources/codegen_plan.toml`. |
| - outputs are written into `$crate/generated/generated_$name.rs` (where `$crate` is one of |
| `read-fonts` or `write-fonts`.) |
| - these output files (which are not in the module tree) are included with the |
| [`include!`][] macro into a corresponding module, generally in |
| `$crate/src/tables/$name.rs`. |
| |
| |
| ## Adding a new table |
| |
| - Create a new codegen input file in `resources/codegen_inputs`. The name of |
| this file is not important, but in general it should be the name of the |
| corresponding table in the spec. Each top-level table (table with a tag) gets |
| its own file. To assist with creating this file, you may use the |
| [preprocessor](#preprocessor); see below. |
| - Add a task in `resources/codegen_plan.toml` to generate an output in |
| `read-fonts/generated`. |
| - Add a module corresponding to the new table to the `read-fonts` crate. In |
| general this means adding a new file in `read-fonts/src/tables`, and adding an |
| entry in `read-fonts/src/tables.rs`. The module should `include!` the |
| generated file. |
| - Run the codegen tool, with |
| `$ cargo run --bin=codegen resources/codegen_plan.toml`, and run `cargo check` |
| to see if there are any errors. |
| - If there are any errors, add [attributes](#annotations) as to your table |
| as appropriate. Look at other tables for examples. |
| - Update `read-fonts/src/table_provider.rs` to provide a getter for your table. |
| - Update `otexplorer` to add support for your table. Run the `otexplorer` tool, |
| and ensure it is producing reasonable output. |
| - Repeat this process for the `write-fonts` crate. |
| |
| |
| ## Modifying the codegen code |
| |
| It is possible that in adding a table you will need to modify the codegen code |
| itself, for instance to add a new attribute. |
| |
| This can be a fiddily process. In general, the workflow is something like this: |
| |
| - Update `codegen_inputs/test.rs` to include an input matching the input you are |
| trying to support. |
| - Make a modification to the codegen code. |
| - Run `$ cargo run --bin=codegen resources/test_plan.toml && cargo test` to see |
| if the generated code compiles, and inspect to see that it is working as |
| intended. |
| - repeat the edit/test cycle until you are satisfied. |
| |
| ## preprocessor |
| |
| To speed up writing of the codegen inputs, there is a *preprocessor*, which |
| takes a simple text input and does basic reformatting into the expected input |
| format. |
| |
| The text in the preprocessor inputs (which live in `resources/raw_tables`) is |
| copied directly from the [Microsoft OpenType® docs][opentype]; it is then |
| augmented with links to the original documentation, and a few basic annotations |
| to indicate the type of the object (record/table/flags/enums) |
| |
| Inputs to the preprocessor look like this: |
| |
| ``` |
| /// an optional comment for each top-level item |
| @table Gpos1_0 |
| uint16 majorVersion Major version of the GPOS table, = 1 |
| uint16 minorVersion Minor version of the GPOS table, = 0 |
| Offset16 scriptListOffset Offset to ScriptList table, from beginning of GPOS table |
| Offset16 featureListOffset Offset to FeatureList table, from beginning of GPOS table |
| Offset16 lookupListOffset Offset to LookupList table, from beginning of GPOS table |
| |
| /// Part of [Name1] |
| @record LangTagRecord |
| uint16 length Language-tag string length (in bytes) |
| Offset16 langTagOffset Language-tag string offset from start of storage area (in bytes). |
| |
| /// [Axis value table flags](https://docs.microsoft.com/en-us/typography/opentype/spec/stat#flags). |
| @flags(u16) AxisValueTableFlags |
| 0x0001 OLDER_SIBLING_FONT_ATTRIBUTE If set, this axis value table provides axis value information |
| 0x0002 ELIDABLE_AXIS_VALUE_NAME If set, do something else |
| |
| @enum(u16) GlyphClassDef |
| 1 Base Base glyph (single character, spacing glyph) |
| 2 Ligature Ligature glyph (multiple character, spacing glyph) |
| 3 Mark Mark glyph (non-spacing combining glyph) |
| 4 Component Component glyph (part of single character, spacing glyph) |
| ``` |
| |
| - all objects are separated by a newline, and begin with `@OBJECT_TYPE`. |
| - record & table are currently interchangeable, but this may change, and you |
| should follow the spec. |
| - enum & flags require an explicit format |
| - this does not handle lifetimes, which will need to be added manually |
| - it also does not add annotations, which are necessary in any non-trivial case. |
| - you will generally need to do some cleanup. |
| |
| run this like, |
| |
| ```sh |
| $ cargo run --bin preprocessor resources/raw_tables/my_table.txt > resources/codegen_inputs/my_table.rs |
| ``` |
| |
| ## codegen |
| |
| The codegen tool reads in a file in rust-like syntax, and generates the final |
| rust source. |
| |
| To run the tool on a single input: |
| |
| ```sh |
| # cargo run --bin=codegen resources/codegen_inputs/my_table.rs |
| ``` |
| |
| This will write the generated source to stdout; you can redirect it as desired. |
| |
| ### annotations |
| |
| Codegen inputs can be annotated with various table and field attributes that |
| inform how the code is generated. These use the same syntax as proc-macro |
| attributes. |
| |
| #### table attributes |
| |
| The following annotations are supported on top-level objects: |
| |
| - `#[skip_font_write]`: if present, we will not generate a `FontWrite` |
| implementation for this type. This is useful if a type needs some manual |
| processing before it can be compiled. |
| - `#[skip_from_obj]`: if present, we will not generate a `FromObjRef` |
| implementation for this type. |
| - `#[read_args(name: type,+)]` if present, this type will be given an |
| implementation of `FontReadWithArgs`, expecting the provided arguments. The |
| provided names will be available to other attributes on this type, as if they |
| were fields on the type itself. |
| - `#[generic_offset(T)]` Indicate that this type contains an offset with a generic |
| target, for which we will add a `PhantomData` field. This is is used for |
| common tables that contain offsets which point to different concrete types |
| depending on the containing table, such as the `Layout` subtable shared |
| between GPOS and GSUB. |
| - `#[write_fonts_only]` Indicate that this table should only be generated for |
| `write-fonts` (i.e. should be ignored in `read-fonts`). |
| - `#[validate(method)]` Provide a method to perform additional pre-compilation |
| validation for this type. The method must be manually implemented on the type, |
| with the signature `fn(&self, &mut ValidationCtx)`. |
| |
| #### field attributes |
| - `#[nullable]`: only allowed on offsets or arrays of offsets, and indicates |
| that this field is allowed to be null. This changes the behaviour of getters, |
| as well as validation and compilation code. |
| - `#[since_version(version)]`: indicates that a field only exists in a given version |
| of the table. The `version` may be either a single integer literal |
| (`#[since_version(1)]`), or a major.minor pair (`#[since_version(1.1)]`). |
| - `#[if_flag($field, Flags::SOME_FLAG)]`: indicates that a given field is only |
| present if a particular flag is set on the named field. The field is expected |
| to be a bitset with a `contains` method. |
| - `#[if_cond($field, Flags::SOME_FLAG_A, Flags::SOME_FLAG_B, ...)]`: indicates that a |
| given field is only present if at least one of the listed flags is set on the named |
| field. The field is expected to be a bitset with a `contains` method. |
| - `#[skip_getter]`: if present, we will not generate a getter for this field. |
| Used on things like padding fields. |
| - `#[offset_getter(method name)]`: only allowed on offsets or arrays of offsets. |
| If present, we will not generate a method that resolves this offset, but will |
| instead expect that one will be implemented manually, and will have the |
| provided name. |
| - `#[offset_data(method name)]`: only on offset fields. If present, the provided |
| 'method name' must be implemented, and must return `FontData` that will be |
| used to resolve this offset. Used in places where offsets are not resolved |
| from the base of the containing table. Uncommon. |
| - `#[offset_adjustment(expr)]`: related to the above, but for encoding: the |
| provided expression must evaluate to a `u32`, which will be subtracted from |
| the computed offset during compilation. |
| - `#[version]`: May only be supplied for one field. If present, this field is |
| treated as the 'version', used when determining the availability of versioned |
| fields. |
| - `#[format = x]`: Indicates that this field is the format field of a |
| multi-format table, and that it has the provided format value. |
| - `#[count(arg)]` and `#[count(fn_name(arg, +))]`: This annotation has two |
| forms. The simple form accepts a single argument, which can be either |
| the token `..` (meaning all remaining data, and only valid on the last field |
| in a table), the name of a field (preceded by the `$` token) or a literal |
| integer. The less-simple form begins with a function identifier, and then one |
| or more arguments, comma separated. Currently accepted function identifiers |
| are 'add', 'subtract', 'add_multiply', 'multiply_add', 'half', 'map_delta_size', |
| and 'delta_value_count'. |
| - `#[compile(arg)]`: If present, this field will not be included in the compile |
| type. The value may be either the literal 'skip', or an expression that |
| evaluates to the field's type: the skip case is only expected in cases where |
| there is a manual `FontWrite` impl, and the field does not make sense on the |
| compile type. |
| - `#[compile_with(method_name)]`: Specify custom compilation behaviour. This |
| attribute lets you name a method that will be called to get some type that |
| will be used to compile this field. This may be any type that implements the |
| `FontWrite` trait; this can be used in cases where the logic to compile a |
| given type requires some custom implementation. |
| - `#[compile_type(type)]`: specify an alternate type to be used in the struct |
| generated for this type. |
| - `#[default(expr)]`: specify a value that will be used in the implementation of |
| `Default` for the containing type. Unlike with `#[compile]`, this value is set |
| when the type is created, and can be manually modified by the user. |
| - `#[read_with(args,+)]`: specify that this field's type needs to be read with |
| `FontReadWithArgs`, and passed the provided args. Args is a comma separated |
| list of fields or input args to the type. |
| - `#[read_offset_with(args,+)]`: on offsets or arrays of offsets, indicates that |
| the type referenced by this offset needs to be passed the provided args when |
| it is read. |
| - `#[validate(arg)]`: arg is either the literal 'skip' or the name of a method. |
| If the name of a method, that method will be called during validation, and can |
| implement custom validation logic. |
| - `#[traverse_with(method name)]`: uncommon/hacky: provides a method name that |
| will be called in traversal code to get the `FieldType` for this field. |
| To skip traversing this field, you can use the 'skip' keyword |
| (`#[traverse_with(skip)]`). |
| - `#[to_owned(expr)]`: uncommon/hacky: provide an expression that will be used |
| in `FromObjRef` to convert the parse type to the compile type. |
| |
| |
| ### codegen plans |
| |
| There is also the concept of a 'codegen plan', which is a simple toml file |
| describing a number of different operations to be run in parallel. This is |
| intended to be the general mechanism by which codegen is run. |
| |
| See `../resources/codegen_plan.toml` for an example. |
| |
| [opentype]: https://docs.microsoft.com/en-us/typography/opentype/ |
| [`include!`]: http://doc.rust-lang.org/1.64.0/std/macro.include.html |
| [codegen-tour]: ../docs/codegen-tour.md |
| |