| # Tooling to generate interpreters |
| |
| Documentation for the instruction definitions in `Python/bytecodes.c` |
| ("the DSL") is [here](interpreter_definition.md). |
| |
| What's currently here: |
| |
| - `analyzer.py`: code for converting `AST` generated by `Parser` |
| to more high-level structure for easier interaction |
| - `lexer.py`: lexer for C, originally written by Mark Shannon |
| - `plexer.py`: OO interface on top of lexer.py; main class: `PLexer` |
| - `parsing.py`: Parser for instruction definition DSL; main class: `Parser` |
| - `parser.py` helper for interactions with `parsing.py` |
| - `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and |
| write `Python/generated_cases.c.h` (and several other files) |
| - `optimizer_generator.py`: reads `Python/bytecodes.c` and |
| `Python/optimizer_bytecodes.c` and writes |
| `Python/optimizer_cases.c.h` |
| - `stack.py`: code to handle generalized stack effects |
| - `cwriter.py`: code which understands tokens and how to format C code; |
| main class: `CWriter` |
| - `generators_common.py`: helpers for generators |
| - `opcode_id_generator.py`: generate a list of opcodes and write them to |
| `Include/opcode_ids.h` |
| - `opcode_metadata_generator.py`: reads the instruction definitions and |
| write the metadata to `Include/internal/pycore_opcode_metadata.h` |
| - `py_metadata_generator.py`: reads the instruction definitions and |
| write the metadata to `Lib/_opcode_metadata.py` |
| - `target_generator.py`: generate targets for computed goto dispatch and |
| write them to `Python/opcode_targets.h` |
| - `uop_id_generator.py`: generate a list of uop IDs and write them to |
| `Include/internal/pycore_uop_ids.h` |
| - `uop_metadata_generator.py`: reads the instruction definitions and |
| write the metadata to `Include/internal/pycore_uop_metadata.h` |
| |
| Note that there is some dummy C code at the top and bottom of |
| `Python/bytecodes.c` |
| to fool text editors like VS Code into believing this is valid C code. |
| |
| ## A bit about the parser |
| |
| The parser class uses a pretty standard recursive descent scheme, |
| but with unlimited backtracking. |
| The `PLexer` class tokenizes the entire input before parsing starts. |
| We do not run the C preprocessor. |
| Each parsing method returns either an AST node (a `Node` instance) |
| or `None`, or raises `SyntaxError` (showing the error in the C source). |
| |
| Most parsing methods are decorated with `@contextual`, which automatically |
| resets the tokenizer input position when `None` is returned. |
| Parsing methods may also raise `SyntaxError`, which is irrecoverable. |
| When a parsing method returns `None`, it is possible that after backtracking |
| a different parsing method returns a valid AST. |
| |
| Neither the lexer nor the parsers are complete or fully correct. |
| Most known issues are tersely indicated by `# TODO:` comments. |
| We plan to fix issues as they become relevant. |