Tools/cases_generator/README.md - external/github.com/python/cpython - Git at Google

 # Tooling to generate interpreters

 Documentation for the instruction definitions in `Python/bytecodes.c`
 ("the DSL") is [here](interpreter_definition.md).

 What's currently here:

 - `analyzer.py`: code for converting `AST` generated by `Parser`
   to more high-level structure for easier interaction
 - `lexer.py`: lexer for C, originally written by Mark Shannon
 - `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
 - `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
 - `parser.py` helper for interactions with `parsing.py`
 - `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
   write `Python/generated_cases.c.h` (and several other files)
 - `optimizer_generator.py`: reads `Python/bytecodes.c` and
   `Python/optimizer_bytecodes.c` and writes
   `Python/optimizer_cases.c.h`
 - `stack.py`: code to handle generalized stack effects
 - `cwriter.py`: code which understands tokens and how to format C code;
   main class: `CWriter`
 - `generators_common.py`: helpers for generators
 - `opcode_id_generator.py`: generate a list of opcodes and write them to
   `Include/opcode_ids.h`
 - `opcode_metadata_generator.py`: reads the instruction definitions and
   write the metadata to `Include/internal/pycore_opcode_metadata.h`
 - `py_metadata_generator.py`: reads the instruction definitions and
   write the metadata to `Lib/_opcode_metadata.py`
 - `target_generator.py`: generate targets for computed goto dispatch and
   write them to `Python/opcode_targets.h`
 - `uop_id_generator.py`: generate a list of uop IDs and write them to
   `Include/internal/pycore_uop_ids.h`
 - `uop_metadata_generator.py`: reads the instruction definitions and
   write the metadata to `Include/internal/pycore_uop_metadata.h`

 Note that there is some dummy C code at the top and bottom of
 `Python/bytecodes.c`
 to fool text editors like VS Code into believing this is valid C code.

 ## A bit about the parser

 The parser class uses a pretty standard recursive descent scheme,
 but with unlimited backtracking.
 The `PLexer` class tokenizes the entire input before parsing starts.
 We do not run the C preprocessor.
 Each parsing method returns either an AST node (a `Node` instance)
 or `None`, or raises `SyntaxError` (showing the error in the C source).

 Most parsing methods are decorated with `@contextual`, which automatically
 resets the tokenizer input position when `None` is returned.
 Parsing methods may also raise `SyntaxError`, which is irrecoverable.
 When a parsing method returns `None`, it is possible that after backtracking
 a different parsing method returns a valid AST.

 Neither the lexer nor the parsers are complete or fully correct.
 Most known issues are tersely indicated by `# TODO:` comments.
 We plan to fix issues as they become relevant.
	# Tooling to generate interpreters

	Documentation for the instruction definitions in `Python/bytecodes.c`
	("the DSL") is [here](interpreter_definition.md).

	What's currently here:

	- `analyzer.py`: code for converting `AST` generated by `Parser`
	to more high-level structure for easier interaction
	- `lexer.py`: lexer for C, originally written by Mark Shannon
	- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
	- `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
	- `parser.py` helper for interactions with `parsing.py`
	- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
	write `Python/generated_cases.c.h` (and several other files)
	- `optimizer_generator.py`: reads `Python/bytecodes.c` and
	`Python/optimizer_bytecodes.c` and writes
	`Python/optimizer_cases.c.h`
	- `stack.py`: code to handle generalized stack effects
	- `cwriter.py`: code which understands tokens and how to format C code;
	main class: `CWriter`
	- `generators_common.py`: helpers for generators
	- `opcode_id_generator.py`: generate a list of opcodes and write them to
	`Include/opcode_ids.h`
	- `opcode_metadata_generator.py`: reads the instruction definitions and
	write the metadata to `Include/internal/pycore_opcode_metadata.h`
	- `py_metadata_generator.py`: reads the instruction definitions and
	write the metadata to `Lib/_opcode_metadata.py`
	- `target_generator.py`: generate targets for computed goto dispatch and
	write them to `Python/opcode_targets.h`
	- `uop_id_generator.py`: generate a list of uop IDs and write them to
	`Include/internal/pycore_uop_ids.h`
	- `uop_metadata_generator.py`: reads the instruction definitions and
	write the metadata to `Include/internal/pycore_uop_metadata.h`

	Note that there is some dummy C code at the top and bottom of
	`Python/bytecodes.c`
	to fool text editors like VS Code into believing this is valid C code.

	## A bit about the parser

	The parser class uses a pretty standard recursive descent scheme,
	but with unlimited backtracking.
	The `PLexer` class tokenizes the entire input before parsing starts.
	We do not run the C preprocessor.
	Each parsing method returns either an AST node (a `Node` instance)
	or `None`, or raises `SyntaxError` (showing the error in the C source).

	Most parsing methods are decorated with `@contextual`, which automatically
	resets the tokenizer input position when `None` is returned.
	Parsing methods may also raise `SyntaxError`, which is irrecoverable.
	When a parsing method returns `None`, it is possible that after backtracking
	a different parsing method returns a valid AST.

	Neither the lexer nor the parsers are complete or fully correct.
	Most known issues are tersely indicated by `# TODO:` comments.
	We plan to fix issues as they become relevant.