InternalDocs/jit.md - external/github.com/python/cpython - Git at Google

 # The JIT

 The [adaptive interpreter](interpreter.md) consists of a main loop that
 executes the bytecode instructions generated by the
 [bytecode compiler](compiler.md) and their
 [specializations](interpreter.md#Specialization). Runtime optimization in
 this interpreter can only be done for one instruction at a time. The JIT
 is based on a mechanism to replace an entire sequence of bytecode instructions,
 and this enables optimizations that span multiple instructions.

 Historically, the adaptive interpreter was referred to as `tier 1` and
 the JIT as `tier 2`. You will see remnants of this in the code.

 ## The Optimizer and Executors

 The program begins running on the adaptive interpreter, until a `JUMP_BACKWARD`
 instruction determines that it is "hot" because the counter in its
 [inline cache](interpreter.md#inline-cache-entries) indicates that it
 executed more than some threshold number of times (see
 [`backoff_counter_triggers`](../Include/internal/pycore_backoff.h)).
 It then calls the function `_PyOptimizer_Optimize()` in
 [`Python/optimizer.c`](../Python/optimizer.c), passing it the current
 [frame](frames.md) and instruction pointer. `_PyOptimizer_Optimize()`
 constructs an object of type
 [`_PyExecutorObject`](../Include/internal/pycore_optimizer.h) which implements
 an optimized version of the instruction trace beginning at this jump.

 The optimizer determines where the trace ends, and the executor is set up
 to either return to the adaptive interpreter and resume execution, or
 transfer control to another executor (see `_PyExitData` in
 Include/internal/pycore_optimizer.h).

 The executor is stored on the [`code object`](code_objects.md) of the frame,
 in the `co_executors` field which is an array of executors. The start
 instruction of the trace (the `JUMP_BACKWARD`) is replaced by an
 `ENTER_EXECUTOR` instruction whose `oparg` is equal to the index of the
 executor in `co_executors`.

 ## The micro-op optimizer

 The micro-op (abbreviated `uop` to approximate `μop`) optimizer is defined in
 [`Python/optimizer.c`](../Python/optimizer.c) as `_PyOptimizer_Optimize`.
 It translates an instruction trace into a sequence of micro-ops by replacing
 each bytecode by an equivalent sequence of micro-ops (see
 `_PyOpcode_macro_expansion` in
 [pycore_opcode_metadata.h](../Include/internal/pycore_opcode_metadata.h)
 which is generated from [`Python/bytecodes.c`](../Python/bytecodes.c)).
 The micro-op sequence is then optimized by
 `_Py_uop_analyze_and_optimize` in
 [`Python/optimizer_analysis.c`](../Python/optimizer_analysis.c)
 and an instance of `_PyUOpExecutor_Type` is created to contain it.

 ## The JIT interpreter

 After a `JUMP_BACKWARD` instruction invokes the uop optimizer to create a uop
 executor, it transfers control to this executor via the `TIER1_TO_TIER2` macro.

 CPython implements two executors. Here we describe the JIT interpreter,
 which is the simpler of them and is therefore useful for debugging and analyzing
 the uops generation and optimization stages. To run it, we configure the
 JIT to run on its interpreter (i.e., python is configured with
 [`--enable-experimental-jit=interpreter`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit)).

 When invoked, the executor jumps to the `tier2_dispatch:` label in
 [`Python/ceval.c`](../Python/ceval.c), where there is a loop that
 executes the micro-ops. The body of this loop is a switch statement over
 the uops IDs, resembling the one used in the adaptive interpreter.

 The switch implementing the uops is in [`Python/executor_cases.c.h`](../Python/executor_cases.c.h),
 which is generated by the build script
 [`Tools/cases_generator/tier2_generator.py`](../Tools/cases_generator/tier2_generator.py)
 from the bytecode definitions in
 [`Python/bytecodes.c`](../Python/bytecodes.c).

 When an `_EXIT_TRACE` or `_DEOPT` uop is reached, the uop interpreter exits
 and execution returns to the adaptive interpreter.

 ## Invalidating Executors

 In addition to being stored on the code object, each executor is also
 inserted into a list of all executors, which is stored in the interpreter
 state's `executor_list_head` field. This list is used when it is necessary
 to invalidate executors because values they used in their construction may
 have changed.

 ## The JIT

 When the full jit is enabled (python was configured with
 [`--enable-experimental-jit`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit),
 the uop executor's `jit_code` field is populated with a pointer to a compiled
 C function that implements the executor logic. This function's signature is
 defined by `jit_func` in [`pycore_jit.h`](../Include/internal/pycore_jit.h).
 When the executor is invoked by `ENTER_EXECUTOR`, instead of jumping to
 the uop interpreter at `tier2_dispatch`, the executor runs the function
 that `jit_code` points to. This function returns the instruction pointer
 of the next Tier 1 instruction that needs to execute.

 The generation of the jitted functions uses the copy-and-patch technique
 which is described in
 [Haoran Xu's article](https://sillycross.github.io/2023/05/12/2023-05-12/).
 At its core are statically generated `stencils` for the implementation
 of the micro ops, which are completed with runtime information while
 the jitted code is constructed for an executor by
 [`_PyJIT_Compile`](../Python/jit.c).

 The stencils are generated at build time under the Makefile target `regen-jit`
 by the scripts in [`/Tools/jit`](/Tools/jit). This script reads
 [`Python/executor_cases.c.h`](../Python/executor_cases.c.h) (which is
 generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). For
 each opcode, it constructs a `.c` file that contains a function for
 implementing this opcode, with some runtime information injected.
 This is done by replacing `CASE` by the bytecode definition in the
 template file [`Tools/jit/template.c`](../Tools/jit/template.c).

 Each of the `.c` files is compiled by LLVM, to produce an object file
 that contains a function that executes the opcode. These compiled
 functions are used to generate the file
 [`jit_stencils.h`](../jit_stencils.h), which contains the functions
 that the JIT can use to emit code for each of the bytecodes.

 For Python maintainers this means that changes to the bytecodes and
 their implementations do not require changes related to the stencils,
 because everything is automatically generated from
 [`Python/bytecodes.c`](../Python/bytecodes.c) at build time.

 See Also:

 * [Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode](https://arxiv.org/abs/2011.13127)

 * [PyCon 2024: Building a JIT compiler for CPython](https://www.youtube.com/watch?v=kMO3Ju0QCDo)
	# The JIT

	The [adaptive interpreter](interpreter.md) consists of a main loop that
	executes the bytecode instructions generated by the
	[bytecode compiler](compiler.md) and their
	[specializations](interpreter.md#Specialization). Runtime optimization in
	this interpreter can only be done for one instruction at a time. The JIT
	is based on a mechanism to replace an entire sequence of bytecode instructions,
	and this enables optimizations that span multiple instructions.

	Historically, the adaptive interpreter was referred to as `tier 1` and
	the JIT as `tier 2`. You will see remnants of this in the code.

	## The Optimizer and Executors

	The program begins running on the adaptive interpreter, until a `JUMP_BACKWARD`
	instruction determines that it is "hot" because the counter in its
	[inline cache](interpreter.md#inline-cache-entries) indicates that it
	executed more than some threshold number of times (see
	[`backoff_counter_triggers`](../Include/internal/pycore_backoff.h)).
	It then calls the function `_PyOptimizer_Optimize()` in
	[`Python/optimizer.c`](../Python/optimizer.c), passing it the current
	[frame](frames.md) and instruction pointer. `_PyOptimizer_Optimize()`
	constructs an object of type
	[`_PyExecutorObject`](../Include/internal/pycore_optimizer.h) which implements
	an optimized version of the instruction trace beginning at this jump.

	The optimizer determines where the trace ends, and the executor is set up
	to either return to the adaptive interpreter and resume execution, or
	transfer control to another executor (see `_PyExitData` in
	Include/internal/pycore_optimizer.h).

	The executor is stored on the [`code object`](code_objects.md) of the frame,
	in the `co_executors` field which is an array of executors. The start
	instruction of the trace (the `JUMP_BACKWARD`) is replaced by an
	`ENTER_EXECUTOR` instruction whose `oparg` is equal to the index of the
	executor in `co_executors`.

	## The micro-op optimizer

	The micro-op (abbreviated `uop` to approximate `μop`) optimizer is defined in
	[`Python/optimizer.c`](../Python/optimizer.c) as `_PyOptimizer_Optimize`.
	It translates an instruction trace into a sequence of micro-ops by replacing
	each bytecode by an equivalent sequence of micro-ops (see
	`_PyOpcode_macro_expansion` in
	[pycore_opcode_metadata.h](../Include/internal/pycore_opcode_metadata.h)
	which is generated from [`Python/bytecodes.c`](../Python/bytecodes.c)).
	The micro-op sequence is then optimized by
	`_Py_uop_analyze_and_optimize` in
	[`Python/optimizer_analysis.c`](../Python/optimizer_analysis.c)
	and an instance of `_PyUOpExecutor_Type` is created to contain it.

	## The JIT interpreter

	After a `JUMP_BACKWARD` instruction invokes the uop optimizer to create a uop
	executor, it transfers control to this executor via the `TIER1_TO_TIER2` macro.

	CPython implements two executors. Here we describe the JIT interpreter,
	which is the simpler of them and is therefore useful for debugging and analyzing
	the uops generation and optimization stages. To run it, we configure the
	JIT to run on its interpreter (i.e., python is configured with
	[`--enable-experimental-jit=interpreter`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit)).

	When invoked, the executor jumps to the `tier2_dispatch:` label in
	[`Python/ceval.c`](../Python/ceval.c), where there is a loop that
	executes the micro-ops. The body of this loop is a switch statement over
	the uops IDs, resembling the one used in the adaptive interpreter.

	The switch implementing the uops is in [`Python/executor_cases.c.h`](../Python/executor_cases.c.h),
	which is generated by the build script
	[`Tools/cases_generator/tier2_generator.py`](../Tools/cases_generator/tier2_generator.py)
	from the bytecode definitions in
	[`Python/bytecodes.c`](../Python/bytecodes.c).

	When an `_EXIT_TRACE` or `_DEOPT` uop is reached, the uop interpreter exits
	and execution returns to the adaptive interpreter.

	## Invalidating Executors

	In addition to being stored on the code object, each executor is also
	inserted into a list of all executors, which is stored in the interpreter
	state's `executor_list_head` field. This list is used when it is necessary
	to invalidate executors because values they used in their construction may
	have changed.

	## The JIT

	When the full jit is enabled (python was configured with
	[`--enable-experimental-jit`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit),
	the uop executor's `jit_code` field is populated with a pointer to a compiled
	C function that implements the executor logic. This function's signature is
	defined by `jit_func` in [`pycore_jit.h`](../Include/internal/pycore_jit.h).
	When the executor is invoked by `ENTER_EXECUTOR`, instead of jumping to
	the uop interpreter at `tier2_dispatch`, the executor runs the function
	that `jit_code` points to. This function returns the instruction pointer
	of the next Tier 1 instruction that needs to execute.

	The generation of the jitted functions uses the copy-and-patch technique
	which is described in
	[Haoran Xu's article](https://sillycross.github.io/2023/05/12/2023-05-12/).
	At its core are statically generated `stencils` for the implementation
	of the micro ops, which are completed with runtime information while
	the jitted code is constructed for an executor by
	[`_PyJIT_Compile`](../Python/jit.c).

	The stencils are generated at build time under the Makefile target `regen-jit`
	by the scripts in [`/Tools/jit`](/Tools/jit). This script reads
	[`Python/executor_cases.c.h`](../Python/executor_cases.c.h) (which is
	generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). For
	each opcode, it constructs a `.c` file that contains a function for
	implementing this opcode, with some runtime information injected.
	This is done by replacing `CASE` by the bytecode definition in the
	template file [`Tools/jit/template.c`](../Tools/jit/template.c).

	Each of the `.c` files is compiled by LLVM, to produce an object file
	that contains a function that executes the opcode. These compiled
	functions are used to generate the file
	[`jit_stencils.h`](../jit_stencils.h), which contains the functions
	that the JIT can use to emit code for each of the bytecodes.

	For Python maintainers this means that changes to the bytecodes and
	their implementations do not require changes related to the stencils,
	because everything is automatically generated from
	[`Python/bytecodes.c`](../Python/bytecodes.c) at build time.

	See Also:

	* [Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode](https://arxiv.org/abs/2011.13127)

	* [PyCon 2024: Building a JIT compiler for CPython](https://www.youtube.com/watch?v=kMO3Ju0QCDo)