| # The JIT |
| |
| The [adaptive interpreter](interpreter.md) consists of a main loop that |
| executes the bytecode instructions generated by the |
| [bytecode compiler](compiler.md) and their |
| [specializations](interpreter.md#Specialization). Runtime optimization in |
| this interpreter can only be done for one instruction at a time. The JIT |
| is based on a mechanism to replace an entire sequence of bytecode instructions, |
| and this enables optimizations that span multiple instructions. |
| |
| Historically, the adaptive interpreter was referred to as `tier 1` and |
| the JIT as `tier 2`. You will see remnants of this in the code. |
| |
| ## The Optimizer and Executors |
| |
| The program begins running on the adaptive interpreter, until a `JUMP_BACKWARD` |
| instruction determines that it is "hot" because the counter in its |
| [inline cache](interpreter.md#inline-cache-entries) indicates that it |
| executed more than some threshold number of times (see |
| [`backoff_counter_triggers`](../Include/internal/pycore_backoff.h)). |
| It then calls the function `_PyOptimizer_Optimize()` in |
| [`Python/optimizer.c`](../Python/optimizer.c), passing it the current |
| [frame](frames.md) and instruction pointer. `_PyOptimizer_Optimize()` |
| constructs an object of type |
| [`_PyExecutorObject`](../Include/internal/pycore_optimizer.h) which implements |
| an optimized version of the instruction trace beginning at this jump. |
| |
| The optimizer determines where the trace ends, and the executor is set up |
| to either return to the adaptive interpreter and resume execution, or |
| transfer control to another executor (see `_PyExitData` in |
| Include/internal/pycore_optimizer.h). |
| |
| The executor is stored on the [`code object`](code_objects.md) of the frame, |
| in the `co_executors` field which is an array of executors. The start |
| instruction of the trace (the `JUMP_BACKWARD`) is replaced by an |
| `ENTER_EXECUTOR` instruction whose `oparg` is equal to the index of the |
| executor in `co_executors`. |
| |
| ## The micro-op optimizer |
| |
| The micro-op (abbreviated `uop` to approximate `μop`) optimizer is defined in |
| [`Python/optimizer.c`](../Python/optimizer.c) as `_PyOptimizer_Optimize`. |
| It translates an instruction trace into a sequence of micro-ops by replacing |
| each bytecode by an equivalent sequence of micro-ops (see |
| `_PyOpcode_macro_expansion` in |
| [pycore_opcode_metadata.h](../Include/internal/pycore_opcode_metadata.h) |
| which is generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). |
| The micro-op sequence is then optimized by |
| `_Py_uop_analyze_and_optimize` in |
| [`Python/optimizer_analysis.c`](../Python/optimizer_analysis.c) |
| and an instance of `_PyUOpExecutor_Type` is created to contain it. |
| |
| ## The JIT interpreter |
| |
| After a `JUMP_BACKWARD` instruction invokes the uop optimizer to create a uop |
| executor, it transfers control to this executor via the `TIER1_TO_TIER2` macro. |
| |
| CPython implements two executors. Here we describe the JIT interpreter, |
| which is the simpler of them and is therefore useful for debugging and analyzing |
| the uops generation and optimization stages. To run it, we configure the |
| JIT to run on its interpreter (i.e., python is configured with |
| [`--enable-experimental-jit=interpreter`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit)). |
| |
| When invoked, the executor jumps to the `tier2_dispatch:` label in |
| [`Python/ceval.c`](../Python/ceval.c), where there is a loop that |
| executes the micro-ops. The body of this loop is a switch statement over |
| the uops IDs, resembling the one used in the adaptive interpreter. |
| |
| The switch implementing the uops is in [`Python/executor_cases.c.h`](../Python/executor_cases.c.h), |
| which is generated by the build script |
| [`Tools/cases_generator/tier2_generator.py`](../Tools/cases_generator/tier2_generator.py) |
| from the bytecode definitions in |
| [`Python/bytecodes.c`](../Python/bytecodes.c). |
| |
| When an `_EXIT_TRACE` or `_DEOPT` uop is reached, the uop interpreter exits |
| and execution returns to the adaptive interpreter. |
| |
| ## Invalidating Executors |
| |
| In addition to being stored on the code object, each executor is also |
| inserted into a list of all executors, which is stored in the interpreter |
| state's `executor_list_head` field. This list is used when it is necessary |
| to invalidate executors because values they used in their construction may |
| have changed. |
| |
| ## The JIT |
| |
| When the full jit is enabled (python was configured with |
| [`--enable-experimental-jit`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit), |
| the uop executor's `jit_code` field is populated with a pointer to a compiled |
| C function that implements the executor logic. This function's signature is |
| defined by `jit_func` in [`pycore_jit.h`](../Include/internal/pycore_jit.h). |
| When the executor is invoked by `ENTER_EXECUTOR`, instead of jumping to |
| the uop interpreter at `tier2_dispatch`, the executor runs the function |
| that `jit_code` points to. This function returns the instruction pointer |
| of the next Tier 1 instruction that needs to execute. |
| |
| The generation of the jitted functions uses the copy-and-patch technique |
| which is described in |
| [Haoran Xu's article](https://sillycross.github.io/2023/05/12/2023-05-12/). |
| At its core are statically generated `stencils` for the implementation |
| of the micro ops, which are completed with runtime information while |
| the jitted code is constructed for an executor by |
| [`_PyJIT_Compile`](../Python/jit.c). |
| |
| The stencils are generated at build time under the Makefile target `regen-jit` |
| by the scripts in [`/Tools/jit`](/Tools/jit). This script reads |
| [`Python/executor_cases.c.h`](../Python/executor_cases.c.h) (which is |
| generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). For |
| each opcode, it constructs a `.c` file that contains a function for |
| implementing this opcode, with some runtime information injected. |
| This is done by replacing `CASE` by the bytecode definition in the |
| template file [`Tools/jit/template.c`](../Tools/jit/template.c). |
| |
| Each of the `.c` files is compiled by LLVM, to produce an object file |
| that contains a function that executes the opcode. These compiled |
| functions are used to generate the file |
| [`jit_stencils.h`](../jit_stencils.h), which contains the functions |
| that the JIT can use to emit code for each of the bytecodes. |
| |
| For Python maintainers this means that changes to the bytecodes and |
| their implementations do not require changes related to the stencils, |
| because everything is automatically generated from |
| [`Python/bytecodes.c`](../Python/bytecodes.c) at build time. |
| |
| See Also: |
| |
| * [Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode](https://arxiv.org/abs/2011.13127) |
| |
| * [PyCon 2024: Building a JIT compiler for CPython](https://www.youtube.com/watch?v=kMO3Ju0QCDo) |