Each Python object starts with two fields:
which the form the header common to all Python objects, for all versions, and hold the reference count and class of the object, respectively.
Since the introduction of the cycle GC, there has also been a pre-header. Before 3.11, this pre-header was two words in size. It should be considered opaque to all code except the cycle GC.
In 3.11 the pre-header was extended to include pointers to the VM managed
__dict__. The reason for moving the
__dict__ to the pre-header is that it allows faster access, as it is at a fixed offset, and it also allows object's dictionaries to be lazily created when the
__dict__ attribute is specifically asked for.
In the 3.11 the non-GC part of the pre-header consists of two pointers:
The values pointer refers to the
PyDictValues array which holds the values of the objects's attributes. Should the dictionary be needed, then
values is set to
NULL and the
dict field points to the dictionary.
In 3.12 the the pointer to the list of weak references is added to the pre-header. In order to make space for it, the
values pointers are combined into a single tagged pointer:
If the object has no physical dictionary, then the
dict_or_values has its low bit set to one, and points to the values array. If the object has a physical dictioanry, then the
dict_or_values has its low bit set to zero, and points to the dictionary.
The untagged form is chosen for the dictionary pointer, rather than the values pointer, to enable the (legacy) C-API function
_PyObject_GetDictPtr(PyObject *obj) to work.
For a “normal” Python object, that is one that doesn't inherit from a builtin class or have slots, the header and pre-header form the entire object.
There are several advantages to this layout:
__dict__s, as described above.
The full layout object, with an opaque part defined by a C extension, and
__slots__ looks like this: