| # Blink-V8 bindings generator (bind_gen package) |
| |
| [TOC] |
| |
| ## What's bind_gen? |
| |
| Python package |
| [`bind_gen`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/) |
| is the core part of Blink-V8 bindings code generator. |
| [`generate_bindings.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/generate_bindings.py) |
| is the driver script, which takes a Web IDL database (`web_idl_database.pickle` |
| generated by |
| [`web_idl_database`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/BUILD.gn?q=content:%5C%22web_idl_database%5C%22&ss=chromium) |
| GN target) as an input and produces a set of C++ source files of Blink-V8 |
| bindings (v8_\*.h, v8_\*.cc). |
| |
| ## Design and code structure |
| |
| The bindings code generator is implemented as a tree builder of `CodeNode` |
| which is a fundamental building block. The following sub sections describe |
| what `CodeNode` is and how the code generator builds a tree of `CodeNode`. |
| |
| ### [`CodeNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ECodeNode$&ss=chromium) |
| |
| The code generator produces C++ source files (text files) but the content of |
| each file is not represented as a single giant string nor a list of strings. |
| The content of each file is represented as a CodeNode tree. |
| |
| `CodeNode` is a fundamental building block that represents a text fragment in |
| the tree structure. A text file is represented as a tree of CodeNodes, each of |
| which represents a corresponding text fragment. The code generator is the |
| CodeNode tree builder. |
| |
| Here is a simple example to build a CodeNode tree. |
| ```python |
| # SequenceNode and TextNode are subclasses of CodeNode. |
| |
| def make_prologue(): |
| return SequenceNode([ |
| TextNode("// Prologue"), |
| TextNode("SetUp();"), |
| ]) |
| |
| def make_epilogue(): |
| return SequenceNode([ |
| TextNode("// Epilogue"), |
| TextNode("CleanUp();"), |
| ]) |
| |
| def main(): |
| root_node = SequenceNode([ |
| make_prologue(), |
| TextNode("LOG(INFO) << \"hello, world\";"), |
| make_epilogue(), |
| ]) |
| ``` |
| The `root_node` above represents the following text. |
| |
| ```c++ |
| // Prologue |
| SetUp(); |
| LOG(INFO) << "hello, world"; |
| // Epilogue |
| CleanUp(); |
| ``` |
| |
| The basic features of CodeNode are implemented in |
| [code_node.py](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py). |
| Just for convenience, CodeNode subclasses corresponding to C++ constructs are |
| provided in |
| [code_node_cxx.py](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node_cxx.py). |
| |
| `CodeNode` has an object-oriented design and has internal states (not only the |
| parent / child nodes but also more states to support advanced features). |
| |
| ### CodeNode tree builders |
| |
| The bindings code generator consists of multiple sub code generators. For |
| example, `interface.py` is a sub code generator of Web IDL interface and |
| `enumeration.py` is a sub code generator of Web IDL enumeration. Each Web IDL |
| definition has its own sub code generator. |
| |
| This sub section describes how a sub code generator builds a CodeNode tree and |
| produces C++ source files by looking at |
| [`enumeration.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py) |
| as an example. The example code snippet below is simplified for explanation. |
| |
| ```python |
| def generate_enumerations(task_queue): |
| for enumeration in web_idl_database.enumerations: |
| task_queue.post_task(generate_enumeration, enumeration.identifier) |
| ``` |
| |
| [`generate_enumerations`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Egenerate_enumerations$&ss=chromium) |
| is the entry point to this sub code generator. In favor of parallel processing, |
| `task_queue` is used. `generate_enumeration` (singular form) actually produces |
| a pair of C++ source files (\*.h and \*.cc). |
| |
| ```python |
| def generate_enumeration(enumeration_identifier): |
| # Filepaths |
| header_path = path_manager.api_path(ext="h") |
| source_path = path_manager.api_path(ext="cc") |
| |
| # Root nodes |
| header_node = ListNode(tail="\n") |
| source_node = ListNode(tail="\n") |
| |
| # ... fill the contents of `header_node` and `source_node` ... |
| |
| # Write down to the files. |
| write_code_node_to_file(header_node, path_manager.gen_path_to(header_path)) |
| write_code_node_to_file(source_node, path_manager.gen_path_to(source_path)) |
| ``` |
| |
| The main task of |
| [`generate_enumeration`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Egenerate_enumeration$&ss=chromium) |
| is to build CodeNode trees and write them down to files. A key point here |
| is to build two trees in parallel; |
| one for \*.h and the other for \*.cc. We can add a function declaration to the |
| header file while adding the corresponding function definition to the source |
| file. The following code snippet is an example to add constructors into the |
| header file and the source file. |
| |
| ```python |
| # Namespaces |
| header_blink_ns = CxxNamespaceNode(name_style.namespace("blink")) |
| source_blink_ns = CxxNamespaceNode(name_style.namespace("blink")) |
| # {header,source}_blink_ns are added to {header,source}_node (the root |
| # nodes) respectively. |
| |
| # Class definition |
| class_def = CxxClassDefNode(cg_context.class_name, |
| base_class_names=["bindings::EnumerationBase"], |
| final=True, |
| export=component_export( |
| api_component, for_testing)) |
| |
| ctor_decls, ctor_defs = make_constructors(cg_context) |
| |
| # Define the class in 'blink' namespace. |
| header_blink_ns.body.append(class_def) |
| |
| # Add constructors to public: section of the class. |
| class_def.public_section.append(ctor_decls) |
| # Add constructors (function definitions) into 'blink' namespace in the |
| # source file. |
| source_blink_ns.body.append(ctor_defs) |
| ``` |
| |
| In the above code snippet, |
| [`make_constructors`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Emake_constructors$&ss=chromium) |
| creates and returns a CodeNode tree for the header file and another CodeNode |
| tree for the source file. For most cases, functions named `make_xxx` creates |
| and returns a pair of the CodeNode trees. These functions are subtree builders |
| of the CodeNode trees. |
| |
| These subtree builders are implemented in a way of functional programming |
| (unlike CodeNodes themselves are implemented in a way of object-oriented |
| programming). These subtree builders create a pair of new CodeNode trees at |
| every function call (returned code node instances are different per call, so |
| their internal states are separate), but the contents are 100% determined |
| solely by the input arguments. This property is very important when we use |
| closures in advanced use cases. |
| |
| So far, the typical code structure of the sub code generators is covered. |
| `enumeration.py` consists of several `make_xxx` functions (subtree builders) + |
| `generate_enumeration` (the top-level tree builder + file writer). |
| |
| ### Advanced: Two-step code generation and declarative style |
| |
| #### Typical problems of (simple) code generation |
| |
| Bindings code generation has the following typical problems. Suppose we have |
| the following simple code generator. |
| ```python |
| # Example of simple code generation |
| |
| def make_foo(): |
| return SequenceNode([ |
| TextNode("HeavyResource* res = HeavyFunc();"), |
| TextNode("Foo(res);"), |
| ]) |
| |
| def make_bar(): |
| return SequenceNode([ |
| TextNode("HeavyResource* res = HeavyFunc();"), |
| TextNode("Bar(res);"), |
| ]) |
| |
| def main(): |
| root_node = SequenceNode([ |
| make_foo(), |
| make_bar(), |
| ]) |
| ``` |
| This produces the following C++ code, where we have two major problems. The |
| first problem is a symbol conflict: `res` is defined twice. Even if we gave |
| different names like `res1` and `res2`, we have the second problem: the |
| produced code calls `HeavyFunc` twice, which is not efficient. |
| ```c++ |
| // Output of simple code generation example |
| HeavyResource* res = HeavyFunc(); |
| Foo(res); |
| HeavyResource* res = HeavyFunc(); |
| Bar(res); |
| ``` |
| Ideally we'd like to have the following code, without introducing tight coupling |
| between `make_foo` and `make_bar`. |
| ```c++ |
| // Ideal generated code |
| HeavyResource* res = HeavyFunc(); |
| Foo(res); |
| Bar(res); |
| ``` |
| |
| #### Two-step code generation as a solution |
| |
| In order to resolve the above problems, the bindings code generator supports |
| two-step code generation. This way may look like declarative programming. |
| ```python |
| # Example of two-step code generation |
| |
| def bind_vars(code_node): |
| local_vars = [ |
| SymbolNode("heavy_resource", |
| "HeavyResource* ${heavy_resource} = HeavyFunc(${address}, ${phone_number});"), |
| SymbolNode("address", |
| "String ${address} = GetAddress();"), |
| SymbolNode("phone", |
| "String ${phone_number} = GetPhoneNumber();"), |
| ] |
| for symbol_node in local_vars: |
| code_node.register_code_symbol(symbol_node) |
| |
| def make_foo(): |
| return SequenceNode([ |
| TextNode("Foo(${heavy_resource});"), |
| ]) |
| |
| def make_bar(): |
| return SequenceNode([ |
| TextNode("Bar(${heavy_resource});"), |
| ]) |
| |
| def main(): |
| root_node = SymbolScopeNode() |
| bind_vars(root_node) |
| root_node.extend([ |
| make_foo(), |
| make_bar(), |
| ]) |
| ``` |
| The above code generator has two kinds of code generation. One kind is |
| `make_foo` and `make_bar`, which are almost the same as before except for use |
| of a template variable (`${heavy_resource}`). The other kind is `bind_vars`, |
| which provides a catalogue of symbol definitions. We can make the definitions |
| of `make_foo` and `make_bar` simple with using the catalogue of symbol |
| definitions. This code generator produces the following C++ code |
| without producing duplicated function calls. |
| ```c++ |
| // Output of two-step code generation example |
| String address = GetAddress(); |
| String phone_number = GetPhoneNumber(); |
| HeavyResource* heavy_resource = HeavyFunc(address, phone_number); |
| Foo(heavy_resource); |
| Bar(heavy_resource); |
| ``` |
| The mechanism of two-step code generation is simple. |
| SymbolNode(name, definition) consists of a symbol name and code fragment that |
| defines the symbol. When a symbol name is referenced as `${symbol_name}`, it's |
| simply replaced with `symbol_name`, plus it triggers insertion of the symbol |
| definition into a surrounding `SequenceNode`. This step happens recursively. |
| So not only `heavy_resource`'s definition but also `address` and |
| `phone_number`'s definitions are inserted, too. |
| |
| With the two-step code generation, it's possible (and expected) to write code |
| generators in the declarative programming style, which works better in general |
| than the imperative programming style. |
| |
| #### Important subclasses of CodeNode for two-step code generation |
| |
| - [`SymbolNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolNode$&ss=chromium) |
| |
| SymbolNode consists of a symbol name and its definition. You can reference a |
| symbol as `${symbol_name}` in TextNode and FormatNode. It's okay that you |
| never reference a symbol. The symbol definition will be automatically inserted |
| only when you reference the symbol. |
| |
| For simple use cases, a SymbolNode can be constructed from a pair of a symbol |
| name and a plain text (which can contain references in the form of `${...}`) as |
| the definition. |
| ```python |
| # Example of simple use cases |
| addr_symbol = SymbolNode("address", |
| "void* ${address} = ${base} + ${offset};") |
| ``` |
| For more complicated use cases, SymbolNode's definition can be a callable that |
| returns a SymbolDefinitionNode instead. This is useful when the definition has |
| a complex structure of code node tree, since a plain text definition cannot |
| represent a code node tree structure. |
| |
| ```python |
| # Example of complicated use cases |
| def create_address(symbol_node): |
| node = SymbolDefinitionNode(symbol_node) |
| node.extend([ |
| TextNode("void* ${address} = ${base} + ${offset};"), |
| CxxUnlikelyIfNode( |
| cond="!${address}", |
| attribute=None, |
| body=[ |
| TextNode("${exception_state}.ThrowRangeError(\"...\");"), |
| TextNode("return;"), |
| ]), |
| ]) |
| return node |
| |
| addr_symbol = SymbolNode("address", |
| definition_constructor=create_address) |
| ``` |
| where CxxUnlikelyIfNode represents a C++ if statement with an unlikely condition |
| (defined in code_node_cxx.py). This definition is better than a plain text |
| definition because it inserts the definition of ${exception_state} at the best |
| position depending on how much likely ${exception_state} is actually used. |
| ```c++ |
| // Output of the example of complicated use cases |
| void* base = ...; // ${base}'s definition is automatically inserted. |
| void* offset = ...; // ${offset}'s definition is automatically inserted. |
| // ${exception_state}'s definition may be inserted here if it's used often or |
| // outside of the following if statement. |
| // ExceptionState exception_state(...); |
| void* address = base + offset; |
| if (!address) { |
| // ${exception_state}'s definition may be inserted here if it's not used often |
| // or outside of this if statement. |
| ExceptionState exception_state(...); |
| exception_state.ThrowRangeError("..."); |
| return; |
| } |
| ``` |
| |
| - [`SymbolDefinitionNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolDefinitionNode$&ss=chromium) |
| |
| SymbolDefinitionNode represents the code fragment that defines a symbol. |
| The code generator automatically inserts symbol definitions at the best |
| positions heuristically. |
| However it's hard to determine the best position in one path calculation, so |
| the code generator iterates symbol definition insertions/relocations until it |
| finds the heuristically best positions. |
| SymbolDefinitionNode is used to identify a subtree |
| of code nodes that defines its symbol (i.e. used to distinguish automatically |
| inserted code nodes from the original code node tree). |
| |
| - [`SequenceNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESequenceNode$&ss=chromium) |
| |
| SequenceNode represents not only a list of CodeNodes but also insertion points |
| of SymbolDefinitionNode. SymbolDefinitionNodes will be inserted between |
| elements within a SequenceNode. |
| |
| - [`ListNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5EListNode$&ss=chromium) |
| |
| Compared to SequenceNode, ListNode represents just a list of CodeNodes that does |
| not support automatic insertion of symbol definitions, i.e. ListNode is |
| indivisible. SequenceNode should be used when your code nodes represent a |
| series of C++ statements, otherwise ListNode is preferred over SequenceNode so |
| that nothing will be inserted in between. See the following example. |
| ```python |
| # Example of SequenceNode vs ListNode |
| int_array = ListNode([ |
| TextNode("int int_array[] = {"), |
| ListNode([ |
| TextNode("${foo}"), |
| TextNode("${bar}"), |
| ], separator=","), |
| TextNode("};"), |
| ]) |
| |
| node = SequenceNode([ |
| int_array, |
| TextNode("PrintIntArray(int_array);"), |
| ]) |
| ``` |
| This example produces the following C++ code. Since symbol definitions are |
| inserted only between elements of SequenceNode, ${foo} and ${bar}'s definitions |
| won't be inserted within `int_array`'s definition. |
| ```c++ |
| // Output of SequenceNode vs ListNode example |
| int foo = ...; // ${foo}'s definition is automatically inserted here. |
| int bar = ...; // ${bar}'s definition is automatically inserted here. |
| int array[] = { |
| // ${foo}'s definition is _not_ inserted here. |
| foo, |
| // ${bar}'s definition is _not_ inserted here. |
| bar |
| }; |
| PrintIntArray(int_array); |
| ``` |
| |
| - [`SymbolScopeNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolScopeNode$&ss=chromium) |
| |
| You can register SymbolNodes only into a SymbolScopeNode. Registered symbols |
| are effective only inside the SymbolScopeNode. This behavior reflects that |
| C++ variables are effective only inside the closest containing C++ block |
| (`{...}`). |
| |
| ## Tips for debugging and code reading |
| |
| The driver script |
| [`generate_bindings.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/generate_bindings.py) |
| supports two useful command line flags: |
| `--format_generated_files` and `--enable_code_generation_tracing`. |
| |
| `--format_generated_files` runs clang-format for the generated files so that |
| they are easy for developers to read. |
| |
| `--enable_code_generation_tracing` outputs code comments (e.g. |
| `/* make_wrapper_type_info:6304 */` in addition to the regular output in order |
| to clarify which line of the code generator code generated which line of |
| generated code. |
| This is useful to understand the correspondence between the code generator and |
| generated code. |
| |
| When the tracing comments show functions which are too common and uninteresting |
| to you (e.g. `make_blink_to_v8_value`), you can exclude such functions |
| module-by-module basis by using |
| [`CodeGenTracing.add_modules_to_be_ignored`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/package_initializer.py?q=CodeGenTracing%5C.add_modules_to_be_ignored&ss=chromium). |
| |
| Here is an example command line to run the script with the options |
| (working fine as of 2024 May). |
| ```shell |
| # Run generate_bindings.py with --format_generated_files and |
| # --enable_code_generation_tracing. |
| # |
| # web_idl_database.pickle must have already been generated and updated. |
| # Or, run 'autoninja -C out/Default web_idl_database' in advance. |
| |
| $ cd out/Default |
| $ python3 ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py \ |
| async_iterator callback_function callback_interface dictionary enumeration interface namespace observable_array sync_iterator typedef union \ |
| --web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle \ |
| --root_src_dir=../.. \ |
| --root_gen_dir=gen \ |
| --output_reldir=core=third_party/blink/renderer/bindings/core/v8/ \ |
| --output_reldir=modules=third_party/blink/renderer/bindings/modules/v8/ \ |
| --output_reldir=extensions_chromeos=third_party/blink/renderer/bindings/extensions_chromeos/v8/ \ |
| --format_generated_files \ |
| --enable_code_generation_tracing |
| ``` |