This document is intended as an introduction to writing CodeStubAssembler builtins, and is targeted towards V8 developers.
In V8, builtins can be seen as chunks of code that are executable by the VM at runtime. A common use case is to implement the functions of builtin objects (such as RegExp or Promise), but builtins can also be used to provide other internal functionality (e.g. as part of the IC system).
V8’s builtins can be implemented using a number of different methods (each with different trade-offs):
The remaining document will focus on the latter and give a brief tutorial for developing a simple CodeStubAssembler (CSA) builtin exposed to JavaScript.
V8’s CodeStubAssembler is a custom, platform-agnostic assembler that provides low-level primitives as a thin abstraction over assembly, but also offers an extensive library of higher-level functionality.
// Low-level: // Loads the pointer-sized data at addr into value. Node* addr = /* ... */; Node* value = Load(MachineType::IntPtr(), addr); // And high-level: // Performs the JS operation ToString(object). // ToString semantics are specified at https://tc39.github.io/ecma262/#sec-tostring. Node* object = /* ... */; Node* string = ToString(context, object);
CSA builtins run through part of the TurboFan compilation pipeline (including block scheduling and register allocation, but notably not through optimization passes) which then emits the final executable code.
In this section, we will write a simple CSA builtin that takes a single argument, and returns whether it represents the number 42
. The builtin will be exposed to JS by installing it on the Math
object (because we can).
This example demonstrates:
Math
object.In case you’d like to follow along locally, the following code is based off revision 7a8d20a7.
Builtins are declared in the BUILTIN_LIST_BASE
macro in src/builtins/builtins-definitions.h
. To create a new CSA builtin with JS linkage and one parameter named X
’:
#define BUILTIN_LIST_BASE(CPP, API, TFJ, TFC, TFS, TFH, ASM, DBG) \ // [... snip ...] TFJ(MathIs42, 1, kX) \ // [... snip ...]
Note that BUILTIN_LIST_BASE
takes several different macros that denote different builtin kinds (see inline documentation for more details). CSA builtins specifically are split into:
Builtin definitions are located in src/builtins/builtins-*-gen.cc
files, roughly organized by topic. Since we will be writing a Math
builtin, we’ll put our definition into src/builtins/builtins-math-gen.cc.
// TF_BUILTIN is a convenience macro that creates a new subclass of the given // assembler behind the scenes. TF_BUILTIN(MathIs42, MathBuiltinsAssembler) { // Load the current function context (an implicit argument for every stub) // and the X argument. Note that we can refer to parameters by the names // defined in the builtin declaration. Node* const context = Parameter(Descriptor::kContext); Node* const x = Parameter(Descriptor::kX); // At this point, x can be basically anything - a Smi, a HeapNumber, // undefined, or any other arbitrary JS object. Let’s call the ToNumber // builtin to convert x to a number we can use. // CallBuiltin can be used to conveniently call any CSA builtin. Node* const number = CallBuiltin(Builtins::kToNumber, context, x); // Create a CSA variable to store the resulting value. The type of the // variable is kTagged since we will only be storing tagged pointers in it. VARIABLE(var_result, MachineRepresentation::kTagged); // We need to define a couple of labels which will be used as jump targets. Label if_issmi(this), if_isheapnumber(this), out(this); // ToNumber always returns a number. We need to distinguish between Smis // and heap numbers - here, we check whether number is a Smi and conditionally // jump to the corresponding labels. Branch(TaggedIsSmi(number), &if_issmi, &if_isheapnumber); // Binding a label begins generating code for it. BIND(&if_issmi); { // SelectBooleanConstant returns the JS true/false values depending on // whether the passed condition is true/false. The result is bound to our // var_result variable, and we then unconditionally jump to the out label. var_result.Bind(SelectBooleanConstant(SmiEqual(number, SmiConstant(42)))); Goto(&out); } BIND(&if_isheapnumber); { // ToNumber can only return either a Smi or a heap number. Just to make sure // we add an assertion here that verifies number is actually a heap number. CSA_ASSERT(this, IsHeapNumber(number)); // Heap numbers wrap a floating point value. We need to explicitly extract // this value, perform a floating point comparison, and again bind // var_result based on the outcome. Node* const value = LoadHeapNumberValue(number); Node* const is_42 = Float64Equal(value, Float64Constant(42)); var_result.Bind(SelectBooleanConstant(is_42)); Goto(&out); } BIND(&out); { Node* const result = var_result.value(); CSA_ASSERT(this, IsBoolean(result)); Return(result); } }
Builtin objects such as Math
are set up mostly in src/bootstrapper.cc (with some setup occurring in .js
files). Attaching our new builtin is simple:
// Existing code to set up Math, included here for clarity. Handle<JSObject> math = factory->NewJSObject(cons, TENURED); JSObject::AddProperty(global, name, math, DONT_ENUM); // [... snip ...] SimpleInstallFunction(math, "is42", Builtins::kMathIs42, 1, true);
Now that Is42 is attached, it can be called from JS:
$ out/debug/d8 d8> Math.is42(42) true d8> Math.is42("42.0") true d8> Math.is42(true) false d8> Math.is42({ valueOf: () => 42 }) true
CSA builtins can also be created with stub linkage (instead of JS linkage as we used above in MathIs42
). Such builtins can be useful to extract commonly-used code into a separate code object that can be used by multiple callers, while the code is only produced once. Let’s extract the code that handles heap numbers into a separate builtin called MathIsHeapNumber42
, and call it from MathIs42
.
Defining and using TFS stubs is easy; declaration are again placed in src/builtins/builtins-definitions.h
:
#define BUILTIN_LIST_BASE(CPP, API, TFJ, TFC, TFS, TFH, ASM, DBG) \ // [... snip ...] TFS(MathIsHeapNumber42, kX) \ TFJ(MathIs42, 1, kX) \ // [... snip ...]
Note that currently, order within BUILTIN_LIST_BASE
does matter. Since MathIs42
calls MathIsHeapNumber42
, the former needs to be listed after the latter (this requirement should be lifted at some point).
The definition is also straightforward. In src/builtins/builtins-math-gen.cc
:
// Defining a TFS builtin works exactly the same way as TFJ builtins. TF_BUILTIN(MathIsHeapNumber42, MathBuiltinsAssembler) { Node* const x = Parameter(Descriptor::kX); CSA_ASSERT(this, IsHeapNumber(x)); Node* const value = LoadHeapNumberValue(x); Node* const is_42 = Float64Equal(value, Float64Constant(42)); Return(SelectBooleanConstant(is_42)); }
Finally, let’s call our new builtin from MathIs42
:
TF_BUILTIN(MathIs42, MathBuiltinsAssembler) { // [... snip ...] BIND(&if_isheapnumber); { // Instead of handling heap numbers inline, we now call into our new TFS stub. var_result.Bind(CallBuiltin(Builtins::kMathIsHeapNumber42, context, number)); Goto(&out); } // [... snip ...] }
Why should you care about TFS builtins at all? Why not leave the code inline (or extracted into a helper method for better readability)?
An important reason is code space: builtins are generated at compile-time and included in the V8 snapshot, thus unconditionally taking up (significant) space in every created isolate. Extracting large chunks of commonly used code to TFS builtins can quickly lead to space savings in the 10s to 100s of KBs.