llvm/docs/ORCv2DesignAndImplementation.rst - external/github.com/llvm/llvm-project - Git at Google

 ===============================
 ORC Design and Implementation
 ===============================

 Introduction
 ============

 This document aims to provide a high-level overview of the design and
 implementation of the ORC JIT APIs. Except where otherwise stated, all
 discussion applies to the design of the APIs as of LLVM verison 9 (ORCv2).

 .. contents::
    :local:

 Use-cases
 =========

 ORC provides a modular API for building JIT compilers. There are a range
 of use cases for such an API:

 1. The LLVM tutorials use a simple ORC-based JIT class to execute expressions
 compiled from a toy languge: Kaleidoscope.

 2. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression
 evaluation. In this use case, cross compilation allows expressions compiled
 in the debugger process to be executed on the debug target process, which may
 be on a different device/architecture.

 3. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's
 optimizations within an existing JIT infrastructure.

 4. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter.

 By adoping a modular, library-based design we aim to make ORC useful in as many
 of these contexts as possible.

 Features
 ========

 ORC provides the following features:

 - *JIT-linking* links relocatable object files (COFF, ELF, MachO)[1]_ into a
   target process an runtime. The target process may be the same process that
   contains the JIT session object and jit-linker, or may be another process
   (even one running on a different machine or architecture) that communicates
   with the JIT via RPC.

 - *LLVM IR compilation*, which is provided by off the shelf components
   (IRCompileLayer, SimpleCompiler, ConcurrentIRCompiler) that make it easy to
   add LLVM IR to a JIT'd process.

 - *Eager and lazy compilation*. By default, ORC will compile symbols as soon as
   they are looked up in the JIT session object (``ExecutionSession``). Compiling
   eagerly by default makes it easy to use ORC as a simple in-memory compiler for
   an existing JIT. ORC also provides a simple mechanism, lazy-reexports, for
   deferring compilation until first call.

 - *Support for custom compilers and program representations*. Clients can supply
    custom compilers for each symbol that they define in their JIT session. ORC
    will run the user-supplied compiler when the a definition of a symbol is
    needed. ORC is actually fully language agnostic: LLVM IR is not treated
    specially, and is supported via the same wrapper mechanism (the
    ``MaterializationUnit`` class) that is used for custom compilers.

 - *Concurrent JIT'd code* and *concurrent compilation*. JIT'd code may spawn
   multiple threads, and may re-enter the JIT (e.g. for lazy compilation)
   concurrently from multiple threads. The ORC APIs also support running multiple
   compilers concurrently, and provides off-the-shelf infrastructure to track
   dependencies on running compiles (e.g. to ensure that we never call into code
   until it is safe to do so, even if that involves waiting on multiple
   compiles).

 - *Orthogonality* and *composability*: Each of the features above can be used (or
   not) independently. It is possible to put ORC components together to make a
   non-lazy, in-process, single threaded JIT or a lazy, out-of-process,
   concurrent JIT, or anything in between.

 LLJIT and LLLazyJIT
 ===================

 ORC provides two basic JIT classes off-the-shelf. These are useful both as
 examples of how to assemble ORC components to make a JIT, and as replacements
 for earlier LLVM JIT APIs (e.g. MCJIT).

 The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support
 compilation of LLVM IR and linking of relocatable object files. All operations
 are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled
 as soon as you attempt to look up its address). LLJIT is a suitable replacement
 for MCJIT in most cases (note: some more advanced features, e.g.
 JITEventListeners are not supported yet).

 The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy
 compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule
 method, function bodies in that module will not be compiled until they are first
 called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT)
 JIT API.

 LLJIT and LLLazyJIT instances can be created using their respective builder
 classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a
 module ``M`` loaded on an ThreadSafeContext ``Ctx``:

 .. code-block:: c++

   // Try to detect the host arch and construct an LLJIT instance.
   auto JIT = LLJITBuilder().create();

   // If we could not construct an instance, return an error.
   if (!JIT)
     return JIT.takeError();

   // Add the module.
   if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx)))
     return Err;

   // Look up the JIT'd code entry point.
   auto EntrySym = JIT->lookup("entry");
   if (!EntrySym)
     return EntrySym.takeError();

   auto *Entry = (void(*)())EntrySym.getAddress();

   Entry();

 The builder clasess provide a number of configuration options that can be
 specified before the JIT instance is constructed. For example:

 .. code-block:: c++

   // Build an LLLazyJIT instance that uses four worker threads for compilation,
   // and jumps to a specific error handler (rather than null) on lazy compile
   // failures.

   void handleLazyCompileFailure() {
     // JIT'd code will jump here if lazy compilation fails, giving us an
     // opportunity to exit or throw an exception into JIT'd code.
     throw JITFailed();
   }

   auto JIT = LLLazyJITBuilder()
                .setNumCompileThreads(4)
                .setLazyCompileFailureAddr(
                    toJITTargetAddress(&handleLazyCompileFailure))
                .create();

   // ...

 Design Overview
 ===============

 ORC's JIT'd program model aims to emulate the linking and symbol resolution
 rules used by the static and dynamic linkers. This allows ORC to JIT
 arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g.
 clang) that uses constructs like symbol linkage and visibility, and weak and
 common symbol definitions.

 To see how this works, imagine a program ``foo`` which links against a pair
 of dynamic libraries: ``libA`` and ``libB``. On the command line, building this
 system might look like:

 .. code-block:: bash

   $ clang++ -shared -o libA.dylib a1.cpp a2.cpp
   $ clang++ -shared -o libB.dylib b1.cpp b2.cpp
   $ clang++ -o myapp myapp.cpp -L. -lA -lB
   $ ./myapp

 In ORC, this would translate into API calls on a "CXXCompilingLayer" (with error
 checking omitted for brevity) as:

 .. code-block:: c++

   ExecutionSession ES;
   RTDyldObjectLinkingLayer ObjLinkingLayer(
       ES, []() { return llvm::make_unique<SectionMemoryManager>(); });
   CXXCompileLayer CXXLayer(ES, ObjLinkingLayer);

   // Create JITDylib "A" and add code to it using the CXX layer.
   auto &LibA = ES.createJITDylib("A");
   CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp"));
   CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp"));

   // Create JITDylib "B" and add code to it using the CXX layer.
   auto &LibB = ES.createJITDylib("B");
   CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp"));
   CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp"));

   // Specify the search order for the main JITDylib. This is equivalent to a
   // "links against" relationship in a command-line link.
   ES.getMainJITDylib().setSearchOrder({{&LibA, false}, {&LibB, false}});
   CXXLayer.add(ES.getMainJITDylib(), MemoryBuffer::getFile("main.cpp"));

   // Look up the JIT'd main, cast it to a function pointer, then call it.
   auto MainSym = ExitOnErr(ES.lookup({&ES.getMainJITDylib()}, "main"));
   auto *Main = (int(*)(int, char*[]))MainSym.getAddress();

   int Result = Main(...);


 This example tells us nothing about *how* or *when* compilation will happen.
 That will depend on the implementation of the hypothetical CXXCompilingLayer,
 but the linking rules will be the same regardless. For example, if a1.cpp and
 a2.cpp both define a function "foo" the API should generate a duplicate
 definition error. On the other hand, if a1.cpp and b1.cpp both define "foo"
 there is no error (different dynamic libraries may define the same symbol). If
 main.cpp refers to "foo", it should bind to the definition in LibA rather than
 the one in LibB, since main.cpp is part of the "main" dylib, and the main dylib
 links against LibA before LibB.

 Many JIT clients will have no need for this strict adherence to the usual
 ahead-of-time linking rules and should be able to get by just fine by putting
 all of their code in a single JITDylib. However, clients who want to JIT code
 for languages/projects that traditionally rely on ahead-of-time linking (e.g.
 C++) will find that this feature makes life much easier.

 Symbol lookup in ORC serves two other important functions, beyond basic lookup:
 (1) It triggers compilation of the symbol(s) searched for, and (2) it provides
 the synchronization mechanism for concurrent compilation. The pseudo-code for
 the lookup process is:

 .. code-block:: none

   construct a query object from a query set and query handler
   lock the session
   lodge query against requested symbols, collect required materializers (if any)
   unlock the session
   dispatch materializers (if any)

 In this context a materializer is something that provides a working definition
 of a symbol upon request. Generally materializers wrap compilers, but they may
 also wrap a linker directly (if the program representation backing the
 definitions is an object file), or even just a class that writes bits directly
 into memory (if the definitions are stubs). Materialization is the blanket term
 for any actions (compiling, linking, splatting bits, registering with runtimes,
 etc.) that is requried to generate a symbol definition that is safe to call or
 access.

 As each materializer completes its work it notifies the JITDylib, which in turn
 notifies any query objects that are waiting on the newly materialized
 definitions. Each query object maintains a count of the number of symbols that
 it is still waiting on, and once this count reaches zero the query object calls
 the query handler with a *SymbolMap* (a map of symbol names to addresses)
 describing the result. If any symbol fails to materialize the query immediately
 calls the query handler with an error.

 The collected materialization units are sent to the ExecutionSession to be
 dispatched, and the dispatch behavior can be set by the client. By default each
 materializer is run on the calling thread. Clients are free to create new
 threads to run materializers, or to send the work to a work queue for a thread
 pool (this is what LLJIT/LLLazyJIT do).

 Top Level APIs
 ==============

 Many of ORC's top-level APIs are visible in the example above:

 - *ExecutionSession* represents the JIT'd program and provides context for the
   JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the
   materializers.

 - *JITDylibs* provide the symbol tables.

 - *Layers* (ObjLinkingLayer and CXXLayer) are wrappers around compilers and
   allow clients to add uncompiled program representations supported by those
   compilers to JITDylibs.

 Several other important APIs are used explicitly. JIT clients need not be aware
 of them, but Layer authors will use them:

 - *MaterializationUnit* - When XXXLayer::add is invoked it wraps the given
   program representation (in this example, C++ source) in a MaterializationUnit,
   which is then stored in the JITDylib. MaterializationUnits are responsible for
   describing the definitions they provide, and for unwrapping the program
   representation and passing it back to the layer when compilation is required
   (this ownership shuffle makes writing thread-safe layers easier, since the
   ownership of the program representation will be passed back on the stack,
   rather than having to be fished out of a Layer member, which would require
   synchronization).

 - *MaterializationResponsibility* - When a MaterializationUnit hands a program
   representation back to the layer it comes with an associated
   MaterializationResponsibility object. This object tracks the definitions
   that must be materialized and provides a way to notify the JITDylib once they
   are either successfully materialized or a failure occurs.

 Handy utilities
 ===============

 TBD: absolute symbols, aliases, off-the-shelf layers.

 Laziness
 ========

 Laziness in ORC is provided by a utility called "lazy-reexports". The aim of
 this utility is to re-use the synchronization provided by the symbol lookup
 mechanism to make it safe to lazily compile functions, even if calls to the
 stub occur simultaneously on multiple threads of JIT'd code. It does this by
 reducing lazy compilation to symbol lookup: The lazy stub performs a lookup of
 its underlying definition on first call, updating the function body pointer
 once the definition is available. If additional calls arrive on other threads
 while compilation is ongoing they will be safely blocked by the normal lookup
 synchronization guarantee (no result until the result is safe) and can also
 proceed as soon as compilation completes.

 TBD: Usage example.

 Supporting Custom Compilers
 ===========================

 TBD.

 Low Level (MCJIT style) Use
 ===========================

 TBD.

 Future Features
 ===============

 TBD: Speculative compilation. Object Caches.

 .. [1] Formats/architectures vary in terms of supported features. MachO and
        ELF tend to have better support than COFF. Patches very welcome!
	===============================
	ORC Design and Implementation
	===============================

	Introduction
	============

	This document aims to provide a high-level overview of the design and
	implementation of the ORC JIT APIs. Except where otherwise stated, all
	discussion applies to the design of the APIs as of LLVM verison 9 (ORCv2).

	.. contents::
	:local:

	Use-cases
	=========

	ORC provides a modular API for building JIT compilers. There are a range
	of use cases for such an API:

	1. The LLVM tutorials use a simple ORC-based JIT class to execute expressions
	compiled from a toy languge: Kaleidoscope.

	2. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression
	evaluation. In this use case, cross compilation allows expressions compiled
	in the debugger process to be executed on the debug target process, which may
	be on a different device/architecture.

	3. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's
	optimizations within an existing JIT infrastructure.

	4. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter.

	By adoping a modular, library-based design we aim to make ORC useful in as many
	of these contexts as possible.

	Features
	========

	ORC provides the following features:

	- JIT-linking links relocatable object files (COFF, ELF, MachO)[1]_ into a
	target process an runtime. The target process may be the same process that
	contains the JIT session object and jit-linker, or may be another process
	(even one running on a different machine or architecture) that communicates
	with the JIT via RPC.

	- LLVM IR compilation, which is provided by off the shelf components
	(IRCompileLayer, SimpleCompiler, ConcurrentIRCompiler) that make it easy to
	add LLVM IR to a JIT'd process.

	- Eager and lazy compilation. By default, ORC will compile symbols as soon as
	they are looked up in the JIT session object (``ExecutionSession``). Compiling
	eagerly by default makes it easy to use ORC as a simple in-memory compiler for
	an existing JIT. ORC also provides a simple mechanism, lazy-reexports, for
	deferring compilation until first call.

	- Support for custom compilers and program representations. Clients can supply
	custom compilers for each symbol that they define in their JIT session. ORC
	will run the user-supplied compiler when the a definition of a symbol is
	needed. ORC is actually fully language agnostic: LLVM IR is not treated
	specially, and is supported via the same wrapper mechanism (the
	``MaterializationUnit`` class) that is used for custom compilers.

	- Concurrent JIT'd code and concurrent compilation. JIT'd code may spawn
	multiple threads, and may re-enter the JIT (e.g. for lazy compilation)
	concurrently from multiple threads. The ORC APIs also support running multiple
	compilers concurrently, and provides off-the-shelf infrastructure to track
	dependencies on running compiles (e.g. to ensure that we never call into code
	until it is safe to do so, even if that involves waiting on multiple
	compiles).

	- Orthogonality and composability: Each of the features above can be used (or
	not) independently. It is possible to put ORC components together to make a
	non-lazy, in-process, single threaded JIT or a lazy, out-of-process,
	concurrent JIT, or anything in between.

	LLJIT and LLLazyJIT
	===================

	ORC provides two basic JIT classes off-the-shelf. These are useful both as
	examples of how to assemble ORC components to make a JIT, and as replacements
	for earlier LLVM JIT APIs (e.g. MCJIT).

	The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support
	compilation of LLVM IR and linking of relocatable object files. All operations
	are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled
	as soon as you attempt to look up its address). LLJIT is a suitable replacement
	for MCJIT in most cases (note: some more advanced features, e.g.
	JITEventListeners are not supported yet).

	The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy
	compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule
	method, function bodies in that module will not be compiled until they are first
	called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT)
	JIT API.

	LLJIT and LLLazyJIT instances can be created using their respective builder
	classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a
	module ``M`` loaded on an ThreadSafeContext ``Ctx``:

	.. code-block:: c++

	// Try to detect the host arch and construct an LLJIT instance.
	auto JIT = LLJITBuilder().create();

	// If we could not construct an instance, return an error.
	if (!JIT)
	return JIT.takeError();

	// Add the module.
	if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx)))
	return Err;

	// Look up the JIT'd code entry point.
	auto EntrySym = JIT->lookup("entry");
	if (!EntrySym)
	return EntrySym.takeError();

	auto Entry = (void()())EntrySym.getAddress();

	Entry();

	The builder clasess provide a number of configuration options that can be
	specified before the JIT instance is constructed. For example:

	.. code-block:: c++

	// Build an LLLazyJIT instance that uses four worker threads for compilation,
	// and jumps to a specific error handler (rather than null) on lazy compile
	// failures.

	void handleLazyCompileFailure() {
	// JIT'd code will jump here if lazy compilation fails, giving us an
	// opportunity to exit or throw an exception into JIT'd code.
	throw JITFailed();
	}

	auto JIT = LLLazyJITBuilder()
	.setNumCompileThreads(4)
	.setLazyCompileFailureAddr(
	toJITTargetAddress(&handleLazyCompileFailure))
	.create();

	// ...

	Design Overview
	===============

	ORC's JIT'd program model aims to emulate the linking and symbol resolution
	rules used by the static and dynamic linkers. This allows ORC to JIT
	arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g.
	clang) that uses constructs like symbol linkage and visibility, and weak and
	common symbol definitions.

	To see how this works, imagine a program ``foo`` which links against a pair
	of dynamic libraries: ``libA`` and ``libB``. On the command line, building this
	system might look like:

	.. code-block:: bash

	$ clang++ -shared -o libA.dylib a1.cpp a2.cpp
	$ clang++ -shared -o libB.dylib b1.cpp b2.cpp
	$ clang++ -o myapp myapp.cpp -L. -lA -lB
	$ ./myapp

	In ORC, this would translate into API calls on a "CXXCompilingLayer" (with error
	checking omitted for brevity) as:

	.. code-block:: c++

	ExecutionSession ES;
	RTDyldObjectLinkingLayer ObjLinkingLayer(
	ES, []() { return llvm::make_unique<SectionMemoryManager>(); });
	CXXCompileLayer CXXLayer(ES, ObjLinkingLayer);

	// Create JITDylib "A" and add code to it using the CXX layer.
	auto &LibA = ES.createJITDylib("A");
	CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp"));
	CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp"));

	// Create JITDylib "B" and add code to it using the CXX layer.
	auto &LibB = ES.createJITDylib("B");
	CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp"));
	CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp"));

	// Specify the search order for the main JITDylib. This is equivalent to a
	// "links against" relationship in a command-line link.
	ES.getMainJITDylib().setSearchOrder({{&LibA, false}, {&LibB, false}});
	CXXLayer.add(ES.getMainJITDylib(), MemoryBuffer::getFile("main.cpp"));

	// Look up the JIT'd main, cast it to a function pointer, then call it.
	auto MainSym = ExitOnErr(ES.lookup({&ES.getMainJITDylib()}, "main"));
	auto Main = (int()(int, char*[]))MainSym.getAddress();

	int Result = Main(...);


	This example tells us nothing about how or when compilation will happen.
	That will depend on the implementation of the hypothetical CXXCompilingLayer,
	but the linking rules will be the same regardless. For example, if a1.cpp and
	a2.cpp both define a function "foo" the API should generate a duplicate
	definition error. On the other hand, if a1.cpp and b1.cpp both define "foo"
	there is no error (different dynamic libraries may define the same symbol). If
	main.cpp refers to "foo", it should bind to the definition in LibA rather than
	the one in LibB, since main.cpp is part of the "main" dylib, and the main dylib
	links against LibA before LibB.

	Many JIT clients will have no need for this strict adherence to the usual
	ahead-of-time linking rules and should be able to get by just fine by putting
	all of their code in a single JITDylib. However, clients who want to JIT code
	for languages/projects that traditionally rely on ahead-of-time linking (e.g.
	C++) will find that this feature makes life much easier.

	Symbol lookup in ORC serves two other important functions, beyond basic lookup:
	(1) It triggers compilation of the symbol(s) searched for, and (2) it provides
	the synchronization mechanism for concurrent compilation. The pseudo-code for
	the lookup process is:

	.. code-block:: none

	construct a query object from a query set and query handler
	lock the session
	lodge query against requested symbols, collect required materializers (if any)
	unlock the session
	dispatch materializers (if any)

	In this context a materializer is something that provides a working definition
	of a symbol upon request. Generally materializers wrap compilers, but they may
	also wrap a linker directly (if the program representation backing the
	definitions is an object file), or even just a class that writes bits directly
	into memory (if the definitions are stubs). Materialization is the blanket term
	for any actions (compiling, linking, splatting bits, registering with runtimes,
	etc.) that is requried to generate a symbol definition that is safe to call or
	access.

	As each materializer completes its work it notifies the JITDylib, which in turn
	notifies any query objects that are waiting on the newly materialized
	definitions. Each query object maintains a count of the number of symbols that
	it is still waiting on, and once this count reaches zero the query object calls
	the query handler with a SymbolMap (a map of symbol names to addresses)
	describing the result. If any symbol fails to materialize the query immediately
	calls the query handler with an error.

	The collected materialization units are sent to the ExecutionSession to be
	dispatched, and the dispatch behavior can be set by the client. By default each
	materializer is run on the calling thread. Clients are free to create new
	threads to run materializers, or to send the work to a work queue for a thread
	pool (this is what LLJIT/LLLazyJIT do).

	Top Level APIs
	==============

	Many of ORC's top-level APIs are visible in the example above:

	- ExecutionSession represents the JIT'd program and provides context for the
	JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the
	materializers.

	- JITDylibs provide the symbol tables.

	- Layers (ObjLinkingLayer and CXXLayer) are wrappers around compilers and
	allow clients to add uncompiled program representations supported by those
	compilers to JITDylibs.

	Several other important APIs are used explicitly. JIT clients need not be aware
	of them, but Layer authors will use them:

	- MaterializationUnit - When XXXLayer::add is invoked it wraps the given
	program representation (in this example, C++ source) in a MaterializationUnit,
	which is then stored in the JITDylib. MaterializationUnits are responsible for
	describing the definitions they provide, and for unwrapping the program
	representation and passing it back to the layer when compilation is required
	(this ownership shuffle makes writing thread-safe layers easier, since the
	ownership of the program representation will be passed back on the stack,
	rather than having to be fished out of a Layer member, which would require
	synchronization).

	- MaterializationResponsibility - When a MaterializationUnit hands a program
	representation back to the layer it comes with an associated
	MaterializationResponsibility object. This object tracks the definitions
	that must be materialized and provides a way to notify the JITDylib once they
	are either successfully materialized or a failure occurs.

	Handy utilities
	===============

	TBD: absolute symbols, aliases, off-the-shelf layers.

	Laziness
	========

	Laziness in ORC is provided by a utility called "lazy-reexports". The aim of
	this utility is to re-use the synchronization provided by the symbol lookup
	mechanism to make it safe to lazily compile functions, even if calls to the
	stub occur simultaneously on multiple threads of JIT'd code. It does this by
	reducing lazy compilation to symbol lookup: The lazy stub performs a lookup of
	its underlying definition on first call, updating the function body pointer
	once the definition is available. If additional calls arrive on other threads
	while compilation is ongoing they will be safely blocked by the normal lookup
	synchronization guarantee (no result until the result is safe) and can also
	proceed as soon as compilation completes.

	TBD: Usage example.

	Supporting Custom Compilers
	===========================

	TBD.

	Low Level (MCJIT style) Use
	===========================

	TBD.

	Future Features
	===============

	TBD: Speculative compilation. Object Caches.

	.. [1] Formats/architectures vary in terms of supported features. MachO and
	ELF tend to have better support than COFF. Patches very welcome!