Contextual and context-free microcode

September 2, 2019

A key part of Catnip’s architecture is the use of internal microcode. As an intermediate step prior to execution, IL instructions are converted into this microcode, which is then what actually gets executed by the interpreter (or subsequently converted into native code).

The primary reason for this is that IL instructions are heavily contextual - for example, the ADD instruction may be adding integers or floating point values in either 32 bit or 64 bit mode. Obviously enough, in order to execute the instruction it’s necessary to know specifically what it is trying to do. In the case of ADD, the correct operation can be chosen based on the types on the evaluation stack at that point - which in turn can be determined by examining the prior instructions in the method. However, whilst type information from the evaluation stack is the most common source of context for instructions, it is by no means the only one.

For example, an LDFLD instruction (which loads a value from a field) contains a token that references the field in question. To actually execute this it is necessary to know the type of the field as well as the offset at which it appears in the object, which is calculated by the runtime when it lays out the object. Even if this has already been determined, looking up the required value can be very time-consuming (especially in the case where the type being referenced is in a different assembly, in which case a name lookup may be required to find it).

IL is structured like this for a number of reasons - inferring the types of operands from the evaluation stack both reduces the size of the bytecode and makes ensuring type safety much simpler, as it is much less likely that the operand type expected by an instruction differs from the type of data actually provided to it. Referencing fields and types via tokens means that IL can generated without needing to know the specifics of type layouts, which in turn allows assemblies to reference each other without strongly tying their versions together (so a change to structure layout in one assembly does not require any changes to other assemblies).

Essentially, the IL representation of instructions is very well-optimised for almost everything except execution, which is not entirely surprising when you consider that the intended usage is to generate native code via JIT compilation. However this makes life pretty awkward for interpreters, and hence Catnip takes the approach of converting IL into a context-free internal microcode that is much faster to execute.

What “context free” means in the case of Catnip is that the internal microcode instructions are fully self-contained, in the same way that instructions on a real processor generally are. Each instruction can be executed without needing to refer to any other instructions in the method, or (for the most part) external information.

For example, here’s a simple method from one of the Catnip test programs:

public static void DebugWriteLine(string msg)
{
    Catnip.MinCorLibUtils.DebugWriteLine(msg);
    if (!RunningUnderCatnip)
    {
        Console.WriteLine(msg);
    }
}

Which when converted looks like this (indented lines with instruction names in caps are Catnip microcode):

    STARTSTACKFRAME 0 (0x00000000)
ldarg.0 
    LDLOC_32 -16 (0xfffffff0)
call "void Catnip.MinCorLibUtils.DebugWriteLine(string)"
    CALL Method = void Catnip.MinCorLibUtils.DebugWriteLine(string), 
        OurMethod = void TestApp.Utils.DebugWriteLine(string),
        ExceptionHandlerOffset = -1 (null target)
call "bool TestApp.Utils.get_RunningUnderCatnip()"
    CALL Method = bool TestApp.Utils.get_RunningUnderCatnip(),
        OurMethod = void TestApp.Utils.DebugWriteLine(string),
        ExceptionHandlerOffset = -1 (null target)
brtrue.s 0x0013
    BRNZ_U4 8 (0x025CB7D4)
ldarg.0 
    LDLOC_32 -16 (0xfffffff0)
call "void System.Console.WriteLine(string)"
    CHECKSTATICDATA System.Console
    CALL Method = void System.Console.WriteLine(string),
        OurMethod = void TestApp.Utils.DebugWriteLine(string),
        ExceptionHandlerOffset = -1 (null target)
ret 
    CHECKSTACKFRAMESIZE MinExpectedSize = 0 (0x00000000), 
        MaxExpectedSize = 0 (0x00000000)
    ENDSTACKFRAME
    RET NumArgumentBytes = 4 (0x00000004),
        NumReturnBytes = 0 (0x00000000),
        LocalStorageSize = 0 (0x00000000)

By pre-calculating as much as possible (for example, the offsets of fields within objects), the interpreter only needs to perform the operations that involve dynamic data - in other words, the actual values the instruction is operating on. It’s possible therefore to view this entire exercise as one in caching for optimisation purposes - performing a pre-process using the known static data (the IL code, structure layouts and the like) with the aim of eliminating as many calculations as possible.

So, as a simple example, the following IL:

ldc.i4 1
ldc.i4 2
add

…gets converted to the following Catnip microcode:

LDC_32 1
LDC_32 2
ADD_I32_I32

For the purposes of this example we’ll ignore the fact that the entire calculation could be precalculated.

Essentially, the two constant loads remain more-or-less unchanged (aside from the small semantic difference that IL declares the full type of the value - a 32-bit integer in this case - whilst Catnip only cares about the size and not the signed-ness), but the add instruction has been changed to an explicit “32 bit signed integer + 32 bit signed integer” ADD_I32_I32.

It can be seen in this example how some of the context has migrated to where it is needed - the load instructions don’t care about signed-ness as they are simply moving bytes around, but the ADD instruction does and thus has gained that information.

A more complex example might be a load from a field, such as this:

ldfld System.TimeSpan._ticks

…which produces the following Catnip microcode:

LDFLD_64 0

As described above, what has happened here is that the field reference token was dereferenced to find the field itself, and then the concrete offset of that field in the object calculated (which in this instance happened to be 0). This, along with information about the field type (a 64-bit integer), allows Catnip to generate a load instruction with a specific offset and size.

Whilst the examples above have shown a 1:1 mapping between IL instructions and Catnip microcode, this isn’t always the case - for example, loading a value from a static field looks like this:

ldsfld System.Diagnostics.Stopwatch.IsHighResolution

In Catnip microcode:

CHECKSTATICDATA System.Diagnostics.Stopwatch
LDSFLD_U1 RuntimeType = System.Diagnostics.Stopwatch, Offset = 8

What has happened here is that Catnip has decomposed the IL into two instructions - the first checks to make sure that the static data for the class being accessed has been correctly initialised (as the CLR guarantees that static data will always be initialised prior to access), whilst the second actually performs the load.

In this instance, the instruction also gets a pointer to the runtime type information for the class (which the disassembler has converted to a name here) - this is required to find the base address of the static data area.

This division of responsibilities is useful for two reasons - firstly, it allows the initialisation check to be eliminated in cases where the code generator can prove that the field must already have been initialised prior to this. Secondly, the load instruction is a more generic “load data from this offset from a class static data region” which can potentially be reused in other scenarios.

Decomposing instructions like this can have unexpected (and unwanted!) side-effects, though, especially when exceptions are involved as they can cause the code flow to return from “half-way” through a decomposed instruction - consider the case where there is a static class constructor that gets executed as part of the initialisation check and then throws an exception.

Instruction reuse allows for some interesting short-cuts - for example, Catnip does not have any explicit instructions for manipulating unmanaged pointers, but instead simply emits 32 / 64-bit integer operations instead as appropriate for the target architecture (managed pointers do have specialised instructions, as it is necessary to insert write barriers for the garbage collector).

It is, of course, also true that decomposing instructions adds some additional interpreter overhead due to needing to fetch/dispatch extra instructions - at present Catnip just lives with this (relatively small) performance hit, but in the future my intention is to add a mechanism to generate compound instructions to efficiently execute commonly-used microcode sequences.

So that’s a brief look at how MSIL instructions make use of contextual information as a way of optimising for code size and complexity, and the approach Catnip takes to convert that representation into one suited for efficient execution.