JHCR Bytecode Introduction

In this post we’re going to look at all the things related to the bytecode. The bytecode was custom-designed to work reasonably fast in our jass context, so that it would actually be useable and provide a net-positive for map development.

One key insight is that the interpreter works on only one global hashtable. We use the same technique as many other table libraries: use an unique integer as the first index for the table and our provided key for the second. Another great feature of hashtables that we use extensively is that you can use any integer as any index. We will later see how that is usefull.

Now let’s first look at the two definitions of our bytecode. We have one in Haskell which we compile to and one corresponding file in Jass. You can find them here and here.

data Instruction
    -- three regs
    = Lt                Type Register Register Register
    | Le                Type Register Register Register
    | Gt                Type Register Register Register
    | Ge                Type Register Register Register
    | Eq                Type Register Register Register
    | Neq               Type Register Register Register
    | Add               Type Register Register Register
    | Sub               Type Register Register Register
    | Mul               Type Register Register Register
    | Div               Type Register Register Register
    
    -- Mod only works on integers but for ease we still include the Type
    | Mod               Type Register Register Register

    | SetGlobalArray    Type Register Register Register
    | GetGlobalArray    Type Register Register Register
    | SetLocalArray     Type Register Register Register
    | GetLocalArray     Type Register Register Register
    
    -- two regs
    | Negate    Type Register Register
    | Set       Type Register Register
    | SetGlobal Type Register Register
    | GetGlobal Type Register Register
    | Bind      Type Register Register

    -- special
    | Literal Type Register (Hot.Ast Var Expr) -- encoded as: lit ty reg len string
    | Call Register Label Name
    | Convert Type Register Type Register

    -- one label
    | Label Label
    | Jmp Label
    | Function Label Name
    
    | JmpT Label Register
    
    | Not Register Register

    | Ret Type
    deriving (Show)

A quick glance and we can see that we use four different data-types in the instructions: Type, Register, Label, Name.

Type

Type is just a plain old jass type like handle, integer, widget etc. There are two reasons we have a typed bytecode. The first one is once again because we use hashtables to store everything and the hashtable API has us use the correct native. The second reason is that we avoid redundancy in our instruction set (only one Add-Instruction etc.).

The Instructionset used in JHCR is a register based instructionset which follows from our usage of hashtables since we can have effectivly unlimited registers. This especially aids in the code-generation phase since we don’t have to care about register allocation (an NP-Hard Problem). It actually has some more benefits of which i might speak about in a later post.

But even though we have more or less unlimited registers we still have to follow some rules, which result in the calling convention used in JHCR. To explore this let’s have a look at a small example.

Let’s look at the add function first. From this function we can see that the parameters to a function are stored in registers 1 to n where n is the amount of parameters a function takes. We can also see that the return value is stored in register 0. And this is our calling convention. So let’s now look at the main function. From that we can see a few things aswell: first of all we have to use the bind instruction to transfer values from local “local” registers to the registers of the to-be-called function. So unlike registers found in your CPU we have them scoped for each function. The second thing we can notice is that temporary values are assigned to “negative” registers -1, -2, etc. As i said before, this makes compiling just a bit more easy and it doesn’t cost us much since we can use the “endless” hashtables. But we can also see that local variables also use positive registers. In fact they use registers n+1 .. n+m, where m is the amount of local variables declared inside that function. And we can see how the call instruction is used since the return value is directly copied into the register of our local variable:

Label

To achieve loops and conditions and all that good stuff we use labels and jump instructions. We actually don’t have many at all: Label, Jmp, JmpT. The Label instruction just creates the point in a list of instructions under that “label”. Do note that that label is static, so you can’t create dynamic labels to jump to. Once interpreted the interpreter simply ignores any label instruction. The Jmp instruction on the other hand just jumps to the corresponding label instruction. The following snippet is an infinite loop.

Now finally the JmpT instruction jumps to the label if and only if the value stored under the register is true. For an example look at the code below. That code is an if-statement compiled to our bytecode where the condition is stored in register -1.

Name

The Name datatype is mostly ignored because we assign unique ids to everything we can. There is only one case where we actually have to use an actual name. Can you guess?

If we reload the script with a totaly new function that wasn’t seen before, that new function ofcourse also gets an id but to link up the name to the new id in our already running map we have to transfer both. And we need that mapping for ExecuteFunc only (i think).

fin

Now that we looked at the datatype representing the bytecode we can guess how many of those instructions actually work. But as you might have guessed this is still only a fairly high-level description of what JHCR is doing so i hope i can shed some light on the nitty-gritty details in a later post.