JHCR Bytecode Introduction

lep

In this post we’re going to look at all the things related to the bytecode. The bytecode was custom-designed to work reasonably fast in our jass context, so that it would actually be useable and provide a net-positive for map development.

One key insight is that the interpreter works on only one global hashtable. We use the same technique as many other table libraries: use an unique integer as the first index for the table and our provided key for the second. Another great feature of hashtables that we use extensively is that you can use any integer as any index. We will later see how that is usefull.

Now let’s first look at the two definitions of our bytecode. We have one in Haskell which we compile to and one corresponding file in Jass. You can find them here and here.

data Instruction
    -- three regs
    = Lt                Type Register Register Register
    | Le                Type Register Register Register
    | Gt                Type Register Register Register
    | Ge                Type Register Register Register
    | Eq                Type Register Register Register
    | Neq               Type Register Register Register
    | Add               Type Register Register Register
    | Sub               Type Register Register Register
    | Mul               Type Register Register Register
    | Div               Type Register Register Register
    
    -- Mod only works on integers but for ease we still include the Type
    | Mod               Type Register Register Register

    | SetGlobalArray    Type Register Register Register
    | GetGlobalArray    Type Register Register Register
    | SetLocalArray     Type Register Register Register
    | GetLocalArray     Type Register Register Register
    
    -- two regs
    | Negate    Type Register Register
    | Set       Type Register Register
    | SetGlobal Type Register Register
    | GetGlobal Type Register Register
    | Bind      Type Register Register

    -- special
    | Literal Type Register (Hot.Ast Var Expr) -- encoded as: lit ty reg len string
    | Call Register Label Name
    | Convert Type Register Type Register

    -- one label
    | Label Label
    | Jmp Label
    | Function Label Name
    
    | JmpT Label Register
    
    | Not Register Register

    | Ret Type
    deriving (Show)

A quick glance and we can see that we use four different data-types in the instructions: Type, Register, Label, Name.

Let’s look at them one by one.

Type

Type is just a plain old jass type like handle, integer, widget etc. There are two reasons we have a typed bytecode. The first one is once again because we use hashtables to store everything and the hashtable API has us use the correct native. The second reason is that we avoid redundancy in our instruction set (only one Add-Instruction etc.).

Register

The Instructionset used in JHCR is a register based instructionset which follows from our usage of hashtables since we can have effectivly unlimited registers. This especially aids in the code-generation phase since we don’t have to care about register allocation (an NP-Hard Problem). It actually has some more benefits of which i might speak about in a later post.

But even though we have more or less unlimited registers we still have to follow some rules, which result in the calling convention used in JHCR. To explore this let’s have a look at a small example.

native print_integer takes integer x returns nothing

function add takes integer a, integer b returns integer
    return a+b
endfunction

function main takes nothing returns nothing
    local integer r = add(1, 3)
    call print_integer(r)
endfunction

This snippet could be compiled to this bytecode:

fun 123 add
add integer 0 1 2
ret integer

fun 234 main
lit integer -1 1
lit integer -2 3
bind integer 1 -1
bind integer 2 -2
call 1 123 add
bind integer 1 1
call -3 567 print_integer
ret nothing

Let’s look at the add function first. From this function we can see that the parameters to a function are stored in registers 1 to n where n is the amount of parameters a function takes. We can also see that the return value is stored in register 0. And this is our calling convention. So let’s now look at the main function. From that we can see a few things aswell: first of all we have to use the bind instruction to transfer values from local “local” registers to the registers of the to-be-called function. So unlike registers found in your CPU we have them scoped for each function. The second thing we can notice is that temporary values are assigned to “negative” registers -1, -2, etc. As i said before, this makes compiling just a bit more easy and it doesn’t cost us much since we can use the “endless” hashtables. But we can also see that local variables also use positive registers. In fact they use registers n+1 .. n+m, where m is the amount of local variables declared inside that function. And we can see how the call instruction is used since the return value is directly copied into the register of our local variable:

call 1 123 add
     ^  ^   ^
     |  |   |
     |  |   +- The Name of the function is ommited in the final bytecode but
     |  |   `- it's useful for debugging.
     |  |
     |  `- Just the internal id of the function to-be-called
     |
     +- The Interpreter takes whatever is stored in the called functions
     +- register 0 and puts it into this functions register 1 which is the
     +- the local variable r. Technically the transfer of data actually
     +- happens in when interpreting the ret instruction since we lack the
     `- type information in the call instruction.

Question to the reader: how do you do local arrays with this setup?

Label

To achieve loops and conditions and all that good stuff we use labels and jump instructions. We actually don’t have many at all: Label, Jmp, JmpT. The Label instruction just creates the point in a list of instructions under that “label”. Do note that that label is static, so you can’t create dynamic labels to jump to. Once interpreted the interpreter simply ignores any label instruction. The Jmp instruction on the other hand just jumps to the corresponding label instruction. The following snippet is an infinite loop.

label 3
jmp 3

Now finally the JmpT instruction jumps to the label if and only if the value stored under the register is true. For an example look at the code below. That code is an if-statement compiled to our bytecode where the condition is stored in register -1.

jmpt 2 -1
<else branch>
jmp 3
label 2
<if branch>
label 3

Name

The Name datatype is mostly ignored because we assign unique ids to everything we can. There is only one case where we actually have to use an actual name. Can you guess?


If we reload the script with a totaly new function that wasn’t seen before, that new function ofcourse also gets an id but to link up the name to the new id in our already running map we have to transfer both. And we need that mapping for ExecuteFunc only (i think).

In the code to load new bytecode we handle that.

fin

Now that we looked at the datatype representing the bytecode we can guess how many of those instructions actually work. But as you might have guessed this is still only a fairly high-level description of what JHCR is doing so i hope i can shed some light on the nitty-gritty details in a later post.

lep . blog