In this post we’re going to look at all the things related to the bytecode. The bytecode was custom-designed to work reasonably fast in our jass context, so that it would actually be useable and provide a net-positive for map development.
One key insight is that the interpreter works on only one global hashtable. We use the same technique as many other table libraries: use an unique integer as the first index for the table and our provided key for the second. Another great feature of hashtables that we use extensively is that you can use any integer as any index. We will later see how that is usefull.
Now let’s first look at the two definitions of our bytecode. We have one in Haskell which we compile to and one corresponding file in Jass. You can find them here and here.
data Instruction
-- three regs
= Lt Type Register Register Register
| Le Type Register Register Register
| Gt Type Register Register Register
| Ge Type Register Register Register
| Eq Type Register Register Register
| Neq Type Register Register Register
| Add Type Register Register Register
| Sub Type Register Register Register
| Mul Type Register Register Register
| Div Type Register Register Register
-- Mod only works on integers but for ease we still include the Type
| Mod Type Register Register Register
| SetGlobalArray Type Register Register Register
| GetGlobalArray Type Register Register Register
| SetLocalArray Type Register Register Register
| GetLocalArray Type Register Register Register
-- two regs
| Negate Type Register Register
| Set Type Register Register
| SetGlobal Type Register Register
| GetGlobal Type Register Register
| Bind Type Register Register
-- special
| Literal Type Register (Hot.Ast Var Expr) -- encoded as: lit ty reg len string
| Call Register Label Name
| Convert Type Register Type Register
-- one label
| Label Label
| Jmp Label
| Function Label Name
| JmpT Label Register
| Not Register Register
| Ret Type
deriving (Show)
A quick glance and we can see that we use four different data-types in the instructions: Type, Register, Label, Name.
Let’s look at them one by one.
Type is just a plain old jass type like handle, integer, widget
etc. There are two reasons
we have a typed bytecode. The first one is once again because we use
hashtable
s to store everything and the
hashtable API has us use the correct native. The second reason is that
we avoid redundancy in our instruction set (only one Add-Instruction
etc.).
The Instructionset used in JHCR is a register based instructionset which follows from our usage of hashtables since we can have effectivly unlimited registers. This especially aids in the code-generation phase since we don’t have to care about register allocation (an NP-Hard Problem). It actually has some more benefits of which i might speak about in a later post.
But even though we have more or less unlimited registers we still have to follow some rules, which result in the calling convention used in JHCR. To explore this let’s have a look at a small example.
native print_integer takes integer x returns nothing
function add takes integer a, integer b returns integer
return a+b
endfunction
function main takes nothing returns nothing
local integer r = add(1, 3)
call print_integer(r)
endfunction
This snippet could be compiled to this bytecode:
fun 123 add
add integer 0 1 2
ret integer
fun 234 main
lit integer -1 1
lit integer -2 3
bind integer 1 -1
bind integer 2 -2
call 1 123 add
bind integer 1 1
call -3 567 print_integer
ret nothing
Let’s look at the add
function first. From this function
we can see that the parameters to a function are stored in registers 1
to n where n is the amount of parameters a function takes. We can also
see that the return value is stored in register 0. And this is
our calling convention. So let’s now look at the main
function. From that we can see a few things aswell: first of all we have
to use the bind
instruction to transfer values from local
“local” registers to the registers of the to-be-called function. So
unlike registers found in your CPU we have them scoped for each
function. The second thing we can notice is that temporary values are
assigned to “negative” registers -1, -2, etc. As i said before, this
makes compiling just a bit more easy and it doesn’t cost us much since
we can use the “endless” hashtables. But we can also see that local
variables also use positive registers. In fact they use registers n+1 ..
n+m, where m is the amount of local variables declared inside that
function. And we can see how the call
instruction is used
since the return value is directly copied into the register of our local
variable:
call 1 123 add
^ ^ ^
| | |
| | +- The Name of the function is ommited in the final bytecode but
| | `- it's useful for debugging.
| |
| `- Just the internal id of the function to-be-called
|
+- The Interpreter takes whatever is stored in the called functions
+- register 0 and puts it into this functions register 1 which is the
+- the local variable r. Technically the transfer of data actually
+- happens in when interpreting the ret instruction since we lack the
`- type information in the call instruction.
Question to the reader: how do you do local arrays with this setup?
To achieve loops and conditions and all that good stuff we use labels and jump instructions. We actually don’t have many at all: Label, Jmp, JmpT. The Label instruction just creates the point in a list of instructions under that “label”. Do note that that label is static, so you can’t create dynamic labels to jump to. Once interpreted the interpreter simply ignores any label instruction. The Jmp instruction on the other hand just jumps to the corresponding label instruction. The following snippet is an infinite loop.
label 3
jmp 3
Now finally the JmpT instruction jumps to the label if and only if the value stored under the register is true. For an example look at the code below. That code is an if-statement compiled to our bytecode where the condition is stored in register -1.
jmpt 2 -1
<else branch>
jmp 3
label 2
<if branch>
label 3
The Name datatype is mostly ignored because we assign unique ids to everything we can. There is only one case where we actually have to use an actual name. Can you guess?
If we reload the script with a totaly new function that wasn’t seen
before, that new function ofcourse also gets an id but to link up the
name to the new id in our already running map we have to transfer both.
And we need that mapping for ExecuteFunc
only
(i think).
In the code to load new bytecode we handle that.
Now that we looked at the datatype representing the bytecode we can guess how many of those instructions actually work. But as you might have guessed this is still only a fairly high-level description of what JHCR is doing so i hope i can shed some light on the nitty-gritty details in a later post.
lep . blog