
vm
Virtual Machine experiment
This is an experiment and comparison and optimization of a miniature assembler and VM with the following (sort of but less and less minimalistic) instructions:
Instructions
Immediate operand instructions:
LoadI, AddI, SubI, MulI, DivI, ModI, ShiftI, AndI (though they can also load the relative address of a label as value)
Relative address based instructions:
LoadR, AddR, SubR, MulR, DivR, StoreR, JNZ (jump if not equal to 0), JNEG (jump if negative), JPOS (jump if positive or 0), JumpR (unconditional jump), IncrR i addr increments (or decrements if i is negative the value at addr by i and loads the result in the accumulator)
Stack-oriented instructions let the VM manage simple call frames:
Call pushes the return address, and Ret unwinds the stack (optionally dropping extra entries).
Push/Pop move the accumulator to and from the stack while reserving or discarding extra slots.
LoadS, StoreS, AddS, SubS, MulS, DivS, and IncrS read and write relative to the current stack pointer so stack-resident variables can be manipulated without touching memory directly, and SysS mirrors Sys but uses a stack index operand for its first argument.
IdivS divides the stack location by the accumulator and keeps the remainder in A.
StoreSB stores a single byte from the accumulator into a stack-resident buffer: the first operand specifies the base stack offset of the target word span, while 2nd operand indicates a stack slot containing the byte offset (which can be more than 8). The handler computes the word/bit position and patches the selected byte in place. It is handy for building packed str8 buffers on the stack (see programs/itoa.asm).
- String quoting use the go rules (ie in "double-quotes" with \ sequences or single 'x' for 1 character or backtick for verbatim)
- str8: 1 byte size, remaining data (so string 7 bytes or less are 1 word, longer is chunked into 8 bytes words)
- Data can also be just bytes packed by 64 bit words (see ReadN/WriteN below for instance)
Syscall
Sys 8bit callid (lowest byte), 48 remaining bits as (first) argument to the syscall
Exit (1) with value from arg
Read8 (2) reads from fd (0 == stdin) up to A bytes into param address/stack buffer str8 format (so max 255 bytes).
Write8 (3) writes a str8 to fd (1 == stdout). In the SysS variant the accumulator is a byte offset from the passed stack offset. In the Sys one A is ignored unless the parameter is 0 in which case A is the address to use for the location of the str8 (see an example in echo.asm).
ReadN (4) reads from fd up to A bytes into param address/stack buffer (no limit outside of underlying read syscall and memory as this returns the length and does not write str8 len byte first).
WriteN (5) writes A bytes to fd from memory pointed to by the operand.
Sleep (6) argument in milliseconds
Open (7) opens a file and returns fd in A. Takes flags (low 8 bits of param) and path address (high 48 bits). If Sys variant with param=0, uses A as path address. Flags are platform-specific (e.g., O_RDONLY=0).
Close (8) closes a file descriptor. Fd in A. Returns 0 on success, -1 on error.
ReadF (9) reads A bytes from fd (low 8 bits of param) to address (high 48 bits). Similar to ReadN but fd comes from param instead of being fixed.
WriteF (10) writes A bytes to fd (low 8 bits of param) from address (high 48 bits). Similar to WriteN but fd comes from param instead of being fixed.
Assembler virtual instructions
The assembler is using space separated arguments and allows basic expressions (e.g. foo<<4+3) (see the Operator constants for list). Note that there are limitations, for instance -2*3 works fine but not 2*-3 e.g. for negative operands gather the sign at the front.
: (e.g. foo: on its own line) defines a label for relative addressing instructions.
.const defines a constant that can be used in instructions later (must be defined before use unlike labels)
data for a 64 bit word
str8 for string (with the double or backtick quotes)
- on a line preceding an instruction: label +
: label for the *R instruction (relative address calculation). label starts with a letter.
.space for multiple 0 initialized 64 bit words
Var v1 v2 ... virtual instruction that generates a Push instruction with the number of identifiers provided and defines labels for said variables starting at 0 (which will start with the value of the accumulator while the rest will start 0 initialized).
Param p1 p2 ... virtual instruction that generates stack labels for p1, p2 as offset from before the return PC (ie parameters pushed (via Var or Push) by the caller before calling Call)
Return virtual instruction that generates a Ret n where n is such as a Var push is undone.
Program Initialization
When a VM program starts, the host initializes the stack with command-line arguments:
- Argument addresses are pushed onto the stack in reverse order (so
argv[0] is deepest, argv[N-1] is highest)
- The argument count (
argc) is pushed on top
- The stack pointer points at
argc
To access arguments, a program typically:
POP 0 ; Pop argc into accumulator
StoreR argc ; Store it for later use
; Now pop each argument address and process it
See echo.asm for a complete example that iterates through all arguments and prints them one per line (and make echo-test as an example to run).
Benchmarks
Compares go, tinygo, C based VMs (and plain C loop for reference).
Usage and more
See Makefile / run make
Installation
Binary release of the go version also available in releases/ or via
go install grol.io/vm@latest
(and homebrew and docker as well)