assembler

module
v0.0.0-...-8bb7ff2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2020 License: GPL-2.0

README

GoDoc Go Report Card license

Assembler

This repository contains a VERY BASIC x86-64 assembler, which is capable of reading assembly-language input, and generating a staticly linked ELF binary output.

It is more a proof-of-concept than a useful assembler, but I hope to take it to the state where it can compile the kind of x86-64 assembly I produce in some of my other projects.

Currently the assembler will generate a binary which looks like this:

$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
       statically linked, no section header

Why? I've written a couple of toy projects that generate assembly language programs, then pass them through an assembler:

The code in this repository was born out of the process of experimenting with generating an ELF binary directly. A necessary learning-process.

Limitations

We don't support anywhere near the complete instruction-set which an assembly language programmer would expect. Currently we support only things like this:

  • add $REG, $REG + add $REG, $NUMBER
    • Add a number, or the contents of another register, to a register.
  • call $LABEL
  • dec $REG
    • Decrement the contents of the specified register.
    • We also support indirection, so the following work:
      • inc byte ptr [$REG]
      • inc word ptr [$REG]
      • inc dword ptr [$REG]
      • inc qword ptr [$REG]
  • inc $REG
    • Increment the contents of the specified register.
    • We also support indirection, so the following work:
      • inc byte ptr [$REG]
      • inc word ptr [$REG]
      • inc dword ptr [$REG]
      • inc qword ptr [$REG]
  • jmp $LABEL, je $LABEL, jne $LABEL
    • We support jumping instructions, but only with -127/+128 byte displacements
    • See jmp.asm for a simple example.
  • mov $REG, $NUMBER
  • mov $REG, $REG
    • Move a number into the specified register.
  • nop
    • Do nothing.
  • push $NUMBER, or push $IDENTIFIER
  • ret
    • Return from call.
    • NOTE: We don't actually support making calls, though that can be emulated via push - see jmp.asm for an example.
  • sub $REG, $REG + sub $REG, $NUMBER
    • Subtract a number, or the contents of another register, from a register.
  • xor $REG, $REG
    • Set the given register to be zero.
  • int $NUM
    • Call the kernel.
  • Processor (flag) control instructions:
    • clc, cld, cli, cmc, stc, std, and sti.

Note that we really only support the following registers, you'll see that we only support the 64-bit registers (which means rax is supported but eax, ax, ah, and al are specifically not supported):

  • rax
  • rcx
  • rdx
  • rbx
  • rsp
  • rbp
  • rsi
  • rdi

There is some support for the extended registers r8-r15, but this varies on a per-instruction basis and should not be relied upon.

There is support for storing fixed-data within our program, and locating that. See hello.asm for an example of that.

We also have some other (obvious) limitations:

  • There is notably no support for comparison instructions, and jumping instructions.
    • We emulate (unconditional) jump instructions via "push" and "ret", see jmp.asm for an example of that.
  • The entry-point is always at the beginning of the source.
  • You can only reference data AFTER it has been declared.
    • These are added to the data section of the generated binary, but must be defined first.
    • See hello.asm for an example of that.

Installation

If you have this repository cloned locally you can build the assembler like so:

cd cmd/assembler
go build .
go install .

If you wish to fetch and install via your existing toolchain:

go get -u github.com/skx/assembler/cmd/assembler

You can repeat for the other commands if you wish:

go get -u github.com/skx/assembler/cmd/lexer
go get -u github.com/skx/assembler/cmd/parser

Of course these binary-names are very generic, so perhaps better to work locally!

Example Usage

Build the assembler:

 $ cd cmd/assembler
 $ go build .

Compile the sample program, and execute it showing the return-code:

 $ cmd/assembler/assembler test.asm && ./a.out ; echo $?
 9

Or run the hello.asm example:

 $ cmd/assembler/assembler  hello.in && ./a.out
 Hello, world
 Goodbye, world

You'll note that the \n character was correctly expanded into a newline.

Internals

The core of our code consists of a small number of simple packages:

In addition to the package modules we also have a couple of binaries:

  • cmd/lexer
    • Show the output of lexing a program.
    • This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
  • cmd/parser
    • Show the output of parsing a program.
      • This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
  • cmd/assembler
    • Assemble a program, producing an executable binary.

These commands located beneath cmd each operate the same way. They each take a single argument which is a file containing assembly-language instructions.

For example here is how you'd build and test the parser:

cd cmd/parser
go build .
$ ./parser ../../test.asm
&{{INSTRUCTION xor} [{REGISTER rax} {REGISTER rax}]}
&{{INSTRUCTION inc} [{REGISTER rax}]}
&{{INSTRUCTION mov} [{REGISTER rbx} {NUMBER 0x0000}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0007}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0002}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION int} [{NUMBER 0x80}]}

Adding New Instructions

This is how you might add a new instruction to the assembler, for example you might add jmp 0x00000 or some similar instruction:

  • Add a new entry for the instruction in instructions/instructions.go
    • i.e. Update InstructionLengths map to add the instruction.
    • This will be used by both the tokenization process, and the parser.
  • Generate the appropriate output in compiler/compiler.go, inside the function compileInstruction.
    • i.e. Emit the binary-code for the instruction.

Debugging Generated Binaries

Launch the binary under gdb:

$ gdb ./a.out

Start it:

(gdb) starti
Starting program: /home/skx/Repos/github.com/skx/assembler/a.out

Program stopped.
0x00000000004000b0 in ?? ()

Dissassemble:

(gdb)  x/5i $pc

Or show string-contents at an address:

(gdb) x/s 0x400000

Bugs?

Feel free to report, as this is more a proof of concept rather than a robust tool they are to be expected.

Specifically we're missing support for many instructions, but I hope the code generated for those that is present is correct.

Steve

Directories

Path Synopsis
cmd
assembler command
lexer command
parser command
Package compiler is the package which is actually responsible for reading the user-program and generating the binary result.
Package compiler is the package which is actually responsible for reading the user-program and generating the binary result.
Package instructions contains the comment instruction-definitions for the instructions that we understand.
Package instructions contains the comment instruction-definitions for the instructions that we understand.
Package lexer contains our lexer.
Package lexer contains our lexer.
Package parser consumes tokens from the lexer, and generates the AST which is then walked to generate binary code.
Package parser consumes tokens from the lexer, and generates the AST which is then walked to generate binary code.
Package token contains identifiers for the various things we find in our source-scripts.
Package token contains identifiers for the various things we find in our source-scripts.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL