xeddata

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 8, 2023 License: BSD-3-Clause Imports: 12 Imported by: 1

Documentation

Overview

Package xeddata provides utilities to work with XED datafiles.

Main features:

  • Fundamental XED enumerations (CPU modes, operand sizes, ...)
  • XED objects and their components
  • XED datafiles reader (see below)
  • Utility functions like ExpandStates

The amount of file formats that is understood is a minimal set required to generate x86.csv from XED tables:

  • states - simple macro substitutions used in patterns
  • widths - mappings from width names to their size
  • element-types - XED xtype information
  • objects - XED objects that constitute "the tables"

Collectively, those files are called "datafiles".

Terminology is borrowed from XED itself, where appropriate, x86csv names are provided as an alternative.

"$XED/foo/bar.txt" notation is used to specify a path to "foo/bar.txt" file under local XED source repository folder.

The default usage scheme:

  1. Open "XED database" to load required metadata.
  2. Read XED file with objects definitions.
  3. Operate on XED objects.

See example_test.go for complete examples.

It is required to build Intel XED before attempting to use its datafiles, as this package expects "all" versions that are a concatenated final versions of datafiles. If "$XED/obj/dgen/" does not contain relevant files, then either this documentation is stale or your XED is not built.

To see examples of "XED objects" see "testdata/xed_objects.txt".

Intel XED https://github.com/intelxed/xed provides all documentation that can be required to understand datafiles. The "$XED/misc/engineering-notes.txt" is particularly useful. For convenience, the most important notes are spread across package comments.

Tested with XED 088c48a2efa447872945168272bcd7005a7ddd91.

Index

Examples

Constants

This section is empty.

Variables

View Source
var PatternAliases = map[string]string{
	"VEX":     "VEXVALID=1",
	"EVEX":    "VEXVALID=2",
	"XOP":     "VEXVALID=3",
	"MemOnly": "MOD!=3",
	"RegOnly": "MOD=3",
}

PatternAliases is extendable map of pattern keys aliases. Maps human-readable key to XED property.

Used in PatternSet.Is.

Functions

func ExpandStates

func ExpandStates(db *Database, s string) string

ExpandStates returns a copy of s where all state macros are expanded. This requires db "states" to be loaded.

Example

This example shows how to use ExpandStates and its effects.

package main

import (
	"fmt"
	"log"
	"strings"

	"golang.org/x/arch/x86/xeddata"
)

func main() {
	const xedPath = "testdata/xedpath"

	input := strings.NewReader(`
{
ICLASS: VEXADD
CPL: 3
CATEGORY: ?
EXTENSION: ?
ATTRIBUTES: AT_A AT_B

PATTERN: _M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_128 _M_MAP_0F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()
OPERANDS: REG0=XMM_R():w:width_dq:fword64 REG1=XMM_N():r:width_dq:fword64 MEM0:r:width_dq:fword64

PATTERN: _M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_128 _M_MAP_0F MOD[0b11] MOD=3 REG[rrr] RM[nnn]
OPERANDS: REG0=XMM_R():w:width_dq:fword64 REG1=XMM_N():r:width_dq:fword64 REG2=XMM_B():r:width_dq:fword64

PATTERN: _M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_256 _M_MAP_0F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()
OPERANDS: REG0=YMM_R():w:qq:fword64 REG1=YMM_N():r:qq:fword64 MEM0:r:qq:fword64

PATTERN: _M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_256 _M_MAP_0F MOD[0b11] MOD=3 REG[rrr] RM[nnn]
OPERANDS: REG0=YMM_R():w:qq:fword64 REG1=YMM_N():r:qq:fword64 REG2=YMM_B():r:qq:fword64
}`)

	objects, err := xeddata.NewReader(input).ReadAll()
	if err != nil {
		log.Fatal(err)
	}
	db, err := xeddata.NewDatabase(xedPath)
	if err != nil {
		log.Fatal(err)
	}

	for _, o := range objects {
		for _, inst := range o.Insts {
			fmt.Printf("old: %q\n", inst.Pattern)
			fmt.Printf("new: %q\n", xeddata.ExpandStates(db, inst.Pattern))
		}
	}

}
Output:

old: "_M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_128 _M_MAP_0F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()"
new: "VEXVALID=1 0x58 VEX_PREFIX=1 VL=0 MAP=1 MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()"
old: "_M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_128 _M_MAP_0F MOD[0b11] MOD=3 REG[rrr] RM[nnn]"
new: "VEXVALID=1 0x58 VEX_PREFIX=1 VL=0 MAP=1 MOD[0b11] MOD=3 REG[rrr] RM[nnn]"
old: "_M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_256 _M_MAP_0F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()"
new: "VEXVALID=1 0x58 VEX_PREFIX=1 VL=1 MAP=1 MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()"
old: "_M_VV_TRUE 0x58  _M_VEX_P_66 _M_VLEN_256 _M_MAP_0F MOD[0b11] MOD=3 REG[rrr] RM[nnn]"
new: "VEXVALID=1 0x58 VEX_PREFIX=1 VL=1 MAP=1 MOD[0b11] MOD=3 REG[rrr] RM[nnn]"

func WalkInsts

func WalkInsts(xedPath string, visit func(*Inst)) error

WalkInsts calls visit function for each XED instruction found at $xedPath/all-dec-instructions.txt.

Types

type AddressSizeMode

type AddressSizeMode int

AddressSizeMode describes address size mode (67H prefix).

const (
	AddrSize16 AddressSizeMode = iota
	AddrSize32
	AddrSize64
)

Possible address size modes. XED calls it ASZ.

func (AddressSizeMode) String

func (asz AddressSizeMode) String() string

String returns asz bit size string. Panics on illegal enumerations.

type CPUMode

type CPUMode int

CPUMode describes availability in certain CPU mode.

const (
	Mode16 CPUMode = iota
	Mode32
	Mode64
)

Possible CPU modes. XED calls it MODE.

type Database

type Database struct {
	// contains filtered or unexported fields
}

Database holds information that is required to properly handle XED datafiles.

func NewDatabase

func NewDatabase(xedPath string) (*Database, error)

NewDatabase returns Database that loads everything it can find in xedPath. Missing lookup file is not an error, but error during parsing of found file is.

Lookup:

"$xedPath/all-state.txt" => db.LoadStates()
"$xedPath/all-widths.txt" => db.LoadWidths()
"$xedPath/all-element-types.txt" => db.LoadXtypes()

$xedPath is the interpolated value of function argument.

The call NewDatabase("") is valid and returns empty database. Load methods can be used to read lookup files one-by-one.

func (*Database) LoadStates

func (db *Database) LoadStates(r io.Reader) error

LoadStates reads XED states definitions from r and updates db. "states" are simple macro substitutions without parameters. See "$XED/obj/dgen/all-state.txt".

func (*Database) LoadWidths

func (db *Database) LoadWidths(r io.Reader) error

LoadWidths reads XED widths definitions from r and updates db. "widths" are 16/32/64 bit mode type sizes. See "$XED/obj/dgen/all-widths.txt".

func (*Database) LoadXtypes

func (db *Database) LoadXtypes(r io.Reader) error

LoadXtypes reads XED xtypes definitions from r and updates db. "xtypes" are low-level XED type names. See "$XED/obj/dgen/all-element-types.txt". See "$XED/obj/dgen/all-element-type-base.txt".

func (*Database) WidthSize

func (db *Database) WidthSize(width string, m OperandSizeMode) string

WidthSize translates width string to size string using desired SizeMode m. For some widths output is the same for any valid value of m.

type Inst

type Inst struct {
	// Object that contains properties that are shared with multiple
	// Inst objects.
	*Object

	// Index is the position inside XED object.
	// Object.Insts[Index] returns this inst.
	Index int

	// Pattern is the sequence of bits and nonterminals used to
	// decode/encode an instruction.
	// Example: "0x0F 0x28 no_refining_prefix MOD[0b11] MOD=3 REG[rrr] RM[nnn]".
	Pattern string

	// Operands are instruction arguments, typicall registers,
	// memory operands and pseudo-resources. Separated by space.
	// Example: "MEM0:rcw:b REG0=GPR8_R():r REG1=XED_REG_AL:rcw:SUPP".
	Operands string

	// Iform is a name for the pattern that starts with the
	// iclass and bakes in the operands. If omitted, XED
	// tries to generate one. We often add custom suffixes
	// to these to disambiguate certain combinations.
	// Example: "MOVAPS_XMMps_XMMps_0F28".
	//
	// Optional.
	Iform string
}

Inst represents a single instruction template.

Some templates contain expandable (macro) pattern and operands which tells that there are more than one real instructions that are expressed by the template.

func (*Inst) String

func (inst *Inst) String() string

String returns pretty-printed inst representation.

Outputs valid JSON string. This property is not guaranteed to be preserved.

type Object

type Object struct {
	// Iclass is instruction class name (opcode).
	// Iclass alone is not enough to uniquely identify machine instructions.
	// Example: "PSRLW".
	Iclass string

	// Disasm is substituted name when a simple conversion
	// from iclass is inappropriate.
	// Never combined with DisasmIntel or DisasmATTSV.
	// Example: "syscall".
	//
	// Optional.
	Disasm string

	// DisasmIntel is like Disasm, but with Intel syntax.
	// If present, usually comes with DisasmATTSV.
	// Example: "jmp far".
	//
	// Optional.
	DisasmIntel string

	// DisasmATTSV is like Disasm, but with AT&T/SysV syntax.
	// If present, usually comes with DisasmIntel.
	// Example: "ljmp".
	//
	// Optional.
	DisasmATTSV string

	// Attributes describes name set for bits in the binary attributes field.
	// Example: "NOP X87_CONTROL NOTSX".
	//
	// Optional. If not present, zero attribute set is implied.
	Attributes string

	// Uname is unique name used for deleting / replacing instructions.
	//
	// Optional. Provided for completeness, mostly useful for XED internal usage.
	Uname string

	// CPL is instruction current privilege level restriction.
	// Can have value of "0" or "3".
	CPL string

	// Category is an ad-hoc categorization of instructions.
	// Example: "SEMAPHORE".
	Category string

	// Extension is an ad-hoc grouping of instructions.
	// If no ISASet is specified, this is used instead.
	// Example: "3DNOW"
	Extension string

	// Exceptions is an exception set name.
	// Example: "SSE_TYPE_7".
	//
	// Optional. Empty exception category generally means that
	// instruction generates no exceptions.
	Exceptions string

	// ISASet is a name for the group of instructions that
	// introduced this feature.
	// Example: "I286PROTECTED".
	//
	// Older objects only defined Extension field.
	// Newer objects may contain both Extension and ISASet fields.
	// For some objects Extension==ISASet.
	// Both fields are required to do precise CPUID-like decisions.
	//
	// Optional.
	ISASet string

	// Flags describes read/written flag bit values.
	// Example: "MUST [ of-u sf-u af-u pf-u cf-mod ]".
	//
	// Optional. If not present, no flags are neither read nor written.
	Flags string

	// A hopefully useful comment.
	//
	// Optional.
	Comment string

	// The object revision.
	//
	// Optional.
	Version string

	// RealOpcode marks unstable (not in SDM yet) instructions with "N".
	// Normally, always "Y" or not present at all.
	//
	// Optional.
	RealOpcode string

	// Insts are concrete instruction templates that are derived from containing Object.
	// Inst contains fields PATTERN, OPERANDS, IFORM in enc/dec instruction.
	Insts []*Inst
}

An Object is a single "dec/enc-instruction" XED object from datafiles.

Field names and their comments are borrowed from Intel XED engineering notes (see "$XED/misc/engineering-notes.txt").

Field values are always trimmed (i.e. no leading/trailing whitespace).

Missing optional members are expressed with an empty string.

Object contains multiple Inst elements that represent concrete instruction with encoding pattern and operands description.

func (*Object) HasAttribute

func (o *Object) HasAttribute(name string) bool

HasAttribute checks that o has attribute with specified name. Note that check is done at "word" level, substring names will not match.

func (*Object) Opcode

func (o *Object) Opcode() string

Opcode returns instruction name or empty string, if appropriate Object fields are not initialized.

type Operand

type Operand struct {
	// Name is an ID with optional nonterminal name part.
	//
	// Possible values: "REG0=GPRv_B", "REG1", "MEM0", ...
	//
	// If nonterminal part is present, name
	// can be split into LHS and RHS with NonTerminalName method.
	Name string

	// Action describes argument types.
	//
	// Possible values: "r", "w", "rw", "cr", "cw", "crw".
	// Optional "c" prefix represents conditional access.
	Action string

	// Width descriptor. It can express simple width like "w" (word, 16bit)
	// or meta-width like "v", which corresponds to {16, 32, 64} bits.
	//
	// Possible values: "", "q", "ds", "dq", ...
	// Optional.
	Width string

	// Xtype holds XED-specific type information.
	//
	// Possible values: "", "f64", "i32", ...
	// Optional.
	Xtype string

	// Attributes serves as container for all other properties.
	//
	// Possible values:
	//   EVEX.b context {
	//     TXT=ZEROSTR  - zeroing
	//     TXT=SAESTR   - suppress all exceptions
	//     TXT=ROUNDC   - rounding
	//     TXT=BCASTSTR - broadcasting
	//   }
	//   MULTISOURCE4 - 4FMA multi-register operand.
	//
	// Optional. For most operands, it's nil.
	Attributes map[string]bool

	// Visibility tells if operand is explicit, implicit or suspended.
	Visibility OperandVisibility
}

Operand holds data that is encoded inside instruction's "OPERANDS" field.

Use NewOperand function to decode operand fields into Operand object.

Example

This example shows how to handle Inst "OPERANDS" field.

package main

import (
	"fmt"
	"log"
	"strings"

	"golang.org/x/arch/x86/xeddata"
)

func main() {
	const xedPath = "testdata/xedpath"

	input := strings.NewReader(`
{
ICLASS: ADD_N_TIMES # Like IMUL
CPL: 3
CATEGORY: BINARY
EXTENSION: BASE
ISA_SET: I86
FLAGS: MUST [ of-mod sf-u zf-u af-u pf-u cf-mod ]

PATTERN: 0xAA MOD[mm] MOD!=3 REG[0b101] RM[nnn] MODRM()
OPERANDS: MEM0:r:width_v REG0=AX:rw:SUPP REG1=DX:w:SUPP
}`)

	objects, err := xeddata.NewReader(input).ReadAll()
	if err != nil {
		log.Fatal(err)
	}
	db, err := xeddata.NewDatabase(xedPath)
	if err != nil {
		log.Fatal(err)
	}

	inst := objects[0].Insts[0] // Single instruction is enough for this example
	for i, rawOperand := range strings.Fields(inst.Operands) {
		operand, err := xeddata.NewOperand(db, rawOperand)
		if err != nil {
			log.Fatalf("parse operand #%d: %+v", i, err)
		}

		visibility := "implicit"
		if operand.IsVisible() {
			visibility = "explicit"
		}
		fmt.Printf("(%s) %s:\n", visibility, rawOperand)

		fmt.Printf("\tname: %q\n", operand.Name)
		if operand.IsVisible() {
			fmt.Printf("\t32/64bit width: %s/%s bytes\n",
				db.WidthSize(operand.Width, xeddata.OpSize32),
				db.WidthSize(operand.Width, xeddata.OpSize64))
		}
	}

}
Output:

(explicit) MEM0:r:width_v:
	name: "MEM0"
	32/64bit width: 4/8 bytes
(implicit) REG0=AX:rw:SUPP:
	name: "REG0=AX"
(implicit) REG1=DX:w:SUPP:
	name: "REG1=DX"

func NewOperand

func NewOperand(db *Database, s string) (*Operand, error)

NewOperand decodes operand string.

See "$XED/pysrc/opnds.py" to learn about fields format and valid combinations.

Requires database with xtypes and widths info.

func (*Operand) IsVisible

func (op *Operand) IsVisible() bool

IsVisible returns true for operands that are usually shown in syntax strings.

func (*Operand) NameLHS

func (op *Operand) NameLHS() string

NameLHS returns left hand side part of the non-terminal name. Example: NameLHS("REG0=GPRv()") => "REG0".

func (*Operand) NameRHS

func (op *Operand) NameRHS() string

NameRHS returns right hand side part of the non-terminal name. Example: NameLHS("REG0=GPRv()") => "GPRv()".

func (*Operand) NonterminalName

func (op *Operand) NonterminalName() bool

NonterminalName returns true if op.Name consist of LHS and RHS parts.

RHS is non-terminal name lookup function expression. Example: "REG0=GPRv()" has "GPRv()" name lookup function.

type OperandSizeMode

type OperandSizeMode int

OperandSizeMode describes operand size mode (66H prefix).

const (
	OpSize16 OperandSizeMode = iota
	OpSize32
	OpSize64
)

Possible operand size modes. XED calls it OSZ.

func (OperandSizeMode) String

func (osz OperandSizeMode) String() string

String returns osz bit size string. Panics on illegal enumerations.

type OperandVisibility

type OperandVisibility int

OperandVisibility describes operand visibility in XED terms.

const (
	// VisExplicit is a default operand visibility.
	// Explicit operand is "real" kind of operands that
	// is shown in syntax and can be specified by the programmer.
	VisExplicit OperandVisibility = iota

	// VisImplicit is for fixed arg (like EAX); usually shown in syntax.
	VisImplicit

	// VisSuppressed is like VisImplicit, but not shown in syntax.
	// In some very rare exceptions, they are also shown in syntax string.
	VisSuppressed

	// VisEcond is encoder-only conditions. Can be ignored.
	VisEcond
)

type PatternSet

type PatternSet map[string]bool

PatternSet wraps instruction PATTERN properties providing set operations on them.

func NewPatternSet

func NewPatternSet(pattern string) PatternSet

NewPatternSet decodes pattern string into PatternSet.

func (PatternSet) Index

func (pset PatternSet) Index(keys ...string) int

Index returns index from keys of first matching key. Returns -1 if does not contain any of given keys.

func (PatternSet) Is

func (pset PatternSet) Is(k string) bool

Is reports whether set contains key k. In contrast with direct pattern set lookup, it does check if PatternAliases[k] is available to be used instead of k in lookup.

func (PatternSet) Match

func (pset PatternSet) Match(keyval ...string) string

Match is like MatchOrDefault("", keyval...).

func (PatternSet) MatchOrDefault

func (pset PatternSet) MatchOrDefault(defaultValue string, keyval ...string) string

MatchOrDefault returns first matching key associated value. Returns defaultValue if no match is found.

Keyval structure can be described as {"k1", "v1", ..., "kN", "vN"}.

func (PatternSet) Replace

func (pset PatternSet) Replace(oldKey, newKey string)

Replace inserts newKey if oldKey is defined. oldKey is removed if insertion is performed.

func (PatternSet) String

func (pset PatternSet) String() string

String returns pattern printer representation. All properties are sorted.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader reads enc/dec-instruction objects from XED datafile.

Example

This example shows how to print raw XED objects using Reader. Objects are called "raw" because some of their fields may require additional transformations like macro (states) expansion.

package main

import (
	"fmt"
	"log"
	"strings"

	"golang.org/x/arch/x86/xeddata"
)

func main() {
	const xedPath = "testdata/xedpath"

	input := strings.NewReader(`
{
ICLASS: VEXADD
EXCEPTIONS: avx-type-zero
CPL: 2000
CATEGORY: AVX-Q
EXTENSION: AVX-Q
ATTRIBUTES: A B C
PATTERN: VV1 0x07 VL128 V66 V0F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()
OPERANDS: REG0=XMM_R():w:width_dq:fword64 REG1=XMM_N():r:width_dq:fword64 MEM0:r:width_dq:fword64
}

{
ICLASS: COND_MOV_Z
CPL: 210
CATEGORY: MOV_IF_COND_MET
EXTENSION: BASE
ISA_SET: COND_MOV
FLAGS: READONLY [ zf-tst ]

PATTERN: 0x0F 0x4F MOD[mm] MOD!=3 REG[rrr] RM[nnn] MODRM()
OPERANDS: REG0=GPRv_R():cw MEM0:r:width_v
PATTERN: 0x0F 0x4F MOD[0b11] MOD=3 REG[rrr] RM[nnn]
OPERANDS: REG0=GPRv_R():cw REG1=GPRv_B():r
}`)

	objects, err := xeddata.NewReader(input).ReadAll()
	if err != nil {
		log.Fatal(err)
	}

	for _, o := range objects {
		fmt.Printf("%s (%s):\n", o.Opcode(), o.Extension)
		for _, inst := range o.Insts {
			fmt.Printf("\t[%d] %s\n", inst.Index, inst.Operands)
		}
	}

}
Output:

VEXADD (AVX-Q):
	[0] REG0=XMM_R():w:width_dq:fword64 REG1=XMM_N():r:width_dq:fword64 MEM0:r:width_dq:fword64
COND_MOV_Z (BASE):
	[0] REG0=GPRv_R():cw MEM0:r:width_v
	[1] REG0=GPRv_R():cw REG1=GPRv_B():r

func NewReader

func NewReader(r io.Reader) *Reader

NewReader returns a new Reader that reads from r.

func (*Reader) Read

func (r *Reader) Read() (*Object, error)

Read reads single XED instruction object from the stream backed by reader.

If there is no data left to be read, returned error is io.EOF.

func (*Reader) ReadAll

func (r *Reader) ReadAll() ([]*Object, error)

ReadAll reads all the remaining objects from r. A successful call returns err == nil, not err == io.EOF, just like csv.Reader.ReadAll().

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL