Introduction to the Go compiler
cmd/compile contains the main packages that form the Go compiler. The compiler
may be logically split in four phases, which we will briefly describe alongside
the list of packages that contain their code.
You may sometimes hear the terms "front-end" and "back-end" when referring to the compiler. Roughly speaking, these translate to the first two and last two phases we are going to list here. A third term, "middle-end", often refers to much of the work that happens in the second phase.
Note that the
go/* family of packages, such as
have no relation to the compiler. Since the compiler was initially written in C,
go/* packages were developed to enable writing tools working with Go code,
It should be clarified that the name "gc" stands for "Go compiler", and has little to do with uppercase "GC", which stands for garbage collection.
cmd/compile/internal/syntax(lexer, parser, syntax tree)
In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.
Each syntax tree is an exact representation of the respective source file, with nodes corresponding to the various elements of the source such as expressions, declarations, and statements. The syntax tree also includes position information which is used for error reporting and the creation of debugging information.
2. Type-checking and AST transformations
cmd/compile/internal/gc(create compiler AST, type checking, AST transformations)
The gc package includes an AST definition carried over from when it was written in C. All of its code is written in terms of it, so the first thing that the gc package must do is convert the syntax package's syntax tree to the compiler's AST representation. This extra step may be refactored away in the future.
The AST is then type-checked. The first steps are name resolution and type inference, which determine which object belongs to which identifier, and what type each expression has. Type-checking includes certain extra checks, such as "declared and not used" as well as determining whether or not a function terminates.
Certain transformations are also done on the AST. Some nodes are refined based on type information, such as string additions being split from the arithmetic addition node type. Some other examples are dead code elimination, function call inlining, and escape analysis.
3. Generic SSA
cmd/compile/internal/gc(converting to SSA)
cmd/compile/internal/ssa(SSA passes and rules)
In this phase, the AST is converted into Static Single Assignment (SSA) form, a lower-level intermediate representation with specific properties that make it easier to implement optimizations and to eventually generate machine code from it.
During this conversion, function intrinsics are applied. These are special functions that the compiler has been taught to replace with heavily optimized code on a case-by-case basis.
Certain nodes are also lowered into simpler components during the AST to SSA conversion, so that the rest of the compiler can work with them. For instance, the copy builtin is replaced by memory moves, and range loops are rewritten into for loops. Some of these currently happen before the conversion to SSA due to historical reasons, but the long-term plan is to move all of them here.
Then, a series of machine-independent passes and rules are applied. These do not
concern any single computer architecture, and thus run on all
Some examples of these generic passes include dead code elimination, removal of unneeded nil checks, and removal of unused branches. The generic rewrite rules mainly concern expressions, such as replacing some expressions with constant values, and optimizing multiplications and float operations.
4. Generating machine code
cmd/compile/internal/ssa(SSA lowering and arch-specific passes)
cmd/internal/obj(machine code generation)
The machine-dependent phase of the compiler begins with the "lower" pass, which rewrites generic values into their machine-specific variants. For example, on amd64 memory operands are possible, so many load-store operations may be combined.
Note that the lower pass runs all machine-specific rewrite rules, and thus it currently applies lots of optimizations too.
Once the SSA has been "lowered" and is more specific to the target architecture, the final code optimization passes are run. This includes yet another dead code elimination pass, moving values closer to their uses, the removal of local variables that are never read from, and register allocation.
Other important pieces of work done as part of this step include stack frame layout, which assigns stack offsets to local variables, and pointer liveness analysis, which computes which on-stack pointers are live at each GC safe point.
At the end of the SSA generation phase, Go functions have been transformed into
a series of obj.Prog instructions. These are passed to the assembler
cmd/internal/obj), which turns them into machine code and writes out the
final object file. The object file will also contain reflect data, export data,
and debugging information.
To dig deeper into how the SSA package works, including its passes and rules, head to cmd/compile/internal/ssa/README.md.
Compile, typically invoked as “go tool compile,” compiles a single Go package comprising the files named on the command line. It then writes a single object file named for the basename of the first source file with a .o suffix. The object file can then be combined with other objects into a package archive or passed directly to the linker (“go tool link”). If invoked with -pack, the compiler writes an archive directly, bypassing the intermediate object file.
The generated files contain type information about the symbols exported by the package and about types used by symbols imported by the package from other packages. It is therefore not necessary when compiling client C of package P to read the files of P's dependencies, only the compiled output of P.
Command Line ¶
go tool compile [flags] file...
The specified files must be Go source files and all part of the same package. The same compiler is used for all target operating systems and architectures. The GOOS and GOARCH environment variables set the desired target.
-D path Set relative path for local imports. -I dir1 -I dir2 Search for imported packages in dir1, dir2, etc, after consulting $GOROOT/pkg/$GOOS_$GOARCH. -L Show complete file path in error messages. -N Disable optimizations. -S Print assembly listing to standard output (code only). -S -S Print assembly listing to standard output (code and data). -V Print compiler version and exit. -asmhdr file Write assembly header to file. -blockprofile file Write block profile for the compilation to file. -complete Assume package has no non-Go components. -cpuprofile file Write a CPU profile for the compilation to file. -dynlink Allow references to Go symbols in shared libraries (experimental). -e Remove the limit on the number of errors reported (default limit is 10). -h Halt with a stack trace at the first error detected. -importmap old=new Interpret import "old" as import "new" during compilation. The option may be repeated to add multiple mappings. -installsuffix suffix Look for packages in $GOROOT/pkg/$GOOS_$GOARCH_suffix instead of $GOROOT/pkg/$GOOS_$GOARCH. -l Disable inlining. -lang version Set language version to compile, as in -lang=go1.12. Default is current version. -largemodel Generate code that assumes a large memory model. -linkobj file Write linker-specific object to file and compiler-specific object to usual output file (as specified by -o). Without this flag, the -o output is a combination of both linker and compiler input. -memprofile file Write memory profile for the compilation to file. -memprofilerate rate Set runtime.MemProfileRate for the compilation to rate. -msan Insert calls to C/C++ memory sanitizer. -mutexprofile file Write mutex profile for the compilation to file. -nolocalimports Disallow local (relative) imports. -o file Write object to file (default file.o or, with -pack, file.a). -p path Set expected package import path for the code being compiled, and diagnose imports that would cause a circular dependency. -pack Write a package (archive) file rather than an object file -race Compile with race detector enabled. -trimpath prefix Remove prefix from recorded source file paths.
There are also a number of debugging flags; run the command with no arguments for a usage message.
Compiler Directives ¶
The compiler accepts directives in the form of comments. To distinguish them from non-directive comments, directives require no space between the comment opening and the name of the directive. However, since they are comments, tools unaware of the directive convention or of a particular directive can skip over a directive like any other comment.
Line directives come in several forms:
//line :line //line :line:col //line filename:line //line filename:line:col /*line :line*/ /*line :line:col*/ /*line filename:line*/ /*line filename:line:col*/
In order to be recognized as a line directive, the comment must start with //line or /*line followed by a space, and must contain at least one colon. The //line form must start at the beginning of a line. A line directive specifies the source position for the character immediately following the comment as having come from the specified file, line and column: For a //line comment, this is the first character of the next line, and for a /*line comment this is the character position immediately following the closing */. If no filename is given, the recorded filename is empty if there is also no column number; otherwise it is the most recently recorded filename (actual filename or filename specified by previous line directive). If a line directive doesn't specify a column number, the column is "unknown" until the next directive and the compiler does not report column numbers for that range. The line directive text is interpreted from the back: First the trailing :ddd is peeled off from the directive text if ddd is a valid number > 0. Then the second :ddd is peeled off the same way if it is valid. Anything before that is considered the filename (possibly including blanks and colons). Invalid line or column values are reported as errors.
//line foo.go:10 the filename is foo.go, and the line number is 10 for the next line //line C:foo.go:10 colons are permitted in filenames, here the filename is C:foo.go, and the line is 10 //line a:100 :10 blanks are permitted in filenames, here the filename is " a:100 " (excluding quotes) /*line :10:20*/x the position of x is in the current file with line number 10 and column number 20 /*line foo: 10 */ this comment is recognized as invalid line directive (extra blanks around line number)
Line directives typically appear in machine-generated code, so that compilers and debuggers will report positions in the original input to the generator.
The line directive is an historical special case; all other directives are of the form //go:name and must start at the beginning of a line, indicating that the directive is defined by the Go toolchain.
The //go:noescape directive specifies that the next declaration in the file, which must be a func without a body (meaning that it has an implementation not written in Go) does not allow any of the pointers passed as arguments to escape into the heap or into the values returned from the function. This information can be used during the compiler's escape analysis of Go code calling the function.
The //go:nosplit directive specifies that the next function declared in the file must not include a stack overflow check. This is most commonly used by low-level runtime sources invoked at times when it is unsafe for the calling goroutine to be preempted.
//go:linkname localname importpath.name
The //go:linkname directive instructs the compiler to use “importpath.name” as the object file symbol name for the variable or function declared as “localname” in the source code. Because this directive can subvert the type system and package modularity, it is only enabled in files that have imported "unsafe".