dce

package
v1.19.0-beta2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2025 License: BSD-2-Clause Imports: 10 Imported by: 0

README

Dead-Code Elimination

Dead-Code Eliminations (DCE) is used to remove code that isn't reachable from a code entry point. Entry points are code like the main method, init functions, and variable initializations with side effects. These entry points are always considered alive. Any dependency of something alive, is also considered alive.

Once all dependencies are taken into consideration we have the set of alive declarations. Anything not considered alive is considered dead and may be safely eliminated, i.e. not outputted to JS.

Idea

The following is the logic behind the DCE mechanism. Not all of the following is used since some conditions are difficult to determine even with a lot of additional information, and because GopherJS stores some additional information making some parts of DCE unnecessary. To ensure that the JS output is fully functional, we bias the DCE towards things being alive. We'd rather keep something we don't need than remove something that is needed.

Package

Package declarations (e.g. package foo) might be able to be removed when only used by dead-code. However, packages may be imported and not used for various reasons including to invoke some initialization or to implement a link. So it is difficult to determine. See Dead Package example.

Currently, we won't remove any packages, but someday the complexity could be added to check for inits, side effects, links, etc then determine if any of those are are alive or affect alive things.

Named Types

Named type definitions (e.g. type Foo int) depend on the underlying type for each definition.

When a named type is alive, all of its exported methods (e.g. func (f Foo) Bar() { }) are also alive, even any unused exported method. Unused exported methods are still important when duck-typing. See Interfaces for more information. See Grandmas and Zombies for an example of what can happen when removing an unused exported method.

Also unused exported methods could be accessed by name via reflect (e.g. reflect.ValueOf(&Foo{}).MethodByName("Bar")). Since the string name may be provided from outside the code, such as the command line, it is impossible to determine which exported methods could be accessed this way. It would be very difficult to determine which types are ever accessed via reflect so by default we simply assume any can be.

Methods that are unexported may be considered dead when unused even when the receiver type is alive. The exception is when an interface in the same package has the same unexported method in it. See Interfaces for more information.

Named Structs

A named struct is a named type that has a struct as its underlying type, e.g. type Foo struct { }. A struct type depends on all of the types in its fields and embedded fields.

If the struct type is alive then all the types of the fields will also be alive. Even unexported fields maybe accessed via reflections, so they all must be alive. Also, the fields are needed for comparisons and serializations (such as encoding/binary).

Interfaces

All the types in the function signatures and embedded interfaces are the dependents of the interface.

Interfaces may contain exported and unexported function signatures. If an interface is alive then all of the functions are alive. Since there are many ways to wrap a type with an interface, any alive type that duck-types to an interface must have all of the matching methods also alive.

In theory the unexported functions are also alive however, for GopherJS there is an exception because duck-typing is handled separately from the method definitions. Those difference are discussed in Dependencies but for this idea we discuss DCE more generally.

Since the exported methods in an alive type will be alive, see Named Types, the only ones here that need to be considered are the unexported methods. An interface with unexported methods may only duck-type to types within the package the interface is defined in. Therefore, if an interface is alive with unexported methods, then all alive types within the same package that duck-type to that interface, will have the matching unexported methods be alive.

Since doing a full types.Implements check between every named types and interfaces in a package is difficult, we simplify this requirement to be any unexported method in an alive named type that matches an unexported method in an alive interface is alive even if the named type doesn't duck-type to the interface. This means that in some rare cases, some unexported methods on named structs that could have been eliminated will not be. For example, given type Foo struct{}; func(f Foo) X(); func (f Foo) y() the Foo.y() method may be alive if types Bar interface { Z(); y() } is alive even though the X() and Z() means that Foo doesn't implement Bar and therefore Foo.y() can not be called via a Bar.y().

We will try to reduce the false positives in alive unexported methods by using the parameter and result types of the methods. Meaning that y(), y(int), y() int, etc won't match just because they are named y. This also helps with a generic type's unexported methods that use type parameters, e.g. Foo.y(T). Since the generic type may be instantiated with int and string, the different instances of the method are Foo.y(int) and Foo.y(string). By using the parameter and result types, it is possible to remove the unused unexported method instantiations even when some instantiations of the same method are used.

Functions

Functions with or without a receiver are dependent on the types used by the parameters, results, and type uses inside the body of the function. They are also dependent on any function invoked or used, and any package level variable that is used.

Unused functions without a receiver, that are exported or not, may be considered dead since they aren't used in duck-typing and cannot be accessed by name via reflections.

Variables

Variables (or constants) depend on their type and anything used during initialization.

The exported or unexported variables are dead unless they are used by something else that is alive or if the initialization has side effects.

If the initialization has side effects the variable will be alive even if unused. The side effect may be simply setting another variable's value that is also unused, however it would be difficult to determine if the side effects are used or not. See Side Effects example.

Generics and Instances

For functions and types with generics, the definitions are split into unique instances. For example, type StringKeys[T any] map[string]T could be used in code as StringKeys[int] and StringKeys[*Cat]. We don't need all possible instances, only the ones which are realized in code. Each instance depends on the realized parameter types (type arguments). In the example the type arguments are int and *Cat.

The instance of the generic type also defines the code with the specific type arguments (e.g. map[string]int and map[string]*Cat). When an instance is depended on by alive code, only that instance is alive, not the entire generic type. This means if StringKey[*Cat] is only used from dead code then it is also dead and can be safely eliminated.

The named generic types may have methods that are also copied for an instance with the parameter types replaced by the type arguments. For example, func (sk StringKeys[T]) values() []T { ... } becomes func (sk StringKeys[int]) values() []int { ... } when the type argument is int. This method in the instance now duck-types to interface { values() []int } and therefore must follow the rules for unexported methods. See Instance Duck-typing example for more information.

Functions and named types may be generic, but methods and unnamed types may not be. This makes somethings simpler. A method with a receiver is used, only the receiver's type arguments are needed. The generic type or function may not be needed since only the instances are written out.

This also means that inside of a generic function or named type there is only one type parameter list being used. Even generic types used inside of the generic function must be specified in terms of the type parameter for the generic and doesn't contribute any type parameters of it's own. For example, inside of func Foo[K comparable, V any]() { ... } every usage of a generic type must specify a concrete type (int, *Cat, Bar[Bar[bool]]) or use the parameter types K and V. This is simpler than languages that allow a method of an object to have it's own type parameters, e.g. class X<T> { void Y<U>() { ... } ... }.

However, generics mean that the same method, receiver, type, etc names will be used with different parameters types caused by different type arguments. The type arguments are being passed into those parameter types for a specific instance. When an interface is alive, the signatures for unexported methods need to be instantiated with type arguments so that we know which instances the interface is duck-typing to. See Interfaces for more detail.

Links use compiler directives (//go:linkname) to alias a var or func with another. For example some code may have func bar_foo() as a function stub that is linked with foo() { ... } as a function with a body, i.e. the target of the link. The links are single directional but allow multiple stubs to link to the same target.

When a link is made, the dependencies for the linked code come from the target. If the target is used by something alive then it is alive. If a stub linked to a target is used by something alive then that stub and the target are both alive.

Since links cross package boundaries in ways that may violate encapsulation and the dependency tree, it may be difficult to determine if a link is alive or not. Therefore, currently all links are considered alive.

Design

The design is created taking all the parts of the above idea together and simplifying the justifications down to a simple set of rules.

Initially alive
  • The main method in the main package
  • The init in every included file
  • Any variable initialization that has a side effect
  • Any linked function or variable
  • Anything not given a DCE named, e.g. packages
Naming

The following specifies what declarations should be named and how the names should look. These names are later used to match (via string comparisons) dependencies with declarations that should be set as alive. Since the names are used to filter out alive code from all the code these names may also be referred to as filters.

Some names will have multiple name parts; an object name and method name. This is kind of like a first name and last name when a first name alone isn't specific enough. This helps with matching multiple dependency requirements for a declaration, i.e. both name parts must be alive before the declaration is considered alive.

Currently, only unexported method declarations will have a method name to support duck-typing with unexported signatures on interfaces. If the unexported method is depended on, then both names will be in the dependencies. If the receiver is alive and an alive interface has the matching unexported signature, then both names will be depended on thus making the unexported method alive. Since the unexported method is only visible in the package in which it is defined, the package path is included in the method name.

To simplify the above for GopherJS, we don't look at the receiver for an unexported method before indicating it is alive. Meaning if there is no interface, only two named objects with identical unexported methods, the use of either will indicate a use of both. This will cause slightly more unexported methods to be alive while reducing the complication of type checking which object or type of object is performing the call.

Declaration exported unexported generic object name method name
variables <package>.<var name>
functions <package>.<func name>
functions <package>.<func name>[<type args>]
named type <package>.<type name>
named type <package>.<type name>[<type args>]
method <package>.<receiver name>
method <package>.<receiver name>[<type args>]
method <package>.<receiver name> <package>.<method name>(<parameter types>)(<result types>)
method <package>.<receiver name>[<type args>] <package>.<method name>(<parameter types>)(<result types>)

For nested types, the nest (the function or method in which the type is declared) needs to be taken into consideration to differentiate types with the same name but different nests.

Nest type generic nest generic object object name
function <package>.<func name>:<type name>
function <package>.<func name>:<type name>[<nest args>;]
function <package>.<func name>:<type name>[<type args>]
function <package>.<func name>:<type name>[<nest args>;<type args>]
method <package>.<receiver name>:<method name>:<type name>
method <package>.<receiver name>:<method name>:<type name>[<nest args>;]
method <package>.<receiver name>:<method name>:<type name>[<type args>]
method <package>.<receiver name>:<method name>:<type name>[<nest args>;<type args>]
Name Specifics

The following are specifics about the different types of names that show up in the above table. This isn't the only way to represent this information. These names can get long but don't have to. The goal is to make the names as unique as possible whilst still ensuring that signatures in interfaces will still match the correct methods. The less unique the more false positives for alive will occur meaning more dead code is kept alive. However, too unique could cause needed alive code to not match and be eliminated causing the application to not run.

<package>.<var name>, <package>.<func name>, <package>.<type name> and <package>.<receiver name> all have the same form. They are the package path followed by a ., if there is a package path, and the object name or receiver name. For example rand.Shuffle will be named math/rand.Shuffle. The builtin error will be named error without a package path.

<package>.<func name>[<type args>], <package>.<type name>[<type args>], and <package>.<receiver name>[<type args>] are the same as above except with comma separated type parameters or type arguments in square brackets. The type parameter names are not used, instead the constraint types are since the names for type parameters may not match even if the constraints match. For example type Foo[T any] struct{}; type Bar[B any] { f Foo[B] } has Foo[B] used in Bar that is identical to Foo[T] even though technically Foo[B] is an instance of Foo[T] with the B type parameter as the type argument.

Command compiles, i.e. compiles with a main entry point, and test builds should not have any type parameters that aren't resolved to concrete types, however to handle partial compiles of packages, there may still be a type parameter, including unions of approximate constraints, i.e. ~int|~string.

Therefore, type arguments need to be reduced to only types. This means something like maps.Keys, i.e. func Keys[Map ~map[K]V, K comparable, V any](m Map) iter.Seq[K], will be named maps.Keys[~map[comparable]any, comparable, any] as a generic. If the instances for Map are map[string]int and map[int][]*cats.Cat, then respectively the names would be maps.Keys[map[string]int, string, int] and maps.Keys[map[int][]*cats.Cat, int, []*cats.Cat]. If this function is used in func Foo[T ~string|~int](data map[string]T) { ... maps.Keys(data) ... } then the instance of maps.Keys that Foo depends on would be named maps.Keys[map[string]~int|~string, string, ~int|~string].

For the method name of unexposed methods, <package>.<method name>(<parameter types>)(<result types>), the prefix, <package>.<method name>, is in the same format as <package>.<func name>. The rest contains the signature, (<parameter types>)(<result types>). The signature is defined with only the types since (v, u int)(ok bool, err error) should match (x, y int)(bool, error). To match both, both will have to be (int, int)(bool, error). Also the parameter types should include the veridic indicator, e.g. sum(...int) int, since that affects how the signature is matched. If there are no results then the results part is left off. Otherwise, the result types only need parenthesis if there are more than one result, e.g. (int, int), (int, int) bool, and (int, int)(bool, error).

In either the object name or method name, if there is a recursive type parameter, e.g. func Foo[T Bar[T]]() the second usage of the type parameter will have it's type parameters as ... to prevent an infinite loop whilst also indicating which object in the type parameter is recursive, e.g. Foo[Bar[Bar[...]]].

Dependencies

The dependencies are specified in an expression. For example a function that invokes another function will be dependent on that invoked function. When a dependency is added it will be added as one or more names to the declaration that depends on it. It follows the naming rules so that the dependencies will match correctly.

In theory, structural dependencies would be needed to be added automatically while the declaration is being named. When an interface is named, it would automatically add all unexported signatures as dependencies via <package path>.<method name>(<parameter type list>)(<result type list>). However, we do not need to do that in GopherJS because we aren't using the existence of realized methods in duck-typing. GopherJS stores full set of method information when describing the type so that, even when things like unexported methods in interfaces are removed, duck-typing will still work correctly. This reduces the size of the code by not keeping a potentially long method body when the signature is all that is needed.

Currently we don't filter unused packages so there is no need to automatically add dependencies on the packages themselves. This is also why the package declarations aren't named and therefore are always alive.

Examples

Dead Package

In this example, a point package defines a Point object. The point package may be used by several repos as shared code so can not have code manually removed from it to reduce its dependencies for specific applications.

For the current example, the Distance method is never used and therefore dead. The Distance method is the only method dependent on the math package. It might be safe to make the whole math package dead too and eliminate it in this case, however, it is possible that some packages aren't used on purpose and their reason for being included is to invoke the initialization functions within the package. If a package has any inits or any variable definitions with side effects, then the package can not be safely removed.

package point

import "math"

type Point struct {
   X float64
   Y float64
}

func (p Point) Sub(other Point) Point {
   p.X -= other.X
   p.Y -= other.Y
   return p
}

func (p Point) ToQuadrant1() Point {
   if p.X < 0.0 {
      p.X = -p.X
   }
   if p.Y < 0.0 {
      p.Y = -p.Y
   }
   return p
}

func (p Point) Manhattan(other Point) float64 {
   a := p.Sub(other).ToQuadrant1()
   return a.X + a.Y
}

func (p Point) Distance(other Point) float64 {
   d := p.Sub(other)
   return math.Sqrt(d.X*d.X + d.Y*d.Y)
}
package main

import "point"

func main() {
   a := point.Point{X: 10.2, Y: 45.3}
   b := point.Point{X: -23.0, Y: 7.7}
   println(`Manhattan a to b:`, a.Manhattan(b))
}
Grandmas and Zombies

In this example, the following code sorts grandmas and zombies by if they are Dangerous. The method EatBrains is never used. If we remove EatBrains from Zombie then both the grandmas and zombies are moved to the safe bunker. If we remove EatBrains from Dangerous then both grandmas and zombies will be moved to the air lock because Dangerous will duck-type to all Person instances. Unused exported methods and signatures must be considered alive if the type is alive.

package main

import "fmt"

type Person interface {
   MoveTo(loc string)
}

type Dangerous interface {
   Person
   EatBrains()
}

type Grandma struct{}

func (g Grandma) MoveTo(loc string) {
   fmt.Println(`grandma was moved to`, loc)
}

type Zombie struct{}

func (z Zombie) MoveTo(loc string) {
   fmt.Println(`zombie was moved to`, loc)
}

func (z Zombie) EatBrains() {}

func main() {
   people := []Person{Grandma{}, Zombie{}, Grandma{}, Zombie{}}
   for _, person := range people {
      if _, ok := person.(Dangerous); ok {
         person.MoveTo(`air lock`)
      } else {
         person.MoveTo(`safe bunker`)
      }
   }
}
Side Effects

In this example unused variables are being initialized with expressions that has side effects. The max value is 8 by the time main is called because each initialization calls count() that increments max. The expression doesn't have to have a function call and can be any combination of operations.

An initialization may have a side effect even if it doesn't set a value. For example, simply printing a message to the console is a side effect that can not be removed even if it is part of an unused initializer.

package main

import "fmt"

func count() int {
   max++
   return max
}

var (
   max  = 0
   _    = count() // a
   b, c = count(), count()
   x    = []int{count(), count(), count()}[0]
   y, z = func() (int, int) { return count(), count() }()
)

func main() {
   fmt.Println(`max count`, max) // Outputs: max count 8
}
Instance Duck-typing

In this example the type StringKeys[T any] is a map that stores any kind of value with string keys. There is an interface IntProvider that StringKeys will duck-type to iff the type argument is int, i.e. StringKeys[int]. This exemplifies how the type arguments used in the type arguments affect the overall signature such that in some cases a generic object may match an interface and in others it may not.

Also notice that the structure was typed with T as the parameter type's name whereas the methods use S. This shows that the name of the type doesn't matter in the instancing. Therefore, outputting a methods name (assuming it is unexported) should use the type argument type, not the type parameter name, e.g. value() []int or value() []any instead of value() []S or value() []T.

package main

import (
   "fmt"
   "sort"
)

type StringKeys[T any] map[string]T

func (sk StringKeys[S]) Keys() []string {
   keys := make([]string, 0, len(sk))
   for key := range sk {
      keys = append(keys, key)
   }
   sort.Strings(keys)
   return keys
}

func (sk StringKeys[S]) Values() []S {
   values := make([]S, len(sk))
   for i, key := range sk.Keys() {
      values[i] = sk[key]
   }
   return values
}

type IntProvider interface {
   Values() []int
}

func Sum(data IntProvider) int {
   sum := 0
   for _, value := range data.Values() {
      sum += value
   }
   return sum
}

func main() {
   sInt := StringKeys[int]{
      `one`:   1,
      `two`:   2,
      `three`: 3,
      `four`:  4,
   }
   fmt.Println(sInt.Keys())   // Outputs: [four one three two]
   fmt.Println(sInt.Values()) // Outputs: [4 1 3 2]
   fmt.Println(Sum(sInt))     // Outputs: 10

   sFp := StringKeys[float64]{
      `one`:   1.1,
      `two`:   2.2,
      `three`: 3.3,
      `four`:  4.4,
   }
   fmt.Println(sFp.Keys())   // Outputs: [four one three two]
   fmt.Println(sFp.Values()) // [4.4 1.1 3.3 2.2]
   //fmt.Println(Sum(sFp))   // Fails with "StringKeys[float64] does not implement IntProvider"
}

Additional Notes

This DCE is different from those found in Muchnick, Steven S.. “Advanced Compiler Design and Implementation.” (1997), Chapter 18 Control-Flow and Low-Level Optimization, Section 10 Dead-Code Elimination. And different from related DCE designs such as Knoop, Rüthing, and Steffen. "Partial dead code elimination." (1994), SIGPLAN Not. 29, 6, 147–158. See DCE wiki for more information.

Those discuss DCE at the block code level where the higher level constructs such as functions and objects have been reduced to a graphs of blocks with variables, procedures, and routines. Since we want to keep the higher level constructs during transpilation, we simply are reducing the higher level constructs not being used.

Any variable internal to the body of a function or method that is unused or only used for computing new values for itself, are left as is. The Go compiler and linters have requirements that attempt to prevent this kind of dead-code in a function body (unless an underscore is used to quite usage warnings, e.g. _ = unusedVar) and prevent unreachable code. Therefore, we aren't going to worry about trying to DCE inside of function bodies or in variable initializers.

GopherJS does not implicitly perform JS Tree Shaking Algorithms, as discussed in How Modern Javascript eliminate dead code (2023) at this time and provides no guarantees about the effectiveness of running such an algorithm on the resulting JS.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Collector

type Collector struct {
	// contains filtered or unexported fields
}

Collector is a tool to collect dependencies for a declaration that'll be used in dead-code elimination (DCE).

func (*Collector) CollectDCEDeps

func (c *Collector) CollectDCEDeps(decl Decl, f func())

CollectDCEDeps captures a list of Go objects (types, functions, etc.) the code translated inside f() depends on. Then sets those objects as dependencies of the given dead-code elimination info.

Only one CollectDCEDeps call can be active at a time.

func (*Collector) DeclareDCEDep

func (c *Collector) DeclareDCEDep(o types.Object, tNest, tArgs []types.Type)

DeclareDCEDep records that the code that is currently being transpiled depends on a given Go object with optional type arguments.

The given optional type arguments are used to when the object is a function with type parameters or anytime the object doesn't carry them. If not given, this attempts to get the type arguments from the object.

type Decl

type Decl interface {
	Dce() *Info
}

Decl is any code declaration that has dead-code elimination (DCE) information attached to it.

type DeclConstraint

type DeclConstraint interface {
	Decl
	comparable
}

DeclConstraint is type constraint for any code declaration that has dead-code elimination (DCE) information attached to it and will be used in a set.

type Info

type Info struct {
	// contains filtered or unexported fields
}

Info contains information used by the dead-code elimination (DCE) logic to determine whether a declaration is alive or dead.

func (*Info) GobDecode

func (id *Info) GobDecode(data []byte) error

func (*Info) GobEncode

func (id *Info) GobEncode() ([]byte, error)

func (*Info) SetAsAlive

func (d *Info) SetAsAlive()

SetAsAlive marks the declaration as alive, meaning it will not be eliminated.

This should be called by an entry point (like main() or init() functions) or a variable initializer which has a side effect, consider it live.

func (*Info) SetName

func (d *Info) SetName(o types.Object, tNest, tArgs []types.Type)

SetName sets the name used by DCE to represent the declaration this DCE info is attached to.

The given optional type arguments are used to when the object is a function with type parameters or anytime the object doesn't carry them. If not given, this attempts to get the type arguments from the object.

func (*Info) String

func (d *Info) String() string

String gets a human-readable representation of the DCE info.

type Selector

type Selector[D DeclConstraint] struct {
	// contains filtered or unexported fields
}

Selector gathers all declarations that are still alive after dead-code elimination.

func (*Selector[D]) AliveDecls

func (s *Selector[D]) AliveDecls() map[D]struct{}

AliveDecls returns a set of declarations that are still alive after dead-code elimination. This should only be called once all declarations have been included.

func (*Selector[D]) Include

func (s *Selector[D]) Include(decl D, implementsLink bool)

Include will add a new declaration to be checked as alive or not.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL