typerefs

package
v0.0.0-...-ec1a81b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2024 License: BSD-3-Clause Imports: 11 Imported by: 0

Documentation

Overview

Package typerefs extracts symbol-level reachability information from the syntax of a Go package.

Background

The goal of this analysis is to determine, for each package P, a nearly minimal set of packages that could affect the type checking of P. This set may contain false positives, but the smaller this set the better we can invalidate and prune packages in gopls.

More precisely, for each package P we define the set of "reachable" packages from P as the set of packages that may affect the (deep) export data of the direct dependencies of P. By this definition, the complement of this set cannot affect any information derived from type checking P, such as diagnostics, cross references, or method sets. Therefore we need not invalidate any results for P when a package in the complement of this set changes.

Computing references

For a given declaration D, references are computed based on identifiers or dotted identifiers referenced in the declaration of D, that may affect the type of D. However, these references reflect only local knowledge of the package and its dependency metadata, and do not depend on any analysis of the dependencies themselves. This allows the reference information for a package to be cached independent of all others.

Specifically, if a referring identifier I appears in the declaration, we record an edge from D to each object possibly referenced by I. We search for references within type syntax, but do not actually type-check, so we can't reliably determine whether an expression is a type or a term, or whether a function is a builtin or generic. For example, the type of x in var x = p.F(W) only depends on W if p.F is a builtin or generic function, which we cannot know without type-checking package p. So we may over-approximate in this way.

  • If I is declared in the current package, record a reference to its declaration.
  • Otherwise, if there are any dot imports in the current file and I is exported, record a (possibly dangling) edge to the corresponding declaration in each dot-imported package.

If a dotted identifier q.I appears in the declaration, we perform a similar operation:

  • If q is declared in the current package, we record a reference to that object. It may be a var or const that has a field or method I.
  • Otherwise, if q is a valid import name based on imports in the current file and the provided metadata for dependency package names, record a reference to the object I in that package.
  • Additionally, handle the case where Q is exported, and Q.I may refer to a field or method in a dot-imported package.

That is essentially the entire algorithm, though there is some subtlety to visiting the set of identifiers or dotted identifiers that may affect the declaration type. See the visitDeclOrSpec function for the details of this analysis. Notably, we also skip identifiers that refer to type parameters in generic declarations.

Graph optimizations

The references extracted from the syntax are used to construct edges between nodes representing declarations. Edges are of two kinds: internal references, from one package-level declaration to another; and external references, from a symbol in this package to a symbol imported from a direct dependency.

Once the symbol reference graph is constructed, we find its strongly connected components (SCCs) using Tarjan's algorithm. As we coalesce the nodes of each SCC we compute the union of external references reached by each package-level declaration. The final result is the mapping from each exported package-level declaration to the set of external (imported) declarations that it reaches.

Because it is common for many package members to have the same reachability, the result takes the form of a set of equivalence classes, each mapping a set of package-level declarations to a set of external symbols. We use a hash table to canonicalize sets so that repeated occurrences of the same set (which are common) are only represented once in memory or in the file system. For example, all declarations that ultimately reference only {fmt.Println,strings.Join} would be classed as equivalent.

This approach was inspired by the Hash-Value Numbering (HVN) optimization described by Hardekopf and Lin. See golang.org/x/tools/go/pointer/hvn.go for an implementation. (Like pointer analysis, this problem is fundamentally one of graph reachability.) The HVN algorithm takes the compression a step further by preserving the topology of the SCC DAG, in which edges represent "is a superset of" constraints. Redundant edges that don't increase the solution can be deleted. We could apply the same technique here to further reduce the worst-case size of the result, but the current implementation seems adequate.

API

The main entry point for this analysis is the Encode function, which implements the analysis described above for one package, and encodes the result as a binary message.

The Decode function decodes the message into a usable form: a set of equivalence classes. The decoder uses a shared PackageIndex to enable more compact representations of sets of packages (PackageSet) during the global reacahability computation.

The [BuildPackageGraph] constructor implements a whole-graph analysis similar to that which will be implemented by gopls, but for various reasons the logic for this analysis will eventually live in the golang.org/x/tools/gopls/internal/cache package. Nevertheless, BuildPackageGraph and its test serve to verify the syntactic analysis, and may serve as a proving ground for new optimizations of the whole-graph analysis.

Export data is insufficient

At first it may seem that the simplest way to implement this analysis would be to consider the types.Packages of the dependencies of P, for example during export. After all, it makes sense that the type checked packages themselves could describe their dependencies. However, this does not work as type information does not describe certain syntactic relationships.

For example, the following scenarios cause type information to miss syntactic relationships:

Named type forwarding:

package a; type A b.B
package b; type B int

Aliases:

package a; func A(f b.B)
package b; type B = func()

Initializers:

package a; var A = b.B()
package b; func B() string { return "hi" }

Use of the unsafe package:

package a; type A [unsafe.Sizeof(B{})]int
package b; type B struct { f1, f2, f3 int }

In all of these examples, types do not contain information about the edge between the a.A and b.B declarations.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Encode

func Encode(files []*parsego.File, imports map[metadata.ImportPath]*metadata.Package) []byte

Encode analyzes the Go syntax trees of a package, constructs a reference graph, and uses it to compute, for each exported declaration, the set of exported symbols of directly imported packages that it references, perhaps indirectly.

It returns a serializable index of this information. Use Decode to expand the result.

Types

type Class

type Class struct {
	Decls []string // sorted set of names of exported decls with same reachability
	Refs  []Symbol // set of external symbols, in ascending (PackageID, Name) order
}

A Class is a reachability equivalence class.

It attests that each exported package-level declaration in Decls references (perhaps indirectly) one of the external (imported) symbols in Refs.

Because many Decls reach the same Refs, it is more efficient to group them into classes.

func Decode

func Decode(pkgIndex *PackageIndex, data []byte) []Class

Decode decodes a serializable index of symbol reachability produced by Encode.

Because many declarations reference the exact same set of symbols, the results are grouped into equivalence classes. Classes are sorted by Decls[0], ascending. The class with empty reachability is omitted.

See the package documentation for more details as to what a reference does (and does not) represent.

type IndexID

type IndexID int

An IndexID is a small integer that uniquely identifies a package within a given PackageIndex.

type PackageIndex

type PackageIndex struct {
	// contains filtered or unexported fields
}

PackageIndex stores common data to enable efficient representation of references and package sets.

func NewPackageIndex

func NewPackageIndex() *PackageIndex

NewPackageIndex creates a new PackageIndex instance for use in building reference and package sets.

func (*PackageIndex) DeclaringPackage

func (index *PackageIndex) DeclaringPackage(sym Symbol) metadata.PackageID

DeclaringPackage returns the ID of the symbol's declaring package. The package index must be the one used during decoding.

func (*PackageIndex) IndexID

func (index *PackageIndex) IndexID(id metadata.PackageID) IndexID

IndexID returns the packageIdx referencing id, creating one if id is not yet tracked by the receiver.

func (*PackageIndex) NewSet

func (index *PackageIndex) NewSet() *PackageSet

NewSet creates a new PackageSet bound to this PackageIndex instance.

PackageSets may only be combined with other PackageSets from the same instance.

func (*PackageIndex) PackageID

func (index *PackageIndex) PackageID(idx IndexID) metadata.PackageID

PackageID returns the PackageID for idx.

idx must have been created by this PackageIndex instance.

type PackageSet

type PackageSet struct {
	// contains filtered or unexported fields
}

A PackageSet is a set of metadata.PackageIDs, optimized for inuse memory footprint and efficient union operations.

func (*PackageSet) Add

func (s *PackageSet) Add(idx IndexID)

Add records a new element in the package set. It is the caller's responsibility to ensure that idx was created with the same PackageIndex as the PackageSet.

func (*PackageSet) AddPackage

func (s *PackageSet) AddPackage(id metadata.PackageID)

Add records a new element in the package set, for the provided package ID.

func (*PackageSet) Contains

func (s *PackageSet) Contains(id metadata.PackageID) bool

Contains reports whether id is contained in the receiver set.

func (*PackageSet) Elems

func (s *PackageSet) Elems(f func(IndexID))

Elems calls f for each element of the set in ascending order.

func (*PackageSet) String

func (s *PackageSet) String() string

String returns a human-readable representation of the set: {A, B, ...}.

func (*PackageSet) Union

func (s *PackageSet) Union(other *PackageSet)

Union records all elements from other into the receiver, mutating the receiver set but not the argument set. The receiver must not be nil, but the argument set may be nil.

Precondition: both package sets were created with the same PackageIndex.

type Symbol

type Symbol struct {
	Package IndexID // w.r.t. PackageIndex passed to decoder
	Name    string
}

A Symbol represents an external (imported) symbol referenced by the analyzed package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL