pointer

package

v0.0.0-...-339fcaa Latest Latest Go to latest Published: Jun 13, 2014 License: BSD-3-Clause Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/t-yuki/go.tools.old

Links

Open Source Insights

Documentation ¶

Overview ¶

Package pointer implements Andersen's analysis, an inclusion-based pointer analysis algorithm first described in (Andersen, 1994).

The implementation is similar to that described in (Pearce et al, PASTE'04). Unlike many algorithms which interleave constraint generation and solving, constructing the callgraph as they go, this implementation for the most part observes a phase ordering (generation before solving), with only simple (copy) constraints being generated during solving. (The exception is reflection, which creates various constraints during solving as new types flow to reflect.Value operations.) This improves the traction of presolver optimisations, but imposes certain restrictions, e.g. potential context sensitivity is limited since all variants must be created a priori.

We intend to add various presolving optimisations such as Pointer and Location Equivalence from (Hardekopf & Lin, SAS '07) and solver optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf & Lin, PLDI'07),

CLASSIFICATION ¶

Our algorithm is INCLUSION-BASED: the points-to sets for x and y will be related by pts(y) ⊇ pts(x) if the program contains the statement y = x.

It is FLOW-INSENSITIVE: it ignores all control flow constructs and the order of statements in a program. It is therefore a "MAY ALIAS" analysis: its facts are of the form "P may/may not point to L", not "P must point to L".

It is FIELD-SENSITIVE: it builds separate points-to sets for distinct fields, such as x and y in struct { x, y *int }.

It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once, so values can flow in at one call to the function and return out at another. Only some smaller functions are analyzed with consideration to their calling context.

It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation site and context, so the objects returned by two distinct calls to f:

func f() *T { return new(T) }

are distinguished up to the limits of the calling context.

It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the complete Go program and summaries for native code.

See the (Hind, PASTE'01) survey paper for an explanation of these terms.

TERMINOLOGY ¶

We occasionally use C's x->f notation to distinguish the case where x is a struct pointer from x.f where is a struct value.

NODES ¶

Nodes are the key datastructure of the analysis, and have a dual role: they represent both constraint variables (equivalence classes of pointers) and members of points-to sets (things that can be pointed at, i.e. "labels").

Nodes are naturally numbered. The numbering enables compact representations of sets of nodes such as bitvectors or BDDs; and the ordering enables a very cheap way to group related nodes together. (For example, passing n parameters consists of generating n parallel constraints from caller+i to callee+i for 0<=i<n.)

The zero nodeid means "not a pointer". Currently it's only used for struct{} or (). We generate all flow constraints, even for non-pointer types, with the expectations that (a) presolver optimisations will quickly collapse all the non-pointer ones and (b) we may get more precise results by treating uintptr as a possible pointer.

Each node represents a scalar (word-sized) part of a value or object. Aggregate types (structs, tuples, arrays) are recursively flattened out into a sequential list of scalar component types, and all the elements of an array are represented by a single node. (The flattening of a basic type is a list containing a single node.)

Nodes are connected into a graph with various kinds of labelled edges: simple edges (or copy constraints) represent value flow. Complex edges (load, store, etc) trigger the creation of new simple edges during the solving phase.

OBJECTS ¶

Conceptually, an "object" is a contiguous sequence of nodes denoting an addressable location: something that a pointer can point to. The first node of an object has a non-nil obj field containing information about the allocation: its size, context, and ssa.Value.

Objects include:

functions and globals;
variable allocations in the stack frame or heap;
maps, channels and slices created by calls to make();
allocations to construct an interface;
allocations caused by literals and conversions, e.g. []byte("foo"), []byte(str).
arrays allocated by calls to append();

Many objects have no Go types. For example, the func, map and chan type kinds in Go are all varieties of pointers, but the respective objects are actual functions, maps, and channels. Given the way we model interfaces, they too are pointers to tagged objects with no Go type. And an *ssa.Global denotes the address of a global variable, but the object for a Global is the actual data. So, types of objects are usually "off by one indirection".

The individual nodes of an object are sometimes referred to as "labels".

Objects containing no nodes (i.e. just empty structs; tuples may be values but never objects in Go) are padded with an invalid-type node to have at least one node so that there is something to point to. (TODO(adonovan): I think this is unnecessary now that we have identity nodes; check.)

TAGGED OBJECTS ¶

An tagged object has the following layout:

T          -- obj.flags ⊇ {otTagged}
v
...

The T node's typ field is the dynamic type of the "payload", the value v which follows, flattened out. The T node's obj has the otTagged flag.

Tagged objects are needed when generalizing across types: interfaces, reflect.Values, reflect.Types. Each of these three types is modelled as a pointer that exclusively points to tagged objects.

Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that the value v is not of type T but *T; this is used only for reflect.Values that represent lvalues.

ANALYSIS ABSTRACTION OF EACH TYPE ¶

Variables of the following "scalar" types may be represented by a single node: basic types, pointers, channels, maps, slices, 'func' pointers, interfaces.

Pointers

Nothing to say here.

Basic types (bool, string, numbers, unsafe.Pointer)

Currently all fields in the flattening of a type, including
non-pointer basic types such as int, are represented in objects and
values.  Though non-pointer nodes within values are uninteresting,
non-pointer nodes in objects may be useful (if address-taken)
because they permit the analysis to deduce, in this example,

   var s struct{ ...; x int; ... }
   p := &s.x

that p points to s.x.  If we ignored such object fields, we could only
say that p points somewhere within s.

All other basic types are ignored.  Expressions of these types have
zero nodeid, and fields of these types within aggregate other types
are omitted.

unsafe.Pointer conversions are not yet modelled as pointer
conversions.  Consequently uintptr is always a number and uintptr
nodes do not point to any object.

Channels

An expression of type 'chan T' is a kind of pointer that points
exclusively to channel objects, i.e. objects created by MakeChan (or
reflection).

'chan T' is treated like *T.
*ssa.MakeChan is treated as equivalent to new(T).
*ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store
 and load.

Maps

An expression of type 'map[K]V' is a kind of pointer that points
exclusively to map objects, i.e. objects created by MakeMap (or
reflection).

map K[V] is treated like *M where M = struct{k K; v V}.
*ssa.MakeMap is equivalent to new(M).
*ssa.MapUpdate is equivalent to *y=x where *y and x have type M.
*ssa.Lookup is equivalent to y=x.v where x has type *M.

Slices

A slice []T, which dynamically resembles a struct{array *T, len, cap int},
is treated as if it were just a *T pointer; the len and cap fields are
ignored.

*ssa.MakeSlice is treated like new([1]T): an allocation of a
 singleton array.
*ssa.Index on a slice is equivalent to a load.
*ssa.IndexAddr on a slice returns the address of the sole element of the
slice, i.e. the same address.
*ssa.Slice is treated as a simple copy.

Functions

An expression of type 'func...' is a kind of pointer that points
exclusively to function objects.

A function object has the following layout:

   identity         -- typ:*types.Signature; obj.flags ⊇ {otFunction}
   params_0         -- (the receiver, if a method)
   ...
   params_n-1
   results_0
   ...
   results_m-1

There may be multiple function objects for the same *ssa.Function
due to context-sensitive treatment of some functions.

The first node is the function's identity node.
Associated with every callsite is a special "targets" variable,
whose pts(·) contains the identity node of each function to which
the call may dispatch.  Identity words are not otherwise used.

The following block of nodes represent the flattened-out types of
the parameters and results of the function object, and are
collectively known as its "P/R block".

The treatment of free variables of closures (*ssa.FreeVar) is like
that of global variables; it is not context-sensitive.
*ssa.MakeClosure instructions create copy edges to Captures.

A Go value of type 'func' (i.e. a pointer to one or more functions)
is a pointer whose pts() contains function objects.  The valueNode()
for an *ssa.Function returns a singleton for that function.

Interfaces

An expression of type 'interface{...}' is a kind of pointer that
points exclusively to tagged objects.  All tagged objects pointed to
by an interface are direct (the otIndirect flag is clear) and
concrete (the tag type T is not itself an interface type).  The
associated ssa.Value for an interface's tagged objects may be an
*ssa.MakeInterface instruction, or nil if the tagged object was
created by an instrinsic (e.g. reflection).

Constructing an interface value causes generation of constraints for
all of the concrete type's methods; we can't tell a priori which
ones may be called.

TypeAssert y = x.(T) is implemented by a dynamic filter triggered by
each tagged object E added to pts(x).  If T is an interface that E.T
implements, E is added to pts(y).  If T is a concrete type then edge
E.v -> pts(y) is added.

ChangeInterface is a simple copy because the representation of
tagged objects is independent of the interface type (in contrast
to the "method tables" approach used by the gc runtime).

y := Invoke x.m(...) is implemented by allocating a contiguous P/R
block for the callsite and adding a dynamic rule triggered by each
tagged object E added to pts(x).  The rule adds param/results copy
edges to/from each discovered concrete method.

(Q. Why do we model an interface as a pointer to a pair of type and
value, rather than as a pair of a pointer to type and a pointer to
value?
A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2},
{V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and
type-unsafe combination (T1,V2).  Treating the value and its concrete
type as inseparable makes the analysis type-safe.)

reflect.Value

A reflect.Value is modelled very similar to an interface{}, i.e. as
a pointer exclusively to tagged objects, but with two
generalizations.

1) a reflect.Value that represents an lvalue points to an indirect
   (obj.flags ⊇ {otIndirect}) tagged object, which has a similar
   layout to an tagged object except that the value is a pointer to
   the dynamic type.  Indirect tagged objects preserve the correct
   aliasing so that mutations made by (reflect.Value).Set can be
   observed.

   Indirect objects only arise when an lvalue is derived from an
   rvalue by indirection, e.g. the following code:

      type S struct { X T }
      var s S
      var i interface{} = &s    // i points to a *S-tagged object (from MakeInterface)
      v1 := reflect.ValueOf(i)  // v1 points to same *S-tagged object as i
      v2 := v1.Elem()           // v2 points to an indirect S-tagged object, pointing to s
      v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X
      v3.Set(y)                 // pts(s.X) ⊇ pts(y)

   Whether indirect or not, the concrete type of the tagged object
   corresponds to the user-visible dynamic type, and the existence
   of a pointer is an implementation detail.

2) The dynamic type tag of a tagged object pointed to by a
   reflect.Value may be an interface type; it need not be concrete.

   This arises in code such as this:
      tEface := reflect.TypeOf(new(interface{}).Elem() // interface{}
      eface := reflect.Zero(tEface)
   pts(eface) is a singleton containing an interface{}-tagged
   object.  That tagged object's payload is an interface{} value,
   i.e. the pts of the payload contains only concrete-tagged
   objects, although in this example it's the zero interface{} value,
   so its pts is empty.

reflect.Type

Just as in the real "reflect" library, we represent a reflect.Type
as an interface whose sole implementation is the concrete type,
*reflect.rtype.  (This choice is forced on us by go/types: clients
cannot fabricate types with arbitrary method sets.)

rtype instances are canonical: there is at most one per dynamic
type.  (rtypes are in fact large structs but since identity is all
that matters, we represent them by a single node.)

The payload of each *rtype-tagged object is an *rtype pointer that
points to exactly one such canonical rtype object.  We exploit this
by setting the node.typ of the payload to the dynamic type, not
'*rtype'.  This saves us an indirection in each resolution rule.  As
an optimisation, *rtype-tagged objects are canonicalized too.

Aggregate types:

Aggregate types are treated as if all directly contained aggregates are recursively flattened out.

Structs

*ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.

*ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
 simple edges for each struct discovered in pts(x).

The nodes of a struct consist of a special 'identity' node (whose
type is that of the struct itself), followed by the nodes for all
the struct's fields, recursively flattened out.  A pointer to the
struct is a pointer to its identity node.  That node allows us to
distinguish a pointer to a struct from a pointer to its first field.

Field offsets are currently the logical field offsets (plus one for
the identity node), so the sizes of the fields can be ignored by the
analysis.

Sound treatment of unsafe.Pointer conversions (not yet implemented)
would require us to model memory layout using physical field offsets
to ascertain which object field(s) might be aliased by a given
FieldAddr of a different base pointer type.  It would also require
us to dispense with the identity node.

*ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.

*ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
 simple edges for each struct discovered in pts(x).

Arrays

We model an array by an identity node (whose type is that of the
array itself) followed by a node representing all the elements of
the array; the analysis does not distinguish elements with different
indices.  Effectively, an array is treated like struct{elem T}, a
load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the
index i is ignored.

A pointer to an array is pointer to its identity node.  (A slice is
also a pointer to an array's identity node.)  The identity node
allows us to distinguish a pointer to an array from a pointer to one
of its elements, but it is rather costly because it introduces more
offset constraints into the system.  Furthermore, sound treatment of
unsafe.Pointer would require us to dispense with this node.

Arrays may be allocated by Alloc, by make([]T), by calls to append,
and via reflection.

Tuples (T, ...)

Tuples are treated like structs with naturally numbered fields.
*ssa.Extract is analogous to *ssa.Field.

However, tuples have no identity field since by construction, they
cannot be address-taken.

FUNCTION CALLS

There are three kinds of function call:
(1) static "call"-mode calls of functions.
(2) dynamic "call"-mode calls of functions.
(3) dynamic "invoke"-mode calls of interface methods.
Cases 1 and 2 apply equally to methods and standalone functions.

Static calls.
  A static call consists three steps:
  - finding the function object of the callee;
  - creating copy edges from the actual parameter value nodes to the
    params block in the function object (this includes the receiver
    if the callee is a method);
  - creating copy edges from the results block in the function
    object to the value nodes for the result of the call.

  Context sensitivity

    Static calls (alone) may be treated context sensitively,
    i.e. each callsite may cause a distinct re-analysis of the
    callee, improving precision.  Our current context-sensitivity
    policy treats all intrinsics and getter/setter methods in this
    manner since such functions are small and seem like an obvious
    source of spurious confluences, though this has not yet been
    evaluated.

Dynamic function calls

  Dynamic calls work in a similar manner except that the creation of
  copy edges occurs dynamically, in a similar fashion to a pair of
  struct copies:

    *fn->params = callargs
    callresult = *fn->results

  (Recall that the function object's params and results blocks are
  contiguous.)

Interface method invocation

  For invoke-mode calls, we create a params/results block for the
  callsite and attach a dynamic closure rule to the interface.  For
  each new tagged object that flows to the interface, we look up
  the concrete method, find its function object, and connect its P/R
  block to the callsite's P/R block.

Recording call targets

  The analysis notifies its clients of each callsite it encounters,
  passing a CallSite interface.  Among other things, the CallSite
  contains a synthetic constraint variable ("targets") whose
  points-to solution includes the set of all function objects to
  which the call may dispatch.

  It is via this mechanism that the callgraph is made available.
  Clients may also elect to be notified of callgraph edges directly;
  internally this just iterates all "targets" variables' pts(·)s.

SOLVER ¶

The solver is currently a very naive Andersen-style implementation, although it does use difference propagation (Pearce et al, SQC'04). There is much to do.

Index ¶

func CanHaveDynamicTypes(T types.Type) bool
func CanPoint(T types.Type) bool
type Config
- func (c *Config) AddIndirectQuery(v ssa.Value)
- func (c *Config) AddQuery(v ssa.Value)
type Label
type Pointer
type PointsToSet
type Result
- func Analyze(config *Config) (result *Result, err error)
type Warning

Examples ¶

Package

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CanHaveDynamicTypes ¶

func CanHaveDynamicTypes(T types.Type) bool

CanHaveDynamicTypes reports whether the type T can "hold" dynamic types, i.e. is an interface (incl. reflect.Type) or a reflect.Value.

func CanPoint ¶

func CanPoint(T types.Type) bool

CanPoint reports whether the type T is pointerlike, for the purposes of this analysis.

Types ¶

type Config ¶

type Config struct {
	// Mains contains the set of 'main' packages to analyze
	// Clients must provide the analysis with at least one
	// package defining a main() function.
	Mains []*ssa.Package

	// Reflection determines whether to handle reflection
	// operators soundly, which is currently rather slow since it
	// causes constraint to be generated during solving
	// proportional to the number of constraint variables, which
	// has not yet been reduced by presolver optimisation.
	Reflection bool

	// BuildCallGraph determines whether to construct a callgraph.
	// If enabled, the graph will be available in Result.CallGraph.
	BuildCallGraph bool

	// The client populates Queries[v] or IndirectQueries[v]
	// for each ssa.Value v of interest, to request that the
	// points-to sets pts(v) or pts(*v) be computed.  If the
	// client needs both points-to sets, v may appear in both
	// maps.
	//
	// (IndirectQueries is typically used for Values corresponding
	// to source-level lvalues, e.g. an *ssa.Global.)
	//
	// The analysis populates the corresponding
	// Result.{Indirect,}Queries map when it creates the pointer
	// variable for v or *v.  Upon completion the client can
	// inspect that map for the results.
	//
	// TODO(adonovan): this API doesn't scale well for batch tools
	// that want to dump the entire solution.  Perhaps optionally
	// populate a map[*ssa.DebugRef]Pointer in the Result, one
	// entry per source expression.
	//
	Queries         map[ssa.Value]struct{}
	IndirectQueries map[ssa.Value]struct{}

	// If Log is non-nil, log messages are written to it.
	// Logging is extremely verbose.
	Log io.Writer
}

A Config formulates a pointer analysis problem for Analyze().

func (*Config) AddIndirectQuery ¶

func (c *Config) AddIndirectQuery(v ssa.Value)

AddQuery adds v to Config.IndirectQueries. Precondition: CanPoint(v.Type().Underlying().(*types.Pointer).Elem()).

func (*Config) AddQuery ¶

func (c *Config) AddQuery(v ssa.Value)

AddQuery adds v to Config.Queries. Precondition: CanPoint(v.Type()). TODO(adonovan): consider returning a new Pointer for this query, which will be initialized during analysis. That avoids the needs for the corresponding ssa.Value-keyed maps in Config and Result.

type Label ¶

type Label struct {
	// contains filtered or unexported fields
}

A Label is an entity that may be pointed to by a pointer, map, channel, 'func', slice or interface.

Labels include:

functions
globals
tagged objects, representing interfaces and reflect.Values
arrays created by conversions (e.g. []byte("foo"), []byte(s))
stack- and heap-allocated variables (including composite literals)
channels, maps and arrays created by make()
intrinsic or reflective operations that allocate (e.g. append, reflect.New)
intrinsic objects, e.g. the initial array behind os.Args.
and their subelements, e.g. "alloc.y[*].z"

Labels are so varied that they defy good generalizations; some have no value, no callgraph node, or no position. Many objects have types that are inexpressible in Go: maps, channels, functions, tagged objects.

At most one of Value() or ReflectType() may return non-nil.

func (Label) Path ¶

func (l Label) Path() string

Path returns the path to the subelement of the object containing this label. For example, ".x[*].y".

func (Label) Pos ¶

func (l Label) Pos() token.Pos

Pos returns the position of this label, if known, zero otherwise.

func (Label) ReflectType ¶

func (l Label) ReflectType() types.Type

ReflectType returns the type represented by this label if it is an reflect.rtype instance object or *reflect.rtype-tagged object.

func (Label) String ¶

func (l Label) String() string

String returns the printed form of this label.

Examples: Object type:

x                                       (a variable)
(sync.Mutex).Lock                       (a function)
convert                                 (array created by conversion)
makemap                                 (map allocated via make)
makechan                                (channel allocated via make)
makeinterface                           (tagged object allocated by makeinterface)
<alloc in reflect.Zero>                 (allocation in instrinsic)
sync.Mutex                              (a reflect.rtype instance)
<command-line arguments>                (an intrinsic object)

Labels within compound objects have subelement paths:

x.y[*].z                                (a struct variable, x)
append.y[*].z                           (array allocated by append)
makeslice.y[*].z                        (array allocated via make)

TODO(adonovan): expose func LabelString(*types.Package, Label).

func (Label) Value ¶

func (l Label) Value() ssa.Value

Value returns the ssa.Value that allocated this label's object, if any.

type Pointer ¶

type Pointer struct {
	// contains filtered or unexported fields
}

A Pointer is an equivalence class of pointer-like values.

A pointer doesn't have a unique type because pointers of distinct types may alias the same object.

func (Pointer) DynamicTypes ¶

func (p Pointer) DynamicTypes() *typeutil.Map

DynamicTypes returns p.PointsTo().DynamicTypes().

func (Pointer) MayAlias ¶

func (p Pointer) MayAlias(q Pointer) bool

MayAlias reports whether the receiver pointer may alias the argument pointer.

func (Pointer) PointsTo ¶

func (p Pointer) PointsTo() PointsToSet

PointsTo returns the points-to set of this pointer.

func (Pointer) String ¶

func (p Pointer) String() string

type PointsToSet ¶

type PointsToSet struct {
	// contains filtered or unexported fields
}

A PointsToSet is a set of labels (locations or allocations).

func (PointsToSet) DynamicTypes ¶

func (s PointsToSet) DynamicTypes() *typeutil.Map

If this PointsToSet came from a Pointer of interface kind or a reflect.Value, DynamicTypes returns the set of dynamic types that it may contain. (For an interface, they will always be concrete types.)

The result is a mapping whose keys are the dynamic types to which it may point. For each pointer-like key type, the corresponding map value is the PointsToSet for pointers of that type.

The result is empty unless CanHaveDynamicTypes(T).

func (PointsToSet) Intersects ¶

func (x PointsToSet) Intersects(y PointsToSet) bool

Intersects reports whether this points-to set and the argument points-to set contain common members.

func (PointsToSet) Labels ¶

func (s PointsToSet) Labels() []*Label

PointsTo returns the set of labels that this points-to set contains.

func (PointsToSet) String ¶

func (s PointsToSet) String() string

type Result ¶

type Result struct {
	CallGraph       *callgraph.Graph      // discovered call graph
	Queries         map[ssa.Value]Pointer // pts(v) for each v in Config.Queries.
	IndirectQueries map[ssa.Value]Pointer // pts(*v) for each v in Config.IndirectQueries.
	Warnings        []Warning             // warnings of unsoundness
}

A Result contains the results of a pointer analysis.

See Config for how to request the various Result components.

func Analyze ¶

func Analyze(config *Config) (result *Result, err error)

Analyze runs the pointer analysis with the scope and options specified by config, and returns the (synthetic) root of the callgraph.

Pointer analysis of a transitively closed well-typed program should always succeed. An error can occur only due to an internal bug.

type Warning ¶

type Warning struct {
	Pos     token.Pos
	Message string
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL