minipipeline

package
v0.27.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 13, 2023 License: GPL-3.0 Imports: 13 Imported by: 0

Documentation

Overview

Package minipipeline implements a minimal data processing pipeline used to analyze local measurements collected by OONI Probe.

This package mimics ooni/data design.

A user provides as input to the minipipeline an OONI measurement and obtains as an intermediate result one or more observations. In turn, the user can process the observations to obtain a measurement analysis. Observations are an intermediate, flat data format useful to simplify writing analysis algorithms. The measurement analysis contains scalar, vector, and map fields summarizing the measurement. Each experiment should write custom expressions for generating top-level test keys given the analysis.

For *WebMeasurement, *WebObservation is an observation. The *WebObservationsContainer type allows one to create observations from OONI experiments measurements. In the same vein, the *WebAnalysis type contains the analysis for *WebMeasurement.

The IngestWebMeasurement convenience function simplifies transforming a *WebMeasurement into a *WebObservationsContainer. Likewise, the AnalyzeWebObservations function simplifies obtaining a *WebAnalysis.

Index

Constants

View Source
const (
	// The last operation is a DNS lookup.
	WebObservationTypeDNSLookup = WebObservationType(iota)

	// The last operation is a TCP connect.
	WebObservationTypeTCPConnect

	// The last operation is a TLS handshake.
	WebObservationTypeTLSHandshake

	// The last operation is an HTTP round trip.
	WebObservationTypeHTTPRoundTrip
)

These are all the valid WebObservationType.

Variables

View Source
var ErrNoTestKeys = errors.New("minipipeline: no test keys")

ErrNoTestKeys indicates that a *WebMeasurement does not contain [*MeasurementTestKeys].

Functions

func ComputeHTTPDiffBodyProportionFactor

func ComputeHTTPDiffBodyProportionFactor(measurement, control int64) float64

ComputeHTTPDiffBodyProportionFactor computes the body proportion factor.

func ComputeHTTPDiffStatusCodeMatch

func ComputeHTTPDiffStatusCodeMatch(measurement, control int64) optional.Value[bool]

ComputeHTTPDiffStatusCodeMatch computes whether the status code matches.

func ComputeHTTPDiffTitleDifferentLongWords

func ComputeHTTPDiffTitleDifferentLongWords(measurement, control string) map[string]bool

ComputeHTTPDiffTitleDifferentLongWords computes the different long words in the title (a long word is a word longer than 5 chars).

func ComputeHTTPDiffUncommonHeadersIntersection

func ComputeHTTPDiffUncommonHeadersIntersection(measurement, control map[string]bool) map[string]bool

ComputeHTTPDiffUncommonHeadersIntersection computes the uncommon header intersection.

Types

type Set

type Set[T ~string | ~int64] struct {
	// contains filtered or unexported fields
}

Set is a set containing keys with pretty JSON serialization and deserialization rules and a valid zero value.

func DNSDiffFindCommonASNsIntersection

func DNSDiffFindCommonASNsIntersection(measurement, control Set[string]) Set[int64]

DNSDiffFindCommonIPAddressIntersection returns the set of ASNs that belong to both the set of ASNs obtained from the measurement and the one obtained from the control.

func DNSDiffFindCommonIPAddressIntersection

func DNSDiffFindCommonIPAddressIntersection(measurement, control Set[string]) Set[string]

DNSDiffFindCommonIPAddressIntersection returns the set of IP addresses that belong to both the measurement and the control sets.

func NewSet

func NewSet[T ~string | ~int64](keys ...T) Set[T]

NewSet creates a new set containing the given keys.

func (*Set[T]) Add

func (sx *Set[T]) Add(keys ...T)

Add adds the given key to the set.

func (*Set[T]) Contains

func (sx *Set[T]) Contains(key T) bool

Contains returns whether the set contains a key.

func (Set[T]) Keys

func (sx Set[T]) Keys() []T

Keys returns the keys.

func (Set[T]) Len

func (sx Set[T]) Len() int

Len returns the number of keys inside the set.

func (Set[T]) MarshalJSON

func (sx Set[T]) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler.

func (Set[T]) Remove

func (sx Set[T]) Remove(keys ...T)

Remove removes the given key from the set.

func (*Set[T]) UnmarshalJSON

func (sx *Set[T]) UnmarshalJSON(data []byte) error

UnmarshalJSON implements json.Unmarshaler.

type WebAnalysis

type WebAnalysis struct {
	// DNSLookupSuccessWithInvalidAddresses contains DNS transactions with invalid IP addresses by
	// taking into account control info, bogons, and TLS handshakes.
	DNSLookupSuccessWithInvalidAddresses Set[int64]

	// DNSLookupSuccessWithValidAddress contains DNS transactions with valid IP addresses.
	DNSLookupSuccessWithValidAddress Set[int64]

	// DNSLookupSuccessWithInvalidAddressesClassic is like DNSLookupInvalid but the algorithm is more relaxed
	// to be compatible with Web Connectivity v0.4's behavior.
	DNSLookupSuccessWithInvalidAddressesClassic Set[int64]

	// DNSLookupSuccessWithValidAddressClassic contains DNS transactions with valid IP addresses.
	DNSLookupSuccessWithValidAddressClassic Set[int64]

	// DNSLookupUnexpectedFailure contains DNS transactions with unexpected failures.
	DNSLookupUnexpectedFailure Set[int64]

	// DNSExperimentFailure is the first failure experienced by any resolver
	// before hitting redirects (i.e., when TagDepth==0).
	DNSExperimentFailure optional.Value[string]

	// DNSLookupExpectedFailure contains DNS transactions with expected failures.
	DNSLookupExpectedFailure Set[int64]

	// DNSLookupExpectedSuccess contains DNS transactions with expected successes.
	DNSLookupExpectedSuccess Set[int64]

	// TCPConnectUnexpectedFailure contains TCP endpoint transactions with unexpected failures.
	TCPConnectUnexpectedFailure Set[int64]

	// TCPConnectUnexpectedFailureDuringWebFetch contains TCP endpoint transactions with unexpected failures
	// while performing a web fetch, as opposed to checking for connectivity.
	TCPConnectUnexpectedFailureDuringWebFetch Set[int64]

	// TCPConnectUnexpectedFailureDuringConnectivityCheck contains TCP endpoint transactions with unexpected failures
	// while checking for connectivity, as opposed to fetching a webpage.
	TCPConnectUnexpectedFailureDuringConnectivityCheck Set[int64]

	// TCPConnectUnexplainedFailure contains failures occurring during redirects.
	TCPConnectUnexplainedFailure Set[int64]

	// TCPConnectUnexplainedFailureDuringWebFetch contains failures occurring during redirects
	// while performing a web fetch, as opposed to checking for connectivity.
	TCPConnectUnexplainedFailureDuringWebFetch Set[int64]

	// TCPConnectUnexplainedFailureDuringConnectivityCheck contains failures occurring during redirects
	// while checking for connectivity, as opposed to fetching a webpage.
	TCPConnectUnexplainedFailureDuringConnectivityCheck Set[int64]

	// TLSHandshakeUnexpectedFailure contains TLS endpoint transactions with unexpected failures.
	TLSHandshakeUnexpectedFailure Set[int64]

	// TLSHandshakeUnexpectedFailureDuringWebFetch contains TLS endpoint transactions with unexpected failures.
	// while performing a web fetch, as opposed to checking for connectivity.
	TLSHandshakeUnexpectedFailureDuringWebFetch Set[int64]

	// TLSHandshakeUnexpectedFailureDuringConnectivityCheck contains TLS endpoint transactions with unexpected failures.
	// while checking for connectivity, as opposed to fetching a webpage.
	TLSHandshakeUnexpectedFailureDuringConnectivityCheck Set[int64]

	// TLSHandshakeUnexplainedFailure contains failures occurring during redirects.
	TLSHandshakeUnexplainedFailure Set[int64]

	// TLSHandshakeUnexplainedFailureDuringWebFetch  contains failures occurring during redirects
	// while performing a web fetch, as opposed to checking for connectivity.
	TLSHandshakeUnexplainedFailureDuringWebFetch Set[int64]

	// TLSHandshakeUnexplainedFailureDuringConnectivityCheck contains failures occurring during redirects
	// while checking for connectivity, as opposed to fetching a webpage.
	TLSHandshakeUnexplainedFailureDuringConnectivityCheck Set[int64]

	// HTTPRoundTripUnexpectedFailure contains HTTP endpoint transactions with unexpected failures.
	HTTPRoundTripUnexpectedFailure Set[int64]

	// HTTPFinalResponseSuccessTLSWithoutControl contains the ID of the final response
	// transaction when the final response succeeded without control and with TLS.
	HTTPFinalResponseSuccessTLSWithoutControl optional.Value[int64]

	// HTTPFinalResponseSuccessTLSWithControl contains the ID of the final response
	// transaction when the final response succeeded with control and with TLS.
	HTTPFinalResponseSuccessTLSWithControl optional.Value[int64]

	// HTTPFinalResponseSuccessTCPWithoutControl contains the ID of the final response
	// transaction when the final response succeeded without control and with TCP.
	HTTPFinalResponseSuccessTCPWithoutControl optional.Value[int64]

	// HTTPFinalResponseSuccessTCPWithControl contains the ID of the final response
	// transaction when the final response succeeded with control and with TCP.
	HTTPFinalResponseSuccessTCPWithControl optional.Value[int64]

	// HTTPFinalResponseDiffBodyProportionFactor is the body proportion factor.
	HTTPFinalResponseDiffBodyProportionFactor optional.Value[float64]

	// HTTPFinalResponseDiffStatusCodeMatch returns whether the status code matches.
	HTTPFinalResponseDiffStatusCodeMatch optional.Value[bool]

	// HTTPFinalResponseDiffTitleDifferentLongWords contains the words long 5+ characters that appear
	// in the probe's "final" response title or in the TH title but not in both.
	HTTPFinalResponseDiffTitleDifferentLongWords optional.Value[map[string]bool]

	// HTTPFinalResponseDiffUncommonHeadersIntersection contains the uncommon headers intersection.
	HTTPFinalResponseDiffUncommonHeadersIntersection optional.Value[map[string]bool]

	// Linear contains the linear analysis.
	Linear []*WebObservation
}

WebAnalysis summarizes the content of *WebObservationsContainer.

The zero value of this struct is ready to use.

func AnalyzeWebObservations

func AnalyzeWebObservations(container *WebObservationsContainer) *WebAnalysis

AnalyzeWebObservations generates a *WebAnalysis from a *WebObservationsContainer.

type WebMeasurement

type WebMeasurement struct {
	// Input contains the input we measured (a URL).
	Input string `json:"input"`

	// TestKeys contains the test-specific measurements.
	TestKeys optional.Value[*WebMeasurementTestKeys] `json:"test_keys"`
}

WebMeasurement is the canonical web measurement structure assumed by minipipeline.

type WebMeasurementTestKeys

type WebMeasurementTestKeys struct {
	// Control contains the OPTIONAL TH response.
	Control optional.Value[*model.THResponse] `json:"control"`

	// NetworkEvents contains I/O events.
	NetworkEvents []*model.ArchivalNetworkEvent `json:"network_events"`

	// Queries contains the DNS queries results.
	Queries []*model.ArchivalDNSLookupResult `json:"queries"`

	// Requests contains HTTP request results.
	Requests []*model.ArchivalHTTPRequestResult `json:"requests"`

	// TCPConnect contains the TCP connect results.
	TCPConnect []*model.ArchivalTCPConnectResult `json:"tcp_connect"`

	// TLSHandshakes contains the TLS handshakes results.
	TLSHandshakes []*model.ArchivalTLSOrQUICHandshakeResult `json:"tls_handshakes"`

	// QUICHandshakes contains the QUIC handshakes results.
	QUICHandshakes []*model.ArchivalTLSOrQUICHandshakeResult `json:"quic_handshakes"`

	// XControlRequest contains the OPTIONAL TH request.
	XControlRequest optional.Value[*model.THRequest] `json:"x_control_request"`
}

WebMeasurementTestKeys is the canonical container for observations generated by most OONI experiments. This is the data format ingested by this package for generating [*WebEndpointObservations].

This structure is designed to support Web Connectivity LTE and possibly other OONI experiments such as telegram, signal, etc.

type WebObservation

type WebObservation struct {

	// TagDepth is the value of the depth=<int64> tag. We use this tag
	// in Web Connectivity LTE to know the current redirect depth. We start
	// from zero for the first set of requests and increement this value
	// every time we follow a redirect. (Because just one transaction
	// is allowed to fetch the body and follow redirects, everything should
	// work as intended and it's possible to use this tag to group related
	// DNS lookups and endpoints operations, which can then be further break
	// down using the transaction ID to isolate transactions.)
	TagDepth optional.Value[int64]

	// Type is the observation type.
	Type WebObservationType

	// Failure contains the overall failure. For example, if the observation
	// is a WebObservationTypeTLSHandshake, this would be the TLSHandshakeFailure.
	Failure optional.Value[string]

	// TransactionID is the DNS or endpoint TransactionID.
	TransactionID int64

	// TagFetchBody is the value of the fetch_body=<bool> tag. We use this tag
	// in Web Connectivity LTE to indicate that the current transaction will
	// attempt to fetch the webpage body. (Potentially, more than one transaction
	// tries fetching the body and only one will actually do it.)
	TagFetchBody optional.Value[bool]

	// DNSTransactionIDs contains the ID of the DNS transaction that caused this
	// specific [*WebObservation] to be generated by the minipipeline.
	DNSTransactionID optional.Value[int64]

	// DNSDomain is the domain from which we resolved the IP address. This field
	// is empty when this record wasn't generated by a DNS lookup. This occurs, e.g.,
	// when Web Connectivity LTE discovers new addresses from the TH response.
	DNSDomain optional.Value[string]

	// DNSLookupFailure is the failure that occurred during the DNS lookup. This field will be
	// optional.None if there's no DNS lookup information. Otherwise, it contains a string
	// representing the error, where the empty string means success.
	DNSLookupFailure optional.Value[string]

	// DNSQueryType is the type of the DNS query (e.g., "A").
	DNSQueryType optional.Value[string]

	// DNSEngine is the DNS engine that we're using (e.g., "getaddrinfo").
	DNSEngine optional.Value[string]

	// DNSResolvedAddrs contains the list of DNS-resolved addrs.
	DNSResolvedAddrs optional.Value[Set[string]]

	// IPAddress is the optional IP address that this observation is about. We typically derive
	// this value from a DNS lookup, but sometimes we know it from other means (e.g., from
	// the Web Connectivity test helper response). When DNSLookupFailure contains an nonempty
	// error string, the DNS lookup failed and this field is an optional.None.
	IPAddress optional.Value[string]

	// IPAddressASN is the optional ASN associated to this IP address as discovered by
	// the probe while performing the measurement. When this field is optional.None, it
	// means that the probe failed to discover the IP address ASN.
	IPAddressASN optional.Value[int64]

	// IPAddressBogon is true if IPAddress is a bogon.
	IPAddressBogon optional.Value[bool]

	// EndpointTransactionID is the transaction ID used by this endpoint.
	EndpointTransactionID optional.Value[int64]

	// EndpointProto is either "tcp" or "udp".
	EndpointProto optional.Value[string]

	// EndpointPort is the port used by this endpoint.
	EndpointPort optional.Value[string]

	// EndpointAddress is "${IPAddress}:${EndpointPort}" where "${IPAddress}" is
	// quoted using "[" and "]" when the protocol family is IPv6.
	EndpointAddress optional.Value[string]

	// TCPConnectFailure is the optional TCP connect failure.
	TCPConnectFailure optional.Value[string]

	// TLSHandshakeFailure is the optional TLS handshake failure.
	TLSHandshakeFailure optional.Value[string]

	// TLSServerName is the optional TLS server name used by the TLS handshake.
	TLSServerName optional.Value[string]

	// HTTPRequestURL is the HTTP request URL.
	HTTPRequestURL optional.Value[string]

	// HTTPFailure is the error that occurred during the HTTP round trip.
	HTTPFailure optional.Value[string]

	// HTTPResponseStatusCode is the response status code.
	HTTPResponseStatusCode optional.Value[int64]

	// HTTPResponseBodyLength is the length of the response body.
	HTTPResponseBodyLength optional.Value[int64]

	// HTTPResponseBodyIsTruncated indicates whether the response body was truncated.
	HTTPResponseBodyIsTruncated optional.Value[bool]

	// HTTPResponseHeadersKeys contains maps response headers keys to true.
	HTTPResponseHeadersKeys optional.Value[map[string]bool]

	// HTTPResponseLocation contains the location we're redirected to.
	HTTPResponseLocation optional.Value[string]

	// HTTPResponseTitle contains the response title.
	HTTPResponseTitle optional.Value[string]

	// HTTPResponseIsFinal is true if the status code is 2xx, 4xx, or 5xx.
	HTTPResponseIsFinal optional.Value[bool]

	// ControlDNSDomain is the domain used by the control for its DNS lookup. This field is
	// optional.Some only when the domain used by the control matches the domain used by the
	// probe. So, we won't see this record for redirect endpoints using another domain.
	ControlDNSDomain optional.Value[string]

	// ControlDNSLookupFailure is the corresponding control DNS lookup failure.
	ControlDNSLookupFailure optional.Value[string]

	// ControlDNSResolvedAddrs contains the list of addrs DNS-resolved by the control.
	ControlDNSResolvedAddrs optional.Value[Set[string]]

	// ControlTCPConnectFailure is the control's TCP connect failure.
	ControlTCPConnectFailure optional.Value[string]

	// ControlTLSHandshakeFailure is the control's TLS handshake failure.
	ControlTLSHandshakeFailure optional.Value[string]

	// ControlHTTPFailure is the HTTP failure seen by the control.
	ControlHTTPFailure optional.Value[string]

	// ControlHTTPResponseStatusCode is the status code seen by the control.
	ControlHTTPResponseStatusCode optional.Value[int64]

	// ControlHTTPResponseBodyLength contains the control HTTP response body length.
	ControlHTTPResponseBodyLength optional.Value[int64]

	// ControlHTTPResponseHeadersKeys contains the response headers keys.
	ControlHTTPResponseHeadersKeys optional.Value[map[string]bool]

	// ControlHTTPResponseTitle contains the title seen by the control.
	ControlHTTPResponseTitle optional.Value[string]
}

WebObservation is an observation of the flow that starts with a DNS lookup that either fails or discovers an IP address and proceeds by documenting binding such an address to a part to obtain and use a TCP or UDP endpoint.

A key property of the WebObservation is that there is a single failure mode for the whole WebObservation. If the DNS fails, there is no IP address to construct and endpoint. If TCP connect fails, there is no connection to use for a TLS handshake. Likewise, if QUIC fails, there is also no connection. Finally, if there is no suitable connection, we cannot peform an HTTP round trip.

Most fields are optional.Value fields. When the field contains an optional.None, it means that the related information is not available. We represent failures using flat strings and we use optional.Some("") to indicate the absence of any errors.

We borrow this struct from https://github.com/ooni/data.

func NewLinearWebAnalysis

func NewLinearWebAnalysis(input *WebObservationsContainer) (output []*WebObservation)

NewLinearWebAnalysis constructs a slice containing all the observations contained inside the given *WebObservationsContainer.

We sort the returned list as follows:

1. by descending TagDepth;

2. with TagDepth being equal, by descending WebObservationType;

3. with WebObservationType being equal, by ascending failure string;

This means that you divide the list in groups like this:

+------------+------------+------------+------------+
| TagDepth=3 | TagDepth=2 | TagDepth=1 | TagDepth=0 |
+------------+------------+------------+------------+

Where TagDepth=3 is the last redirect and TagDepth=0 is the initial request.

Each group is further divided as follows:

+------+-----+-----+-----+
| HTTP | TLS | TCP | DNS |
+------+-----+-----+-----+

Where each group may be empty. The first non-empty group is about the operation that failed for the current TagDepth.

Within each group, successes sort before failures because the empty string has priority over non-empty strings.

So, when walking the list from index 0 to index N, you encounter the latest redirects first, you observe the more complex operations first, and you see errors before failures.

type WebObservationType

type WebObservationType int64

WebObservationType is the type of a *WebObservation.

type WebObservationsContainer

type WebObservationsContainer struct {
	// DNSLookupFailures maps transaction IDs to DNS lookup failures.
	//
	// Note that DNSLookupFailures and KnownTCPEndpoints share the same transaction
	// ID space, i.e., you can't see the same transaction ID in both. Transaction IDs
	// are strictly positive unique numbers within the same OONI measurement. Note
	// that the A and AAAA events for the same DNS lookup uses the same transaction ID
	// until we fix the https://github.com/ooni/probe/issues/2624 issue. For this
	// reason DNSLookupFailure and DNSLookupSuccesses MUST be slices.
	DNSLookupFailures []*WebObservation

	// DNSLookupSuccesses contains all the successful transactions.
	DNSLookupSuccesses []*WebObservation

	// KnownTCPEndpoints maps transaction IDs to TCP observations.
	KnownTCPEndpoints map[int64]*WebObservation
	// contains filtered or unexported fields
}

WebObservationsContainer contains [*WebObservations].

The zero value of this struct is not ready to use, please use NewWebObservationsContainer.

func ClassicFilter

func ClassicFilter(input *WebObservationsContainer) (output *WebObservationsContainer)

ClassicFilter takes in input a *WebObservationsContainer and returns in output another *WebObservationsContainer where we only keep:

1. DNS lookups using getaddrinfo;

2. IP addresses discovered using getaddrinfo;

3. endpoints using such IP addresses.

We use this filter to produce a backward compatible Web Connectivity analysis when the input *WebObservationsContainer was built using LTE.

The result should approximate what v0.4 would have measured.

func IngestWebMeasurement

func IngestWebMeasurement(meas *WebMeasurement) (*WebObservationsContainer, error)

IngestWebMeasurement loads a *WebMeasurement into a [*WebObservationsContainter]. To this end, we create a *WebObservationsContainer and fill it with the contents of the input *WebMeasurement. An empty *WebMeasurement will cause this function to produce an empty result. It is safe to pass to this function a *WebMeasurement with empty Control and XControlRequestFields: in such a case, this function will just avoid using the test helper (aka control) information for generating flat *WebObservation. This function returns an error if the *WebMeasurement TestKeys are empty or Input is not a valid URL.

func NewWebObservationsContainer

func NewWebObservationsContainer() *WebObservationsContainer

NewWebObservationsContainer constructs a *WebObservationsContainer.

func (*WebObservationsContainer) IngestControlMessages

func (c *WebObservationsContainer) IngestControlMessages(req *model.THRequest, resp *model.THResponse) error

IngestControlMessages ingests the control request and response. You MUST call this method last, after you've ingested all the other measurement events.

This method fails if req.HTTPRequest is not a valid serialized URL.

func (*WebObservationsContainer) IngestDNSLookupEvents

func (c *WebObservationsContainer) IngestDNSLookupEvents(evs ...*model.ArchivalDNSLookupResult)

IngestDNSLookupEvents ingests DNS lookup events from a OONI measurement. You MUST ingest DNS lookup events before ingesting any other kind of event.

func (*WebObservationsContainer) IngestHTTPRoundTripEvents

func (c *WebObservationsContainer) IngestHTTPRoundTripEvents(evs ...*model.ArchivalHTTPRequestResult)

IngestHTTPRoundTripEvents ingests HTTP round trip events from a OONI measurement. You MUST ingest these events after ingesting TCP connect events.

func (*WebObservationsContainer) IngestTCPConnectEvents

func (c *WebObservationsContainer) IngestTCPConnectEvents(evs ...*model.ArchivalTCPConnectResult)

IngestTCPConnectEvents ingests TCP connect events from a OONI measurement. You MUST ingest these events after DNS events and before any other kind of events.

func (*WebObservationsContainer) IngestTLSHandshakeEvents

func (c *WebObservationsContainer) IngestTLSHandshakeEvents(evs ...*model.ArchivalTLSOrQUICHandshakeResult)

IngestTLSHandshakeEvents ingests TLS handshake events from a OONI measurement. You MUST ingest these events after ingesting TCP connect events.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL