Documentation

Overview

    Package dpagg contains differentially private aggregation primitives.

    Index

    Constants

    This section is empty.

    Variables

    View Source
    var LargestRepresentableDelta = 1 - math.Pow(2, -53)

      LargestRepresentableDelta is the largest delta we could support in 64 bit precision, approximately equal to one.

      Functions

      func ClampFloat64

      func ClampFloat64(e, lower, upper float64) (float64, error)

        ClampFloat64 clamps e within lower and upper, such that lower is returned if e < lower, and upper is returned if e > upper. Otherwise, e is returned.

        func ClampInt64

        func ClampInt64(e, lower, upper int64) (int64, error)

          ClampInt64 clamps e within lower and upper. Returns lower if e < lower. Returns upper if e > upper.

          Types

          type BoundedMeanFloat64

          type BoundedMeanFloat64 struct {
          	// contains filtered or unexported fields
          }

            BoundedMeanFloat64 calculates a differentially private mean of a collection of float64 values.

            The mean is computed by dividing a noisy sum of the entries by a noisy count of the entries. To improve utility, all entries are normalized by setting them to the difference between their actual value and the middle of the input range before summation. The original mean is recovered by adding the midpoint in a post-processing step. This idea is taken from Algorithm 2.4 of "Differential Privacy: From Theory to Practice", by Ninghui Li, Min Lyu, Dong Su and Weining Yang (section 2.5.5, page 28). In contrast to Algorithm 2.4, we do not return the midpoint if the noisy count is less or equal to 1. Instead, we set the noisy count to 1. Since this is a mere post-processing step, the DP bounds are preserved. Moreover, for small numbers of entries, this approach will return results that are closer to the actual mean in expectation.

            BoundedMeanFloat64 supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) as well as contribute to the same partition multiple times (via the MaxContributionsPerPartition parameter), by scaling the added noise appropriately.

            For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.

            Note: Do not use when your results may cause overflows for int64 or float64 values. This aggregation is not hardened for such applications yet.

            Not thread-safe.

            func NewBoundedMeanFloat64

            func NewBoundedMeanFloat64(opt *BoundedMeanFloat64Options) *BoundedMeanFloat64

              NewBoundedMeanFloat64 returns a new BoundedMeanFloat64.

              func (*BoundedMeanFloat64) Add

              func (bm *BoundedMeanFloat64) Add(e float64)

                Add an entry to a BoundedMeanFloat64. It skips NaN entries and doesn't count them in the final result because introducing even a single NaN entry will result in a NaN mean regardless of other entries, which would break the indistinguishability property required for differential privacy.

                func (*BoundedMeanFloat64) ComputeConfidenceInterval

                func (bm *BoundedMeanFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)

                  ComputeConfidenceInterval computes a confidence interval that contains the true mean with probability greater than or equal to 1 - alpha. The computation is based exclusively on noised data and the privacy parameters. Thus no privacy budget is consumed by this operation.

                  Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.

                  func (*BoundedMeanFloat64) GobDecode

                  func (bm *BoundedMeanFloat64) GobDecode(data []byte) error

                    GobDecode decodes Count.

                    func (*BoundedMeanFloat64) GobEncode

                    func (bm *BoundedMeanFloat64) GobEncode() ([]byte, error)

                      GobEncode encodes Count.

                      func (*BoundedMeanFloat64) Merge

                      func (bm *BoundedMeanFloat64) Merge(bm2 *BoundedMeanFloat64)

                        Merge merges bm2 into bm (i.e., adds to bm all entries that were added to bm2). bm2 is consumed by this operation: bm2 may not be used after it is merged into bm.

                        func (*BoundedMeanFloat64) Result

                        func (bm *BoundedMeanFloat64) Result() float64

                          Result returns a differentially private estimate of the average of bounded elements added so far. The method can be called only once.

                          Note that the returned value is not an unbiased estimate of the raw bounded mean.

                          type BoundedMeanFloat64Options

                          type BoundedMeanFloat64Options struct {
                          	Epsilon                      float64 // Privacy parameter ε. Required.
                          	Delta                        float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise.
                          	MaxPartitionsContributed     int64   // How many distinct partitions may a single user contribute to? Defaults to 1.
                          	MaxContributionsPerPartition int64   // How many times may a single user contribute to a single partition? Required.
                          	// Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper.
                          	Lower, Upper float64
                          	Noise        noise.Noise // Type of noise used in BoundedMean. Defaults to Laplace noise.
                          }

                            BoundedMeanFloat64Options contains the options necessary to initialize a BoundedMeanFloat64.

                            type BoundedSumFloat64

                            type BoundedSumFloat64 struct {
                            	// contains filtered or unexported fields
                            }

                              BoundedSumFloat64 calculates a differentially private sum of a collection of float64 values. It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it assumes that for each BoundedSumFloat64 instance (partition), each privacy unit contributes at most one value. If a privacy unit contributes more, the contributions should be pre-aggregated before passing them to BoundedSumFloat64.

                              The provided differentially private sum is an unbiased estimate of the raw bounded sum meaning that its expected value is equal to the raw bounded sum.

                              For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions,

                              Not thread-safe.

                              func NewBoundedSumFloat64

                              func NewBoundedSumFloat64(opt *BoundedSumFloat64Options) *BoundedSumFloat64

                                NewBoundedSumFloat64 returns a new BoundedSumFloat64, whose sum is initialized at 0.

                                func (*BoundedSumFloat64) Add

                                func (bs *BoundedSumFloat64) Add(e float64)

                                  Add adds a new summand to the BoundedSumFloat64. It ignores NaN summands because introducing even a single NaN summand will result in a NaN sum regardless of other summands, which would break the indistinguishability property required for differential privacy.

                                  func (*BoundedSumFloat64) ComputeConfidenceInterval

                                  func (bs *BoundedSumFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)

                                    ComputeConfidenceInterval computes a confidence interval that contains the true sum with a probability greater than or equal to 1 - alpha using the noised sum computed by Result(). The computation is based exclusively on the noised sum returned by Result(). Thus no privacy budget is consumed by this operation.

                                    Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.

                                    See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.

                                    func (*BoundedSumFloat64) GobDecode

                                    func (bs *BoundedSumFloat64) GobDecode(data []byte) error

                                      GobDecode decodes BoundedSumInt64.

                                      func (*BoundedSumFloat64) GobEncode

                                      func (bs *BoundedSumFloat64) GobEncode() ([]byte, error)

                                        GobEncode encodes BoundedSumInt64.

                                        func (*BoundedSumFloat64) Merge

                                        func (bs *BoundedSumFloat64) Merge(bs2 *BoundedSumFloat64)

                                          Merge merges bs2 into bs (i.e., adds to bs all entries that were added to bs2). bs2 is consumed by this operation: bs2 may not be used after it is merged into bs.

                                          func (*BoundedSumFloat64) Result

                                          func (bs *BoundedSumFloat64) Result() float64

                                            Result returns a differentially private estimate of the sum of bounded elements added so far. The method can be called only once.

                                            The returned value is an unbiased estimate of the raw bounded sum.

                                            The returned value may sometimes be outside the set of possible raw bounded sums, e.g., the differentially private bounded sum may be positive although neither the lower nor the upper bound are positive. This can be corrected by the caller of this method, e.g., by snapping the result to the closest value representing a bounded sum that is possible. Note that such post processing introduces bias to the result.

                                            func (*BoundedSumFloat64) ThresholdedResult

                                            func (bs *BoundedSumFloat64) ThresholdedResult(thresholdDela float64) *float64

                                              ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise, mechanism, it returns nil. Otherwise, it returns the result.

                                              type BoundedSumFloat64Options

                                              type BoundedSumFloat64Options struct {
                                              	Epsilon                  float64 // Privacy parameter ε. Required.
                                              	Delta                    float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise.
                                              	MaxPartitionsContributed int64   // How many distinct partitions may a single privacy unit contribute to? Defaults to 1.
                                              	// Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper.
                                              	Lower, Upper float64
                                              	Noise        noise.Noise // Type of noise used in BoundedSum. Defaults to Laplace noise.
                                              	// contains filtered or unexported fields
                                              }

                                                BoundedSumFloat64Options contains the options necessary to initialize a BoundedSumFloat64.

                                                type BoundedSumInt64

                                                type BoundedSumInt64 struct {
                                                	// contains filtered or unexported fields
                                                }

                                                  BoundedSumInt64 calculates a differentially private sum of a collection of int64 values. It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it assumes that for each BoundedSumInt64 instance (partition), each privacy unit contributes at most one value. If a privacy unit contributes more, the contributions should be pre-aggregated before passing them to BoundedSumInt64.

                                                  The provided differentially private sum is an unbiased estimate of the raw bounded sum in the sense that its expected value is equal to the raw bounded sum.

                                                  For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.

                                                  Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.

                                                  Not thread-safe.

                                                  func NewBoundedSumInt64

                                                  func NewBoundedSumInt64(opt *BoundedSumInt64Options) *BoundedSumInt64

                                                    NewBoundedSumInt64 returns a new BoundedSumInt64, whose sum is initialized at 0.

                                                    func (*BoundedSumInt64) Add

                                                    func (bs *BoundedSumInt64) Add(e int64)

                                                      Add adds a new summand to the BoundedSumInt64.

                                                      func (*BoundedSumInt64) ComputeConfidenceInterval

                                                      func (bs *BoundedSumInt64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)

                                                        ComputeConfidenceInterval computes a confidence interval with integer bounds that contains the true sum with a probability greater than or equal to 1 - alpha using the noised sum computed by Result(). The computation is based exclusively on the noised sum returned by Result(). Thus no privacy budget is consumed by this operation.

                                                        Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.

                                                        See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.

                                                        func (*BoundedSumInt64) GobDecode

                                                        func (bs *BoundedSumInt64) GobDecode(data []byte) error

                                                          GobDecode decodes BoundedSumInt64.

                                                          func (*BoundedSumInt64) GobEncode

                                                          func (bs *BoundedSumInt64) GobEncode() ([]byte, error)

                                                            GobEncode encodes BoundedSumInt64.

                                                            func (*BoundedSumInt64) Merge

                                                            func (bs *BoundedSumInt64) Merge(bs2 *BoundedSumInt64)

                                                              Merge merges bs2 into bs (i.e., adds to bs all entries that were added to bs2). bs2 is consumed by this operation: bs2 may not be used after it is merged into bs.

                                                              func (*BoundedSumInt64) Result

                                                              func (bs *BoundedSumInt64) Result() int64

                                                                Result returns a differentially private estimate of the sum of bounded elements added so far. The method can be called only once.

                                                                The returned value is an unbiased estimate of the raw bounded sum.

                                                                The returned value may sometimes be outside the set of possible raw bounded sums, e.g., the differentially private bounded sum may be positive although neither the lower nor the upper bound are positive. This can be corrected by the caller of this method, e.g., by snapping the result to the closest value representing a bounded sum that is possible. Note that such post processing introduces bias to the result.

                                                                func (*BoundedSumInt64) ThresholdedResult

                                                                func (bs *BoundedSumInt64) ThresholdedResult(thresholdDelta float64) *int64

                                                                  ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise mechanism, it returns nil. Otherwise, it returns the result.

                                                                  type BoundedSumInt64Options

                                                                  type BoundedSumInt64Options struct {
                                                                  	Epsilon                  float64 // Privacy parameter ε. Required.
                                                                  	Delta                    float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise.
                                                                  	MaxPartitionsContributed int64   // How many distinct partitions may a single privacy unit contribute to? Defaults to 1.
                                                                  	// Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper.
                                                                  	Lower, Upper int64
                                                                  	Noise        noise.Noise // Type of noise used in BoundedSum. Defaults to Laplace noise.
                                                                  	// contains filtered or unexported fields
                                                                  }

                                                                    BoundedSumInt64Options contains the options necessary to initialize a BoundedSumInt64.

                                                                    type Count

                                                                    type Count struct {
                                                                    	// contains filtered or unexported fields
                                                                    }

                                                                      Count calculates a differentially private count of a collection of values using the Laplace or Gaussian mechanism

                                                                      It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it does not support multiple contributions to a single partition from the same privacy unit. For that use case, BoundedSumInt64 should be used instead.

                                                                      The provided differentially private count is an unbiased estimate of the raw count meaning that its expected value is equal to the raw count.

                                                                      For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.

                                                                      Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.

                                                                      Not thread-safe.

                                                                      func NewCount

                                                                      func NewCount(opt *CountOptions) *Count

                                                                        NewCount returns a new Count, initialized at 0.

                                                                        func (*Count) ComputeConfidenceInterval

                                                                        func (c *Count) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)

                                                                          ComputeConfidenceInterval computes a confidence interval with integer bounds that contains the true count with a probability greater than or equal to 1 - alpha using the noised count computed by Result(). The computation is based exclusively on the noised count returned by Result(). Thus no privacy budget is consumed by this operation.

                                                                          Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.

                                                                          See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.

                                                                          func (*Count) GobDecode

                                                                          func (c *Count) GobDecode(data []byte) error

                                                                            GobDecode decodes Count.

                                                                            func (*Count) GobEncode

                                                                            func (c *Count) GobEncode() ([]byte, error)

                                                                              GobEncode encodes Count.

                                                                              func (*Count) Increment

                                                                              func (c *Count) Increment()

                                                                                Increment increments the count by one.

                                                                                func (*Count) IncrementBy

                                                                                func (c *Count) IncrementBy(count int64)

                                                                                  IncrementBy increments the count by the given value. Note that this shouldn't be used to count multiple contributions to a single partition from the same privacy unit.

                                                                                  func (*Count) Merge

                                                                                  func (c *Count) Merge(c2 *Count)

                                                                                    Merge merges c2 into c (i.e., adds to c all entries that were added to c2). c2 is consumed by this operation: it may not be used after it is merged into c.

                                                                                    func (*Count) Result

                                                                                    func (c *Count) Result() int64

                                                                                      Result returns a differentially private estimate of the current count. The method can be called only once.

                                                                                      The returned value is an unbiased estimate of the raw count.

                                                                                      The returned value may sometimes be negative. This can be corrected by setting negative results to 0. Note that such post processing introduces bias to the result.

                                                                                      func (*Count) ThresholdedResult

                                                                                      func (c *Count) ThresholdedResult(thresholdDelta float64) *int64

                                                                                        ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise mechanism, it returns nil. Otherwise, it returns the result.

                                                                                        type CountOptions

                                                                                        type CountOptions struct {
                                                                                        	Epsilon                  float64     // Privacy parameter ε. Required.
                                                                                        	Delta                    float64     // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise.
                                                                                        	MaxPartitionsContributed int64       // How many distinct partitions may a single privacy unit contribute to? Defaults to 1.
                                                                                        	Noise                    noise.Noise // Type of noise used. Defaults to Laplace noise.
                                                                                        	// contains filtered or unexported fields
                                                                                        }

                                                                                          CountOptions contains the options necessary to initialize a Count.

                                                                                          type PreAggSelectPartition

                                                                                          type PreAggSelectPartition struct {
                                                                                          	// contains filtered or unexported fields
                                                                                          }

                                                                                            PreAggSelectPartition is used to compute an (ε,δ)-differentially private decision of whether to materialize a partition.

                                                                                            Many differential privacy mechanisms work by performing an aggregation and adding noise. They achieve (ε_m,δ_m)-differential privacy under the assumption that partitions are chosen in advance. In other words, they assume that even if no data is associated with a partition, noise is added to the empty aggregation, and the noisy result is materialized. However, when only partitions containing data are materialized, such mechanisms fail to protect privacy for partitions containing data from a single privacy ID (e.g., user). To fix this, partitions with small numbers of privacy IDs must sometimes be dropped in order to maintain privacy. This process of partition selection incurs an additional (ε,δ) differential privacy budget resulting in a total differential privacy budget of (ε+ε_m,δ+δ_m) being used for the aggregation with partition selection.

                                                                                            Depending on the l0sensitivity, the PreAggSelectPartition uses one of two differentially private partition selection algorithms.

                                                                                            When l0sensitivity ≤ 3, the partition selection process is made (ε,δ) differentially private by applying the definition of differential privacy to the count of privacy IDs. Supposing l0Sensitivity bounds the number of partitions a privacy ID may contribute to, we define:

                                                                                            pε := ε/l0Sensitivity
                                                                                            pδ := δ/l0Sensitivity
                                                                                            

                                                                                            to be the per-partition differential privacy losses incurred by the partition selection process. Letting n denote the number of privacy IDs in a partition, the probability of selecting a partition is given by the following recurrence relation:

                                                                                            keepPartitionProbability(n) = min(
                                                                                                    keepPartitionProbability(n-1) * exp(pε) + pδ,              (1)
                                                                                                    1 - exp(-pε) * (1-keepPartitionProbability(n-1)-pδ),       (2)
                                                                                                    1                                                   (3)
                                                                                                )
                                                                                            

                                                                                            with base case keepPartitionProbability(0) = 0. This formula is optimal in terms of maximizing the likelihood of selecting a partition under (ε,δ)-differential privacy, with the caveat that the input values for pε and pδ are lower bound approximations. For efficiency, we use a closed-form solution to this recurrence relation. See [Differentially private partition selection paper] https://arxiv.org/pdf/2006.03684.pdf for details on the underlying mathematics.

                                                                                            When l0sensitivity > 3, the partition selection process is made (ε,δ) differentially private by using the ThresholdedResult() of the Count primitive with Gaussian noise. Count computes a (ε,δ/2) differentially private count of the privacy IDs in a partition by adding Gaussian noise. Then, it computes a threshold T for which the probability that a (ε,δ/2) differentially private count of a single privacy ID can exceed T is δ/2. It keeps the partition iff differentially private count exceeds the threshold.

                                                                                            The reason two different algorithms for deciding whether to keep a partition is used is because the first algorithm ("magic partition selection") is optimal when l0sensitivity ≤ 3 but is outperformed by Gaussian-based thresholding when l0sensitivity > 3.

                                                                                            PreAggSelectPartition is a utility for maintaining the count of IDs in a single partition and then determining whether the partition should be materialized. Use Increment() to increment the count of IDs and ShouldKeepPartition() to decide if the partition should be materialized.

                                                                                            func NewPreAggSelectPartition

                                                                                            func NewPreAggSelectPartition(opt *PreAggSelectPartitionOptions) *PreAggSelectPartition

                                                                                              NewPreAggSelectPartition constructs a new PreAggSelectPartition from opt.

                                                                                              func (*PreAggSelectPartition) GetHardThreshold

                                                                                              func (s *PreAggSelectPartition) GetHardThreshold() int

                                                                                                GetHardThreshold returns a threshold k, where if there are at least k privacy units in a partition, we are guaranteed to keep that partition.

                                                                                                This is the conceptual equivalent of the post-aggregation threshold of the noise.Noise interface with the difference that here there is 0 probability of not keeping the partition if it has at least k privacy units, whereas with the post-aggregation threshold there is a non-zero probability (however small).

                                                                                                func (*PreAggSelectPartition) GobDecode

                                                                                                func (s *PreAggSelectPartition) GobDecode(data []byte) error

                                                                                                  GobDecode decodes PreAggSelectPartition.

                                                                                                  func (*PreAggSelectPartition) GobEncode

                                                                                                  func (s *PreAggSelectPartition) GobEncode() ([]byte, error)

                                                                                                    GobEncode encodes PreAggSelectPartition.

                                                                                                    func (*PreAggSelectPartition) Increment

                                                                                                    func (s *PreAggSelectPartition) Increment()

                                                                                                      Increment increments the ids count by one. The caller must ensure this methods called at most once per privacy ID.

                                                                                                      func (*PreAggSelectPartition) Merge

                                                                                                        Merge merges s2 into s (i.e., add the idCount of s2 to s). This implicitly assumes that s and s2 act on distinct privacy IDs. s2 is consumed by this operation: s2 may not be used after it is merged into s.

                                                                                                        Preconditions: s and s2 must have the same privacy parameters. In addition, ShouldKeepPartition() may not be called yet for either s or s2.

                                                                                                        func (*PreAggSelectPartition) ShouldKeepPartition

                                                                                                        func (s *PreAggSelectPartition) ShouldKeepPartition() bool

                                                                                                          ShouldKeepPartition returns whether the partition should be materialized.

                                                                                                          func (*PreAggSelectPartition) String

                                                                                                          func (s *PreAggSelectPartition) String() string

                                                                                                          type PreAggSelectPartitionOptions

                                                                                                          type PreAggSelectPartitionOptions struct {
                                                                                                          	// Epsilon and Delta specify the (ε,δ)-differential privacy budget used for
                                                                                                          	// partition selection. Required.
                                                                                                          	Epsilon float64
                                                                                                          	Delta   float64
                                                                                                          	// MaxPartitionsContributed is the number of distinct partitions a single
                                                                                                          	// privacy unit can contribute to. Defaults to 1.
                                                                                                          	MaxPartitionsContributed int64
                                                                                                          }

                                                                                                            PreAggSelectPartitionOptions is used to set the privacy parameters when constructing a PreAggSelectPartition.