Documentation
¶
Overview ¶
Package dpagg contains differentially private aggregation primitives.
Index ¶
- Variables
- func ClampFloat64(e, lower, upper float64) (float64, error)
- func ClampInt64(e, lower, upper int64) (int64, error)
- type BoundedMeanFloat64
- func (bm *BoundedMeanFloat64) Add(e float64)
- func (bm *BoundedMeanFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
- func (bm *BoundedMeanFloat64) GobDecode(data []byte) error
- func (bm *BoundedMeanFloat64) GobEncode() ([]byte, error)
- func (bm *BoundedMeanFloat64) Merge(bm2 *BoundedMeanFloat64)
- func (bm *BoundedMeanFloat64) Result() float64
- type BoundedMeanFloat64Options
- type BoundedSumFloat64
- func (bs *BoundedSumFloat64) Add(e float64)
- func (bs *BoundedSumFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
- func (bs *BoundedSumFloat64) GobDecode(data []byte) error
- func (bs *BoundedSumFloat64) GobEncode() ([]byte, error)
- func (bs *BoundedSumFloat64) Merge(bs2 *BoundedSumFloat64)
- func (bs *BoundedSumFloat64) Result() float64
- func (bs *BoundedSumFloat64) ThresholdedResult(thresholdDela float64) *float64
- type BoundedSumFloat64Options
- type BoundedSumInt64
- func (bs *BoundedSumInt64) Add(e int64)
- func (bs *BoundedSumInt64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
- func (bs *BoundedSumInt64) GobDecode(data []byte) error
- func (bs *BoundedSumInt64) GobEncode() ([]byte, error)
- func (bs *BoundedSumInt64) Merge(bs2 *BoundedSumInt64)
- func (bs *BoundedSumInt64) Result() int64
- func (bs *BoundedSumInt64) ThresholdedResult(thresholdDelta float64) *int64
- type BoundedSumInt64Options
- type Count
- func (c *Count) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
- func (c *Count) GobDecode(data []byte) error
- func (c *Count) GobEncode() ([]byte, error)
- func (c *Count) Increment()
- func (c *Count) IncrementBy(count int64)
- func (c *Count) Merge(c2 *Count)
- func (c *Count) Result() int64
- func (c *Count) ThresholdedResult(thresholdDelta float64) *int64
- type CountOptions
- type PreAggSelectPartition
- func (s *PreAggSelectPartition) GetHardThreshold() int
- func (s *PreAggSelectPartition) GobDecode(data []byte) error
- func (s *PreAggSelectPartition) GobEncode() ([]byte, error)
- func (s *PreAggSelectPartition) Increment()
- func (s *PreAggSelectPartition) Merge(s2 *PreAggSelectPartition)
- func (s *PreAggSelectPartition) ShouldKeepPartition() bool
- func (s *PreAggSelectPartition) String() string
- type PreAggSelectPartitionOptions
Constants ¶
Variables ¶
var LargestRepresentableDelta = 1 - math.Pow(2, -53)
LargestRepresentableDelta is the largest delta we could support in 64 bit precision, approximately equal to one.
Functions ¶
func ClampFloat64 ¶
ClampFloat64 clamps e within lower and upper, such that lower is returned if e < lower, and upper is returned if e > upper. Otherwise, e is returned.
func ClampInt64 ¶
ClampInt64 clamps e within lower and upper. Returns lower if e < lower. Returns upper if e > upper.
Types ¶
type BoundedMeanFloat64 ¶
type BoundedMeanFloat64 struct {
// contains filtered or unexported fields
}
BoundedMeanFloat64 calculates a differentially private mean of a collection of float64 values.
The mean is computed by dividing a noisy sum of the entries by a noisy count of the entries. To improve utility, all entries are normalized by setting them to the difference between their actual value and the middle of the input range before summation. The original mean is recovered by adding the midpoint in a post-processing step. This idea is taken from Algorithm 2.4 of "Differential Privacy: From Theory to Practice", by Ninghui Li, Min Lyu, Dong Su and Weining Yang (section 2.5.5, page 28). In contrast to Algorithm 2.4, we do not return the midpoint if the noisy count is less or equal to 1. Instead, we set the noisy count to 1. Since this is a mere post-processing step, the DP bounds are preserved. Moreover, for small numbers of entries, this approach will return results that are closer to the actual mean in expectation.
BoundedMeanFloat64 supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) as well as contribute to the same partition multiple times (via the MaxContributionsPerPartition parameter), by scaling the added noise appropriately.
For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.
Note: Do not use when your results may cause overflows for int64 or float64 values. This aggregation is not hardened for such applications yet.
Not thread-safe.
func NewBoundedMeanFloat64 ¶
func NewBoundedMeanFloat64(opt *BoundedMeanFloat64Options) *BoundedMeanFloat64
NewBoundedMeanFloat64 returns a new BoundedMeanFloat64.
func (*BoundedMeanFloat64) Add ¶
func (bm *BoundedMeanFloat64) Add(e float64)
Add an entry to a BoundedMeanFloat64. It skips NaN entries and doesn't count them in the final result because introducing even a single NaN entry will result in a NaN mean regardless of other entries, which would break the indistinguishability property required for differential privacy.
func (*BoundedMeanFloat64) ComputeConfidenceInterval ¶
func (bm *BoundedMeanFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
ComputeConfidenceInterval computes a confidence interval that contains the true mean with probability greater than or equal to 1 - alpha. The computation is based exclusively on noised data and the privacy parameters. Thus no privacy budget is consumed by this operation.
Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.
func (*BoundedMeanFloat64) GobDecode ¶
func (bm *BoundedMeanFloat64) GobDecode(data []byte) error
GobDecode decodes Count.
func (*BoundedMeanFloat64) GobEncode ¶
func (bm *BoundedMeanFloat64) GobEncode() ([]byte, error)
GobEncode encodes Count.
func (*BoundedMeanFloat64) Merge ¶
func (bm *BoundedMeanFloat64) Merge(bm2 *BoundedMeanFloat64)
Merge merges bm2 into bm (i.e., adds to bm all entries that were added to bm2). bm2 is consumed by this operation: bm2 may not be used after it is merged into bm.
func (*BoundedMeanFloat64) Result ¶
func (bm *BoundedMeanFloat64) Result() float64
Result returns a differentially private estimate of the average of bounded elements added so far. The method can be called only once.
Note that the returned value is not an unbiased estimate of the raw bounded mean.
type BoundedMeanFloat64Options ¶
type BoundedMeanFloat64Options struct { Epsilon float64 // Privacy parameter ε. Required. Delta float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise. MaxPartitionsContributed int64 // How many distinct partitions may a single user contribute to? Defaults to 1. MaxContributionsPerPartition int64 // How many times may a single user contribute to a single partition? Required. // Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper. Lower, Upper float64 Noise noise.Noise // Type of noise used in BoundedMean. Defaults to Laplace noise. }
BoundedMeanFloat64Options contains the options necessary to initialize a BoundedMeanFloat64.
type BoundedSumFloat64 ¶
type BoundedSumFloat64 struct {
// contains filtered or unexported fields
}
BoundedSumFloat64 calculates a differentially private sum of a collection of float64 values. It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it assumes that for each BoundedSumFloat64 instance (partition), each privacy unit contributes at most one value. If a privacy unit contributes more, the contributions should be pre-aggregated before passing them to BoundedSumFloat64.
The provided differentially private sum is an unbiased estimate of the raw bounded sum meaning that its expected value is equal to the raw bounded sum.
For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions,
Not thread-safe.
func NewBoundedSumFloat64 ¶
func NewBoundedSumFloat64(opt *BoundedSumFloat64Options) *BoundedSumFloat64
NewBoundedSumFloat64 returns a new BoundedSumFloat64, whose sum is initialized at 0.
func (*BoundedSumFloat64) Add ¶
func (bs *BoundedSumFloat64) Add(e float64)
Add adds a new summand to the BoundedSumFloat64. It ignores NaN summands because introducing even a single NaN summand will result in a NaN sum regardless of other summands, which would break the indistinguishability property required for differential privacy.
func (*BoundedSumFloat64) ComputeConfidenceInterval ¶
func (bs *BoundedSumFloat64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
ComputeConfidenceInterval computes a confidence interval that contains the true sum with a probability greater than or equal to 1 - alpha using the noised sum computed by Result(). The computation is based exclusively on the noised sum returned by Result(). Thus no privacy budget is consumed by this operation.
Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.
See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.
func (*BoundedSumFloat64) GobDecode ¶
func (bs *BoundedSumFloat64) GobDecode(data []byte) error
GobDecode decodes BoundedSumInt64.
func (*BoundedSumFloat64) GobEncode ¶
func (bs *BoundedSumFloat64) GobEncode() ([]byte, error)
GobEncode encodes BoundedSumInt64.
func (*BoundedSumFloat64) Merge ¶
func (bs *BoundedSumFloat64) Merge(bs2 *BoundedSumFloat64)
Merge merges bs2 into bs (i.e., adds to bs all entries that were added to bs2). bs2 is consumed by this operation: bs2 may not be used after it is merged into bs.
func (*BoundedSumFloat64) Result ¶
func (bs *BoundedSumFloat64) Result() float64
Result returns a differentially private estimate of the sum of bounded elements added so far. The method can be called only once.
The returned value is an unbiased estimate of the raw bounded sum.
The returned value may sometimes be outside the set of possible raw bounded sums, e.g., the differentially private bounded sum may be positive although neither the lower nor the upper bound are positive. This can be corrected by the caller of this method, e.g., by snapping the result to the closest value representing a bounded sum that is possible. Note that such post processing introduces bias to the result.
func (*BoundedSumFloat64) ThresholdedResult ¶
func (bs *BoundedSumFloat64) ThresholdedResult(thresholdDela float64) *float64
ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise, mechanism, it returns nil. Otherwise, it returns the result.
type BoundedSumFloat64Options ¶
type BoundedSumFloat64Options struct { Epsilon float64 // Privacy parameter ε. Required. Delta float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise. MaxPartitionsContributed int64 // How many distinct partitions may a single privacy unit contribute to? Defaults to 1. // Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper. Lower, Upper float64 Noise noise.Noise // Type of noise used in BoundedSum. Defaults to Laplace noise. // contains filtered or unexported fields }
BoundedSumFloat64Options contains the options necessary to initialize a BoundedSumFloat64.
type BoundedSumInt64 ¶
type BoundedSumInt64 struct {
// contains filtered or unexported fields
}
BoundedSumInt64 calculates a differentially private sum of a collection of int64 values. It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it assumes that for each BoundedSumInt64 instance (partition), each privacy unit contributes at most one value. If a privacy unit contributes more, the contributions should be pre-aggregated before passing them to BoundedSumInt64.
The provided differentially private sum is an unbiased estimate of the raw bounded sum in the sense that its expected value is equal to the raw bounded sum.
For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.
Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.
Not thread-safe.
func NewBoundedSumInt64 ¶
func NewBoundedSumInt64(opt *BoundedSumInt64Options) *BoundedSumInt64
NewBoundedSumInt64 returns a new BoundedSumInt64, whose sum is initialized at 0.
func (*BoundedSumInt64) Add ¶
func (bs *BoundedSumInt64) Add(e int64)
Add adds a new summand to the BoundedSumInt64.
func (*BoundedSumInt64) ComputeConfidenceInterval ¶
func (bs *BoundedSumInt64) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
ComputeConfidenceInterval computes a confidence interval with integer bounds that contains the true sum with a probability greater than or equal to 1 - alpha using the noised sum computed by Result(). The computation is based exclusively on the noised sum returned by Result(). Thus no privacy budget is consumed by this operation.
Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.
See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.
func (*BoundedSumInt64) GobDecode ¶
func (bs *BoundedSumInt64) GobDecode(data []byte) error
GobDecode decodes BoundedSumInt64.
func (*BoundedSumInt64) GobEncode ¶
func (bs *BoundedSumInt64) GobEncode() ([]byte, error)
GobEncode encodes BoundedSumInt64.
func (*BoundedSumInt64) Merge ¶
func (bs *BoundedSumInt64) Merge(bs2 *BoundedSumInt64)
Merge merges bs2 into bs (i.e., adds to bs all entries that were added to bs2). bs2 is consumed by this operation: bs2 may not be used after it is merged into bs.
func (*BoundedSumInt64) Result ¶
func (bs *BoundedSumInt64) Result() int64
Result returns a differentially private estimate of the sum of bounded elements added so far. The method can be called only once.
The returned value is an unbiased estimate of the raw bounded sum.
The returned value may sometimes be outside the set of possible raw bounded sums, e.g., the differentially private bounded sum may be positive although neither the lower nor the upper bound are positive. This can be corrected by the caller of this method, e.g., by snapping the result to the closest value representing a bounded sum that is possible. Note that such post processing introduces bias to the result.
func (*BoundedSumInt64) ThresholdedResult ¶
func (bs *BoundedSumInt64) ThresholdedResult(thresholdDelta float64) *int64
ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise mechanism, it returns nil. Otherwise, it returns the result.
type BoundedSumInt64Options ¶
type BoundedSumInt64Options struct { Epsilon float64 // Privacy parameter ε. Required. Delta float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise. MaxPartitionsContributed int64 // How many distinct partitions may a single privacy unit contribute to? Defaults to 1. // Lower and Upper bounds for clamping. Default to 0; must be such that Lower < Upper. Lower, Upper int64 Noise noise.Noise // Type of noise used in BoundedSum. Defaults to Laplace noise. // contains filtered or unexported fields }
BoundedSumInt64Options contains the options necessary to initialize a BoundedSumInt64.
type Count ¶
type Count struct {
// contains filtered or unexported fields
}
Count calculates a differentially private count of a collection of values using the Laplace or Gaussian mechanism
It supports privacy units that contribute to multiple partitions (via the MaxPartitionsContributed parameter) by scaling the added noise appropriately. However, it does not support multiple contributions to a single partition from the same privacy unit. For that use case, BoundedSumInt64 should be used instead.
The provided differentially private count is an unbiased estimate of the raw count meaning that its expected value is equal to the raw count.
For general details and key definitions, see https://github.com/google/differential-privacy/blob/main/differential_privacy.md#key-definitions.
Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.
Not thread-safe.
func NewCount ¶
func NewCount(opt *CountOptions) *Count
NewCount returns a new Count, initialized at 0.
func (*Count) ComputeConfidenceInterval ¶
func (c *Count) ComputeConfidenceInterval(alpha float64) (noise.ConfidenceInterval, error)
ComputeConfidenceInterval computes a confidence interval with integer bounds that contains the true count with a probability greater than or equal to 1 - alpha using the noised count computed by Result(). The computation is based exclusively on the noised count returned by Result(). Thus no privacy budget is consumed by this operation.
Result() needs to be called before ComputeConfidenceInterval, otherwise this will return an error.
See https://github.com/google/differential-privacy/tree/main/common_docs/confidence_intervals.md.
func (*Count) IncrementBy ¶
IncrementBy increments the count by the given value. Note that this shouldn't be used to count multiple contributions to a single partition from the same privacy unit.
func (*Count) Merge ¶
Merge merges c2 into c (i.e., adds to c all entries that were added to c2). c2 is consumed by this operation: it may not be used after it is merged into c.
func (*Count) Result ¶
Result returns a differentially private estimate of the current count. The method can be called only once.
The returned value is an unbiased estimate of the raw count.
The returned value may sometimes be negative. This can be corrected by setting negative results to 0. Note that such post processing introduces bias to the result.
func (*Count) ThresholdedResult ¶
ThresholdedResult is similar to Result() but applies thresholding to the result. So, if the result is less than the threshold specified by the noise mechanism, it returns nil. Otherwise, it returns the result.
type CountOptions ¶
type CountOptions struct { Epsilon float64 // Privacy parameter ε. Required. Delta float64 // Privacy parameter δ. Required with Gaussian noise, must be 0 with Laplace noise. MaxPartitionsContributed int64 // How many distinct partitions may a single privacy unit contribute to? Defaults to 1. Noise noise.Noise // Type of noise used. Defaults to Laplace noise. // contains filtered or unexported fields }
CountOptions contains the options necessary to initialize a Count.
type PreAggSelectPartition ¶
type PreAggSelectPartition struct {
// contains filtered or unexported fields
}
PreAggSelectPartition is used to compute an (ε,δ)-differentially private decision of whether to materialize a partition.
Many differential privacy mechanisms work by performing an aggregation and adding noise. They achieve (ε_m,δ_m)-differential privacy under the assumption that partitions are chosen in advance. In other words, they assume that even if no data is associated with a partition, noise is added to the empty aggregation, and the noisy result is materialized. However, when only partitions containing data are materialized, such mechanisms fail to protect privacy for partitions containing data from a single privacy ID (e.g., user). To fix this, partitions with small numbers of privacy IDs must sometimes be dropped in order to maintain privacy. This process of partition selection incurs an additional (ε,δ) differential privacy budget resulting in a total differential privacy budget of (ε+ε_m,δ+δ_m) being used for the aggregation with partition selection.
Depending on the l0sensitivity, the PreAggSelectPartition uses one of two differentially private partition selection algorithms.
When l0sensitivity ≤ 3, the partition selection process is made (ε,δ) differentially private by applying the definition of differential privacy to the count of privacy IDs. Supposing l0Sensitivity bounds the number of partitions a privacy ID may contribute to, we define:
pε := ε/l0Sensitivity pδ := δ/l0Sensitivity
to be the per-partition differential privacy losses incurred by the partition selection process. Letting n denote the number of privacy IDs in a partition, the probability of selecting a partition is given by the following recurrence relation:
keepPartitionProbability(n) = min( keepPartitionProbability(n-1) * exp(pε) + pδ, (1) 1 - exp(-pε) * (1-keepPartitionProbability(n-1)-pδ), (2) 1 (3) )
with base case keepPartitionProbability(0) = 0. This formula is optimal in terms of maximizing the likelihood of selecting a partition under (ε,δ)-differential privacy, with the caveat that the input values for pε and pδ are lower bound approximations. For efficiency, we use a closed-form solution to this recurrence relation. See [Differentially private partition selection paper] https://arxiv.org/pdf/2006.03684.pdf for details on the underlying mathematics.
When l0sensitivity > 3, the partition selection process is made (ε,δ) differentially private by using the ThresholdedResult() of the Count primitive with Gaussian noise. Count computes a (ε,δ/2) differentially private count of the privacy IDs in a partition by adding Gaussian noise. Then, it computes a threshold T for which the probability that a (ε,δ/2) differentially private count of a single privacy ID can exceed T is δ/2. It keeps the partition iff differentially private count exceeds the threshold.
The reason two different algorithms for deciding whether to keep a partition is used is because the first algorithm ("magic partition selection") is optimal when l0sensitivity ≤ 3 but is outperformed by Gaussian-based thresholding when l0sensitivity > 3.
PreAggSelectPartition is a utility for maintaining the count of IDs in a single partition and then determining whether the partition should be materialized. Use Increment() to increment the count of IDs and ShouldKeepPartition() to decide if the partition should be materialized.
func NewPreAggSelectPartition ¶
func NewPreAggSelectPartition(opt *PreAggSelectPartitionOptions) *PreAggSelectPartition
NewPreAggSelectPartition constructs a new PreAggSelectPartition from opt.
func (*PreAggSelectPartition) GetHardThreshold ¶
func (s *PreAggSelectPartition) GetHardThreshold() int
GetHardThreshold returns a threshold k, where if there are at least k privacy units in a partition, we are guaranteed to keep that partition.
This is the conceptual equivalent of the post-aggregation threshold of the noise.Noise interface with the difference that here there is 0 probability of not keeping the partition if it has at least k privacy units, whereas with the post-aggregation threshold there is a non-zero probability (however small).
func (*PreAggSelectPartition) GobDecode ¶
func (s *PreAggSelectPartition) GobDecode(data []byte) error
GobDecode decodes PreAggSelectPartition.
func (*PreAggSelectPartition) GobEncode ¶
func (s *PreAggSelectPartition) GobEncode() ([]byte, error)
GobEncode encodes PreAggSelectPartition.
func (*PreAggSelectPartition) Increment ¶
func (s *PreAggSelectPartition) Increment()
Increment increments the ids count by one. The caller must ensure this methods called at most once per privacy ID.
func (*PreAggSelectPartition) Merge ¶
func (s *PreAggSelectPartition) Merge(s2 *PreAggSelectPartition)
Merge merges s2 into s (i.e., add the idCount of s2 to s). This implicitly assumes that s and s2 act on distinct privacy IDs. s2 is consumed by this operation: s2 may not be used after it is merged into s.
Preconditions: s and s2 must have the same privacy parameters. In addition, ShouldKeepPartition() may not be called yet for either s or s2.
func (*PreAggSelectPartition) ShouldKeepPartition ¶
func (s *PreAggSelectPartition) ShouldKeepPartition() bool
ShouldKeepPartition returns whether the partition should be materialized.
func (*PreAggSelectPartition) String ¶
func (s *PreAggSelectPartition) String() string
type PreAggSelectPartitionOptions ¶
type PreAggSelectPartitionOptions struct { // Epsilon and Delta specify the (ε,δ)-differential privacy budget used for // partition selection. Required. Epsilon float64 Delta float64 // MaxPartitionsContributed is the number of distinct partitions a single // privacy unit can contribute to. Defaults to 1. MaxPartitionsContributed int64 }
PreAggSelectPartitionOptions is used to set the privacy parameters when constructing a PreAggSelectPartition.