Averaged Value

The AveragedValue data structure keeps track of the count and mean of a stream of numbers with a single pass over the data. The mean calculation uses a numerically stable online algorithm suitable for large numbers of records, similar to Chan et. al.’s parallel variance algorithm on Wikipedia. As long as your count doesn’t overflow a Long, the mean calculation won’t overflow.

You can build instances of AveragedValue from any numeric type:

import com.twitter.algebird._
// import com.twitter.algebird._

val longVal = AveragedValue(3L)
// longVal: com.twitter.algebird.AveragedValue = AveragedValue(1,3.0)

val doubleVal = AveragedValue(12.0)
// doubleVal: com.twitter.algebird.AveragedValue = AveragedValue(1,12.0)

val intVal = AveragedValue(15)
// intVal: com.twitter.algebird.AveragedValue = AveragedValue(1,15.0)

Algebraic Properties

Combining instances with + generates a new instance by adding the counts and averaging the values:

longVal + doubleVal
// res0: com.twitter.algebird.AveragedValue = AveragedValue(2,7.5)

longVal + doubleVal + intVal
// res1: com.twitter.algebird.AveragedValue = AveragedValue(3,10.0)

You can also add numbers directly to an AveragedValue instance:

longVal + 12
// res2: com.twitter.algebird.AveragedValue = AveragedValue(2,7.5)

AveragedValue is a commutative group. This means you can add instances in any order:

longVal + doubleVal == doubleVal + doubleVal
// res3: Boolean = false

An AveragedValue with a count and value of 0 act as Monoid.zero:

Monoid.zero[AveragedValue]
// res4: com.twitter.algebird.AveragedValue = AveragedValue(0,0.0)

longVal + Monoid.zero[AveragedValue] == longVal
// res5: Boolean = true

Subtracting AveragedValues is the opposite of addition:

intVal - longVal
// res6: com.twitter.algebird.AveragedValue = AveragedValue(0,0.0)

intVal + doubleVal - doubleVal
// res7: com.twitter.algebird.AveragedValue = AveragedValue(1,15.0)

Stable Average Algorithm

Each AveragedValue instance represents a stream of data. AveragedValue uses Chan et. al.’s parallel algorithm to combine the mean values of each stream. Here’s the calculation:

// big and small are two AveragedValue instances
val deltaOfMeans = big.value - small.value
val newCount = big.count + small.count
val newMean = small.value + deltaOfMeans * (big.count / newCount)

As long as newCount stays within the bounds of Long, this calculation won’t overflow for large value.

The Wikipedia presentation of this algorithm points out that if both streams are close in size, the algorithm is “prone to loss of precision due to catastrophic cancellation”.

AveragedValue guards against this instability by taking a weighted average of the two instances if the smaller of the two has a count greater than 10% of the combined count.

val newCount = big.count + small.count
(big.count * big.value + small.count * small.value) / newCount

Aggregator

AveragedValue.aggregator returns an Aggregator that uses AveragedValue to calculate the mean of all Double values in a stream. For example:

val items = List[Double](1.0, 2.2, 3.3, 4.4, 5.5)
// items: List[Double] = List(1.0, 2.2, 3.3, 4.4, 5.5)

AveragedValue.aggregator(items)
// res8: Double = 3.28

AveragedValue.numericAggregator works the same way for any numeric type:

val items = List[Int](1, 3, 5, 7)
// items: List[Int] = List(1, 3, 5, 7)

AveragedValue.numericAggregator[Int].apply(items)
// res9: Double = 4.0
  • DecayedValue calculates a decayed moving average of a stream; this allows you to approximate the mean of a bounded sliding window of the datastream.
  • In addition to count and mean, Moments allows you to keep track of the standard deviation, skewness and kurtosis of a stream with one pass over the data.

Documentation Help

We’d love your help fleshing out this documentation! You can edit this page in your browser by clicking this link.