Q Tree

A QTree[A: Monoid] is like an approximate Map[Double, A: Monoid] that can store only a fixed number of keys. Instead of storing key-value pairs, it stores range-value pairs, where each range is a small Double interval, and the ranges don’t overlap. When there are too many keys, it combines nearby range-value pairs, combining the ranges, and combining the values using the Monoid[A] operation. Each range-value pair also keeps track of its count, i.e. the number of key-value pairs the range is composed of.

For example, suppose we create a QTree[String] from the following values: (remember that String is a Monoid under concatenation)

- 0.1, "he"
- 0.2, "ll"
- 0.21, "o"
- 0.26, "wo"
- 0.3, "r"
- 0.4, "l"
- 0.49, "d"

This QTree might be represented as

- \[0,0.125), "he", count=1
- \[0.125,0.25), "llo", count=2
- \[0.25, 0.375), "wor", count=2
- \[0.375, 0.5), "ld", count=2

or as

- \[0, 0.25), "hello", count=3
- \[0.25, 0.5), "world", count=4

or as

- \[0,0.5), "helloworld", count=7

How is this useful?

The most common usage of QTree is as QTree[Unit]. (Remember, Unit is a trivial monoid, where () + () = () and () is the identity.)

This is basically a histogram. Why? When two ranges get combined, the counts get added together, so at any point, the count associated with a range R is exactly the number of original Double values in that range.

  • The quantileBounds function gives lower and upper bounds for a given quantile. For instance, qtree.quantileBounds(0.5) gives bounds for the median double value.
  • The rangeCountBounds function gives lower and upper bounds for the count within a certain range. For instance, qtree.rangeCountBounds(0.5, 2.5) gives lower and upper bounds for the number of elements in the range (0.5, 2.5).

What happens if we use a nontrivial monoid A, instead of Unit? Then we get some more useful functions:

  • rangeSumBounds gives lower and upper bounds for the monoid sum of the elements in a certain range.

For example, if we call qtree.rangeSumBounds(0.1,0.2) on the first QTree described above, it would return ("","hello"). Why is it a range, and not an exact value? This is because we only store the ranges and we don’t know exactly where in the range each item is. For example, "he" might exist between 0 and 0.09999 and "llo" might exist between 0.20001 and 0.25, in which case the true sum over (0.1,0.2) would be "". On the other hand, "he" might exist between 0.1 and 0.125, and "llo" might exist between 0.125 and 0.2, in which case the true sum would be "hello". The answer could also be anywhere in between these two results. If we look at the original data points, we see that the correct answer is actually "he", since "he" is the only value in the range [0.1,0.2).

  • totalSum returns the total sum over the entire tree. In the example above, qtree.totalSum would return "helloworld".

REPL Tour

import com.twitter.algebird._
// import com.twitter.algebird._

val data = List(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8)
// data: List[Int] = List(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8)

val seqQTree = data.map { QTree(_) }
// seqQTree: List[com.twitter.algebird.QTree[Long]] = List((1,0,1,1,None,None), (1,0,1,1,None,None), (2,0,1,2,None,None), (2,0,1,2,None,None), (3,0,1,3,None,None), (3,0,1,3,None,None), (4,0,1,4,None,None), (4,0,1,4,None,None), (5,0,1,5,None,None), (5,0,1,5,None,None), (6,0,1,6,None,None), (6,0,1,6,None,None), (7,0,1,7,None,None), (7,0,1,7,None,None), (8,0,1,8,None,None), (8,0,1,8,None,None))

val qtSemigroup = new QTreeSemigroup[Long](6)
// qtSemigroup: com.twitter.algebird.QTreeSemigroup[Long] = com.twitter.algebird.QTreeSemigroup@7a9a2a9b

val sum = qtSemigroup.sumOption(seqQTree).get
// sum: com.twitter.algebird.QTree[Long] = (0,4,16,0,Some((0,3,14,0,Some((0,2,6,0,Some((0,1,2,0,None,Some((1,0,2,2,None,None)))),Some((1,1,4,0,Some((2,0,2,4,None,None)),Some((3,0,2,6,None,None)))))),Some((1,2,8,0,Some((2,1,4,0,Some((4,0,2,8,None,None)),Some((5,0,2,10,None,None)))),Some((3,1,4,0,Some((6,0,2,12,None,None)),Some((7,0,2,14,None,None)))))))),Some((1,3,2,0,Some((2,2,2,0,Some((4,1,2,0,Some((8,0,2,16,None,None)),None)),None)),None)))

val sum2 = seqQTree.reduce{qtSemigroup.plus(_,_)}
// sum2: com.twitter.algebird.QTree[Long] = (0,4,16,0,Some((0,3,14,0,Some((0,2,6,0,Some((0,1,2,0,None,Some((1,0,2,2,None,None)))),Some((1,1,4,0,Some((2,0,2,4,None,None)),Some((3,0,2,6,None,None)))))),Some((1,2,8,0,Some((2,1,4,0,Some((4,0,2,8,None,None)),Some((5,0,2,10,None,None)))),Some((3,1,4,0,Some((6,0,2,12,None,None)),Some((7,0,2,14,None,None)))))))),Some((1,3,2,0,Some((2,2,2,0,Some((4,1,2,0,Some((8,0,2,16,None,None)),None)),None)),None)))

sum.count
// warning: there was one inliner warning; re-run with -Yinline-warnings for details
// res0: Long = 16

sum.upperBound
// res1: Double = 16.0

sum.lowerBound
// res2: Double = 0.0

sum.quantileBounds(.5)
// res3: (Double, Double) = (5.0,6.0)

sum.quantileBounds(.5)
// res4: (Double, Double) = (5.0,6.0)

sum.quantileBounds(.25)
// res5: (Double, Double) = (3.0,4.0)

sum.quantileBounds(.75)
// res6: (Double, Double) = (7.0,8.0)

Documentation Help

We’d love your help fleshing out this documentation! You can edit this page in your browser by clicking this link. These links might be helpful: