Q Tree
A QTree[A: Monoid]
is like an approximate Map[Double, A: Monoid]
that can store only a fixed number of keys. Instead of storing key-value pairs, it stores range-value pairs, where each range is a small Double
interval, and the ranges don’t overlap. When there are too many keys, it combines nearby range-value pairs, combining the ranges, and combining the values using the Monoid[A]
operation. Each range-value pair also keeps track of its count, i.e. the number of key-value pairs the range is composed of.
For example, suppose we create a QTree[String]
from the following values: (remember that String
is a Monoid
under concatenation)
- 0.1, "he"
- 0.2, "ll"
- 0.21, "o"
- 0.26, "wo"
- 0.3, "r"
- 0.4, "l"
- 0.49, "d"
This QTree might be represented as
- \[0,0.125), "he", count=1
- \[0.125,0.25), "llo", count=2
- \[0.25, 0.375), "wor", count=2
- \[0.375, 0.5), "ld", count=2
or as
- \[0, 0.25), "hello", count=3
- \[0.25, 0.5), "world", count=4
or as
- \[0,0.5), "helloworld", count=7
How is this useful?
The most common usage of QTree
is as QTree[Unit]
. (Remember, Unit
is a trivial monoid, where () + () = ()
and ()
is the identity.)
This is basically a histogram. Why? When two ranges get combined, the counts get added together, so at any point, the count associated with a range R
is exactly the number of original Double
values in that range.
- The
quantileBounds
function gives lower and upper bounds for a given quantile. For instance,qtree.quantileBounds(0.5)
gives bounds for the median double value. - The
rangeCountBounds
function gives lower and upper bounds for the count within a certain range. For instance,qtree.rangeCountBounds(0.5, 2.5)
gives lower and upper bounds for the number of elements in the range (0.5, 2.5).
What happens if we use a nontrivial monoid A
, instead of Unit
? Then we get some more useful functions:
rangeSumBounds
gives lower and upper bounds for the monoid sum of the elements in a certain range.
For example, if we call qtree.rangeSumBounds(0.1,0.2)
on the first QTree
described above, it would return ("","hello")
. Why is it a range, and not an exact value? This is because we only store the ranges and we don’t know exactly where in the range each item is. For example, "he"
might exist between 0 and 0.09999 and "llo"
might exist between 0.20001 and 0.25, in which case the true sum over (0.1,0.2) would be ""
. On the other hand, "he"
might exist between 0.1 and 0.125, and "llo"
might exist between 0.125 and 0.2, in which case the true sum would be "hello"
. The answer could also be anywhere in between these two results. If we look at the original data points, we see that the correct answer is actually "he"
, since "he"
is the only value in the range [0.1,0.2).
totalSum
returns the total sum over the entire tree. In the example above,qtree.totalSum
would return"helloworld"
.
REPL Tour
import com.twitter.algebird._
// import com.twitter.algebird._
val data = List(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8)
// data: List[Int] = List(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8)
val seqQTree = data.map { QTree(_) }
// seqQTree: List[com.twitter.algebird.QTree[Long]] = List((1,0,1,1,None,None), (1,0,1,1,None,None), (2,0,1,2,None,None), (2,0,1,2,None,None), (3,0,1,3,None,None), (3,0,1,3,None,None), (4,0,1,4,None,None), (4,0,1,4,None,None), (5,0,1,5,None,None), (5,0,1,5,None,None), (6,0,1,6,None,None), (6,0,1,6,None,None), (7,0,1,7,None,None), (7,0,1,7,None,None), (8,0,1,8,None,None), (8,0,1,8,None,None))
val qtSemigroup = new QTreeSemigroup[Long](6)
// qtSemigroup: com.twitter.algebird.QTreeSemigroup[Long] = com.twitter.algebird.QTreeSemigroup@7a9a2a9b
val sum = qtSemigroup.sumOption(seqQTree).get
// sum: com.twitter.algebird.QTree[Long] = (0,4,16,0,Some((0,3,14,0,Some((0,2,6,0,Some((0,1,2,0,None,Some((1,0,2,2,None,None)))),Some((1,1,4,0,Some((2,0,2,4,None,None)),Some((3,0,2,6,None,None)))))),Some((1,2,8,0,Some((2,1,4,0,Some((4,0,2,8,None,None)),Some((5,0,2,10,None,None)))),Some((3,1,4,0,Some((6,0,2,12,None,None)),Some((7,0,2,14,None,None)))))))),Some((1,3,2,0,Some((2,2,2,0,Some((4,1,2,0,Some((8,0,2,16,None,None)),None)),None)),None)))
val sum2 = seqQTree.reduce{qtSemigroup.plus(_,_)}
// sum2: com.twitter.algebird.QTree[Long] = (0,4,16,0,Some((0,3,14,0,Some((0,2,6,0,Some((0,1,2,0,None,Some((1,0,2,2,None,None)))),Some((1,1,4,0,Some((2,0,2,4,None,None)),Some((3,0,2,6,None,None)))))),Some((1,2,8,0,Some((2,1,4,0,Some((4,0,2,8,None,None)),Some((5,0,2,10,None,None)))),Some((3,1,4,0,Some((6,0,2,12,None,None)),Some((7,0,2,14,None,None)))))))),Some((1,3,2,0,Some((2,2,2,0,Some((4,1,2,0,Some((8,0,2,16,None,None)),None)),None)),None)))
sum.count
// warning: there was one inliner warning; re-run with -Yinline-warnings for details
// res0: Long = 16
sum.upperBound
// res1: Double = 16.0
sum.lowerBound
// res2: Double = 0.0
sum.quantileBounds(.5)
// res3: (Double, Double) = (5.0,6.0)
sum.quantileBounds(.5)
// res4: (Double, Double) = (5.0,6.0)
sum.quantileBounds(.25)
// res5: (Double, Double) = (3.0,4.0)
sum.quantileBounds(.75)
// res6: (Double, Double) = (7.0,8.0)
Documentation Help
We’d love your help fleshing out this documentation! You can edit this page in your browser by clicking this link. These links might be helpful: