Class/Object

com.twitter.algebird

CMS

Related Docs: object CMS | package algebird

Permalink

sealed abstract class CMS[K] extends Serializable with CMSCounting[K, CMS]

A Count-Min sketch data structure that allows for counting and frequency estimation of elements in a data stream.

Tip: If you also need to track heavy hitters ("Top N" problems), take a look at TopCMS.

Usage

This example demonstrates how to count Long elements with CMS, i.e. K=Long.

Note that the actual counting is always performed with a Long, regardless of your choice of K. That is, the counting table behind the scenes is backed by Long values (at least in the current implementation), and thus the returned frequency estimates are always instances of Approximate[Long].

K

The type used to identify the elements to be counted.

Source
CountMinSketch.scala
Example:
  1. // Creates a monoid for a CMS that can count `Long` elements.
    val cmsMonoid: CMSMonoid[Long] = {
      val eps = 0.001
      val delta = 1E-10
      val seed = 1
      CMS.monoid[Long](eps, delta, seed)
    }
    // Creates a CMS instance that has counted the element `1L`.
    val cms: CMS[Long] = cmsMonoid.create(1L)
    // Estimates the frequency of `1L`
    val estimate: Approximate[Long] = cms.frequency(1L)
Linear Supertypes
CMSCounting[K, CMS], Serializable, AnyRef, Any
Known Subclasses
Type Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CMS
  2. CMSCounting
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def +(item: K, count: Long): CMS[K]

    Permalink

    Counts the item count times and returns the result as a new sketch.

    Counts the item count times and returns the result as a new sketch.

    Definition Classes
    CMSCounting
  2. abstract def ++(other: CMS[K]): CMS[K]

    Permalink

    Returns a new sketch that is the combination of this sketch and the other sketch.

    Returns a new sketch that is the combination of this sketch and the other sketch.

    Definition Classes
    CMSCounting
  3. abstract def frequency(item: K): Approximate[Long]

    Permalink

    Returns an estimate of the total number of times this item has been seen in the stream so far.

    Returns an estimate of the total number of times this item has been seen in the stream so far. This estimate is an upper bound.

    It is always true that estimatedFrequency >= trueFrequency. With probability p >= 1 - delta, it also holds that estimatedFrequency <= trueFrequency + eps * totalCount.

    Definition Classes
    CMSCounting
  4. abstract def innerProduct(other: CMS[K]): Approximate[Long]

    Permalink

    Returns an estimate of the inner product against another data stream.

    Returns an estimate of the inner product against another data stream.

    In other words, let a_i denote the number of times element i has been seen in the data stream summarized by this CMS, and let b_i denote the same for the other CMS. Then this returns an estimate of <a, b> = \sum a_i b_i.

    Note: This can also be viewed as the join size between two relations.

    It is always true that actualInnerProduct <= estimatedInnerProduct. With probability p >= 1 - delta, it also holds that estimatedInnerProduct <= actualInnerProduct + eps * thisTotalCount * otherTotalCount.

    Definition Classes
    CMSCounting
  5. abstract def totalCount: Long

    Permalink

    Total number of elements counted (i.e.

    Total number of elements counted (i.e. seen in the data stream) so far.

    Definition Classes
    CMSCounting

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. def +(item: K): CMS[K]

    Permalink

    Counts the item and returns the result as a new sketch.

    Counts the item and returns the result as a new sketch.

    Definition Classes
    CMSCounting
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val delta: Double

    Permalink

    Returns the bound on the probability that a query estimate does NOT lie within some small interval (an interval that depends on eps) around the truth.

    Returns the bound on the probability that a query estimate does NOT lie within some small interval (an interval that depends on eps) around the truth.

    Definition Classes
    CMSCMSCounting
  8. def depth: Int

    Permalink

    Number of hash functions (also: number of rows in the counting table).

    Number of hash functions (also: number of rows in the counting table). This number is derived from delta.

    Definition Classes
    CMSCounting
  9. val eps: Double

    Permalink

    Returns the one-sided error bound on the error of each point query, i.e.

    Returns the one-sided error bound on the error of each point query, i.e. frequency estimate.

    Definition Classes
    CMSCMSCounting
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. def f1: Long

    Permalink

    The first frequency moment is the total number of elements in the stream.

    The first frequency moment is the total number of elements in the stream.

    Definition Classes
    CMSCounting
  13. def f2: Approximate[Long]

    Permalink

    The second frequency moment is \sum a_i^2, where a_i is the count of the i-th element.

    The second frequency moment is \sum a_i^2, where a_i is the count of the i-th element.

    Definition Classes
    CMSCMSCounting
  14. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  16. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  17. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  18. def maxExactCount: Int

    Permalink

    Number of exact counts a sparse CMS wants to keep.

    Number of exact counts a sparse CMS wants to keep. This number is derived from maxExactCountOpt.

    Definition Classes
    CMSCounting
  19. val maxExactCountOpt: Option[Int]

    Permalink

    An Option parameter about how many exact counts a sparse CMS wants to keep

    An Option parameter about how many exact counts a sparse CMS wants to keep

    Definition Classes
    CMSCMSCounting
  20. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. val params: CMSParams[K]

    Permalink
  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  25. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. def width: Int

    Permalink

    Number of counters per hash function (also: number of columns in the counting table).

    Number of counters per hash function (also: number of columns in the counting table). This number is derived from eps.

    Definition Classes
    CMSCounting

Inherited from CMSCounting[K, CMS]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped