Class

com.twitter.algebird

TopPctCMSMonoid

Related Doc: package algebird

Permalink

class TopPctCMSMonoid[K] extends TopCMSMonoid[K]

Monoid for Top-% based TopCMS sketches.

Usage

The type K is the type of items you want to count. You must provide an implicit CMSHasher[K] for K, and Algebird ships with several such implicits for commonly used types such as Long and scala.BigInt.

If your type K is not supported out of the box, you have two options: 1) You provide a "translation" function to convert items of your (unsupported) type K to a supported type such as Double, and then use the contramap function of CMSHasher to create the required CMSHasher[K] for your type (see the documentation of CMSHasher for an example); 2) You implement a CMSHasher[K] from scratch, using the existing CMSHasher implementations as a starting point.

Note: Because Arrays in Scala/Java not have sane equals and hashCode implementations, you cannot safely use types such as Array[Byte]. Extra work is required for Arrays. For example, you may opt to convert Array[T] to a Seq[T] via toSeq, or you can provide appropriate wrapper classes. Algebird provides one such wrapper class, Bytes, to safely wrap an Array[Byte] for use with CMS.

K

The type used to identify the elements to be counted. For example, if you want to count the occurrence of user names, you could map each username to a unique numeric ID expressed as a Long, and then count the occurrences of those Longs with a CMS of type K=Long. Note that this mapping between the elements of your problem domain and their identifiers used for counting via CMS should be bijective. We require a CMSHasher context bound for K, see CMSHasher for available implicits that can be imported. Which type K should you pick in practice? For domains that have less than 2^64 unique elements, you'd typically use Long. For larger domains you can try scala.BigInt, for example.

Source
CountMinSketch.scala
Linear Supertypes
TopCMSMonoid[K], Monoid[TopCMS[K]], Semigroup[TopCMS[K]], Serializable, AnyRef, Any
Type Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TopPctCMSMonoid
  2. TopCMSMonoid
  3. Monoid
  4. Semigroup
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TopPctCMSMonoid(cms: CMS[K], heavyHittersPct: Double = 0.01)

    Permalink

    cms

    A CMS instance, which is used for the counting and the frequency estimation performed by this class.

    heavyHittersPct

    A threshold for finding heavy hitters, i.e., elements that appear at least (heavyHittersPct * totalCount) times in the stream.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def assertNotZero(v: TopCMS[K]): Unit

    Permalink
    Definition Classes
    Monoid
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def create(data: Seq[K]): TopCMS[K]

    Permalink

    Creates a sketch out of multiple items.

    Creates a sketch out of multiple items.

    Definition Classes
    TopCMSMonoid
  8. def create(item: K): TopCMS[K]

    Permalink

    Creates a sketch out of a single item.

    Creates a sketch out of a single item.

    Definition Classes
    TopCMSMonoid
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. def isNonZero(v: TopCMS[K]): Boolean

    Permalink
    Definition Classes
    Monoid
  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. def nonZeroOption(v: TopCMS[K]): Option[TopCMS[K]]

    Permalink
    Definition Classes
    Monoid
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. val params: TopCMSParams[K]

    Permalink
    Definition Classes
    TopCMSMonoid
  21. def plus(left: TopCMS[K], right: TopCMS[K]): TopCMS[K]

    Permalink

    Combines the two sketches.

    Combines the two sketches.

    The sketches must use the same hash functions.

    returns

    result of combining l and r

    Definition Classes
    TopCMSMonoidSemigroup
  22. def sum(sketches: TraversableOnce[TopCMS[K]]): TopCMS[K]

    Permalink
    Definition Classes
    TopCMSMonoidMonoid
  23. def sumOption(sketches: TraversableOnce[TopCMS[K]]): Option[TopCMS[K]]

    Permalink

    Returns an instance of T calculated by summing all instances in iter in one pass.

    Returns an instance of T calculated by summing all instances in iter in one pass. Returns None if iter is empty, else Some[T].

    returns

    None if iter is empty, else an option value containing the summed T

    Definition Classes
    TopCMSMonoidSemigroup
    Note

    Override if there is a faster way to compute this sum than iter.reduceLeftOption using plus.

  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  25. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. val zero: TopCMS[K]

    Permalink

    Returns the identity element of T for plus.

    Returns the identity element of T for plus.

    Definition Classes
    TopCMSMonoidMonoid

Inherited from TopCMSMonoid[K]

Inherited from Monoid[TopCMS[K]]

Inherited from Semigroup[TopCMS[K]]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped