com.twitter.summingbird

KeyedProducer

sealed trait KeyedProducer[P <: Platform[P], K, V] extends Producer[P, (K, V)]

This has the methods on Key-Value streams. The rule is: if you can easily express your logic on the keys and values independently, do it! This is how you communicate structure to Summingbird and it uses these hints to attempt the most efficient run of your code.

Source
Producer.scala
Linear Supertypes
Producer[P, (K, V)], AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. KeyedProducer
  2. Producer
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ++[U >: (K, V)](r: Producer[P, U]): Producer[P, U]

    Exactly the same as merge.

    Exactly the same as merge. Here by analogy with the scala.collections API

    Definition Classes
    Producer
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def collect[U](fn: PartialFunction[(K, V), U]): Producer[P, U]

    Prefer to flatMap for transforming a subset of items like optionMap but convenient with case syntax in scala prod.

    Prefer to flatMap for transforming a subset of items like optionMap but convenient with case syntax in scala prod.collect { case x if fn(x) => g(x) }

    Definition Classes
    Producer
  10. def collectKeys[K2](pf: PartialFunction[K, K2]): KeyedProducer[P, K2, V]

    Builds a new KeyedProvider by applying a partial function to keys of elements of this one on which the function is defined.

  11. def collectValues[V2](pf: PartialFunction[V, V2]): KeyedProducer[P, K, V2]

    Builds a new KeyedProvider by applying a partial function to values of elements of this one on which the function is defined.

  12. def either[U](other: Producer[P, U]): Producer[P, Either[(K, V), U]]

    Merge a different type of Producer into a single stream

    Merge a different type of Producer into a single stream

    Definition Classes
    Producer
  13. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  15. def filter(fn: ((K, V)) ⇒ Boolean): Producer[P, (K, V)]

    Keep only the items that satisfy the fn

    Keep only the items that satisfy the fn

    Definition Classes
    Producer
  16. def filterKeys(pred: (K) ⇒ Boolean): KeyedProducer[P, K, V]

    Prefer this to filter or flatMap/flatMapKeys if you are filtering.

    Prefer this to filter or flatMap/flatMapKeys if you are filtering. This may be optimized in the future with an intrinsic node in the Producer graph. We know this never increases the number of items, and we know it does not rekey the partition.

  17. def filterValues(pred: (V) ⇒ Boolean): KeyedProducer[P, K, V]

    Prefer this to filter or flatMap/flatMapValues if you are filtering.

    Prefer this to filter or flatMap/flatMapValues if you are filtering. This may be optimized in the future with an intrinsic node in the Producer graph. We know this never increases the number of items, and we know it does not rekey the partition.

  18. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def flatMap[U](fn: ((K, V)) ⇒ TraversableOnce[U]): Producer[P, U]

    Only use this function if you may return more than 1 item sometimes.

    Only use this function if you may return more than 1 item sometimes. otherwise use collect or optionMap, which can be pushed up the graph

    Definition Classes
    Producer
  20. def flatMapKeys[K2](fn: (K) ⇒ TraversableOnce[K2]): KeyedProducer[P, K2, V]

    Prefer to call this method to flatMap if you are expanding only keys.

    Prefer to call this method to flatMap if you are expanding only keys. It may trigger optimizations, that can significantly improve performance

  21. def flatMapValues[U](fn: (V) ⇒ TraversableOnce[U]): KeyedProducer[P, K, U]

    Prefer this to a raw map as this may be optimized to avoid a key reshuffle

  22. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  23. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  24. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  25. def keys: Producer[P, K]

    Return just the keys

  26. def leftJoin[RightV](stream: KeyedProducer[P, K, RightV], buffer: leftJoin.buffer._11.type.Service[K, RightV] with leftJoin.buffer._11.type.Sink[(K, RightV)] forSome {val _11: P}): KeyedProducer[P, K, (V, Option[RightV])]

    Do a windowed join on a stream.

    Do a windowed join on a stream. You need to provide a sink that manages the buffer. Offline, this might be a bounded HDFS partition. Online it might be a cache that evicts after a period of time.

  27. def leftJoin[RightV](service: P.Service[K, RightV]): KeyedProducer[P, K, (V, Option[RightV])]

    Do a lookup/join on a service.

    Do a lookup/join on a service. This is how you trigger async computation is summingbird. Any remote API call, DB lookup, etc... happens here

  28. def lookup[U >: (K, V), V](service: P.Service[U, V]): KeyedProducer[P, U, Option[V]]

    This is identical to a certain leftJoin: map((_, ())).

    This is identical to a certain leftJoin: map((_, ())).leftJoin(srv).mapValues{case (_, v) => v} Useful when you are looking up values from say a stream of inputs, such as IDs.

    Definition Classes
    Producer
  29. def map[U](fn: ((K, V)) ⇒ U): Producer[P, U]

    Map each item to a new value

    Map each item to a new value

    Definition Classes
    Producer
  30. def mapKeys[K2](fn: (K) ⇒ K2): KeyedProducer[P, K2, V]

    Prefer to call this method to flatMap/map if you are mapping only keys.

    Prefer to call this method to flatMap/map if you are mapping only keys. It may trigger optimizations, that can significantly improve performance

  31. def mapValues[U](fn: (V) ⇒ U): KeyedProducer[P, K, U]

    Prefer this to a raw map as this may be optimized to avoid a key reshuffle

  32. def merge[U >: (K, V)](r: Producer[P, U]): Producer[P, U]

    Combine the output into one Producer

    Combine the output into one Producer

    Definition Classes
    Producer
  33. def name(id: String): Producer[P, (K, V)]

    Naming a node is so that you may give Options for that node that may change the run-time performance of the job (parameter tuning, etc.

    Naming a node is so that you may give Options for that node that may change the run-time performance of the job (parameter tuning, etc...)

    Definition Classes
    Producer
  34. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  35. final def notify(): Unit

    Definition Classes
    AnyRef
  36. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  37. def optionMap[U](fn: ((K, V)) ⇒ Option[U]): Producer[P, U]

    Prefer this or collect to flatMap if you are always emitting 0 or 1 items

    Prefer this or collect to flatMap if you are always emitting 0 or 1 items

    Definition Classes
    Producer
  38. def sumByKey(store: P.Store[K, V])(implicit semigroup: Semigroup[V]): Summer[P, K, V]

    emits a KeyedProducer with a value that is the store value, just BEFORE a merge, and the right is a new delta (which may include, depending on the Platform, Store and Options, more than a single aggregated item).

    emits a KeyedProducer with a value that is the store value, just BEFORE a merge, and the right is a new delta (which may include, depending on the Platform, Store and Options, more than a single aggregated item).

    so, the sequence out of this has the property that: (v0, vdelta1), (v0 + vdelta1, vdelta2), (v0 + vdelta1 + vdelta2, vdelta3), ...

  39. def swap: KeyedProducer[P, V, K]

    Exchange values for keys

  40. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  41. def toString(): String

    Definition Classes
    AnyRef → Any
  42. def values: Producer[P, V]

    Keep only the values

  43. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. def write[U >: (K, V)](sink: P.Sink[U]): TailProducer[P, (K, V)]

    Cause some side effect on the sink, but pass through the values so they can be consumed downstream

    Cause some side effect on the sink, but pass through the values so they can be consumed downstream

    Definition Classes
    Producer

Inherited from Producer[P, (K, V)]

Inherited from AnyRef

Inherited from Any

Ungrouped