Class

com.twitter.scalding.typed

IdentityReduce

Related Doc: package typed

Permalink

case class IdentityReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with Grouped[K, V1] with Product with Serializable

Source
Grouped.scala
Linear Supertypes
Serializable, Product, Equals, Grouped[K, V1], WithDescription[Grouped[K, V1]], WithReducers[Grouped[K, V1]], Sortable[V1, [+x]SortedGrouped[K, x] with Reversable[SortedGrouped[K, x]]], HashJoinable[K, V1], CoGroupable[K, V1], HasDescription, HasReducers, KeyedListLike[K, V1, UnsortedGrouped], Serializable, ReduceStep[K, V1], KeyedPipe[K], AnyRef, Any
Type Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. IdentityReduce
  2. Serializable
  3. Product
  4. Equals
  5. Grouped
  6. WithDescription
  7. WithReducers
  8. Sortable
  9. HashJoinable
  10. CoGroupable
  11. HasDescription
  12. HasReducers
  13. KeyedListLike
  14. Serializable
  15. ReduceStep
  16. KeyedPipe
  17. AnyRef
  18. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new IdentityReduce(keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def aggregate[B, C](agg: Aggregator[V1, B, C]): UnsortedGrouped[K, C]

    Permalink

    Use Algebird Aggregator to do the reduction

    Use Algebird Aggregator to do the reduction

    Definition Classes
    KeyedListLike
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def bufferedTake(n: Int): UnsortedGrouped[K, V1]

    Permalink

    This does the partial heap sort followed by take in memory on the mappers before sending to the mappers.

    This does the partial heap sort followed by take in memory on the mappers before sending to the mappers. This is a big help if there are relatively few keys and n is relatively small.

    Definition Classes
    IdentityReduceKeyedListLike
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def cogroup[R1, R2](smaller: CoGroupable[K, R1])(fn: (K, Iterator[V1], Iterable[R1]) ⇒ Iterator[R2]): CoGrouped[K, R2]

    Permalink

    Smaller is about average values/key not total size (that does not matter, but is clearly related).

    Smaller is about average values/key not total size (that does not matter, but is clearly related).

    Note that from the type signature we see that the right side is iterated (or may be) over and over, but the left side is not. That means that you want the side with fewer values per key on the right. If both sides are similar, no need to worry. If one side is a one-to-one mapping, that should be the "smaller" side.

    Definition Classes
    CoGroupable
  9. def count(fn: (V1) ⇒ Boolean): UnsortedGrouped[K, Long]

    Permalink

    For each key, count the number of values that satisfy a predicate

    For each key, count the number of values that satisfy a predicate

    Definition Classes
    KeyedListLike
  10. val descriptions: Seq[String]

    Permalink
    Definition Classes
    IdentityReduceHasDescription
  11. def distinctSize: UnsortedGrouped[K, Long]

    Permalink

    For each key, give the number of unique values.

    For each key, give the number of unique values. WARNING: May OOM. This assumes the values for each key can fit in memory.

    Definition Classes
    KeyedListLike
  12. def distinctValues: UnsortedGrouped[K, V1]

    Permalink

    For each key, remove duplicate values.

    For each key, remove duplicate values. WARNING: May OOM. This assumes the values for each key can fit in memory.

    Definition Classes
    KeyedListLike
  13. def drop(n: Int): UnsortedGrouped[K, V1]

    Permalink

    For each key, selects all elements except first n ones.

    For each key, selects all elements except first n ones.

    Definition Classes
    KeyedListLike
  14. def dropWhile(p: (V1) ⇒ Boolean): UnsortedGrouped[K, V1]

    Permalink

    For each key, Drops longest prefix of elements that satisfy the given predicate.

    For each key, Drops longest prefix of elements that satisfy the given predicate.

    Definition Classes
    KeyedListLike
  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def filter(fn: ((K, V1)) ⇒ Boolean): UnsortedGrouped[K, V1]

    Permalink

    .filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.

    .filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.

    Definition Classes
    KeyedListLike
  17. def filterKeys(fn: (K) ⇒ Boolean): UnsortedIdentityReduce[K, V1]

    Permalink

    filter keys on a predicate.

    filter keys on a predicate. More efficient than filter if you are only looking at keys

    Definition Classes
    IdentityReduceKeyedListLike
  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def flatMapValues[V](fn: (V1) ⇒ TraversableOnce[V]): UnsortedGrouped[K, V]

    Permalink

    Similar to mapValues, but works like flatMap, returning a collection of outputs for each value input.

    Similar to mapValues, but works like flatMap, returning a collection of outputs for each value input.

    Definition Classes
    KeyedListLike
  20. def flattenValues[U](implicit ev: <:<[V1, TraversableOnce[U]]): UnsortedGrouped[K, U]

    Permalink

    flatten the values Useful after sortedTake, for instance

    flatten the values Useful after sortedTake, for instance

    Definition Classes
    KeyedListLike
  21. def fold[V](f: Fold[V1, V]): UnsortedGrouped[K, V]

    Permalink

    Folds are composable aggregations that make one pass over the data.

    Folds are composable aggregations that make one pass over the data. If you need to do several custom folds over the same data, use Fold.join and this method

    Definition Classes
    KeyedListLike
  22. def foldLeft[B](z: B)(fn: (B, V1) ⇒ B): UnsortedGrouped[K, B]

    Permalink

    For each key, fold the values.

    For each key, fold the values. see scala.collection.Iterable.foldLeft

    Definition Classes
    KeyedListLike
  23. def foldWithKey[V](fn: (K) ⇒ Fold[V1, V]): UnsortedGrouped[K, V]

    Permalink

    If the fold depends on the key, use this method to construct the fold for each key

    If the fold depends on the key, use this method to construct the fold for each key

    Definition Classes
    KeyedListLike
  24. def forall(fn: (V1) ⇒ Boolean): UnsortedGrouped[K, Boolean]

    Permalink

    For each key, check to see if a predicate is true for all Values

    For each key, check to see if a predicate is true for all Values

    Definition Classes
    KeyedListLike
  25. def forceToReducers: UnsortedGrouped[K, V1]

    Permalink

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle.

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle. For expert tuning

    Definition Classes
    KeyedListLike
  26. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  27. def groupOp[V2](gb: (GroupBuilder) ⇒ GroupBuilder): TypedPipe[(K, V2)]

    Permalink
    Attributes
    protected
    Definition Classes
    ReduceStep
  28. def groupOpWithValueSort[V2](valueSort: Option[Ordering[_ >: V1]])(gb: (GroupBuilder) ⇒ GroupBuilder): TypedPipe[(K, V2)]

    Permalink
    Attributes
    protected
    Definition Classes
    ReduceStep
  29. def hashCogroupOn[V1, R](mapside: TypedPipe[(K, V1)])(joiner: (K, V1, Iterable[V1]) ⇒ Iterator[R]): TypedPipe[(K, R)]

    Permalink

    This fully replicates this entire Grouped to the argument: mapside.

    This fully replicates this entire Grouped to the argument: mapside. This means that we never see the case where the key is absent in the pipe. This means implementing a right-join (from the pipe) is impossible. Note, there is no reduce-phase in this operation. The next issue is that obviously, unlike a cogroup, for a fixed key, each joiner will NOT See all the tuples with those keys. This is because the keys on the left are distributed across many machines See hashjoin: http://docs.cascading.org/cascading/2.0/javadoc/cascading/pipe/HashJoin.html

    Definition Classes
    HashJoinable
  30. def head: UnsortedGrouped[K, V1]

    Permalink

    Use this to get the first value encountered.

    Use this to get the first value encountered. prefer this to take(1).

    Definition Classes
    KeyedListLike
  31. def inputs: List[TypedPipe[(K, Any)]]

    Permalink

    A HashJoinable has a single input into to the cogroup

    A HashJoinable has a single input into to the cogroup

    Definition Classes
    HashJoinableCoGroupable
  32. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  33. def join[W](smaller: CoGroupable[K, W]): CoGrouped[K, (V1, W)]

    Permalink
    Definition Classes
    CoGroupable
  34. def joinFunction: (Any, Iterator[Tuple], Seq[Iterable[Tuple]]) ⇒ Iterator[V1]

    Permalink

    This is just an identity that casts the result to V1

    This is just an identity that casts the result to V1

    Definition Classes
    IdentityReduceCoGroupable
  35. val keyOrdering: Ordering[K]

    Permalink
    Definition Classes
    IdentityReduceCoGroupableKeyedPipe
  36. def keys: TypedPipe[K]

    Permalink

    Convert to a TypedPipe and only keep the keys

    Convert to a TypedPipe and only keep the keys

    Definition Classes
    KeyedListLike
  37. def leftJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (V1, Option[W])]

    Permalink
    Definition Classes
    CoGroupable
  38. def mapGroup[V3](fn: (K, Iterator[V1]) ⇒ Iterator[V3]): IteratorMappedReduce[K, V1, V3]

    Permalink

    Operate on an Iterator[T] of all the values for each key at one time.

    Operate on an Iterator[T] of all the values for each key at one time. Prefer this to toList, when you can avoid accumulating the whole list in memory. Prefer sum, which is partially executed map-side by default. Use mapValueStream when you don't care about the key for the group.

    Iterator is always Non-empty. Note, any key that has all values removed will not appear in subsequent .mapGroup/mapValueStream

    Definition Classes
    IdentityReduceKeyedListLike
  39. def mapValueStream[V](smfn: (Iterator[V1]) ⇒ Iterator[V]): UnsortedGrouped[K, V]

    Permalink

    Use this when you don't care about the key for the group, otherwise use mapGroup

    Use this when you don't care about the key for the group, otherwise use mapGroup

    Definition Classes
    KeyedListLike
  40. def mapValues[V2](fn: (V1) ⇒ V2): UnsortedIdentityReduce[K, V2]

    Permalink

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once.

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once. An unoptimized implementation is: mapValueStream { _.map { fn } } but for Grouped we can avoid resorting to mapValueStream

    Definition Classes
    IdentityReduceKeyedListLike
  41. val mapped: TypedPipe[(K, V1)]

    Permalink

    Note, this satisfies KeyedPipe.mapped: TypedPipe[(K, Any)]

    Note, this satisfies KeyedPipe.mapped: TypedPipe[(K, Any)]

    Definition Classes
    IdentityReduceReduceStepKeyedPipe
  42. def max[B >: V1](implicit cmp: Ordering[B]): UnsortedGrouped[K, V1]

    Permalink

    For each key, give the maximum value

    For each key, give the maximum value

    Definition Classes
    KeyedListLike
  43. def maxBy[B](fn: (V1) ⇒ B)(implicit cmp: Ordering[B]): UnsortedGrouped[K, V1]

    Permalink

    For each key, give the maximum value by some function

    For each key, give the maximum value by some function

    Definition Classes
    KeyedListLike
  44. def min[B >: V1](implicit cmp: Ordering[B]): UnsortedGrouped[K, V1]

    Permalink

    For each key, give the minimum value

    For each key, give the minimum value

    Definition Classes
    KeyedListLike
  45. def minBy[B](fn: (V1) ⇒ B)(implicit cmp: Ordering[B]): UnsortedGrouped[K, V1]

    Permalink

    For each key, give the minimum value by some function

    For each key, give the minimum value by some function

    Definition Classes
    KeyedListLike
  46. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  47. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  48. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  49. def outerJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[V1], Option[W])]

    Permalink
    Definition Classes
    CoGroupable
  50. def product[U >: V1](implicit ring: Ring[U]): UnsortedGrouped[K, U]

    Permalink

    For each key, Return the product of all the values

    For each key, Return the product of all the values

    Definition Classes
    KeyedListLike
  51. def reduce[U >: V1](fn: (U, U) ⇒ U): UnsortedGrouped[K, U]

    Permalink

    reduce with fn which must be associative and commutative.

    reduce with fn which must be associative and commutative. Like the above this can be optimized in some Grouped cases. If you don't have a commutative operator, use reduceLeft

    Definition Classes
    KeyedListLike
  52. def reduceLeft[U >: V1](fn: (U, U) ⇒ U): UnsortedGrouped[K, U]

    Permalink

    Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function.

    Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function. fn need not be associative and/or commutative. Makes sense when you want to reduce, but in a particular sorted order. the old value comes in on the left.

    Definition Classes
    KeyedListLike
  53. val reducers: Option[Int]

    Permalink
    Definition Classes
    IdentityReduceHasReducers
  54. def rightJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[V1], W)]

    Permalink
    Definition Classes
    CoGroupable
  55. def scanLeft[B](z: B)(fn: (B, V1) ⇒ B): UnsortedGrouped[K, B]

    Permalink

    For each key, scanLeft the values.

    For each key, scanLeft the values. see scala.collection.Iterable.scanLeft

    Definition Classes
    KeyedListLike
  56. def size: UnsortedGrouped[K, Long]

    Permalink

    For each key, give the number of values

    For each key, give the number of values

    Definition Classes
    KeyedListLike
  57. def sortBy[B](fn: (V1) ⇒ B)(implicit arg0: Ordering[B]): SortedGrouped[K, V1] with Reversable[SortedGrouped[K, V1]]

    Permalink
    Definition Classes
    Sortable
  58. def sortWith(lt: (V1, V1) ⇒ Boolean): SortedGrouped[K, V1] with Reversable[SortedGrouped[K, V1]]

    Permalink
    Definition Classes
    Sortable
  59. def sortWithTake[U >: V1](k: Int)(lessThan: (U, U) ⇒ Boolean): UnsortedGrouped[K, Seq[V1]]

    Permalink

    Like the above, but with a less than operation for the ordering

    Like the above, but with a less than operation for the ordering

    Definition Classes
    KeyedListLike
  60. def sorted[B >: V1](implicit ord: Ordering[B]): SortedGrouped[K, V1] with Reversable[SortedGrouped[K, V1]]

    Permalink
    Definition Classes
    Sortable
  61. def sortedReverseTake(k: Int)(implicit ord: Ordering[_ >: V1]): UnsortedGrouped[K, Seq[V1]]

    Permalink

    Take the largest k things according to the implicit ordering.

    Take the largest k things according to the implicit ordering. Useful for top-k without having to call ord.reverse

    Definition Classes
    KeyedListLike
  62. def sortedTake(k: Int)(implicit ord: Ordering[_ >: V1]): UnsortedGrouped[K, Seq[V1]]

    Permalink

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result.

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result. This is faster than using .take if k * (number of Keys) is small enough to fit in memory.

    Definition Classes
    KeyedListLike
  63. def sum[U >: V1](implicit sg: Semigroup[U]): UnsortedGrouped[K, U]

    Permalink

    Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative.

    Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative. If you don't want that, define an ordering on the Values, sort or .forceToReducers.

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce

    Definition Classes
    IdentityReduceKeyedListLike
  64. def sumLeft[U >: V1](implicit sg: Semigroup[U]): UnsortedGrouped[K, U]

    Permalink

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft

    Definition Classes
    KeyedListLike
  65. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  66. def take(n: Int): UnsortedGrouped[K, V1]

    Permalink

    For each key, Selects first n elements.

    For each key, Selects first n elements. Don't use this if n == 1, head is faster in that case.

    Definition Classes
    KeyedListLike
  67. def takeWhile(p: (V1) ⇒ Boolean): UnsortedGrouped[K, V1]

    Permalink

    For each key, Takes longest prefix of elements that satisfy the given predicate.

    For each key, Takes longest prefix of elements that satisfy the given predicate.

    Definition Classes
    KeyedListLike
  68. def toList: UnsortedGrouped[K, List[V1]]

    Permalink

    AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List.

    AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List. WARNING: May OOM Only use this method if you are sure all the values will fit in memory. You really should try to ask why you need all the values, and if you want to do some custom reduction, do it in mapGroup or mapValueStream

    Definition Classes
    KeyedListLike
  69. def toSet[U >: V1]: UnsortedGrouped[K, Set[U]]

    Permalink

    AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM.

    AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM. See toList. Note that toSet needs to be parameterized even though toList does not. This is because List is covariant in its type parameter in the scala API, but Set is invariant. See: http://stackoverflow.com/questions/676615/why-is-scalas-immutable-set-not-covariant-in-its-type

    Definition Classes
    KeyedListLike
  70. lazy val toTypedPipe: TypedPipe[(K, V1)]

    Permalink

    End of the operations on values.

    End of the operations on values. From this point on the keyed structure is lost and another shuffle is generally required to reconstruct it

    Definition Classes
    IdentityReduceKeyedListLike
  71. def values: TypedPipe[V1]

    Permalink

    Convert to a TypedPipe and only keep the values

    Convert to a TypedPipe and only keep the values

    Definition Classes
    KeyedListLike
  72. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  73. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. def withDescription(description: String): IdentityReduce[K, V1]

    Permalink

    never mutates this, instead returns a new item.

    never mutates this, instead returns a new item.

    Definition Classes
    IdentityReduceWithDescription
  76. def withDescription(descriptionOpt: Option[String]): Grouped[K, V1]

    Permalink
    Definition Classes
    WithDescription
  77. def withReducers(red: Int): IdentityReduce[K, V1]

    Permalink

    never mutates this, instead returns a new item.

    never mutates this, instead returns a new item.

    Definition Classes
    IdentityReduceWithReducers
  78. def withSortOrdering[U >: V1](so: Ordering[U]): IdentityValueSortedReduce[K, V1]

    Permalink
    Definition Classes
    IdentityReduceSortable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Grouped[K, V1]

Inherited from WithDescription[Grouped[K, V1]]

Inherited from WithReducers[Grouped[K, V1]]

Inherited from Sortable[V1, [+x]SortedGrouped[K, x] with Reversable[SortedGrouped[K, x]]]

Inherited from HashJoinable[K, V1]

Inherited from CoGroupable[K, V1]

Inherited from HasDescription

Inherited from HasReducers

Inherited from KeyedListLike[K, V1, UnsortedGrouped]

Inherited from Serializable

Inherited from ReduceStep[K, V1]

Inherited from KeyedPipe[K]

Inherited from AnyRef

Inherited from Any

Ungrouped