CoGrouped

Abstract Value Members

abstract def descriptions: Seq[String]

Definition Classes
HasDescription
abstract def inputs: List[TypedPipe[(K, Any)]]

This is the list of mapped pipes, just before the (reducing) joinFunction is applied
This is the list of mapped pipes, just before the (reducing) joinFunction is applied

Definition Classes
CoGroupable
abstract def joinFunction: (K, Iterator[Tuple], Seq[Iterable[Tuple]]) ⇒ Iterator[R]

This function is not type-safe for others to call, but it should never have an error.
This function is not type-safe for others to call, but it should never have an error. By construction, we never call it with incorrect types. It would be preferable to have stronger type safety here, but unclear how to achieve, and since it is an internal function, not clear it would actually help anyone for it to be type-safe

Attributes
protected
Definition Classes
CoGroupable
abstract def keyOrdering: Ordering[K]

Definition Classes
CoGroupable
abstract def reducers: Option[Int]

Definition Classes
HasReducers

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def aggregate[B, C](agg: Aggregator[R, B, C]): CoGrouped[K, C]

Use Algebird Aggregator to do the reduction
Use Algebird Aggregator to do the reduction

Definition Classes
KeyedListLike
final def asInstanceOf[T0]: T0

Definition Classes
Any
def bufferedTake(n: Int): CoGrouped[K, R]

It seems complex to push a take up to the mappers before a general join.
It seems complex to push a take up to the mappers before a general join. For some cases (inner join), we could take at most n from each TypedPipe, but it is not clear how to generalize that for general cogrouping functions. For now, just do a normal take.

Definition Classes
CoGrouped → KeyedListLike
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def cogroup[R1, R2](smaller: CoGroupable[K, R1])(fn: (K, Iterator[R], Iterable[R1]) ⇒ Iterator[R2]): CoGrouped[K, R2]

Smaller is about average values/key not total size (that does not matter, but is clearly related).
Smaller is about average values/key not total size (that does not matter, but is clearly related).
Note that from the type signature we see that the right side is iterated (or may be) over and over, but the left side is not. That means that you want the side with fewer values per key on the right. If both sides are similar, no need to worry. If one side is a one-to-one mapping, that should be the "smaller" side.

Definition Classes
CoGroupable
def count(fn: (R) ⇒ Boolean): CoGrouped[K, Long]

For each key, count the number of values that satisfy a predicate
For each key, count the number of values that satisfy a predicate

Definition Classes
KeyedListLike
def distinctSize: CoGrouped[K, Long]

For each key, give the number of unique values.
For each key, give the number of unique values. WARNING: May OOM. This assumes the values for each key can fit in memory.

Definition Classes
KeyedListLike
def distinctValues: CoGrouped[K, R]

For each key, remove duplicate values.
For each key, remove duplicate values. WARNING: May OOM. This assumes the values for each key can fit in memory.

Definition Classes
KeyedListLike
def drop(n: Int): CoGrouped[K, R]

For each key, selects all elements except first n ones.
For each key, selects all elements except first n ones.

Definition Classes
KeyedListLike
def dropWhile(p: (R) ⇒ Boolean): CoGrouped[K, R]

For each key, Drops longest prefix of elements that satisfy the given predicate.
For each key, Drops longest prefix of elements that satisfy the given predicate.

Definition Classes
KeyedListLike
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filter(fn: ((K, R)) ⇒ Boolean): CoGrouped[K, R]

.filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.
.filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.

Definition Classes
KeyedListLike
def filterKeys(fn: (K) ⇒ Boolean): CoGrouped[K, R]

filter keys on a predicate.
filter keys on a predicate. More efficient than filter if you are only looking at keys

Definition Classes
CoGrouped → KeyedListLike
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def flatMapValues[V](fn: (R) ⇒ TraversableOnce[V]): CoGrouped[K, V]

Similar to mapValues, but works like flatMap, returning a collection of outputs for each value input.
Similar to mapValues, but works like flatMap, returning a collection of outputs for each value input.

Definition Classes
KeyedListLike
def flattenValues[U](implicit ev: <:<[R, TraversableOnce[U]]): CoGrouped[K, U]

flatten the values Useful after sortedTake, for instance
flatten the values Useful after sortedTake, for instance

Definition Classes
KeyedListLike
def fold[V](f: Fold[R, V]): CoGrouped[K, V]

Folds are composable aggregations that make one pass over the data.
Folds are composable aggregations that make one pass over the data. If you need to do several custom folds over the same data, use Fold.join and this method

Definition Classes
KeyedListLike
def foldLeft[B](z: B)(fn: (B, R) ⇒ B): CoGrouped[K, B]

For each key, fold the values.
For each key, fold the values. see scala.collection.Iterable.foldLeft

Definition Classes
KeyedListLike
def foldWithKey[V](fn: (K) ⇒ Fold[R, V]): CoGrouped[K, V]

If the fold depends on the key, use this method to construct the fold for each key
If the fold depends on the key, use this method to construct the fold for each key

Definition Classes
KeyedListLike
def forall(fn: (R) ⇒ Boolean): CoGrouped[K, Boolean]

For each key, check to see if a predicate is true for all Values
For each key, check to see if a predicate is true for all Values

Definition Classes
KeyedListLike
def forceToReducers: CoGrouped[K, R]

This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle.
This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle. For expert tuning

Definition Classes
KeyedListLike
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def head: CoGrouped[K, R]

Use this to get the first value encountered.
Use this to get the first value encountered. prefer this to take(1).

Definition Classes
KeyedListLike
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def join[W](smaller: CoGroupable[K, W]): CoGrouped[K, (R, W)]

Definition Classes
CoGroupable
def keys: TypedPipe[K]

Convert to a TypedPipe and only keep the keys
Convert to a TypedPipe and only keep the keys

Definition Classes
KeyedListLike
def leftJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (R, Option[W])]

Definition Classes
CoGroupable
def mapGroup[R1](fn: (K, Iterator[R]) ⇒ Iterator[R1]): CoGrouped[K, R1]

Operate on an Iterator[T] of all the values for each key at one time.
Operate on an Iterator[T] of all the values for each key at one time. Prefer this to toList, when you can avoid accumulating the whole list in memory. Prefer sum, which is partially executed map-side by default. Use mapValueStream when you don't care about the key for the group.
Iterator is always Non-empty. Note, any key that has all values removed will not appear in subsequent .mapGroup/mapValueStream

Definition Classes
CoGrouped → KeyedListLike
def mapValueStream[V](smfn: (Iterator[R]) ⇒ Iterator[V]): CoGrouped[K, V]

Use this when you don't care about the key for the group, otherwise use mapGroup
Use this when you don't care about the key for the group, otherwise use mapGroup

Definition Classes
KeyedListLike
def mapValues[V](fn: (R) ⇒ V): CoGrouped[K, V]

This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once.
This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once. An unoptimized implementation is: mapValueStream { _.map { fn } } but for Grouped we can avoid resorting to mapValueStream

Definition Classes
KeyedListLike
def max[B >: R](implicit cmp: Ordering[B]): CoGrouped[K, R]

For each key, give the maximum value
For each key, give the maximum value

Definition Classes
KeyedListLike
def maxBy[B](fn: (R) ⇒ B)(implicit cmp: Ordering[B]): CoGrouped[K, R]

For each key, give the maximum value by some function
For each key, give the maximum value by some function

Definition Classes
KeyedListLike
def min[B >: R](implicit cmp: Ordering[B]): CoGrouped[K, R]

For each key, give the minimum value
For each key, give the minimum value

Definition Classes
KeyedListLike
def minBy[B](fn: (R) ⇒ B)(implicit cmp: Ordering[B]): CoGrouped[K, R]

For each key, give the minimum value by some function
For each key, give the minimum value by some function

Definition Classes
KeyedListLike
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def outerJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[R], Option[W])]

Definition Classes
CoGroupable
def product[U >: R](implicit ring: Ring[U]): CoGrouped[K, U]

For each key, Return the product of all the values
For each key, Return the product of all the values

Definition Classes
KeyedListLike
def reduce[U >: R](fn: (U, U) ⇒ U): CoGrouped[K, U]

reduce with fn which must be associative and commutative.
reduce with fn which must be associative and commutative. Like the above this can be optimized in some Grouped cases. If you don't have a commutative operator, use reduceLeft

Definition Classes
KeyedListLike
def reduceLeft[U >: R](fn: (U, U) ⇒ U): CoGrouped[K, U]

Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function.
Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function. fn need not be associative and/or commutative. Makes sense when you want to reduce, but in a particular sorted order. the old value comes in on the left.

Definition Classes
KeyedListLike
def rightJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[R], W)]

Definition Classes
CoGroupable
def scanLeft[B](z: B)(fn: (B, R) ⇒ B): CoGrouped[K, B]

For each key, scanLeft the values.
For each key, scanLeft the values. see scala.collection.Iterable.scanLeft

Definition Classes
KeyedListLike
def size: CoGrouped[K, Long]

For each key, give the number of values
For each key, give the number of values

Definition Classes
KeyedListLike
def sortWithTake[U >: R](k: Int)(lessThan: (U, U) ⇒ Boolean): CoGrouped[K, Seq[R]]

Like the above, but with a less than operation for the ordering
Like the above, but with a less than operation for the ordering

Definition Classes
KeyedListLike
def sortedReverseTake(k: Int)(implicit ord: Ordering[_ >: R]): CoGrouped[K, Seq[R]]

Take the largest k things according to the implicit ordering.
Take the largest k things according to the implicit ordering. Useful for top-k without having to call ord.reverse

Definition Classes
KeyedListLike
def sortedTake(k: Int)(implicit ord: Ordering[_ >: R]): CoGrouped[K, Seq[R]]

This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result.
This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result. This is faster than using .take if k * (number of Keys) is small enough to fit in memory.

Definition Classes
KeyedListLike
def sum[U >: R](implicit sg: Semigroup[U]): CoGrouped[K, U]

Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative.
Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative. If you don't want that, define an ordering on the Values, sort or .forceToReducers.
Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce

Definition Classes
KeyedListLike
def sumLeft[U >: R](implicit sg: Semigroup[U]): CoGrouped[K, U]

Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft
Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft

Definition Classes
KeyedListLike
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def take(n: Int): CoGrouped[K, R]

For each key, Selects first n elements.
For each key, Selects first n elements. Don't use this if n == 1, head is faster in that case.

Definition Classes
KeyedListLike
def takeWhile(p: (R) ⇒ Boolean): CoGrouped[K, R]

For each key, Takes longest prefix of elements that satisfy the given predicate.
For each key, Takes longest prefix of elements that satisfy the given predicate.

Definition Classes
KeyedListLike
def toList: CoGrouped[K, List[R]]

AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List.
AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List. WARNING: May OOM Only use this method if you are sure all the values will fit in memory. You really should try to ask why you need all the values, and if you want to do some custom reduction, do it in mapGroup or mapValueStream

Definition Classes
KeyedListLike
def toSet[U >: R]: CoGrouped[K, Set[U]]

AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM.
AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM. See toList. Note that toSet needs to be parameterized even though toList does not. This is because List is covariant in its type parameter in the scala API, but Set is invariant. See: http://stackoverflow.com/questions/676615/why-is-scalas-immutable-set-not-covariant-in-its-type

Definition Classes
KeyedListLike
def toString(): String

Definition Classes
AnyRef → Any
lazy val toTypedPipe: TypedPipe[(K, R)]

End of the operations on values.
End of the operations on values. From this point on the keyed structure is lost and another shuffle is generally required to reconstruct it

Definition Classes
CoGrouped → KeyedListLike
def values: TypedPipe[R]

Convert to a TypedPipe and only keep the values
Convert to a TypedPipe and only keep the values

Definition Classes
KeyedListLike
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def withDescription(description: String): CoGrouped[K, R]

never mutates this, instead returns a new item.
never mutates this, instead returns a new item.

Definition Classes
CoGrouped → WithDescription
def withDescription(descriptionOpt: Option[String]): CoGrouped[K, R]

Definition Classes
WithDescription
def withReducers(reds: Int): CoGrouped[K, R] { def reducers: Some[Int] }

never mutates this, instead returns a new item.
never mutates this, instead returns a new item.

Definition Classes
CoGrouped → WithReducers

Related Docs: object CoGrouped | package typed

trait CoGrouped[K, +R] extends KeyedListLike[K, R, CoGrouped] with CoGroupable[K, R] with WithReducers[CoGrouped[K, R]] with WithDescription[CoGrouped[K, R]]

Abstract Value Members

abstract def descriptions: Seq[String]

abstract def inputs: List[TypedPipe[(K, Any)]]

abstract def joinFunction: (K, Iterator[Tuple], Seq[Iterable[Tuple]]) ⇒ Iterator[R]

abstract def keyOrdering: Ordering[K]

abstract def reducers: Option[Int]

Concrete Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def aggregate[B, C](agg: Aggregator[R, B, C]): CoGrouped[K, C]

final def asInstanceOf[T0]: T0

def bufferedTake(n: Int): CoGrouped[K, R]

def clone(): AnyRef

def cogroup[R1, R2](smaller: CoGroupable[K, R1])(fn: (K, Iterator[R], Iterable[R1]) ⇒ Iterator[R2]): CoGrouped[K, R2]

def count(fn: (R) ⇒ Boolean): CoGrouped[K, Long]

def distinctSize: CoGrouped[K, Long]

def distinctValues: CoGrouped[K, R]

def drop(n: Int): CoGrouped[K, R]

def dropWhile(p: (R) ⇒ Boolean): CoGrouped[K, R]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def filter(fn: ((K, R)) ⇒ Boolean): CoGrouped[K, R]

def filterKeys(fn: (K) ⇒ Boolean): CoGrouped[K, R]

def finalize(): Unit

def flatMapValues[V](fn: (R) ⇒ TraversableOnce[V]): CoGrouped[K, V]

def flattenValues[U](implicit ev: <:<[R, TraversableOnce[U]]): CoGrouped[K, U]

def fold[V](f: Fold[R, V]): CoGrouped[K, V]

def foldLeft[B](z: B)(fn: (B, R) ⇒ B): CoGrouped[K, B]

def foldWithKey[V](fn: (K) ⇒ Fold[R, V]): CoGrouped[K, V]

def forall(fn: (R) ⇒ Boolean): CoGrouped[K, Boolean]

def forceToReducers: CoGrouped[K, R]

final def getClass(): Class[_]

def hashCode(): Int

def head: CoGrouped[K, R]

final def isInstanceOf[T0]: Boolean

def join[W](smaller: CoGroupable[K, W]): CoGrouped[K, (R, W)]

def keys: TypedPipe[K]

def leftJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (R, Option[W])]

def mapGroup[R1](fn: (K, Iterator[R]) ⇒ Iterator[R1]): CoGrouped[K, R1]

def mapValueStream[V](smfn: (Iterator[R]) ⇒ Iterator[V]): CoGrouped[K, V]

def mapValues[V](fn: (R) ⇒ V): CoGrouped[K, V]

def max[B >: R](implicit cmp: Ordering[B]): CoGrouped[K, R]

def maxBy[B](fn: (R) ⇒ B)(implicit cmp: Ordering[B]): CoGrouped[K, R]

def min[B >: R](implicit cmp: Ordering[B]): CoGrouped[K, R]

def minBy[B](fn: (R) ⇒ B)(implicit cmp: Ordering[B]): CoGrouped[K, R]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def outerJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[R], Option[W])]

def product[U >: R](implicit ring: Ring[U]): CoGrouped[K, U]

def reduce[U >: R](fn: (U, U) ⇒ U): CoGrouped[K, U]

def reduceLeft[U >: R](fn: (U, U) ⇒ U): CoGrouped[K, U]

def rightJoin[W](smaller: CoGroupable[K, W]): CoGrouped[K, (Option[R], W)]

def scanLeft[B](z: B)(fn: (B, R) ⇒ B): CoGrouped[K, B]

def size: CoGrouped[K, Long]

def sortWithTake[U >: R](k: Int)(lessThan: (U, U) ⇒ Boolean): CoGrouped[K, Seq[R]]

def sortedReverseTake(k: Int)(implicit ord: Ordering[_ >: R]): CoGrouped[K, Seq[R]]

def sortedTake(k: Int)(implicit ord: Ordering[_ >: R]): CoGrouped[K, Seq[R]]

def sum[U >: R](implicit sg: Semigroup[U]): CoGrouped[K, U]

def sumLeft[U >: R](implicit sg: Semigroup[U]): CoGrouped[K, U]

final def synchronized[T0](arg0: ⇒ T0): T0

def take(n: Int): CoGrouped[K, R]

def takeWhile(p: (R) ⇒ Boolean): CoGrouped[K, R]

def toList: CoGrouped[K, List[R]]

def toSet[U >: R]: CoGrouped[K, Set[U]]

def toString(): String

lazy val toTypedPipe: TypedPipe[(K, R)]

def values: TypedPipe[R]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def withDescription(description: String): CoGrouped[K, R]

def withDescription(descriptionOpt: Option[String]): CoGrouped[K, R]

def withReducers(reds: Int): CoGrouped[K, R] { def reducers: Some[Int] }

Inherited from WithDescription[CoGrouped[K, R]]

Inherited from WithReducers[CoGrouped[K, R]]