Trait

com.twitter.scalding

ReduceOperations

Related Doc: package scalding

Permalink

trait ReduceOperations[+Self <: ReduceOperations[Self]] extends Serializable

Implements reductions on top of a simple abstraction for the Fields-API This is for associative and commutive operations (particularly Monoids and Semigroups play a big role here)

We use the f-bounded polymorphism trick to return the type called Self in each operation.

Source
ReduceOperations.scala
Linear Supertypes
Serializable, AnyRef, Any
Known Subclasses
Type Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ReduceOperations
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def mapReduceMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(redfn: (X, X) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U]): Self

    Permalink

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

    Assumed to be a commutative operation. If you don't want that, use .forceToReducers

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def aggregate[A, B, C](fieldDef: (Fields, Fields))(ag: Aggregator[A, B, C])(implicit startConv: TupleConverter[A], middleSetter: TupleSetter[B], middleConv: TupleConverter[B], endSetter: TupleSetter[C]): Self

    Permalink

    Pretty much a synonym for mapReduceMap with the methods collected into a trait.

  5. def approximateUniqueCount[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self

    Permalink

    Approximate number of unique values We use about m = (104/errPercent)^2 bytes of memory per key Uses .toString.getBytes to serialize the data so you MUST ensure that .toString is an equivalance on your counted fields (i.e. x.toString == y.toString if and only if x == y)

    Approximate number of unique values We use about m = (104/errPercent)^2 bytes of memory per key Uses .toString.getBytes to serialize the data so you MUST ensure that .toString is an equivalance on your counted fields (i.e. x.toString == y.toString if and only if x == y)

    For each key:

    10% error ~ 256 bytes
    5% error ~ 1kB
    2% error ~ 4kB
    1% error ~ 16kB
    0.5% error ~ 64kB
    0.25% error ~ 256kB
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def average(f: Symbol): Self

    Permalink
  8. def average(f: (Fields, Fields)): Self

    Permalink

    uses a more stable online algorithm which should be suitable for large numbers of records

    uses a more stable online algorithm which should be suitable for large numbers of records

    Similar To

    http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. def count[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

    Permalink

    This is count with a predicate: only counts the tuples for which fn(tuple) is true

  11. def dot[T](left: Fields, right: Fields, result: Fields)(implicit ttconv: TupleConverter[(T, T)], ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

    Permalink

    First do "times" on each pair, then "plus" them all together.

    First do "times" on each pair, then "plus" them all together.

    Example

    groupBy('x) { _.dot('y,'z, 'ydotz) }
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. def forall[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

    Permalink
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  18. def head(f: Symbol*): Self

    Permalink
  19. def head(fd: (Fields, Fields)): Self

    Permalink

    Return the first, useful probably only for sorted case.

  20. def histogram(f: (Fields, Fields), binWidth: Double = 1.0): Self

    Permalink
  21. def hyperLogLog[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self

    Permalink
  22. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  23. def last(f: Symbol*): Self

    Permalink
  24. def last(fd: (Fields, Fields)): Self

    Permalink
  25. def mapList[T, R](fieldDef: (Fields, Fields))(fn: (List[T]) ⇒ R)(implicit conv: TupleConverter[T], setter: TupleSetter[R]): Self

    Permalink

    Collect all the values into a List[T] and then operate on that list.

    Collect all the values into a List[T] and then operate on that list. This fundamentally uses as much memory as it takes to store the list. This gives you the list in the reverse order it was encounted (it is built as a stack for efficiency reasons). If you care about order, call .reverse in your fn

    STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm.

  26. def mapPlusMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U], sgX: Semigroup[X]): Self

    Permalink
  27. def max(f: Symbol*): Self

    Permalink
  28. def max(fieldDef: (Fields, Fields)): Self

    Permalink
  29. def min(f: Symbol*): Self

    Permalink
  30. def min(fieldDef: (Fields, Fields)): Self

    Permalink
  31. def mkString(fieldDef: Symbol): Self

    Permalink
  32. def mkString(fieldDef: Symbol, sep: String): Self

    Permalink
  33. def mkString(fieldDef: Symbol, start: String, sep: String, end: String): Self

    Permalink

    these will only be called if a tuple is not passed, meaning just one column

  34. def mkString(fieldDef: (Fields, Fields)): Self

    Permalink
  35. def mkString(fieldDef: (Fields, Fields), sep: String): Self

    Permalink
  36. def mkString(fieldDef: (Fields, Fields), start: String, sep: String, end: String): Self

    Permalink

    Similar to the scala.collection.Iterable.mkString takes the source and destination fieldname, which should be a single field.

    Similar to the scala.collection.Iterable.mkString takes the source and destination fieldname, which should be a single field. The result will be start, each item.toString separated by sep, followed by end for convenience there several common variants below

  37. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  38. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  40. def pivot(fieldDef: (Fields, Fields), defaultVal: Any = null): Self

    Permalink

    Opposite of RichPipe.unpivot.

    Opposite of RichPipe.unpivot. See SQL/Excel for more on this function converts a row-wise representation into a column-wise one.

    Example

    pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests))

    it will find the feature named "clicks", and put the value in the column with the field named clicks.

    Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.

    Note

    Duplicated fields will result in an error.

    Hint

    if you want more precision, first do a

    map('value -> value) { x : AnyRef => Option(x) }

    and you will have non-nulls for all present values, and Nones for values that were present but previously null. All nulls in the final output will be those truly missing. Similarly, if you want to check if there are any items present that shouldn't be:

    map('feature -> 'feature) { fname : String =>
      if (!goodFeatures(fname)) { throw new Exception("ohnoes") }
      else fname
    }
  41. def reduce[T](fieldDef: Symbol*)(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self

    Permalink
  42. def reduce[T](fieldDef: (Fields, Fields))(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self

    Permalink

    Apply an associative/commutative operation on the left field.

    Apply an associative/commutative operation on the left field.

    Example

    reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) =>
      (left._1 + right._1, left._2 ++ right._2)
    }

    Equivalent to a mapReduceMap with trivial (identity) map functions.

    Assumed to be a commutative operation. If you don't want that, use .forceToReducers

    The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.

  43. def size(thisF: Fields): Self

    Permalink
  44. def size: Self

    Permalink

    How many values are there for this key

  45. def sizeAveStdev(fieldDef: (Fields, Fields)): Self

    Permalink

    Compute the count, ave and standard deviation in one pass example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx))

  46. def sortWithTake[T](f: (Fields, Fields), k: Int)(lt: (T, T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

    Permalink

    Equivalent to sorting by a comparison function then take-ing k items.

    Equivalent to sorting by a comparison function then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, since these bounded sorts are done on the mapper, so only a sort of size k is needed.

    Example

    sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) {
      fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 }

    topClicks will be a List[(Long,Long)]

  47. def sortedReverseTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

    Permalink

    Reverse of above when the implicit ordering makes sense.

  48. def sortedTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

    Permalink

    Same as above but useful when the implicit ordering makes sense.

  49. def sum[T](fs: Symbol*)(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

    Permalink

    The same as sum(fs -> fs) Assumed to be a commutative operation.

    The same as sum(fs -> fs) Assumed to be a commutative operation. If you don't want that, use .forceToReducers

  50. def sum[T](fd: (Fields, Fields))(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

    Permalink

    Use Semigroup.plus to compute a sum.

    Use Semigroup.plus to compute a sum. Not called sum to avoid conflicting with standard sum Your Semigroup[T] should be associated and commutative, else this doesn't make sense

    Assumed to be a commutative operation. If you don't want that, use .forceToReducers

  51. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  52. def times[T](fs: Symbol*)(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

    Permalink

    The same as times(fs -> fs)

  53. def times[T](fd: (Fields, Fields))(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

    Permalink

    Returns the product of all the items in this grouping

  54. def toList[T](fieldDef: (Fields, Fields))(implicit conv: TupleConverter[T]): Self

    Permalink

    Convert a subset of fields into a list of Tuples.

    Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields.

  55. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  56. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  58. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped