Object/Class

com.twitter.scalding

MapsideReduce

Related Docs: class MapsideReduce | package scalding

Permalink

object MapsideReduce extends Serializable

An implementation of map-side combining which is appropriate for associative and commutative functions If a cacheSize is given, it is used, else we query the config for cascading.aggregateby.threshold (standard cascading param for an equivalent case) else we use a default value of 100,000

This keeps a cache of keys up to the cache-size, summing values as keys collide On eviction, or completion of this Operation, the key-value pairs are put into outputCollector.

This NEVER spills to disk and generally never be a performance penalty. If you have poor locality in the keys, you just don't get any benefit but little added cost.

Note this means that you may still have repeated keys in the output even on a single mapper since the key space may be so large that you can't fit all of them in the cache at the same time.

You can use this with the Fields-API by doing:

val msr = new MapsideReduce(Semigroup.from(fn), 'key, 'value, None)
// MUST map onto the same key,value space (may be multiple fields)
val mapSideReduced = pipe.eachTo(('key, 'value) -> ('key, 'value)) { _ => msr }

That said, this is equivalent to AggregateBy, and the only value is that it is much simpler than AggregateBy. AggregateBy assumes several parallel reductions are happening, and thus has many loops, and array lookups to deal with that. Since this does many fewer allocations, and has a smaller code-path it may be faster for the typed-API.

Source
Operations.scala
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MapsideReduce
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val COUNTER_GROUP: String

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  17. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  18. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped