Object

com.twitter.scalding.typed

TypedPipeDiff

Related Doc: package typed

Permalink

object TypedPipeDiff

Some methods for comparing two typed pipes and finding out the difference between them.

Has support for the normal case where the typed pipes are pipes of objects usable as keys in scalding (have an ordering, proper equals and hashCode), as well as some special cases for dealing with Arrays and thrift objects.

See diffByHashCode for comparing typed pipes of objects that have no ordering but a stable hash code (such as Scrooge thrift).

See diffByGroup for comparing typed pipes of objects that have no ordering *and* an unstable hash code.

Source
TypedPipeDiff.scala
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TypedPipeDiff
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. object Enrichments

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def diff[T](left: TypedPipe[T], right: TypedPipe[T], reducers: Option[Int] = None)(implicit arg0: Ordering[T]): UnsortedGrouped[T, (Long, Long)]

    Permalink

    Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.

    Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.

    Requires that T have an ordering and a hashCode and equals that is stable across JVMs (not reference based). See diffArrayPipes for diffing pipes of arrays, since arrays do not meet these requirements by default.

  8. def diffArrayPipes[T](left: TypedPipe[Array[T]], right: TypedPipe[Array[T]], reducers: Option[Int] = None)(implicit arg0: ClassTag[T]): TypedPipe[(Array[T], (Long, Long))]

    Permalink

    Same as diffByHashCode, but takes care to wrap the Array[T] in a wrapper, which has the correct hashCode and equals needed.

    Same as diffByHashCode, but takes care to wrap the Array[T] in a wrapper, which has the correct hashCode and equals needed. This does not involve copying the arrays, just wrapping them, and is specialized for primitive arrays.

  9. def diffByGroup[T, K](left: TypedPipe[T], right: TypedPipe[T], reducers: Option[Int] = None)(groupByFn: (T) ⇒ K)(implicit arg0: Ordering[K]): TypedPipe[(T, (Long, Long))]

    Permalink

    NOTE: Prefer diff over this method if you can find or construct an Ordering[T].

    NOTE: Prefer diff over this method if you can find or construct an Ordering[T].

    Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.

    This implementation does not require an ordering on T, but does require a function (groupByFn) that extracts a value of type K (which has an ordering) from a record of type T.

    The groupByFn should be something that partitions records as evenly as possible, because all unique records that result in the same groupByFn value will be materialized into an in memory map.

    groupByFn must be a pure function, such that: x == y implies that groupByFn(x) == groupByFn(y)

    T must have a hash code suitable for use in a hash map on a single JVM (doesn't have to be stable cross JVM) K must have a hash code this *is* stable across JVMs. K must have an ordering.

    Example groupByFns would be x => x.hashCode, assuming x's hashCode is stable across jvms, or maybe x => x.timestamp, if x's hashCode is not stable, assuming there's shouldn't be too many records with the same timestamp.

  10. def diffByHashCode[T](left: TypedPipe[T], right: TypedPipe[T], reducers: Option[Int] = None): TypedPipe[(T, (Long, Long))]

    Permalink

    NOTE: Prefer diff over this method if you can find or construct an Ordering[T].

    NOTE: Prefer diff over this method if you can find or construct an Ordering[T].

    Same as diffByGroup but uses T.hashCode as the groupByFn

    This method does an exact diff, it does not use the hashCode as a proxy for equality.

  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  21. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped