Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.
Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.
Requires that T have an ordering and a hashCode and equals that is stable across JVMs (not reference based). See diffArrayPipes for diffing pipes of arrays, since arrays do not meet these requirements by default.
Same as diffByHashCode, but takes care to wrap the Array[T] in a wrapper, which has the correct hashCode and equals needed.
Same as diffByHashCode, but takes care to wrap the Array[T] in a wrapper, which has the correct hashCode and equals needed. This does not involve copying the arrays, just wrapping them, and is specialized for primitive arrays.
NOTE: Prefer diff over this method if you can find or construct an Ordering[T].
NOTE: Prefer diff over this method if you can find or construct an Ordering[T].
Returns a mapping from T to a count of the occurrences of T in the left and right pipes, only for cases where the counts are not equal.
This implementation does not require an ordering on T, but does require a function (groupByFn) that extracts a value of type K (which has an ordering) from a record of type T.
The groupByFn should be something that partitions records as evenly as possible, because all unique records that result in the same groupByFn value will be materialized into an in memory map.
groupByFn must be a pure function, such that: x == y implies that groupByFn(x) == groupByFn(y)
T must have a hash code suitable for use in a hash map on a single JVM (doesn't have to be stable cross JVM) K must have a hash code this *is* stable across JVMs. K must have an ordering.
Example groupByFns would be x => x.hashCode, assuming x's hashCode is stable across jvms, or maybe x => x.timestamp, if x's hashCode is not stable, assuming there's shouldn't be too many records with the same timestamp.
NOTE: Prefer diff over this method if you can find or construct an Ordering[T].
NOTE: Prefer diff over this method if you can find or construct an Ordering[T].
Same as diffByGroup but uses T.hashCode as the groupByFn
This method does an exact diff, it does not use the hashCode as a proxy for equality.
Some methods for comparing two typed pipes and finding out the difference between them.
Has support for the normal case where the typed pipes are pipes of objects usable as keys in scalding (have an ordering, proper equals and hashCode), as well as some special cases for dealing with Arrays and thrift objects.
See diffByHashCode for comparing typed pipes of objects that have no ordering but a stable hash code (such as Scrooge thrift).
See diffByGroup for comparing typed pipes of objects that have no ordering *and* an unstable hash code.