com.twitter.summingbird.scalding
This is the "infinite history" join and always joins regardless of how much time is between the left and the right
In this case, the right pipe is fed through a scanLeft doing a Semigroup.
In this case, the right pipe is fed through a scanLeft doing a Semigroup.plus before joined to the left
This ensures that gate(Tleft, Tright) == true, else the None is emitted as the joined value.
This ensures that gate(Tleft, Tright) == true, else the None is emitted as the joined value. Useful for bounding the time of the join to a recent window
This ensures that gate(Tleft, Tright) == true, else the None is emitted as the joined value, and sums are only done as long as they they come within the gate interval as well
lookupJoin simulates the behavior of a realtime system attempting to leftJoin (K, V) pairs against some other value type (JoinedV) by performing realtime lookups on a key-value Store.
An example would join (K, V) pairs of (URL, Username) against a service of (URL, ImpressionCount). The result of this join would be a pipe of (ShortenedURL, (Username, Option[ImpressionCount])).
To simulate this behavior, lookupJoin accepts pipes of key-value pairs with an explicit time value T attached. T must have some sensical ordering. The semantics are, if one were to hit the right pipe's simulated realtime service at any time between T(tuple) T(tuple + 1), one would receive Some((K, JoinedV)(tuple)).
The entries in the left pipe's tuples have the following meaning:
T: The the time at which the (K, W) lookup occurred. K: the join key. W: the current value for the join key.
The right pipe's entries have the following meaning:
T: The time at which the "service" was fed an update K: the join K. V: value of the key at time T
Before the time T in the right pipe's very first entry, the simulated "service" will return None. After this time T, the right side will return None only if the key is absent, else, the service will return Some(joinedV).